Tag: llm

  • Field Report – AgentCon Sillicon Valley (Part 2)

    Field Report – AgentCon Sillicon Valley (Part 2)

    The Conference

    AgentCon Silicon Valley is a free, one-day, in-person conference for developers building with AI agents. This post is about my personal experience and thoughts about the evnet. This continue second part of two parts series – see my first post at https://unraveledstrands.com/2026/05/10/agentcon-silicon-valley-2026-part-1/. A major theme of the event was about sharing skills to be an agent boss – a builder who build tools and frameworks to delegate and Control agents.

    In this post, i will focus about the talks that happened at the PM session of the conference.

    • Lessons from a No-Code Library – Drew Breunig
    • Securing Coding Agents: Sandboxes, Guardrails, and Real-World Attacks – Dan Ndombe
    • From One-Shot to Agentic: Optimizing Shop Intelligence with DSPy – Kshetrajna Raghavan
    • GitHub Agentic Workflows – Peli de Halleux
    • Client side Web AI Agents for the agentic internet of the future – Jason Mayes

    Lessons from a No-Code Library – Drew Breunig

    This talk was probably my favorite of the conference. It addressed the challenges of Spec-Driven Development (SDD) with AI coding—a workflow that is now practiced everywhere in the industry in one form or another. In my team, we use Confluence and Jira as the source of truth, with spec files guiding the agent during implementation.

    However, as many others have found, the difficulty is that agents generate code faster than we can review it. Since specs are never 100% perfect, an agent will inevitably make assumptions to fill in the gaps. Without a feedback loop, these “silent decisions” are never documented or tested. This is exactly why code review subagents and concepts like “ultrareview” have become so necessary.

    The Framework: The Spec-Driven Development Triangle

    Drew Breunig explored this problem while developing whenword, a library containing almost no manual code—only specs and tests. The implementation was left entirely to an agent, which often
    results in “spec drift” as the code evolves away from the original documentation.

    To manage this, Drew introduced the Spec-Driven Development Triangle, which balances Spec, Test, and Code. He uses an LLM as a judge to compare the final implementation against the original spec. The model identifies exactly where the agent filled in gaps or deviated from the requirements, flagging those points for the developer to review.

    This approach mimics closely to Behavior-Driven Development (BDD) – Combining BDD to define the ground truth with an LLM judge to verify the implementation is a practical way to maintain
    oversight.

    Image from https://www.dbreunig.com/2026/03/04/the-spec-driven-development-triangle.html

    The Tool: plumb

    To automate this feedback loop, Drew built plumb. The tool integrates directly into the development workflow via git hooks—specifically pre-commit and post-commit hooks.

    When a developer attempts to commit code, the pre-commit hook triggers an LLM analysis of the staged changes and the
    agent’s conversation history. It identifies any “silent decisions” made during implementation and blocks the commit if there are undocumented changes. Once the developer approves the findings, plumb automatically syncs those decisions back into the specs and tests, ensuring the documentation remains an accurate reflection of the software

    I find it really inspiring on how Drew successfully took the philosophical lessons from his whenword experiment and translated them into a pragmatic, usable tool that solves a real engineering bottleneck.

    Securing Coding Agents: Sandboxes, Guardrails, and Real-World Attacks – Dan Ndombe

    Dan Ndombe is from docker and he was there to talk about docker sandboxes. Providing a secure enviornment is defintiely a very important aspect of agentic workflows and workload. you are giving agent autonomy to drive, but there must be guardrails and safety net. This is simliar to my talk at the 2026 Backgorund agent summit where i spoke of using Ona a envionrment for agents to run background implemntation.

    Docker sandbox are MicroVM-isolated environments. By using a hardware-level hypervisor, each agent gets its own dedicated Linux kernel. This is in contrast to regualr dockerc aontiners, which share the host’s kernel, a “jailbroken” agent could
    theoretically escape to the host machine via a kernel exploit.

    Features Dan highlighted include:

    • Hypervisor Isolation: Each sandbox runs in a lightweight MicroVM (using Apple Hypervisor, Windows WHP, or KVM),
      isolating the agent from the host processes entirely.
    • Network Guardrails: All egress is routed through a proxy that enforces strict domain allow-lists, preventing agents
      from exfiltrating secrets.
    • Private Docker Daemons: The sandboxes include their own private Docker engine, allowing agents to run docker build
      or docker compose (Docker-in-Docker) without needing dangerous “privileged” access to the host

    I was most interested in the egress proxy. For long-running tasks and agents operating ‘in the wild,’ preventing the agent from having direct access to secrets stored in environment variables or accessible files is going to be extremely relevant and important

    Dan Ndombe’s core message provided a powerful summary of the “Agent Boss” era: “An AI agent is only as safe as we want it to be.” It isn’t about taking away the agent’s tools, but about ensuring those tools are used within a secure, kernel-level boundary that the human ‘Boss’ controls.

    From One-Shot to Agentic: Optimizing Shop Intelligence with DSPy – Kshetrajna Raghavan

    Kshetrajna Raghavan, a Principal Engineer at Shopify, delivered what I found to be the best case study for scaling agents in production. He shared the journey of “Shop Intelligence”—Shopify’s system for extracting structured data from millions of highly customized, non-standard merchant stores.

    His talk perfectly illustrated a lession that I have learnt and applying at work: Being an Agent Boss requires intense pragmatism.

    Given enough attempts, compute, and liberty, an advanced AI could probably solve the majority of the technical issues
    we face. But at what cost? If a task costs more to execute than the value it generates, it ceases to be a practical solution. Are we building Lamborghinis to do the job of a tow truck?

    Image from https://cleantechnica.com/2023/12/24/texas-dps-is-less-amused-with-the-model-y-superheavy-than-we-are/

    Shopify’s journey highlighted this exact tension:

    1. The “One-Shot” Wall
      Initially, Shopify used single, large API calls to OpenAI models to analyze store content. While this worked for
      simple cases, it was unsustainable. Not only was the cost of processing millions of stores astronomical, but a single
      prompt couldn’t handle the messy reality of diverse store layouts.
    2. Moving to the ReAct Loop
      The first major shift was moving from a static “blob” of text to an autonomous agent using a ReAct (Reasoning and
      Action) loop. Instead of guessing based on a single snapshot, the agent could explore the store—deciding which pages
      to visit and which data points were missing.
    3. The Swarm of Sub-Agents
      To manage the complexity of exploration, they broke the “do-it-all” agent into a swarm of specialized sub-agents. One
      agent might focus on brand identity, while another focuses on product categorization. This modularity doubled their
      precision, but relying on third-party APIs for this many distinct agent interactions was still prohibitively
      expensive.
    4. Self-Hosting and Compiling with DSPy
      The final, most impressive step was using DSPy to programmatically optimize the entire pipeline so they could bring it
      in-house. Instead of manually tuning prompts for OpenAI, they treated their workflows as code that could be compiled
      against specific metrics. This optimization allowed them to move away from third-party APIs entirely. By spinning up their own H100 clusters and
      leveraging self-hosted Qwen models, the economics of the system fundamentally changed. The results were incredible:
    • 75x Cost Reduction: The combination of an optimized agentic architecture and self-hosted models drove the cost down
      by a factor of 75, while actually improving data quality.
    • Universal Coverage: This efficiency allowed Shopify to scale the system to analyze every single store on the
      platform, something that would have been financially impossible on a per-token API model.

    Raghavan’s core message was that “architecture compounds.”
    Success didn’t come from a single breakthrough, but from the steady evolution from one-shot API calls to a self-hosted swarm of sub-agents, all programmatically optimized for the real world.

    I am also applying this approach in my own work. We started using agentic workflows for basic debugging and have since built on those guardrails and architecture to allow agents to handle more complex tasks, like background agents performing code implementation and self-improvement. While we haven’t found a need to self-host models yet, switching to cheaper models (like MiniMax or AWS Nova Lite) for basic tasks has helped manage costs and made the system much more practical for everyday use.

    GitHub Agentic Workflows – Peli de Halleux

    I find GitHub Agentic Workflows to be an easy and practical way for engineering teams already using Github to adopt agentic workloads. This is especially true for large enterprises since which often have more restrictions and compliance requirements to follow. Also applying AI in a more controlled environment like ci/cd process should has a less negative consequences when things goes sour as oppose to a customer facing agent

    The Agent Factory in Action

    Peli described a setup that feels a bit wild but makes total sense: AI writing software that writes AI software. In Peli’s Agent Factory, they’ve stopped trying to build one giant, monolithic agent. Instead, they have over 100 little “bite-sized” workflows doing highly specific jobs.

    Of course, when you have over 100 agents running around a repo, no human can read all their outputs. Peli’s solution to this is to just add more agents. They use “meta-agents” whose entire job is to watch the other agents and make sure they are behaving.

    screenshot from https://github.github.com/gh-aw/blog/2026-01-12-welcome-to-pelis-agent-factory/

    Something that ties perfectly back to what Drew Breunig’s talk was: GH AW has moved to a spec-only contribution model. This is the whenword experiment playing out at an enterprise scale. If you want to contribute to the Agent Factory, you don’t write code. You write a Markdown file—a spec—that describes exactly what you want the agent to do. The system then “compiles” that natural language into a secure GitHub Action.

    It’s an interesting look at the future. Seeing a giant like GitHub bet so heavily on spec-driven contributions makes
    me think this isn’t just a neat trick—it’s probably the only way we are going to safely manage these systems at scale.

    I managed to speak with Peli after his talk to get some extra tips and tricks for improving agentic CI/CD workflows. He recommended checking out qmd (https://github.com/tobi/qmd) as a code wiki. We also briefly discussed how to avoid reward hacking in the CI/CD process—a conversation that eventually inspired me to write my post on

    Specs vs. Code for Security

    This also made me think a lot about security. In a normal open-source or enterprise repo, you have to read every line of code to make sure someone hasn’t introduced a vulnerability. But with spec-driven development, you’re just auditing the intent. Because the actual implementation is generated within strict, pre-defined guardrails, a spec-driven contribution might actually be way more secure than a human-written one.

    Client side Web AI Agents for the agentic internet of the future – Jason Mayes

    Find out more https://github.com/jasonmayes/WebAIAgent

    I caught the final session of the conference, which focused on client-side Web AI agents. The speaker used flight booking as an example. Finding a flight usually requires navigating rigid, static filters—picking dates, checkboxes, and price sliders. The demo showed how a user could instead use voice to surface flights, with the UI changing to fit the user’s intent. This shift to client-side LLMs highlights a few distinct points for running AI in production:

    • Latency: Processing the prompt locally on the device’s hardware removes the round-trip delay to cloud APIs.
    • Task-Appropriate Models: As the Shopify talk noted, you don’t need a state-of-the-art model for every single task. A webpage doesn’t require a massive, generalized model; it just needs one capable of mapping user intent to specific local functions.
    • Production Economics: Moving inference to the client device removes the cloud infrastructure costs of running agents at scale.

    This architecture could change how we approach the specificity paradox in web design. Currently, developers and designers spend time building custom user experiences for every edge case, trying to predict what a user might do next. With native cognitive capability in the browser, websites could simply expose their tools and data structures via protocols like WebMCP, allowing a local model to parse the page and handle the specific workflow a user requests in that moment. This also points to a practical reason behind Google’s open-source strategy with Gemma 4. If lightweight models are going to run natively within the browser environment (like Chrome’s built-in AI architecture), the model weights must live on local consumer devices. Making Gemma open-weight aligns with a framework where rendering and agent orchestration happen entirely on the client side.

    Closing thoughts

    AgentCon was timely and highly relevant. It’s clear that being an “Agent Boss” will be a mandatory skillset in the AI
    era. This means taking responsibility for the agent’s environment—whether that’s a secure sandbox, a CI/CD pipeline,
    or a web browser. Ultimately, our success will be defined by how well we provide the right context and guardrails to
    turn autonomous actions back into human-led intent.

  • Field Report – AgentCon Silicon Valley 2026 (Part 1)

    Field Report – AgentCon Silicon Valley 2026 (Part 1)

    AgentCon Silicon Valley is a free, one-day, in-person conference for developers building with AI agents.

    One of the peaks of living in the bay is that every week there will be a tech conference that is worth going to. Last week, I attended AgentCon Silicon at the Computer History Museum, Mountain View California, which btw, is a fantastic venue. Because there was so much great content to digest, I’m breaking my report into two parts to stay within a reasonable “context window” for an article.

    The Conferecne

    AgentCon 2026 happened on May 4th (yes lots of Star War reference) is a small to mid size conference with two to three concurrent tracks happening at the same time. The key sponsors were:

    The conference was great – Majority (if not all) of the speakers were engineers and developers. Content were all very applicable to my daily work. The full schedule can be found at the event’s page.

    All together I went to a total of 9 talks. I am very glad that there were always seats available and attendances were pretty evenly distributed between concurrent talks.

    These are the talks I attended:

    Part 1 (this entry, morning session)

    • Will The Real Autonomous Agent Please Stand Up – Patrick Chanezon, Dona Sarkar
    • Your agent needs a sandbox, not a desert – Samuel Colvin
    • How to Build Auditable Agents Using Context Graphs – Nyah Macklin

    Part 2 (afternoon session)

    • Agents Don’t Know What They Don’t Know – Rob Zuber
    • Lessons from a No-Code Library – Drew Breunig
    • Securing Coding Agents: Sandboxes, Guardrails, and Real-World Attacks – Dan Ndombe
    • From One-Shot to Agentic: Optimizing Shop Intelligence with DSPy – Kshetrajna Raghavan
    • GitHub Agentic Workflows – Peli de Halleux
    • Client side Web AI Agents for the agentic internet of the future – Jason Mayes

    Will The Real Autonomous Agent Please Stand Up – Patrick Chanezon, Dona Sarkar

    The session was covered by two excellent speakers – and I would have loved to hear them speak longer

    To open AgentCon, Dona Sarkar shared two key concepts that framed the rest of the day. She spoke about the evolution of becoming an ‘Agent Boss’ and, at the same time, reminded us to check if we’re building real AI innovation or simply
    settling for ‘faster horses.’ Building faster horses are fine – if that free us up to work on transformative AI. These two thoughts really resonated throughout the sessions.

    If I had asked people what they wanted, they would have said faster horses,” – Not Henry Ford.

    Dona also explored the various form factors AI will manifest in. Her talk was funny, entertaining, and highly informative.

    Patrick Chanezon

    Unfortunately, due to a work matter, I missed the first half of his talk. However, in the portion I caught, he spoke about the shifting roles of ICs and Managers in the agentic evolution. Doubling down on the theme of “Agent Bosses,” he explained that an IC’s success now depends on how effectively they can oversee their agents. He also referenced a key ACM article suggesting that as AI automates entry-level tasks, the industry must adopt a “preceptor” model to ensure junior developers still gain the critical judgment needed to become the next generation of senior engineers. To be successful in this new landscape, you will need the following skills:

    (taken from https://www.youtube.com/watch?v=0HI3OIi-YJY)

    Your agent needs a sandbox, not a desert – Samuel Colvin

    Samuel Colvin (the creator of Pydantic) introduced Monty, a Rust-based Python interpreter designed for safe agentic use. Unlike CPython, it operates on a “deny-all” security model, starting with zero capabilities. Because it is an interpreter, it boasts microsecond startup times. It’s an ideal tool for basic Python tasks, text manipulation, and math within a secure environment.

    How to Build Auditable Agents Using Context Graphs – Nyah Macklin

    Nyah discussed leveraging Neo4j to build context and memory graphs. This was one of my favorite talks because it was immediately applicable to my current projects. I am so happy to see Neo4j releasing these tools as completely open-source rather than “fauxpen” source.

    If you are serious about becoming an “Agent Boss,” a scalable, distributed context and memory system is a must. Research (such as the CommGPT paper) and practical application consistently show that you can achieve significantly better performance by providing a robust Knowledge Graph and RAG system rather than relying solely on fine-tuning.

    Key Takeaway: The primary advantage of GraphRAG over traditional vector-based RAG is its inherent ability to map and understand complex, interconnected relationships within data. In her talks she introduced and mentioned that while neo4j-labs/agent-memory use semantic search for core retrieval, they leverage Knowledge Graph structures for organization, deduplication, and context assembly.

    Closing Thoughts on Part 1

    The morning sessions at AgentCon made one thing clear: we are moving past the “AI as a chatbot” phase and into the “AI as a workforce” era. Becoming an Agent Boss isn’t just a catchy phrase; it’s a fundamental shift in how we think about code, memory, and security. Whether it was the security of Rust-based sandboxes or the structural power of Knowledge Graphs, the bar for “real innovation” is being set higher every day.

    I’m still processing the implications for the future of the engineering profession, but the transition toward an Agentic SDLC is clearly well underway.

    Coming Next in Part 2

    In the next entry, I’ll dive into the remaining six talks from the afternoon tracks, focusing on the practical “how-to” of securing and optimizing agents.

    Here’s what I’ll be covering:

    • Agents Don’t Know What They Don’t Know: Handling uncertainty with Rob Zuber.
    • Lessons from a No-Code Library: Drew Breunig on simplifying complexity.
    • Securing Coding Agents: A deep dive into guardrails and real-world attacks.
    • Optimizing Shop Intelligence: Using DSPy to move beyond one-shot prompts.
    • GitHub Agentic Workflows: How the industry giants are orchestrating agents.
    • Client-side Web AI Agents: The future of the agentic internet.

    Stay tuned—Part 2 will be live shortly!

  • Design Pattern Matters -Level up your Lambda Code (including AI Generated Code) with these 3 patterns

    Design Pattern Matters -Level up your Lambda Code (including AI Generated Code) with these 3 patterns

    3 Essential Design Patterns for Robust AWS Lambda Functions

    When you first start with AWS Lambda, it’s easy to write simple, single-file scripts. But to build robust, enterprise-grade serverless applications, you need to apply proven software design patterns. These patterns help you create code that is testable, maintainable, and scalable.

    This post will explore three essential design patterns—and their common anti-patterns—that will immediately elevate your Lambda functions.


    Dependency Injection and the Principle of Separation of Concerns

    Perhaps the most important principle for writing clean Lambda functions is Separation of Concerns. While not a formal design pattern itself, the principle is simple: always separate your core business logic from the Lambda handler code. The pattern we use to achieve this separation is Dependency Injection (DI).

    The Anti-Pattern: Mixing Logic in the Handler

    Developers often write all business logic directly inside the handler, creating the database client and mixing it with validation and event parsing. This makes the code impossible to test without creating complex mock AWS events.

    Python

    # ANTI-PATTERN EXAMPLE
    import boto3
    
    def lambda_handler(event, context):
      # Dependency is created and used directly inside the handler
      dynamodb_client = boto3.client('dynamodb')
        
        # Business logic is mixed with event parsing
      user_data = event['detail']
      if not user_data.get("email"):
        raise ValueError("Email is required.")
            
        # Database interaction is hardcoded
      dynamodb_client.put_item(
        TableName='Users', 
        Item={'email': {'S': user_data['email']}}
      )
      return {"status": "User created"}
    
    

    The Pattern: Inject Your Dependencies

    You implement Separation of Concerns by designing your core logic functions to accept their dependencies (like a database client) as arguments. The Lambda handler is then only responsible for creating those dependencies and “injecting” them.

    Python

    # business_logic.py
    # This function is pure, testable, and knows nothing about Lambda.
    def process_user_signup(user_data: dict, db_client):
      if not user_data.get("email"):
        raise ValueError("Email is required.")
      db_client.put_item(TableName='Users', Item=...)
      return "User created"
    
    # --- lambda_handler.py ---
    import boto3
    from business_logic import process_user_signup
    
    # Initialize client once for reuse
    dynamodb_client = boto3.client('dynamodb')
    
    def lambda_handler(event, context):
      user_data = event['detail']
        # The dependency is "injected" into the core logic
      result = process_user_signup(user_data, dynamodb_client)
      return {"status": result}
    
    

    With this pattern, you can easily unit-test process_user_signup by passing it a simple dictionary and a mock database client.

    Treat software like a well-run kitchen. Each chef has a single responsibility—like a software component. This is how complex systems deliver a quality product, whether it’s a meal or an application.

    2. The Dispatcher Pattern for Routing Events

    The Anti-Pattern: The if/elif/else Chain

    A single Lambda is often triggered by different event variations from the same source (e.g., a DynamoDB Stream sends INSERT, MODIFY, and DELETE events). The most common anti-pattern is a long, cumbersome if/elif/else chain in the handler. This is hard to read and brittle to change.

    Python

    # ANTI-PATTERN EXAMPLE
    def lambda_handler(event, context):
      for record in event['Records']:
        event_name = record['eventName']
        if event_name == 'INSERT':
          print("Handling INSERT event...")
          # ... insert logic ...
        elif event_name == 'MODIFY':
          print("Handling MODIFY event...")
          # ... modify logic ...
        elif event_name == 'DELETE':
          print("Handling DELETE event...")
          # ... delete logic ...
        else:
          print("Warning: Unknown event type.")
    
    

    The Pattern: Use a Dictionary as a Dispatcher

    A cleaner approach is to use a dictionary as a “router” to map an event key to a specific handler function. This makes your handler readable and easy to extend.

    Python

    # event_handlers.py
    def handle_insert(record): print("Handling INSERT event...")
    def handle_modify(record): print("Handling MODIFY event...")
    def handle_unknown(record): print("Warning: Unknown event type.")
    
    # --- lambda_handler.py ---
    from event_handlers import handle_insert, handle_modify, handle_unknown
    
    EVENT_ROUTER = {
      'INSERT': handle_insert,
      'MODIFY': handle_modify,
    }
    
    
    def handle_records(records):
      for record in records
        event_name = record['eventName']        
        handler_func = EVENT_ROUTER.get(event_name, handle_unknown)
        handler_func(record)
    
    
    def lambda_handler(event, context):
      handle_records(event['Records']
      ...
    

    Adding support for DELETE events is now as simple as creating a handle_delete function and adding one line to the EVENT_ROUTER.

    A switchboard (AI generated, probably wrong lol) – routes the conversation to the intended recipients.

    Expanding the Pattern: Handling Logical Outcomes

    The dispatcher pattern isn’t limited to routing based on an event’s type. It’s an even more powerful tool for handling different outcomes from your business logic, such as success, validation errors, or downstream failures. This allows you to create clean, explicit paths for every possible result of an operation.

    The Scenario: A Payment Processing Function

    Let’s imagine a Lambda function that processes a payment. This single operation can have multiple distinct outcomes. A common but messy way to handle this is with a large if/elif/else block directly in the handler. This code can get hard to read and test because the business logic, error handling, and response formatting are all tightly coupled in one place.

    Dispatching Based on Status

    With the dispatcher pattern, we separate these concerns. The core logic function determines the outcome, and the handler dispatches that result to a dedicated function responsible for formatting the response.

    Step 1: Define Outcome-Specific Handlers

    First, create a separate handler for each possible outcome. Their only job is to create the final HTTP response.

    # outcome_handlers.py
    
    def handle_success(result: dict):
      """Handle successful payment."""
      print(f"SUCCESS: Payment processed for transaction ID {result['transactionId']}.")
      ... # code for handling success outcome
      return {"statusCode": 200, "body": "Payment successful"}
    
    def handle_validation_error(error_message: str):
      """Handle validation error."""
      print(f"VALIDATION_ERROR: {error_message}")
      ... # code for handling success outcome
      return {"statusCode": 400, "body": error_message}
    
    def handle_gateway_error(error_details: str):
      """Handle Gateway Error"""
      ... # code for handling error
      return {"statusCode": 502, "body": "Payment provider error"}
    
    # The router maps an outcome status to a handler function
    STATUS_ROUTER = {
      'SUCCESS': handle_success,
      'VALIDATION_ERROR': handle_validation_error,
      'GATEWAY_ERROR': handle_gateway_error,
    }
    
    

    Step 2: Define the Core Logic and the Dispatcher Handler

    Next, the process_payment function contains the business rules and uses early returns to exit as soon as a rule fails. The main lambda_handler calls this function and uses the STATUS_ROUTER to dispatch the result.

    # lambda_handler.py
    import json
    from outcome_handlers import STATUS_ROUTER
    
    def process_payment(request_body: dict) -> tuple[str, dict | str]:
      """
      Core business logic that returns a status and a result.
      It uses early returns to handle failures.
      """
      amount = request_body.get('amount')
        
      # Rule 1: Validate amount exists and is positive
      if not amount or not isinstance(amount, (int, float)) or amount <= 0:
        return ('VALIDATION_ERROR', "Amount must be a positive number.")
    
      card_token = request_body.get('card_token')
        
      # Rule 2: Validate card token exists
      if not card_token:
        return ('VALIDATION_ERROR', "Card token is required.")
    
      # --- All validation passed, proceed to core action ---
      print(f"Charging payment gateway ${amount}...")
        
      success = payment_gateway.charge(amount, card_token)
        
      if not success:
        return ('GATEWAY_ERROR', '...')
      return ('SUCCESS', {'transactionId': 'txn_12345'})
    
    
    def lambda_handler(event, context):
      """
      Main handler that dispatches work based on the outcome of the payment processing.
        """
      body = json.loads(event.get('body', '{}'))   
      status, result = process_payment(body)   
      handler_func = STATUS_ROUTER.get(status)
      return handler_func(result)
    
    

    Why This is Better

    This design is better as it provides clear separation of concerns:

    • Business Logic (process_payment): Knows how to validate and process a payment. It knows nothing about HTTP status codes or JSON response bodies.
    • Response Formatting (handle_* functions): Know how to create specific HTTP responses for different outcomes. They know nothing about business logic.
    • Orchestration (lambda_handler): Knows how to connect the two. Its only job is to call the logic and dispatch the result.

    3. Repository and DTOs for Consistent Data Handling

    The Anti-Pattern: Inconsistent Payloads and Duplicated Queries

    In a serverless system, lambdas communicate via message queues and shared databases. This can lead to data inconsistencies if not managed properly. This pattern uses two techniques to enforce data contracts: one for data moving between services (in-flight) and one for data in your database (at-rest).

    Use Data Transfer Objects (DTOs) for Message Payloads

    The Problem: JSON payloads sent between Lambdas have no enforced structure. If a producer Lambda changes a key name (userId to user_id), the consumer Lambda breaks at runtime.

    The Solution: Define a strict contract using a Data Transfer Object (DTO), implemented as a Python dataclass. This DTO lives in a shared library or Lambda Layer.

    • Producer: Creates a DTO instance and serializes it to JSON.
    • Consumer: Deserializes the JSON back into a DTO instance. This fails immediately if the structure is wrong.
    • Note: There can be multiple consumer and producer

    Python

    # shared/contracts.py
    from dataclasses import dataclass, asdict
    import json
    
    @dataclass
    class UserSignupDTO:
      user_id: str
      email_address: str
    
      def to_json(self): return json.dumps(asdict(self))
    
      @classmethod
      def from_json(cls, s: str): return cls(**json.loads(s))
    
    # In the consumer Lambda:
    # payload = UserSignupDTO.from_json(record['body'])
    # print(f"Processing user: {payload.email_address}")
    
    

    This approach prevents runtime errors from data mismatches, acts as self-documentation, and enables IDE autocompletion.


    Use the Repository Pattern for Database Access

    The Problem: If multiple Lambdas access the same database table, you get duplicated query logic (e.g., the same boto3 call in five functions). Changing the query means updating it everywhere.

    The Solution: Use the Repository Pattern. Create a single class (e.g., UserRepository) that contains all database access logic for that entity.

    • All database queries for a specific table are methods within this single class.
    • Lambdas call methods on the repository object instead of writing raw queries.

    Python

    # shared/database.py
    import boto3
    
    class UserRepository:
      def __init__(
        self, 
        table_name="Users",
        ddb=boto3.resource('dynamodb')
      ):
        self.table = ddb.Table(table_name)
    
      def get_by_id(self, user_id: str):
        response = self.table.get_item(Key={'userId': user_id})
        return response.get('Item')
    
    # In any Lambda function:
    # user_repo = UserRepository()
    # user = user_repo.get_by_id("user-123")
    
    

    This keeps your code DRY (Don’t Repeat Yourself), makes maintenance easy (change logic in one place), and abstracts the database details from your business logic.


    Design Pattern Provides A Blueprint For AI

    The great news is that we live in the age of Large Language Models (LLMs). These models understand design patterns and now that you understand why these patterns are important, you don’t have to implement them from scratch. You can use clever prompting to have an AI partner do the heavy lifting.

    More importantly, this method also prevents “AI code drift.” By consistently instructing an AI to use a specific pattern for a task—like always using the Repository Pattern for database access—you enforce architectural standards across your codebase. This ensures the code remains predictable and maintainable as the project evolves, regardless of who (or which agent/model) writes the prompt.

    Therefore, instead of asking “write me a lambda,” you can now ask:

    Prompt for DI: “Refactor this Python Lambda handler that uses dependency injection. Separate the core business logic from the handler and make the DynamoDB client an injectable dependency.”

    Prompt for Dispatcher: “Write a Python Lambda handler that uses the dispatcher pattern to process DynamoDB Stream events. It should have separate functions for ‘INSERT’ and ‘MODIFY’ events and use a dictionary to route them.”

    Prompt for Repository/DTO: “Generate a Python UserRepository class that uses Boto3 to interact with a DynamoDB table named ‘Users’. Also, create a UserDTO dataclass to represent the user payload.”

    Ultimately, understanding design patterns lets you write better prompts and critically evaluate the AI-generated code, making the AI a more effective tool.

  • AI, Code, and Verification: A Simple Trick for Accurate Results

    AI, Code, and Verification: A Simple Trick for Accurate Results

    TLDR

    • LLM can be terrible at math or generating response that require precision.
    • A simple rule is to ask LLM to generate code to do math instead of using its answer. This can be achieve with a simple prompt like –
      When asked to do any calculations or conversions, always generate code and run it instead of generating a response immediately

    Hallunication

    It’s a known problem that AIs “hallucinate,” especially when you need a precise answer – like doing math or counting.

    This was famously exposed when earlier generation LLMs got stumped by ‘gotcha’ questions like, “How many ‘r’s are in strawberry?”, which showed they weren’t really thinking. While most advanced models today have now learned to answer that question correctly, this isn’t necessarily because they’ve learned to reason, but because they have been specifically trained or prompted to patch that obvious flaw.

    Taken from https://www.reddit.com/r/singularity/comments/1enqk04/how_many_rs_in_strawberry_why_is_this_a_very/


    While this shows progress, it also reveals that their accuracy can be a result of targeted training rather than innate computational ability.

    This exact issue resurfaced for me with a more practical, real-world problem – and this is what I am doing now to prevent it!

    Feeling Lazy

    I was debugging an issue in MongoDB and had a seemingly simple task: convert a MongoDB ObjectId, 6616b9157bac1647326e11e1, into a human-readable timestamp.

    For those who are unfamiliar with MongoDB ObjectIds, or have been using MongoDB but is unaware – A MongoDB ObjectId is a 12-byte value that includes a 4-byte timestamp in its initial segment. This timestamp represents the number of seconds that have passed since the Unix epoch (January 1, 1970). (see docs)

    The Hallucination

    And… it wasn’t just an answer—ChatGPT delivered it with the full swagger of a lead engineer who’s 100% sure of themselves. It laid out the whole thing step-by-step, explaining the ID format, how it pulled the timestamp, and all that.

    The correct answer should have been 2025-07-09T06:01:39.000Z

    The timestamp it gave me seemed legit at first since it was the right day. But something felt off; the time seemed to be off by a few hours Thank goodness I listened to that little voice in my head and ran the conversion myself. Sure enough, ChatGPT was wrong!

    Not Just ChatGPT

    Curious, I tried the same prompt with Grok, Gemini, and Claude. The results were a mixed bag of confidently incorrect answers. This experience was a stark reminder that while the most obvious flaws are being patched, the underlying weakness in performing novel, precise conversions still persists.


    The Better Approach: Ask for the Code, Not the Answer

    This brings me to the core lesson I learned from this: instead of asking an LLM for the final answer, ask it to write code to produce the answer. My experience with Cursor was a perfect example. While the answer in its chat was wrong, it also provided a code snippet.

    Always ask for code!

    That code was the correct path. This approach plays to the AI’s strengths, shifting the task from a weak point (calculation) to a strong point (code generation). Ideally, the model would then execute that code in a sandboxed environment to provide a verified result.

    That’s right!

    A Simple Rule

    Here’s a simple rule: if it involves math or a conversion, always ask the LLM to write code.

    Here is a short example on how to do that with a simple prompt –

    When ask to do any calucations or converstion always generate a code and run it instead of generating a response immediately.

    This too works for counting “R”s =)

  • MCP Version 2025-06-18 Changes: Confused No More!

    MCP Version 2025-06-18 Changes: Confused No More!

    Hey there! In the midst of the Juneteenth holiday break, the Model Context Protocol (MCP) didn’t slow down. In its latest 2025-06-18 specification, MCP introduced significant enhancements to bolster its security posture. I’m especially interested in how these updates directly addresses a long-standing OAuth vulnerability: the Confused Deputy problem.” Let’s dive in!

    The Confused Deputy Problem With MCP

    Working with AI agents that connect to various tools can bring new security challenges, particularly the “confused deputy” problem. This issue arises when a system, entrusted with certain permissions, is tricked into misusing that authority, often by directing an action to the wrong target. Here are the main ways this can manifest with MCP:

    Confused Deputy Scenario 1 (The “Wrong Legitimate Server” Mix-up):

    Your agent is a trusted assistant. It has permission to do things, like reading documents from Google Docs. A “confused deputy” happens when your agent tries to do something, but accidentally directs its action (and its granted permissions) to the wrong server, even if that server isn’t malicious.

    Example: Your company has two MCP servers that can read Google Docs:

    • Finance MCP Server: (https://finance.mycompany.com/mcp) – This server is meant for highly sensitive financial documents.
    • HR MCP Server: (https://hr.mycompany.com/mcp) – This server is meant for confidential HR documents.

    Both servers might offer a tool called “Google Doc Reader” with very similar descriptions. Your agent intends to read a sensitive financial report from Google Docs using the Finance MCP Server. However, due to a slight confusion (e.g., similar tool descriptions), your agent might mistakenly try to send the request (and its Google Docs access token) to the HR MCP Server. The HR server, though not malicious, is not authorized to see financial documents, creating a data leak or compliance issue.

    Confused Deputy Scenario 2 (The “Malicious Look-Alike Server” Trick):

    This scenario, highlighted in GitHub Issue #544, focuses on a more direct phishing attempt where a user is tricked into connecting to a malicious server from the start.

    Example: An attacker publishes a seemingly legitimate article or guide titled “MCP Configuration Best Practices from MyCompany Inc.” This guide subtly promotes configuring a malicious MCP server address (e.g., https://financc.mycompany.com/mcp – a typo, or https://mycompany-docs.net/mcp) in your MCP client application.

    • The Deception: The user, believing they are following official guidance, unknowingly configures their MCP client to use the malicious server’s address.
    • OAuth Flow Triggered: When the agent tries to perform its first action, the OAuth authorization flow begins. To the user, everything seems legitimate – the authorization prompts, the scopes requested – because the malicious server is designed to mimic the real one.
    • The Confusion (and Risk): Upon completing the authorization, your MCP client obtains an OAuth access token. The core of the confused deputy problem here is that the user, confused by the deception, has essentially granted legitimate authority (the OAuth token) to the wrong server. Your client then unknowingly sends this legitimate token to the attacker-controlled MCP server. Once the malicious server has your token, it can then use it to exfiltrate your sensitive data from Google Docs or other services your token has access to.

    How MCP Version 2025-06-18 Helps

    The 2025-06-18 MCP update brings about these changes to fight these problems:

    1. MCP Servers as OAuth Resource Servers

    What it means: MCP servers now function as “OAuth 2.0 Resource Servers.” This means their core responsibility is to validate the access tokens presented by MCP clients to determine if a request for a protected resource (like using a tool or accessing data) should be allowed. They are the guardians of their own services.

    How it helps (Exact Example of Validation): This makes the overall security setup much clearer and stronger. When an MCP server receives a request from an agent, it will perform critical checks on the access token provided in that request. Specifically, it will verify:

    • Signature: Is the token genuinely issued by a trusted Authorization Server and has it not been tampered with?
    • Expiration: Is the token still valid, or has its lifespan expired?
    • Issuer (iss claim): Was the token issued by an Authorization Server that this specific MCP server trusts?
    • Audience (aud claim): Was the token explicitly intended for this specific MCP server? (This is where RFC 8707’s resource parameter comes into play, as detailed below.)
    • Scope (scope claim): Does the token grant the necessary permissions (e.g., read:document, write:database, summarize:report) for the particular action the agent is trying to perform on this server?

    By performing these precise validations, the MCP server ensures that only genuinely authorized agents, with tokens specifically issued for it and with the correct permissions, can access its protected tools and data. This dramatically enhances security by ensuring every interaction is rigorously checked against industry-standard security rules.

    2. MCP Client to Indicate Resource (Using RFC 8707)

    What it means: When your MCP agent asks for permission (an access token) to use a tool on an MCP server, it now must explicitly tell the permission provider (Authorization Server) exactly which resource (MCP server) it plans to talk to.

    How it helps (Directly addresses the “Wrong Server” / Prompt injection Mix-up):

    • Let’s go back to our example. When your agent wants to read a financial document, it asks for a Google Docs access token, but specifically tells the system: “This token is for the Finance MCP Server (https://finance.mycompany.com/mcp) only.”
    • The token then gets a special “audience” tag saying it’s only for finance.mycompany.com/mcp.
    • If your agent then gets confused and accidentally tries to use this token with the HR MCP Server (https://hr.mycompany.com/mcp), the HR server (which also follows these new rules) will check the token. It will see that the token is not meant for it, and reject the request.
    • This prevents the HR server from ever seeing your sensitive financial documents, even if your agent made a mistake in routing.

    Example: Resource Server Payload and Token Parameters

    To make this more concrete, let’s look at how the resource parameter is used in a client’s request and how the aud (audience) claim appears in the access token that the Resource Server (your MCP server) then receives and validates.

    The Client’s Request (Client asking for a Token)

    When your MCP agent needs an access token to interact with, say, the Finance MCP Server, it makes a request to the Authorization Server. This request will include the resource parameter, as mandated by RFC 8707:

    HTTP

    GET /authorize?
      response_type=code
      &client_id=your_mcp_client_id
      &scope=read:document
      &resource=https://finance.mycompany.com/mcp  <-- THIS IS THE KEY PART
      &redirect_uri=https://your_mcp_client/callback
    
    

    (Note: This is a simplified authorization request. A full flow involves exchanging an authorization code for a token.)

    Here, the resource parameter explicitly tells the Authorization Server: “I need a token specifically for the resource located at https://finance.mycompany.com/mcp.” A malicious server “https://finance.mycoy.com/mcp” trying to impersonate https://finance.mycompany.com/mcp, wont be able to request for a token to the resource located at

    The Access Token Payload (What the MCP Server Receives)

    If the Authorization Server supports RFC 8707, it will issue a JSON Web Token (JWT) as an access token. This token will contain an aud (audience) claim in its payload, identifying its intended recipient.

    The payload of such an access token (after decoding, as tokens are usually Base64 encoded) would look something like this:

    JSON

    {
      "iss": "https://auth.mycompany.com",          // Issuer (the Authorization Server)
      "sub": "user_id_12345",                       // Subject (the user or client using the token)
      "aud": "https://finance.mycompany.com/mcp",   // Audience: THIS MUST MATCH THE RESOURCE SERVER
      "exp": 1717603200,                            // Expiration Time
      "iat": 1717602900,                            // Issued At Time
      "scope": "read:document"                      // Permissions granted
      // ... other claims
    }
    
    

    The Resource Server’s Validation (What your MCP Server Does)

    When the Finance MCP Server (https://finance.mycompany.com/mcp) receives this access token, it performs critical validation steps. As an OAuth Resource Server (and specifically following MCP’s requirements), it must check the aud claim in the token’s payload.

    • If aud is https://finance.mycompany.com/mcp: The token is for this server. The server can proceed to process the request (assuming other validations like signature, expiration, etc., also pass).
    • If aud is https://hr.mycompany.com/mcp (or anything else): The token is not for this server. The Finance MCP Server will reject the request, typically with an “Unauthorized” (401) error, because the token’s audience does not match its own identifier.

    This mechanism is what directly prevents the “confused deputy” problem we discussed, ensuring that even if an agent mistakenly tries to send a token to the wrong server, that server will identify that the token isn’t intended for it and deny access.

    Note-Worthy: Server-Side Elicitation

    The MCP Version 2025-06018 also included specs for a new MCP capability where servers can initiate requests for more information or to confirm actions during a task.

    What it’s for: While it doesn’t directly solve the confused deputy problem, this allows MCP servers to dynamically clarify ambiguous requests or get your explicit consent before performing critical, sensitive, or ambiguous actions. It can, therefore, act as a crucial safety net by ensuring your actual intent aligns with the agent’s proposed action, providing a vital “human-in-the-loop” checkpoint.

    Important Considerations:

    Github discussion emphasizing that no sensitive information should be sent via elicitation
    • NEVER Send Sensitive Info Directly: It’s vital that users never send sensitive data (like passwords, credit card numbers, or PII) through an elicitation prompt. This information is likely to be logged, creating a major security risk. The MCP specification strictly prohibits servers from requesting such data via elicitation.
    • Not for Authentication: Elicitation is not a way for servers to ask for your login credentials. Authentication is handled securely and separately by trusted Identity Providers (IdPs). For scenarios requiring authentication or increased permissions, elicitation might trigger a redirect to a secure browser-based flow (as detailed in the upcoming GitHub Pull Request #475), ensuring sensitive login data never directly passes through the MCP channel.

    Final Notes

    Broader Security & Community Notes

    The MCP update also clarifies general security rules and offers new best practices to help developers build safer MCP systems. (See https://modelcontextprotocol.io/specification/2025-06-18/basic/security_best_practices)

    In terms of adoption of 2025-06-18 MCP spec, as of Jun 22nd, MR for these changes are made but remain Open for the official SDKs. VSCode is reportedly looking into it. at https://github.com/microsoft/vscode/issues/248418.

    It’s also worth noting that, 2025-06-18 MCP specification isn’t publicly mentioned in the cline GitHub repository or in the Cursor forum.

    Your Role in Keeping Things Secure

    While these new MCP features are powerful, your carefulness is still crucial. No technology can completely replace user vigilance.

    Even with these updates, the “confused deputy” problem can still arise if you unknowingly make the initial connection to an MCP server that is truly malicious or an unintended target. The protocol’s security features, like RFC 8707, are designed to prevent the misuse of tokens after they are issued for a specific resource. However, if a user is tricked (e.g., through a convincing phishing attack that directs them to configure a malicious look-alike server address), they might legitimately authorize the wrong server from the outset. This is why:

    • Choose Trusted Servers: Always use MCP servers from reputable sources and meticulously verify their exact URLs.
    • Be Aware: Understand what your agent is doing and what permissions it has.
    • Review Requests: Pay close attention to any questions or confirmations your agent asks you through elicitation, especially for sensitive actions.

    In short, MCP gives you stronger tools, but using them safely means staying aware and making smart choices about who you let your agent interact with.

    Have feedback or want to discuss your experience with MCP 2025-06-18? Leave a comment or reach out!

  • MCP is not a fad, it is certain to happen

    MCP is not a fad, it is certain to happen

    Imagine you are building a factory to manufacture cars. You need complex, specialized components like the engine, or parts that require a multi-step process to manufacture, like the tyres.

    If the engine supplier were to simply dump a pallet of raw, unassembled parts on your factory floor, your car assembly line would have to stop while your workers frantically tried to figure out how to build the engine, test it, and prepare it for installation. Your factory would have to become an expert in engine assembly, a job it was never designed to do.

    Similarly, tyre manufacturing is a multistep process requiring high heat for vulcanization. That’s an environment you won’t want to accommodate in your factory space.

    In both cases, your factory is forced to do the hard, specialized work of preparing raw materials. You will not be efficient.

    AI Applications – An Assembly Of Model, Tools, and Data

    Building an AI application is much like designing a modern assembly line. In both cases, you assemble different components to generate your ultimate product. For today’s AI, this means integrating three key components: the core model (like an LLM), a set of external tools, and a constant flow of data.

    To make this assembly truly functional, models are now trained in “tool calling.” This technique allows the LLM to overcome its static knowledge by recognizing when a query requires help from an external tool—like an API or a database. The model learns to call the necessary tool and integrate its response, transforming itself from a simple text generator into a dynamic agent that can act on live data.

    Agent and the Brain

    However, this powerful feature creates a significant, hidden burden for the developer. While the AI calls the tool, the developer is responsible for building and maintaining the entire environment for it. For every single tool, they must manage its specific dependencies, understand its unique authentication and data formats, and write brittle “glue code” to make it compatible with the main application. This is like designing a chaotic assembly line where every machine needs a different power outlet, its own specialized mechanic, and a unique instruction manual. It is highly inefficient and bloats the core application with third-party complexity.

    This is precisely the problem the Model Context Protocol (MCP) is designed to solve.

    MCP: Solving Complexity Through Standardization and Compartmentalization

    The Model Context Protocol (MCP) solves this problem by simultaneously introduces two powerful concepts: a universal standard for integration—the equivalent of a universal utility port for every workstation (usb c for AI apps) —and a decoupled architecture where each tool operates in its own self-contained environment. This combination of a standard interface and a compartmentalized structure is what provides the key benefits.

    1. Simplified Integration Through Standardization

    First, MCP provides a common language for how an AI application communicates with any tool. While API specifications have existed before, MCP standardizes the layer above that: the protocol and context an AI agent needs to reliably call a tool, understand its capabilities, and use its output. This dramatically simplifies the initial work of integrating a new tool into the assembly line.

    2. Independent Evolution Through Compartmentalization

    Second, and arguably more powerful, is how MCP forces a clean separation between the application and the tool (or prompt, or resources). An MCP Server is a self-contained application. This means the AI application doesn’t need to know anything about the tool’s internal environment, its programming language, or its software dependencies. All of that complexity is managed entirely within the MCP Server.

    Hidden Complexity

    This creates a clear and powerful division of labor, which brings us back to the assembly line. The engine manufacturer can completely re-design their own factory—using new machines, new software, and new processes. But as long as the final engine they ship has the same standard mounting points and data connectors, your car factory’s assembly line doesn’t need to change at all. The engine can evolve independently.

    Similarly, a tool provider can completely update their service and its dependencies within their own MCP Server. As long as the MCP interface remains consistent, the AI application that calls it requires no modification. This decoupling is what allows for a truly scalable and maintainable ecosystem, where developers can build complex applications by assembling robust, independent components without inheriting their internal complexity.

    MCP is not a fad, it is certain to happen

    This brings us to a concluding thought, one that could serve as the title for this entire post: The Model Context Protocol is not a temporary fad; it is an evolutionary concept for building complex systems.

    The reason for this is simple: if MCP didn’t exist, something like it would have to be invented. The principles it embodies—compartmentalization and specialization—are fundamental to solving complexity. We see this pattern repeated everywhere, in both natural and man-made systems.

    Complex systems – Life, Software, Trade

    We see this most profoundly in biology, a system that has been self-optimizing for millions of years. Evolution itself discovered that the most robust path to creating complex organisms was through compartmentalization: specialized cells form tissues, and tissues form organs. Each component hides its immense internal complexity, communicating and collaborating through standardized biological and chemical signals.

    We see it in modern software architecture, where developers have moved from monolithic applications to microservices—small, independent services that communicate through standardized APIs, allowing each one to evolve without breaking the entire system. We even see it in global trade with the invention of the simple shipping container. Before this standard interface, logistics were a nightmare of custom work. The container allowed the entire global system of ships, cranes, and trucks to specialize and scale.

    In every case, a standard interface enables a clear division of labor, allowing a system to grow in sophistication without collapsing under its own weight.

    MCP is the application of this universal, time-tested principle to the assembly line of AI. This is not theoretical. The value of this compartmentalized approach is why major industry players like Google, Microsoft, OpenAI, and Anthropic are rapidly adopting MCP.

    Credits: https://x.com/sundarpichai/status/1906484930957193255

    All That Glitter Is Not Gold

    While the Model Context Protocol (MCP) presents a compelling vision, its initial design appears to prioritize functionality at the expense of a robust security framework. The current specifications leave key implementation details ambiguous, such as mandating OAuth 2.1 for authorization—a protocol that has yet to see wide industry adoption. Furthermore, researchers have identified critical risks, including prompt injection, the potential exposure of credentials from MCP servers, and supply chain attacks via malicious third-party tools. As the ecosystem evolves to mitigate these threats, security must remain a paramount consideration for developers adopting the protocol.

    P.S – This is a high level opinion on MCP. Stay tuned for future articles with actual technical examples and description!