The Conference
AgentCon Silicon Valley is a free, one-day, in-person conference for developers building with AI agents. This post is about my personal experience and thoughts about the evnet. This continue second part of two parts series – see my first post at https://unraveledstrands.com/2026/05/10/agentcon-silicon-valley-2026-part-1/. A major theme of the event was about sharing skills to be an agent boss – a builder who build tools and frameworks to delegate and Control agents.
In this post, i will focus about the talks that happened at the PM session of the conference.
- Lessons from a No-Code Library – Drew Breunig
- Securing Coding Agents: Sandboxes, Guardrails, and Real-World Attacks – Dan Ndombe
- From One-Shot to Agentic: Optimizing Shop Intelligence with DSPy – Kshetrajna Raghavan
- GitHub Agentic Workflows – Peli de Halleux
- Client side Web AI Agents for the agentic internet of the future – Jason Mayes
Lessons from a No-Code Library – Drew Breunig
This talk was probably my favorite of the conference. It addressed the challenges of Spec-Driven Development (SDD) with AI coding—a workflow that is now practiced everywhere in the industry in one form or another. In my team, we use Confluence and Jira as the source of truth, with spec files guiding the agent during implementation.
However, as many others have found, the difficulty is that agents generate code faster than we can review it. Since specs are never 100% perfect, an agent will inevitably make assumptions to fill in the gaps. Without a feedback loop, these “silent decisions” are never documented or tested. This is exactly why code review subagents and concepts like “ultrareview” have become so necessary.
The Framework: The Spec-Driven Development Triangle
Drew Breunig explored this problem while developing whenword, a library containing almost no manual code—only specs and tests. The implementation was left entirely to an agent, which often
results in “spec drift” as the code evolves away from the original documentation.
To manage this, Drew introduced the Spec-Driven Development Triangle, which balances Spec, Test, and Code. He uses an LLM as a judge to compare the final implementation against the original spec. The model identifies exactly where the agent filled in gaps or deviated from the requirements, flagging those points for the developer to review.
This approach mimics closely to Behavior-Driven Development (BDD) – Combining BDD to define the ground truth with an LLM judge to verify the implementation is a practical way to maintain
oversight.

The Tool: plumb
To automate this feedback loop, Drew built plumb. The tool integrates directly into the development workflow via git hooks—specifically pre-commit and post-commit hooks.
When a developer attempts to commit code, the pre-commit hook triggers an LLM analysis of the staged changes and the
agent’s conversation history. It identifies any “silent decisions” made during implementation and blocks the commit if there are undocumented changes. Once the developer approves the findings, plumb automatically syncs those decisions back into the specs and tests, ensuring the documentation remains an accurate reflection of the software
I find it really inspiring on how Drew successfully took the philosophical lessons from his whenword experiment and translated them into a pragmatic, usable tool that solves a real engineering bottleneck.
Securing Coding Agents: Sandboxes, Guardrails, and Real-World Attacks – Dan Ndombe
Dan Ndombe is from docker and he was there to talk about docker sandboxes. Providing a secure enviornment is defintiely a very important aspect of agentic workflows and workload. you are giving agent autonomy to drive, but there must be guardrails and safety net. This is simliar to my talk at the 2026 Backgorund agent summit where i spoke of using Ona a envionrment for agents to run background implemntation.
Docker sandbox are MicroVM-isolated environments. By using a hardware-level hypervisor, each agent gets its own dedicated Linux kernel. This is in contrast to regualr dockerc aontiners, which share the host’s kernel, a “jailbroken” agent could
theoretically escape to the host machine via a kernel exploit.

Features Dan highlighted include:
- Hypervisor Isolation: Each sandbox runs in a lightweight MicroVM (using Apple Hypervisor, Windows WHP, or KVM),
isolating the agent from the host processes entirely. - Network Guardrails: All egress is routed through a proxy that enforces strict domain allow-lists, preventing agents
from exfiltrating secrets. - Private Docker Daemons: The sandboxes include their own private Docker engine, allowing agents to run docker build
or docker compose (Docker-in-Docker) without needing dangerous “privileged” access to the host
I was most interested in the egress proxy. For long-running tasks and agents operating ‘in the wild,’ preventing the agent from having direct access to secrets stored in environment variables or accessible files is going to be extremely relevant and important
Dan Ndombe’s core message provided a powerful summary of the “Agent Boss” era: “An AI agent is only as safe as we want it to be.” It isn’t about taking away the agent’s tools, but about ensuring those tools are used within a secure, kernel-level boundary that the human ‘Boss’ controls.
From One-Shot to Agentic: Optimizing Shop Intelligence with DSPy – Kshetrajna Raghavan
Kshetrajna Raghavan, a Principal Engineer at Shopify, delivered what I found to be the best case study for scaling agents in production. He shared the journey of “Shop Intelligence”—Shopify’s system for extracting structured data from millions of highly customized, non-standard merchant stores.
His talk perfectly illustrated a lession that I have learnt and applying at work: Being an Agent Boss requires intense pragmatism.
Given enough attempts, compute, and liberty, an advanced AI could probably solve the majority of the technical issues
we face. But at what cost? If a task costs more to execute than the value it generates, it ceases to be a practical solution. Are we building Lamborghinis to do the job of a tow truck?

Shopify’s journey highlighted this exact tension:
- The “One-Shot” Wall
Initially, Shopify used single, large API calls to OpenAI models to analyze store content. While this worked for
simple cases, it was unsustainable. Not only was the cost of processing millions of stores astronomical, but a single
prompt couldn’t handle the messy reality of diverse store layouts. - Moving to the ReAct Loop
The first major shift was moving from a static “blob” of text to an autonomous agent using a ReAct (Reasoning and
Action) loop. Instead of guessing based on a single snapshot, the agent could explore the store—deciding which pages
to visit and which data points were missing. - The Swarm of Sub-Agents
To manage the complexity of exploration, they broke the “do-it-all” agent into a swarm of specialized sub-agents. One
agent might focus on brand identity, while another focuses on product categorization. This modularity doubled their
precision, but relying on third-party APIs for this many distinct agent interactions was still prohibitively
expensive. - Self-Hosting and Compiling with DSPy
The final, most impressive step was using DSPy to programmatically optimize the entire pipeline so they could bring it
in-house. Instead of manually tuning prompts for OpenAI, they treated their workflows as code that could be compiled
against specific metrics. This optimization allowed them to move away from third-party APIs entirely. By spinning up their own H100 clusters and
leveraging self-hosted Qwen models, the economics of the system fundamentally changed. The results were incredible:
- 75x Cost Reduction: The combination of an optimized agentic architecture and self-hosted models drove the cost down
by a factor of 75, while actually improving data quality. - Universal Coverage: This efficiency allowed Shopify to scale the system to analyze every single store on the
platform, something that would have been financially impossible on a per-token API model.
Raghavan’s core message was that “architecture compounds.”
Success didn’t come from a single breakthrough, but from the steady evolution from one-shot API calls to a self-hosted swarm of sub-agents, all programmatically optimized for the real world.
I am also applying this approach in my own work. We started using agentic workflows for basic debugging and have since built on those guardrails and architecture to allow agents to handle more complex tasks, like background agents performing code implementation and self-improvement. While we haven’t found a need to self-host models yet, switching to cheaper models (like MiniMax or AWS Nova Lite) for basic tasks has helped manage costs and made the system much more practical for everyday use.
GitHub Agentic Workflows – Peli de Halleux
I find GitHub Agentic Workflows to be an easy and practical way for engineering teams already using Github to adopt agentic workloads. This is especially true for large enterprises since which often have more restrictions and compliance requirements to follow. Also applying AI in a more controlled environment like ci/cd process should has a less negative consequences when things goes sour as oppose to a customer facing agent
The Agent Factory in Action
Peli described a setup that feels a bit wild but makes total sense: AI writing software that writes AI software. In Peli’s Agent Factory, they’ve stopped trying to build one giant, monolithic agent. Instead, they have over 100 little “bite-sized” workflows doing highly specific jobs.
Of course, when you have over 100 agents running around a repo, no human can read all their outputs. Peli’s solution to this is to just add more agents. They use “meta-agents” whose entire job is to watch the other agents and make sure they are behaving.

Something that ties perfectly back to what Drew Breunig’s talk was: GH AW has moved to a spec-only contribution model. This is the whenword experiment playing out at an enterprise scale. If you want to contribute to the Agent Factory, you don’t write code. You write a Markdown file—a spec—that describes exactly what you want the agent to do. The system then “compiles” that natural language into a secure GitHub Action.
It’s an interesting look at the future. Seeing a giant like GitHub bet so heavily on spec-driven contributions makes
me think this isn’t just a neat trick—it’s probably the only way we are going to safely manage these systems at scale.
I managed to speak with Peli after his talk to get some extra tips and tricks for improving agentic CI/CD workflows. He recommended checking out qmd (https://github.com/tobi/qmd) as a code wiki. We also briefly discussed how to avoid reward hacking in the CI/CD process—a conversation that eventually inspired me to write my post on
Specs vs. Code for Security
This also made me think a lot about security. In a normal open-source or enterprise repo, you have to read every line of code to make sure someone hasn’t introduced a vulnerability. But with spec-driven development, you’re just auditing the intent. Because the actual implementation is generated within strict, pre-defined guardrails, a spec-driven contribution might actually be way more secure than a human-written one.
Client side Web AI Agents for the agentic internet of the future – Jason Mayes
I caught the final session of the conference, which focused on client-side Web AI agents. The speaker used flight booking as an example. Finding a flight usually requires navigating rigid, static filters—picking dates, checkboxes, and price sliders. The demo showed how a user could instead use voice to surface flights, with the UI changing to fit the user’s intent. This shift to client-side LLMs highlights a few distinct points for running AI in production:
- Latency: Processing the prompt locally on the device’s hardware removes the round-trip delay to cloud APIs.
- Task-Appropriate Models: As the Shopify talk noted, you don’t need a state-of-the-art model for every single task. A webpage doesn’t require a massive, generalized model; it just needs one capable of mapping user intent to specific local functions.
- Production Economics: Moving inference to the client device removes the cloud infrastructure costs of running agents at scale.
This architecture could change how we approach the specificity paradox in web design. Currently, developers and designers spend time building custom user experiences for every edge case, trying to predict what a user might do next. With native cognitive capability in the browser, websites could simply expose their tools and data structures via protocols like WebMCP, allowing a local model to parse the page and handle the specific workflow a user requests in that moment. This also points to a practical reason behind Google’s open-source strategy with Gemma 4. If lightweight models are going to run natively within the browser environment (like Chrome’s built-in AI architecture), the model weights must live on local consumer devices. Making Gemma open-weight aligns with a framework where rendering and agent orchestration happen entirely on the client side.
Closing thoughts
AgentCon was timely and highly relevant. It’s clear that being an “Agent Boss” will be a mandatory skillset in the AI
era. This means taking responsibility for the agent’s environment—whether that’s a secure sandbox, a CI/CD pipeline,
or a web browser. Ultimately, our success will be defined by how well we provide the right context and guardrails to
turn autonomous actions back into human-led intent.



















