Notes on AI coding workflows: two recent Armin Ronacher videos

Armin Ronacher is quickly becoming one of my favorite sources for AI developer workflow advice. It started with this video where he uses Claude Code to solve two actual issues in the minijinja templating engine.
Armin is known in the Python world for creating tools like Flask (a web framework), Jinja2 (a templating engine) and more.

This post is a set of notes from two recent videos:

  1. A 3-hour screencast where Claude tries to build a Sentry clone.
  2. A talk packed with advice on how to make agentic coding work.

Read the "Screencast - Claude Vibe Codes a Sentry clone" section Screencast - Claude Vibe Codes a Sentry clone

This video is 3h23, no audio. I found it oddly entertaining and ended up skimming through the entire thing.
Notes, with approximate timestamps if you're curious:

  • Sentry-clone Claude mostly works on its own for the whole thing. Armin doesn't prompt it a lot.
    I believe it's running directly on the host with --dangerously-skip-permissions (Armin is using a claude-yolo alias).

  • 14:00: Armin uses another Claude Code to create an agent-watcher tool to follow the changes the main agent makes.

  • Agent-watcher Claude starts by struggling with uv init.

    I see it created a subdirectory... Let me move files1.

    It goes on for quite a while, credits for persistency I guess:

    the actual files I created might be in a different location. Let me find them...

  • 24:26: Armin opens VSCode, sees the AI created agent-watcher and agent_watcher... kills it, rm -rf * 💥.
    Opens another Claude Code and prompts it again, asking it to use uv script dependencies and put everything in a single file.
    This worked a lot better.

  • 1:15:28: Fun bit, Armin notices a bug with the agent watcher occasionally printing garbage to the terminal. He prompts Claude with:

    the display kinda screwed up. This is what I saw: <pastes broken display>

    Claude's response:

    a screenshot of Claude Code saying 'I ran bash clear'


    Love how Claude is like don't worry, I got you.

  • 2:54:50: Armin is testing on port 3000, in the background Claude opens a window on port 3002 using playwright. Then it goes and tries to debug the signup process on its own. That's using the playwright MCP (see .mcp.json in the repo).

  • The Sentry clone is not quite working.
    Claude later makes some progress (fixes login/signup), but ultimately does not succeed in building the sentry clone.

The repository shows that Armin is also using a customized Claude setup (see the commands/ folder) to have it use "subagents" (each gets its own context window). Some more details on this issue.

Read the "Talk - Agentic Coding: The Future of Software Development with Agents" section Talk - Agentic Coding: The Future of Software Development with Agents

The talk echoes some of the points made in his blog post. There is a lot in this talk.

General remarks:

  • Armin is very excited about this, feels a paradigm shift.

  • Had successful outputs having Claude Code run for more than 4 hours.

  • 100% of the people he knows hooked on this run Claude on "yolo mode" all the time2
    I was surprised to hear Armin say things like "in practice so far it works really well" and "it hasn't destroyed my computer"!
    I expected people of his skill to care more about this.

  • Claude Code de-emphasizes the role of the editor. You review, program much less.

  • Use cases he mentions: investigate issues, setup/debug CI (the agent uses gh to create draft PRs), create a presentation (the one from the video)...

Quality of the dev environment is key:

  • Works best with stable languages, Go, PHP, basic Python. Low ecosystem churn works best (breaking changes in library versions don't help LLMs).

  • Long function names > namespaces.

  • conflicting patterns in the codebase: not good for agents.

  • Log everything to one big file and tell the agent how to read it (a few lines, and more if needed).

  • Speed matters for agent iteration; it might kill a slow tool!

  • Your dev commands should be helpful when misused (~defensive programming).

    Cool : Go test caching. No arguments. Only the relevant ones run.
    Not cool : Test runners that don't error when a null selection is made - the agent might assume it passed.

  • reduce friction by enabling quick tool creation and execution: Tell Claude where it should place its bespoke tools

MCP is not good for coding agents:3

  • Finds they pollute the context, and the CLIs just work better (Claude can use them in script, that you can run as a human if needed)
  • Armin only uses the playwright MCP at this time.

Tip - Unified logging:

  • Combine console.log + server logs + everything else
    Patch console.log in browser to forward to server via API call.
    Armin says the agent is pretty good at figuring out the order of logs (can be misleading since we merge frontend and backend).

Multi-process guidance: make it clear what processes should be running, provide healthcheck endpoints/easy status access.

Something Armin explores - synchronization points:

Goal: make async operations observable.
Armin has a utility library that the agent can use:

from .agentsupport import reached
# emit an event
reached(point="event-preprocessing-done")
# then a command Claude can run
make await POINT=event-preprocessing-done # Block until reached or timeout

It can have one thing running in the background, do some stuff, and then on the outside await to end up at this point

Armin mentions this isn't perfect, he'd like something that runs 'in lockstep, like a debugger'. I don't fully understand what it enables. Discussion is at 31:06.

Preserving context is key:

  • Prevent the agent from spelunking: tools to navigate the codebase efficiently, tail 20 lines of combined logs...
  • Consider sub tasks/sub agents to conserve context.
  • When you need to /compact, you lost: "from that moment, everything is random". Armin aborts and starts from scratch.

Example: a tool make go-methods. Lists all the methods that exist (with grep), saves the AI a lot of time/tokens navigating.

Read the "Conclusion" section Conclusion

There's something refreshing about the quality of the dev environment being key to the success of agents.
I'm curious to see if AI coding will put a focus on improving dev tooling for AI. It seems like a reasonable investment: worst case scenario, this still benefits your human developers.

Happy hacking 🤖

Read the "Footnotes" section Footnotes

  1. This made Claude very relatable. I always create a project before I uv init and end up with nested folders. ↩

  2. running claude with the --dangerously-skip-permissions flag. I understand that is outside docker too when Armin says it, though I'm not sure. ↩

  3. Armin just wrote a new piece on scripts over MCP His take resonates, LLMs are just good with bash. ↩