Notes on AI coding workflows: two recent Armin Ronacher videos
Armin Ronacher is quickly becoming one of my favorite sources for AI developer workflow advice.
It started with this video where he uses Claude Code to solve two actual issues in the minijinja templating engine.
Armin is known in the Python world for creating tools like Flask (a web framework), Jinja2 (a templating engine) and more.
This post is a set of notes from two recent videos:
- A 3-hour screencast where Claude tries to build a Sentry clone.
- A talk packed with advice on how to make agentic coding work.
Read the "Screencast - Claude Vibe Codes a Sentry clone" sectionScreencast - Claude Vibe Codes a Sentry clone
This video is 3h23, no audio.
I found it oddly entertaining and ended up skimming through the entire thing.
Notes, with approximate timestamps if you're curious:
-
Sentry-clone Claude mostly works on its own for the whole thing. Armin doesn't prompt it a lot.
I believe it's running directly on the host with--dangerously-skip-permissions
(Armin is using aclaude-yolo
alias). -
14:00: Armin uses another Claude Code to create an
agent-watcher
tool to follow the changes the main agent makes. -
Agent-watcher Claude starts by struggling with
uv init
.I see it created a subdirectory... Let me move files1.
It goes on for quite a while, credits for persistency I guess:
the actual files I created might be in a different location. Let me find them...
-
24:26: Armin opens VSCode, sees the AI created
agent-watcher
andagent_watcher
... kills it,rm -rf *
💥.
Opens another Claude Code and prompts it again, asking it to useuv
script dependencies and put everything in a single file.
This worked a lot better. -
1:15:28: Fun bit, Armin notices a bug with the agent watcher occasionally printing garbage to the terminal. He prompts Claude with:
the display kinda screwed up. This is what I saw: <pastes broken display>
Claude's response:
Love how Claude is like don't worry, I got you.
-
2:54:50: Armin is testing on port 3000, in the background Claude opens a window on port 3002 using playwright. Then it goes and tries to debug the signup process on its own. That's using the playwright MCP (see
.mcp.json
in the repo). -
The Sentry clone is not quite working.
Claude later makes some progress (fixes login/signup), but ultimately does not succeed in building the sentry clone.
The repository shows that Armin is also using a customized Claude setup (see the commands/
folder) to have it use "subagents" (each gets its own context window).
Some more details on this issue.
Read the "Talk - Agentic Coding: The Future of Software Development with Agents" sectionTalk - Agentic Coding: The Future of Software Development with Agents
The talk echoes some of the points made in his blog post. There is a lot in this talk.
General remarks:
-
Armin is very excited about this, feels a paradigm shift.
-
Had successful outputs having Claude Code run for more than 4 hours.
-
100% of the people he knows hooked on this run Claude on "yolo mode" all the time2
I was surprised to hear Armin say things like "in practice so far it works really well" and "it hasn't destroyed my computer"!
I expected people of his skill to care more about this. -
Claude Code de-emphasizes the role of the editor. You review, program much less.
-
Use cases he mentions: investigate issues, setup/debug CI (the agent uses
gh
to create draft PRs), create a presentation (the one from the video)...
Quality of the dev environment is key:
-
Works best with stable languages, Go, PHP, basic Python. Low ecosystem churn works best (breaking changes in library versions don't help LLMs).
-
Long function names > namespaces.
-
conflicting patterns in the codebase: not good for agents.
-
Log everything to one big file and tell the agent how to read it (a few lines, and more if needed).
-
Speed matters for agent iteration; it might kill a slow tool!
-
Your dev commands should be helpful when misused (~defensive programming).
Cool : Go test caching. No arguments. Only the relevant ones run.
Not cool : Test runners that don't error when a null selection is made - the agent might assume it passed. -
reduce friction by enabling quick tool creation and execution: Tell Claude where it should place its bespoke tools
MCP is not good for coding agents:3
- Finds they pollute the context, and the CLIs just work better (Claude can use them in script, that you can run as a human if needed)
- Armin only uses the playwright MCP at this time.
Tip - Unified logging:
- Combine
console.log
+ server logs + everything else
Patchconsole.log
in browser to forward to server via API call.
Armin says the agent is pretty good at figuring out the order of logs (can be misleading since we merge frontend and backend).
Multi-process guidance: make it clear what processes should be running, provide healthcheck endpoints/easy status access.
Something Armin explores - synchronization points:
Goal: make async operations observable.
Armin has a utility library that the agent can use:
from .agentsupport import reached
# emit an event
reached(point="event-preprocessing-done")
# then a command Claude can run
make await POINT=event-preprocessing-done # Block until reached or timeout
It can have one thing running in the background, do some stuff, and then on the outside await to end up at this point
Armin mentions this isn't perfect, he'd like something that runs 'in lockstep, like a debugger'. I don't fully understand what it enables. Discussion is at 31:06.
Preserving context is key:
- Prevent the agent from spelunking: tools to navigate the codebase efficiently, tail 20 lines of combined logs...
- Consider sub tasks/sub agents to conserve context.
- When you need to
/compact
, you lost: "from that moment, everything is random". Armin aborts and starts from scratch.
Example: a tool make go-methods
. Lists all the methods that exist (with grep), saves the AI a lot of time/tokens navigating.
Read the "Conclusion" sectionConclusion
There's something refreshing about the quality of the dev environment being key to the success of agents.
I'm curious to see if AI coding will put a focus on improving dev tooling for AI.
It seems like a reasonable investment: worst case scenario, this still benefits your human developers.
Happy hacking 🤖
Read the "Footnotes" sectionFootnotes
-
This made Claude very relatable. I always create a project before I
uv init
and end up with nested folders. ↩ -
running claude with the
--dangerously-skip-permissions
flag. I understand that is outside docker too when Armin says it, though I'm not sure. ↩ -
Armin just wrote a new piece on scripts over MCP His take resonates, LLMs are just good with bash. ↩