Hi there! These days it seems like every developer and their dog is talking about AI coding — Claude, Cursor, and countless other tools. Most of us are already using them in one form or another. But there is one specific approach that has been gaining a lot of traction over the past few months: the Ralph Loop. I want to share my experience with it — what it is, how I use it, and what I’ve built around it.

What Is a Ralph Loop?

I first came across the concept through this original article by Geoffrey Huntley — I highly recommend reading it if you haven’t already. But here’s the gist for those short on time.

We all know that AI coding agents tend to be… lazy. Give them a large, complex task and they’ll start analyzing, maybe write some code, and then at some point tell you: “This is getting complicated, you’ll also need to do X, Y, and Z over here, and this other part needs rethinking too…” — and that’s it. Session over.

The Ralph Loop idea is straightforward: instead of relying on a single agent session to do everything, you run the agent in a loop. First, you generate a detailed plan — broken down into user stories, each with clear acceptance criteria — and store it inside your repository. Since each agent session is essentially stateless, this plan file becomes the persistent context. The agent reads it at the start of every iteration, sees which stories are done and which one to tackle next, does the work, updates progress, and exits. Then the loop picks the next story and repeats. When all stories are complete, the loop stops. Task done.

Finding the Right Tooling

When I first got interested in the Ralph Loop, I naturally went looking for existing tools rather than building my own. And there’s no shortage of options — npm alone has a million packages for this. Anthropic’s Claude even has a plugin for it in their marketplace. But none of them worked quite the way I wanted.

So, as developers often do, I decided to build something myself. I found a decent starting point — a shell-script-based project called snarktank/ralph. A simple shell script felt like exactly the right level of complexity: no npm/py/go/cargo dependencies, no build steps, just drop it into your PATH and go. The result is ralphy — my extended version with automated planning, git worktree support, session management, and a terminal dashboard. More on all of that below.

What I Changed

The core value of any Ralph Loop setup lives in the skills — the prompts that guide the agent through planning, implementation, verification, and committing code. In the original project I forked from, most of the process was manual: you had to set things up, generate the plan, convert it, and kick off the loop yourself.

I’m too lazy for that — especially if I’m doing it multiple times a day. So I automated the whole flow. Now you just run the script, which opens an interactive agent session. In the prompt you type something like “use PRD skill to work on …” and describe the task you want to tackle. The agent uses the PRD skill to generate a detailed plan — a PRD broken into user stories — and you can discuss it right there: adjust priorities, refine scope, add details. Once you’re happy with the plan, you simply end the session. No need to manually start the next step. The script automatically invokes another skill to convert the PRD into a JSON progress file, creates a git worktree for the task, and kicks off the Ralph Loop in it.

The one rough edge that remains is that very first prompt: you still have to manually type “use PRD skill” at the beginning. I haven’t found a way around this yet. It would be great if tools like Claude Code or OpenCode supported pre-filling the prompt in an interactive session — they do allow running a fully defined command non-interactively (via -c), but there’s no option to open an interactive session with a pre-populated prompt. Hopefully that will be addressed at some point.

Parallel Sessions with Git Worktrees

Working on a single task in a loop is simple enough — create a branch and go. But the real beauty of the Ralph Loop approach is that you can run multiple tasks in parallel. The problem is obvious: you can’t have several agents modifying the same working directory on different branches simultaneously.

The naive solution is to clone the repo multiple times, but that means duplicating all dependencies, installing everything separately, and constantly switching between directories and repositories. That felt clunky.

So I went with git worktree instead. It lets you have multiple working trees from a single repository, each checked out to a different branch in separate folder. My script uses worktrees by default: it creates one next to the original repo, named after the task’s short identifier, and all work happens there. Clean, isolated, no duplication. But I left an option to use in-place branching if needed.

Lessons from Two Months of Use

I’ve been using the Ralph Loop approach regularly for almost two months now, and it works remarkably well.

The main challenge, though, isn’t the tooling — it’s writing good prompts. For simple tasks like “write tests for this function”, there isn’t much to specify. But for larger features, the quality of your initial task description makes or breaks the result.

You need to think architecturally: what patterns to use, which libraries, how to structure the code, what the API should look like, how components interact. The more detail you put into the prompt, the less disappointed you’ll be with the output. The skill shifts from writing code to formulating requirements, specifying architecture, and designing the solution — writing code is really just the final step. And that shift is arguably the harder part.

That said, don’t go to the other extreme either. Writing “build me a food delivery app” and expecting production-ready code is… optimistic. The agent will happily generate something, but the result will likely leave you underwhelmed.

Not for Every Task — and Not for Every Developer

I think the Ralph Loop approach is best suited for substantial, multi-step tasks: a large-scale refactoring, building a complete CRUD API, designing a user management system and so on. Not just adding a button or a small endpoint that reads from a database — for those, a simple prompt to any agent is more than enough. Ralph Loop shines when the task involves multiple interconnected pieces: a group of API endpoints, proper data types, data models, database migrations, tests — many steps that need to be planned and executed coherently.

But here’s the thing: to plan all of that effectively, you need to spend serious time in the planning phase, carefully shaping the PRD with the agent. And that requires real architectural thinking and development experience. For this reason, I believe the Ralph Loop approach is harder to adopt for junior developers. They tend to work on smaller, more localized tasks — fix a bug, tweak a UI element, adjust a workflow — and those tasks simply don’t need a Ralph Loop. Any agent handles them fine with a straightforward prompt. Juniors also typically don’t yet have the breadth of experience to anticipate architectural pitfalls, define clear API contracts, or think through data model implications upfront — and that’s exactly what the planning phase demands.

AI Doesn’t Replace Expertise — It Demands It

Some people argue that AI coding tools are making developers lazy, that even seniors will lose their deep technical skills over time. Honestly, when I first started hearing about AI-driven development and hadn’t used it much myself, I thought the same thing. I don’t anymore.

Deep technical knowledge and expertise remain essential — for two reasons.

First, you need expertise to write good specs. Formulating precise, detailed requirements — the kind that lead to good results — requires understanding the domain, the tech stack, the trade-offs. You can’t specify what you don’t understand.

Second, you need expertise to review the output. In my experience, not a single large task has ever been completed by an agent to a point where I had nothing to change. Not once. Say a task is broken down into 20 user stories, and the agent works through all of them overnight. I come in the next morning, the console says “all done” — and then the real work begins. I review the code, and I always find things that need fixing: inconsistencies, suboptimal patterns, missed edge cases, or just code that doesn’t match how I want the system to look. What follows is another round of iterations — sometimes even more than the original user stories — to get things right. Some fixes are small enough that I just do them by hand. Others are easier to describe in a prompt and let the agent handle. But the point is: the output is never perfect, and the review phase is a critical part of the process.

In fact, the review phase is currently the longest part of my workflow. I’ve managed to significantly optimize the initial phase — formulating requirements using voice dictation and iterating on the PRD with the agent is fairly quick now. The Ralph Loop execution itself is straightforward. But reviewing and refining the generated code takes real time and attention. I’m still experimenting with ways to improve this — writing more precise initial requirements, including code references and examples — but for now, this phase remains the most time-consuming.

Realistic Productivity Gains

This brings me to a common misconception. If you can run five Ralph Loop sessions in parallel, does that make you five times more productive? No. Because each one still requires your careful review and follow-up work. Based on my experience, my realistic productivity gain with the Ralph Loop is roughly 1.6–1.8x — meaningful, but far from the “10x developer” fantasy.

The real benefit is the ability to work on multiple tasks concurrently — kick off several sessions, then review and refine them as they complete. But if you take your work seriously and understand that every commit goes under your name — regardless of who or what wrote the code — then you treat AI-generated code with the same scrutiny you’d apply to code you wrote yourself. And that review takes time.

But here’s the upside: if you approach it this way, there’s no degradation of skills. Quite the opposite — you stay technically sharp, and you might even pick up new patterns and techniques from the agent’s output, thanks to the vast training data it draws from. The role evolves, but the depth of knowledge stays — and arguably grows.

Speeding Up Task Descriptions with Voice

I mentioned optimizing the initial phase above — a big part of that is how you actually compose the task description. Writing detailed requirements by hand, with references to files, URLs, and various resources, takes a surprising amount of time. So I started looking for ways to speed that up, and landed on voice dictation.

I can say honestly: it made a huge difference. If you work exclusively in English, the built-in macOS dictation is more than enough. In my case, though, I’m not a native English speaker, and I find it easier to think and explain things in my native language. The built-in macOS dictation doesn’t handle that well — it tries to transliterate mixed-language input and produces gibberish.

After trying a few tools, I settled on MacWhisper with the large multi-language model. On modern Apple Silicon Macs (M3, M4), it runs beautifully. I bound it to a hotkey — I press it, say what I need, and the transcribed text appears at my cursor. Modern AI agents have no trouble understanding dictated text, even when it’s a bit rough around the edges.

Of course, some things are still easier to type or paste: URLs from the browser, file paths, code references. And sometimes I drag-and-drop a screenshot to show what’s wrong, because explaining a visual bug in words alone is painful. So my actual workflow for composing Ralph Loop task descriptions is a mix of voice, quick typed phrases, pasted links, and the occasional screenshot. It sounds chaotic, but it’s genuinely productive.

The Admin Dashboard

Since I often run several Ralph Loop sessions in parallel, I found myself juggling terminal windows to check on their progress. Each session runs in its own shell, sometimes across different terminal tabs or windows, and switching between them to see what’s done and what’s still running got tedious.

So I built a small terminal UI to monitor everything. I updated the script to write session state to a simple registry file — no IPC, no server, just file-based updates. Then I wrote a Go-based TUI (ralphy-admin) that reads this registry and shows you all running sessions at a glance: which iteration they’re on, what task they’re working on, which project and branch, and their current progress.

‍

It’s not something everyone needs — especially if you’re just getting started with the approach. But if you regularly run multiple sessions, having a single dashboard to see everything is genuinely useful.

The Tool

The project is called ralphy and it consists of two parts:

ralphy — a self-contained shell script that drives the agent loop. Supports both branching mode (work in the current repo) and git worktree mode (default). Works with OpenCode, Claude Code, and Amp.
ralphy-admin — a terminal UI for monitoring running sessions, viewing logs and progress in real time, and managing sessions (pause, resume, kill). Vim-style navigation.

Check it out, give it a try, and let me know what you think!

‍

This blog was originally published here

My Take on Ralph Loop

What Is a Ralph Loop?

Finding the Right Tooling

What I Changed

Parallel Sessions with Git Worktrees

Lessons from Two Months of Use

Not for Every Task — and Not for Every Developer

AI Doesn’t Replace Expertise — It Demands It

Realistic Productivity Gains

Speeding Up Task Descriptions with Voice

The Admin Dashboard

The Tool

More Post

Data Governance in the Agent Era: How Autonomous Data Engineers Enable Continuous Enforcement

Observability Was Built for Humans. Agents Need Reliability.

My Take on Ralph Loop

What Is a Ralph Loop?

Finding the Right Tooling

What I Changed

Parallel Sessions with Git Worktrees

Lessons from Two Months of Use

Not for Every Task — and Not for Every Developer

AI Doesn’t Replace Expertise — It Demands It

Realistic Productivity Gains

Speeding Up Task Descriptions with Voice

The Admin Dashboard

The Tool

More Post

Data Governance in the Agent Era: How Autonomous Data Engineers Enable Continuous Enforcement

Observability Was Built for Humans. Agents Need Reliability.

Get insights from our team.