Finding the Sweet Spot: My Claude Code Workflow for Structured Autonomy

Jacob Kausler

16 Dec 2025 — 19 min read

Most people using AI coding assistants are getting real value. But there's a nagging feeling that we're leaving something on the table.

Maybe you're still manually reviewing more than you need to—not every line, but enough that it feels like overhead. Or maybe you've embraced autonomy, letting Claude run and accepting results that are "close enough." It works. It ships. But it's not quite your vision.

I've been on both ends of this spectrum.

Early on, I wanted to watch everything Claude did. It was slow. Then I decided to experiment—I let Claude build the first version of a campaign management app largely unsupervised. Detailed documentation, clear goals, full autonomy. What I got was technically impressive and completely unsalvageable. Features worked but broke each other. The architecture was too complex to refactor back to what I actually wanted.

Version two is on epic 13 of 61. I use it daily. It matches my vision because I'm involved at the right moments—not all of them.

The difference isn't Claude getting smarter. It's me figuring out where to be involved and how much to make Claude delegate.

I found the sweet spot between hawkish observance and complete autonomy, and I want to show you what it looks like. If you want to explore of the ideas here in more detail, checkout the companion repository that contains several of the files, agents, and commands I mention here.

The Journey to Structured Autonomy

Let me tell you about version one.

I'd been using Claude Code to write substantial code—productively. But I was managing it intensely. Skimming most diffs to keep a handle on the architecture. Reading through Claude's thought processes. And constantly fighting context management issues—Claude would run out of thinking space, and I'd cobbled together disparate approaches for it to continue: memory files scattered in different places, no consistent system.

So when I decided to build a campaign management app for my tabletop RPG hobby, I thought: let's try something different. More autonomy. More trust.

I wrote detailed documentation. Claude helped me brainstorm the design doc and create an implementation plan—that part actually looked similar to what I do now. Then I set Claude loose to build it.

And it did. For days.

Here's the thing about watching an AI build for days: you can't meaningfully interact with the result until pieces come together. The backend had to be complete before anything was testable. Then auth issues blocked progress. So I waited for the frontend too. By the time I could actually use anything, it was way too late.

The workflow itself was the problem. I couldn't tweak along the way.

Some tests were genuinely useful—Claude had written them TDD-style, so they were structurally correct. But they tested what Claude built, not what I envisioned. And they'd become incredibly complicated working around database issues that reflected architectural decisions I'd never approved.

Features worked but broke each other. The data model anticipated features I never asked for. When I tried to simplify, the abstractions were so intertwined that I couldn't untangle them.

I scrapped it. Because the process prevented me from staying involved where it mattered, not because Claude failed at coding.

Version two started differently.

The brainstorming and planning phases looked similar. But now, I work through stages one at a time. I pick the design approach at the beginning of each stage. I test manually before Claude reviews or tests anything—so my feedback drives iteration instead of Claude validating its own work.

Features become usable after each epic. Subfeatures become usable after individual stages. I never wait days to interact with what's being built.

Thirteen epics in, the app is already something I use. Not "close enough"—actually what I wanted.

The difference isn't the planning. It's the granularity of implementation and where I'm involved during it.

The Workflow

My system has three main phases: creating design documents, breaking them into epics and stages, and working through those stages one at a time. Each phase involves Claude doing the heavy lifting while I stay involved at decision points.

Design Documents First

Every significant piece of work starts with a design document. Not a quick prompt—a real document that captures what we're building, why, and how.

Claude writes most of it, but only after interrogating me.

I use the Superpowers brainstorming skill—my prompt just needs to include "brainstorm with me" and Claude shifts into a structured question-and-answer mode. One question at a time. Detailed. Technical.

These are deep, technical questions:

"What technology do you want to use for rendering maps?"
"What database solution fits your needs?"
"Should we propagate changes via websockets or polling?"
"How do you want to handle authentication—session-based or JWT?"

By the time Claude starts writing, it has genuinely understood my technical vision. The document that comes out isn't Claude's interpretation of a vague prompt—it's a detailed spec that reflects dozens of my specific answers synthesized into something coherent.

The brainstorming session is one Claude session—usually 30-60 minutes of back-and-forth. At the end, I have a document I actually agree with, because I was consulted on every significant decision before it was written.

For personal projects, this starts from my own ideas. I might say "I want to build a map editor for my campaign app" and Claude helps me think through what that actually means at a technical level.

For work projects, this often starts from a Jira ticket or Confluence doc. Claude reads the source material, asks clarifying questions, and helps me create an implementation-focused design document.

Either way, the output is the same: a document that captures my vision, written by Claude, validated through conversation. This document becomes the foundation for creating epics and stages—though both the document and the resulting plan can evolve as I work through the project.

The temptation is to skip this step. "I know what I want, just build it." That's exactly what I did with version one. The design doc existed, but I hadn't been interrogated about it. Claude filled in the gaps with its own assumptions, and those assumptions compounded through the entire build.

Now the rule is simple: no implementation without a design document, and no design document without brainstorming.

Epic & Stage Creation

With a design document, the next step is breaking it into implementable pieces. This is another brainstorming session—Claude reads the design doc and helps me create epics and stages.

Epics are chunks of work, roughly ticket-sized. Not traditional "epics" that span multiple sprints—these are focused units. "Map marker rendering." "Entity inspector panel." "Campaign selector modal."

Stages are the discrete features that complete an epic. Each epic has 5-13 stages: "Create marker component." "Button to create an entity." "Implement multiple selection." "Add drag-and-drop reordering."

Each stage is a complete vertical slice—frontend, backend, tests, mobile styling, whatever it needs. Not "write the component, then write tests, then do mobile." One stage, one feature, fully implemented.

This structure evolved through the project.

When I first created the plan for my campaign app, I had about 13 "epics." But they were too broad—each one represented multiple tickets worth of work. "Mapping" was a single epic. So was "Entity Management." These were too broad to work through systematically.

As I started implementing, I realized the problem. With Claude's help, I broke "Mapping" into around a dozen actual epics: marker rendering, layer management, map controls, entity inspector, drag-and-drop, and so on. Each became a focused chunk I could complete and verify.

Once I had a feel for what the right granularity looked like, I used agents to help me break down the remaining original "epics" into properly-sized ones. What started as 13 broad sections became 61 feature chunks, each with 5-13 discrete stages.

The plan evolves.

This is crucial: epics and stages aren't a rigid contract. As I work through the project, I constantly re-analyze what's ahead.

Sometimes a stage gets completed early as a side effect of other work—I close it and move on. Sometimes I realize mid-implementation that we need a new epic I hadn't anticipated—I add it. Sometimes the order needs to change based on what I learn.

The structure provides direction without becoming a straitjacket. It's a map, not a mandate.

For smaller work, the same pattern scales down.

When I'm working on a Jira ticket at work rather than a greenfield project, I don't need 61 epics. Usually 1-3 epics cover the scope, with their stages representing the discrete features to build.

The brainstorming process is the same: Claude reads the ticket and any linked design docs, asks clarifying questions, and helps me create a structured breakdown. The result is a clear path through the work with explicit checkpoints.

Why this matters:

The epic/stage structure is what enables strategic involvement. Instead of "build this feature" with no visibility until it's done, I have dozens of small completion points. Each stage is a chance to verify, adjust, or redirect.

It also enables multi-session work. Stage files track what's been done and what's next. When I start a new session, Claude can pick up exactly where we left off without me re-explaining context.

Working Through Stages

With epics and stages defined, the actual work begins. Each stage follows a four-phase cycle: Design, Build, Refinement, and Finalize. These are checkpoints where I stay involved, not bureaucracy.

The Four Phases

Design Phase: Claude presents 2-3 implementation approaches. Not just "here's what I'll build"—actual options with tradeoffs. For frontend work, this might be different UI layouts or interaction patterns. For backend work, it might be different data structures, storage approaches, or calculation methods.

I pick one. Or I offer a different approach entirely. This is where I fine-tune the architecture rather than letting Claude choose what it thinks is best.

Build Phase: Claude implements the chosen approach. This is where most of the autonomous coding happens.

Refinement Phase: I test manually. I use what was built myself, not relying on automated tests. For frontend, I check desktop and mobile. For backend work, I might write scripts to validate the underlying implementation.

My feedback drives iteration. Sometimes it's visual: "The modal is too small on mobile." Sometimes it's functional: "Mobile tap-to-drag isn't working." "This calculation doesn't handle edge case X." Claude debugs and adjusts based on what I find, treating every compile as requiring verification.

For UI work, I require dual sign-off: both desktop and mobile explicitly approved before moving on.

Finalize Phase: Quality gates. First code review checks the implementation. Then tests are written—both unit tests and e2e tests using Playwright for actual browser interactions, with seed data and test databases as needed. Second code review verifies the tests are comprehensive. Documentation updated, clean commit. The stage is complete.

How Sessions Map to Phases

A stage typically spans multiple Claude sessions:

Session 1: Design + Build (pick approach, initial implementation)
Session 2+: Refinement (my testing, Claude debugging, iteration)
Final session: Finalize (first code review, tests, second code review, docs, commit)

Sometimes a simple stage completes in one session. Complex stages might need several refinement sessions as Claude debugs issues I find. The structure flexes to fit the work.

Multi-Session Continuity

Here's what makes this sustainable: stage and epic files are templated.

I have a Claude Skill that generates these files with structured sections: design notes, implementation options offered, summary of what was built, open questions, and status tracking. When I start a new session—maybe the next day, maybe a week later—Claude reads this file and picks up exactly where we left off.

The state lives in the file, not in conversation history.

Navigation Commands

Two slash commands orchestrate this workflow:

/next_task — Begins work on a phase. A task-navigator subagent scans the epic and stage files, identifies the current stage and phase, and reports back. The subagent does the searching, keeping the main conversation context minimal.

/finish_phase — Updates the stage and epic documentation to mark the current phase (and stage or epic if complete) as finished. This is what enables /next_task to figure out what comes next.

These commands mean every session starts the same way: run /next_task, see where I am, continue working.

Quality Gates Built In

The finalize phase enforces explicit quality checks:

Dual Code Reviews: A code-reviewer subagent checks the implementation before tests are written, looking for issues. After tests are written, a second review verifies the tests are actually comprehensive. Because these are subagents rather than the main agent reviewing its own work, they're less prone to self-approval bias.

Testing: Both unit and e2e tests are created. E2e tests use Playwright for real browser interactions, with seed data and test databases as needed. This provides automated verification that the work is complete AND hasn't broken anything from previous stages.

Regression Checklists: For frontend work, Claude generates a regression checklist—items to manually run through, with checkboxes for mobile and/or desktop as applicable. The goal is to run through the entire checklist at the end of each epic (not each stage), covering all work done so far. This catches when new work has invalidated something from earlier stages.

The Rhythm

Once you're in this workflow, it develops a rhythm:

/next_task — Claude finds Stage 3 of the Entity Inspector epic, Refinement phase, and reports it
Test what was built, provide feedback
Claude debugs and iterates based on feedback
Approve when satisfied
/finish_phase — marks Refinement complete, updates docs
/clear — fresh session
/next_task — Claude finds Stage 3, Finalize phase
Claude runs first code review, writes tests, runs second code review, commits
/finish_phase — marks stage complete
/clear — fresh session
/next_task — Claude finds Stage 4, Design phase

Features become usable after each epic. Subfeatures become usable after individual stages. The app grows incrementally, and I'm involved at every meaningful checkpoint without watching every line of code.

Why This Works

The phase structure enforces a specific collaboration pattern:

Vision decisions (Design phase) — me
Execution (Build phase) — Claude
Validation (Refinement phase) — me
Quality verification (Finalize phase) — Claude with my final approval

I'm not absent, and I'm not micromanaging. I'm present at the moments where my judgment matters, and absent during the mechanical work where Claude excels.

The Secret Sauce: Subagents for Everything

Here's the configuration choice that makes everything else work: the main Claude agent does almost nothing directly.

No file reading. No code writing. No test running. No codebase searching. All of that gets delegated to subagents.

The main agent's job is singular: coordinate. Talk to me. Present options. Summarize results. Decide what to delegate next.

This sounds like bureaucratic overhead. It's actually essential—what lets sessions run to their full potential.

Why This Matters

Claude Code has a context window. Every file read, every search result, every test output consumes space in that window. In a complex project, you can burn through context shockingly fast just exploring the codebase, leaving no room for actual implementation.

By delegating all tool use to subagents, the main conversation stays clean. When I'm discussing design options with Claude, the context contains our conversation—not thousands of lines of code Claude read to understand the codebase.

The task-navigator subagent can search through 61 epics and hundreds of stages without polluting the main session. A code-reviewer subagent can read every changed file without using main context. A test-runner subagent can capture verbose output without me ever seeing it unless something fails.

The main agent sees summaries. "The task-navigator found Stage 4 of the Entity Inspector epic, currently in Design phase." "The code-reviewer found two issues: a potential null reference and a missing error handler." Clean, actionable, context-efficient.

The Origin Story

This required explicit configuration. When I first tried delegating to subagents, they'd refuse to do things. They'd claim they couldn't edit files directly, or that running bash commands was too dangerous.

The subagents didn't realize they were subagents. They thought they were the main agent, bound by the same conservative defaults. So they'd try to spawn their own Claude instances to do the work, creating a bizarre recursive delegation chain that accomplished nothing.

The fix was explicit: tell subagents what they are and what they can do. Now every subagent prompt includes clear instructions about their role as executors who can and should read files, edit code, and run commands directly.

Enforcing This Pattern

I enforce delegation automatically through two mechanisms:

Prompt Injection: Every single prompt I send gets enhanced by a hook script before Claude sees it. The script appends guidelines including "You MUST use subagents for ANY exploration, ANY implementation, or ANY execution."

This isn't a suggestion Claude might forget. It's injected into every interaction. The main agent cannot escape the coordination role because the instructions are always present.

CLAUDE.md Guidelines: My global CLAUDE.md file establishes the philosophy: main agent coordinates, subagents execute. It defines what each role should and shouldn't do. It explains why—context isolation, parallel execution, failure containment.

Between prompt injection and CLAUDE.md, the delegation pattern is structural, not aspirational.

What Gets Delegated

Essentially everything:

File operations: Reading code, writing code, editing files
Codebase exploration: Searching for patterns, finding implementations, understanding architecture
Testing: Running tests, debugging failures, verifying fixes
Code review: Analyzing changes, checking for issues
Documentation: Updating tracking files, writing docs
Navigation: Finding the current task, scanning epic/stage files

The main agent's toolkit is basically: talk to me, spawn subagents, summarize what they found.

The Benefits Compound

This pattern enables things that wouldn't otherwise be practical:

Parallel work: Multiple subagents can run simultaneously—implementing frontend and backend at the same time
Failure isolation: A stuck or failing subagent leaves the main session intact. If one fails, I simply spawn a fresh one.
Specialized agents: Different subagents can have different instructions. The code-reviewer has a detailed checklist. The test-runner knows the project's test commands. The task-navigator knows the epic/stage file format.
Session longevity: Complex multi-hour sessions are possible because context doesn't run out

The overhead of delegation is real but small. The benefits are substantial.

This Works Beyond Frontend

Everything I've described so far has used my campaign management app as the example—a frontend-heavy personal project. But this workflow isn't frontend-specific. I use the same patterns at work on backend systems, data pipelines, and infrastructure code.

Design Phase Looks Different

For backend work, the design options focus on architecture:

"What type structure should we use for this new entity?"
"How should we store this data—normalized tables, JSON blob, or separate service?"
"What's the best method to calculate this metric—batch job, real-time aggregation, or materialized view?"

Claude presents 2-3 approaches with real tradeoffs—drawing from existing code structure and context it finds while exploring the codebase. I pick one or suggest an alternative. Same pattern, different domain.

This matters because it gives me control over technical decisions that compound. If Claude just picks what it thinks is best, I might end up with an architecture I don't want and didn't choose. By presenting options, Claude ensures I'm making the calls on decisions that matter.

Refinement Phase Looks Different

For frontend work, refinement means clicking through the UI on desktop and mobile. For backend work, it often means writing scripts.

I'll write a test script that exercises the implementation with various inputs. Or I'll query the testing database or data files directly to verify data is being stored correctly. Or I'll trace through logs to confirm the right things are happening.

The principle is the same: I verify manually before Claude does its own testing. My feedback drives iteration—not Claude assuming its code is correct because it compiled.

The Pattern Holds

Whether I'm building React components or data transformation pipelines:

Vision + options → I choose
Execution → Claude
Verification → me first, then Claude's automated tests and code reviews

The specific tools change. The collaboration pattern doesn't.

Smaller Scope, Same Structure

At work, I'm usually implementing Jira tickets rather than building from scratch. The scope is smaller—typically 1-3 epics rather than 61.

But the structure is identical. Claude reads the ticket and any linked design docs. We brainstorm to create an implementation plan. I work through stages with the same phase cycle.

The overhead of creating epics and stages is minimal for small work, and the benefits—structured checkpoints, multi-session continuity, clear progress tracking—apply regardless of project size.

The Leadership Mindset

The mindset that changed how I work with Claude: treat it like a junior developer. Or maybe an intern.

This isn't dismissive—junior developers and interns can be incredibly productive. They can produce substantial amounts of work. They can learn quickly and execute tasks you'd never have time for yourself.

But you wouldn't hand an intern a vague spec and disappear for a week. You'd check in. You'd review their work. You'd course-correct when they headed in the wrong direction. You'd catch mistakes before they compound.

Claude Is Capable But Inexperienced

Claude can write substantial code, explore codebases, debug issues, and implement complex features—the raw capability exists.

Yet Claude lacks your vision. Claude misses the subtle preferences that make something feel right versus technically correct but wrong. It will make reasonable decisions that diverge from your vision. It will implement features you didn't ask for because they seemed logical. It will veer off course without clear direction.

This reflects the nature of working with any assistant external to your thinking.

What This Means in Practice

Claude can write documentation, plans, and designs. But only if it first understands what's in your mind. That's why brainstorming asks you questions before producing anything. Claude needs to extract your vision before it can articulate it.

Claude can proceed autonomously on implementation. But it needs clear checkpoints where you verify the work. That's why stages have phases with explicit human gates. Claude builds, you test, Claude continues.

Claude's code needs double-checking—both through code reviews and through actually using what it built. Even good code can miss the point, and even good coders produce errors.

You're the Leader

The best Claude Code users don't "use a tool." They lead a capable but inexperienced team member.

Your job:

Vision and direction
Decisions at key points
Validation that work matches intent

Claude's job:

Details you don't have time for
Grunt work you'd rather not do
Asking clarifying questions to pull the vision out of your head

When you adopt this frame, the workflow makes sense. You're not being controlling—you're being a responsible leader. You're not slowing things down—you're preventing costly mistakes.

The goal isn't to replace yourself. It's to multiply yourself by giving Claude the parts you can delegate while keeping the parts only you can do.

The Professional Code Review Process

At work, before I create a merge request, I run Claude through an intensive review loop. This goes beyond the dual code reviews in each stage—it's a final quality gate before any human sees my code.

Multiple Parallel Reviews

I spawn multiple code-reviewer subagents simultaneously—typically around five—each focused on a different concern. Claude decides which areas to target based on the changeset. One run might focus on:

Code quality
Security
Ticket completeness
Performance
Documentation

Another might prioritize different areas based on what the code touches. The specific reviewers vary from task to task, even iteration to iteration. The point isn't which categories get reviewed—it's that multiple parallel perspectives catch issues a single reviewer would miss.

Aggregated Results

Claude combines the findings into tables organized by severity:

Critical — must fix before merge
High — should fix before merge
Medium — fix if time permits
Nitpick — optional improvements

I scan through the results. Some items are legitimate issues. Some are false positives—the subagent misunderstood the context or flagged something intentional. I make a quick judgment call on each.

Then Claude addresses every legitimate issue.

The Iteration Loop

Here's the key: after Claude makes the fixes, I clear the session and run the five reviews again.

Fresh session. No memory of the previous review. No bias toward approving work it already "passed." The new reviewers see the code as if for the first time.

I repeat this cycle until Claude's findings stay at Nitpick level or below (excluding false positives). Usually two or three rounds.

Then Human Review

Only after Claude's multi-round review do I manually review the MR myself. I'm checking architecture, approach, and anything Claude might have missed. But the obvious stuff—the typos, the missing null checks, the inefficient queries—has already been caught and fixed.

Then it goes to my team.

Why This Works

My teammates see only clean code. They can focus on the interesting questions—"Is this the right approach? Does this fit our architecture? Are there edge cases we haven't considered?"—instead of burning review cycles on "You forgot to handle the error case" or "This variable name is confusing."

Review fatigue is real. Human reviewers get tired, especially on large changes. They start skimming. They miss things. Claude doesn't fatigue. It can run five thorough reviews on a thousand-line change without losing focus.

The entire multi-round process takes a fraction of the time it would take humans to achieve the same level of scrutiny. And because Claude catches the mechanical issues, human reviewers can apply their judgment where it matters most.

The Meta-Point

I'm using AI to QA the AI's work. Multiple perspectives, fresh sessions to prevent self-bias, human judgment as the final gate. It's not perfect, but it catches far more than any single review pass would.

Integration with Real Tools

This workflow doesn't exist in isolation. It connects to the tools I already use for project management and memory.

Jira and Confluence

At work, most projects start with a Jira ticket or a Confluence design doc. Claude reads these directly through MCP integrations—no copy-pasting, no manual summarization.

When I start a new ticket, Claude pulls the description, acceptance criteria, and any linked documentation. Claude feeds that context into the brainstorming session. The resulting implementation plan explicitly traces back to ticket requirements, so I can verify coverage.

For my personal project, the equivalent is my own design documents and notes. Same pattern, different source.

Episodic Memory

The Episodic Memory plugin gives Claude semantic search across past conversations. It indexes sessions locally using vector embeddings and SQLite, then enables natural language queries to find relevant discussions.

This matters because work spans many sessions. "What did we decide about the marker component architecture?" "Why did we reject the websocket approach?" Instead of hunting through files, Claude can search its own history.

Here's where the epic/stage structure pays off: every piece of work carries a codified prefix—EPIC-007, STAGE-007-003—and those identifiers appear in the relevant conversations. When Claude searches for context about the entity inspector, the semantic search naturally surfaces sessions that discussed EPIC-012 and its stages. The structure makes memory retrieval more precise.

Superpowers Plugin

The brainstorming skill I mentioned comes from the Superpowers plugin—a collection of Claude Code skills for structured workflows. I also use its verification skills and code review patterns.

The plugin isn't required for this workflow, but it provides pre-built implementations of patterns I'd otherwise have to create myself.

How to Start

You don't need to implement everything at once. This system evolved over months. Start small and add complexity as you feel the need.

Step 1: Enforce subagent usage

This has the highest impact for the least effort. Add a prompt injection hook or update your CLAUDE.md to require that the main agent delegates all file operations, searches, and executions to subagents.

You'll immediately notice longer, more productive sessions because tool outputs no longer consume context.

Step 2: Add a CLAUDE.md with your philosophy

Write down how you want Claude to work. What should it assume? What should it ask about? What patterns should it follow?

Even a simple CLAUDE.md that says "always ask before making architectural decisions" and "test manually before writing automated tests" will change behavior meaningfully.

Step 3: Try brainstorming for your next feature

Install the Superpowers plugin and start your next piece of work with "brainstorm with me." Let Claude ask you questions before it writes anything.

Notice how different the output is when Claude has extracted your vision first.

Step 4: Create one epic with stages

Take a medium-sized feature and break it into an epic with 5-8 stages. Work through them using the phase cycle: design options, build, manual testing, finalize.

See how the checkpoints change the collaboration.

Step 5: Scale as needed

Once you've felt the benefits on one feature, extend the pattern. Add more epics. Create slash commands for navigation. Set up the iteration-based code review process.

Start simple

The system I have now would have been overwhelming to set up from scratch. It grew organically as I hit pain points and found solutions.

Start with subagent enforcement. Add structure when you feel the need. Let your workflow evolve.

Closing

The goal isn't to replace yourself but to work with an AI that handles details while you stay in the driver's seat.

Claude can write code, explore codebases, debug issues, and produce documentation—the capability is real.

Yet capability without structure produces chaos, as my app's first version proved.

Structure without capability is just bureaucracy. The phases and checkpoints only matter because Claude can actually execute between them.

The sweet spot is both: Claude's execution capacity, guided by your vision, verified at checkpoints you control.

That's the workflow. I hope it helps you find yours.