Beyond "Vibe Coding": Replacing My Keyboard With AI-Augmented Development

Verify AI agent output with deterministic fast feedback loops. Skip the "does this even work?" question and focus on content, correctness and risk.

a day ago   •   5 min read

By Benjamin Justice
Photo by Sivani Bandaru / Unsplash
Table of contents

The tech world is currently obsessed with how much code AI can write for us. We see endless posts about building entire apps with multiple agents, alongside complaints, such as "Claude Code wiped our production database with a Terraform command" or "GitHub Copilot Wrote 80% of Our Code. We’re Rewriting All of It".

Recently, I rebuilt my private Wordpress website in Astro in just three days. While roughly 98% of the code was generated by AI agents, I didn't just "vibe code" it and pray for the best.

Instead of treating AI as a magic black box that replaces engineering, I treat it as an exceptionally fast typist that still needs an experienced engineer to guide it, review its work, and take ultimate responsibility.

Here is an overview of the disciplined, AI-augmented workflow that makes this possible.

1. Take Responsibility for Generated Code

I simply can't understand the mindset behind statements like "the AI generated this code and broke our system". If I choose to use an AI to generate code and commit it to the main branch, any resulting breakage is entirely on me.

When working “with an AI agent”, I am the senior engineer in that relationship. I set the technical quality gates, I review the code, and I upload the final product. If I don't or can't take responsibility for what goes into the codebase, I would say that I cannot succeed at my job.

Furthermore, while 98% of the codebase was generated, I understand all of it. I evaluate every new dependency before committing it.

Also, if I just need to change a few lines and I know exactly where they are, I don't spin up an agent. My keyboard works just fine, and typing "edit line 92 in file X" into a prompt is significantly less efficient than just fixing it myself.

2. Scaffold the Foundation Before Using Agents

Before I ever invite an agent into my workspace, I manually scaffold the project. This initial foundation serves as the "first context" for my AI tools. The more structural code, configuration, and architectural guidance you have in place, the better the AI will perform.

By setting up my initial projects and basic scripts myself, I ensure that the trivial, foundational decisions don't become guesswork for the LLM.

Most technologies have tools to scaffold new projects from templates that follow best practices. It's just so much more efficient and deterministic than using an agent for that.

3. Fast Feedback Loops via Monorepo Tooling

To keep the AI (and myself) in check, I heavily rely on local pipelines with rapid feedback loops.

I used the tool "moonrepo" (not a typo) to define my projects, tasks, and project dependencies. Check out https://monorepo.tools/ if you are not familiar with monorepo tooling.A critical part of this setup is defining a strict verify task. It's basically my pre-commit hook. This single command lints the code, builds all projects in the repo, and runs all test suites.

If the verify task runs green locally, I can confidently expect my CI pipeline to run green as well. This creates a hard, immediate quality gate that prevents hallucinated or broken AI code from lingering in the codebase.

Additionally, I define a format task, which reformats all code in the repository to conform to my standards. We have determtinistic tools for this (e.g. prettier or oxfmt for node projects), so there is no need to add formatting to my instructions.

4. Lean Context: Slim AGENTS.md and Minimal MCP Servers

Feeding an AI with too much irrelevant context is like giving an Engineer an unrelated book before they start a task. In the best case, they ignore it. I always aim to minimize that irrelevant context:

Minimal MCP Servers

I keep Model Context Protocol (MCP) servers to an absolute minimum. For this project, I only used the astro-docs MCP server. Offering the agent a dedicated tool to fetch up-to-date, accurate information directly from the main framework's documentation is incredibly valuable and reduces hallucination.

Slim AGENTS.md File

I try to keep my context as “clean” as possible. Only add what’s relevant to the actual agent workflow. Most agents add a lot of irrelevant information when you ask them to initialize an AGENTS.md file.

My system prompts and repository instructions are highly optimized:

  • Only document the deviations: I never document the "default" behaviors of frameworks. I only explicitly note things we do differently in our project.The model has been trained on other projects and knows most defaults.
  • Don't Repeat Yourself: For information on projects and their relationships, I point the agent directly to my monorepo tooling rather than duplicating that information in an AGENTS.md file. Your React version? That's already in the package.json. You used vitest instead of jest? Is that even relevant to the agent? If so, it's in your package.json, too.
  • "Enforce" the pipeline: I explicitly instruct the agent to run the format and verify scripts after making any changes. All other repository scripts are deemed irrelevant to the agents and I do not mention them. Running the dev server or building the production build are not relevant to the task.
  • "Force" documentation usage: I explicitly instruct the agent to never assume knowledge about Astro and use the astro-docs MCP server instead.

Most Agents Can Do This

This workflow isn't tied to a single, magical AI tool. It works with any agent that can reliably read and follow the provided instructions. With a strong harness and strict, simple rules, even "weaker" models can provide stellar results.

I have tested the following agent harnesses and models with this workflow:

  • Kilocode: Works with Gemini 3 Flash and even MiniMax M2.5
    • Use High Reasoning for Gemini 3 Flash or weaker models
  • Junie CLI: Works with Gemini 3 Flash
    • Only works with the CLI version. The Junie IDE Plugin uses a different harness and reliably ignored instructions to run the verification steps.
  • Codex: Works with 5.3-Codex
    • I use Medium Reasoning, as that’s “enough” with 5.3-Codex
  • Gemini CLI: Works with Gemini 3 Flash
  • Claude Code: Works with Sonnet 4.6
  • Github Copilot CLI: Works with Sonnet 4.6
    • Only tested with the CLI version. Other versions, such as IDE plugins or remote coding agents use different harnesses with wildly different features and behaviour.

The Result

After three days of work exclusively using this system, I am incredibly happy with the state of my new website and my speed.

The code is cleanly structured and easily extendable in any direction I choose. Most importantly, I am not bound to these tools. If every AI agent goes offline tomorrow, my foundation is solid enough, my understanding of the codebase is complete, and I'll just go back to writing code by hand.

I mostly replaced my keyboard, not my brain. I understand the problem, design a solution and the coding agent translates from natural language to code. I avoided using multiple agents in parallel, as I fear I could create too much mental workload for myself. The only time I used a second agent in parallel was for a very trivial task, which was also trivial to review (fixing image paths in several files).

AI amplifies habits, good and bad. Let's stick to our engineering discipline and let it multiply our good habits. Coding agents can't think, so we shouldn't delegate critical decision-making.

Spread the word