Of all the possible applications of generative AI, the value proposition of using it to write code was perhaps the clearest. Coding can be slow and it requires expertise, both of which can be expensive. Moreover, the promise that anyone who could describe their idea in plain text could create apps, features, or other value-adding products meant that innovation would no longer be limited to those with the skills to execute, but could be done by anyone with an idea. The strength of this promise has created a $7.37 billion market for these tools.
As companies have announced large-scale engineering layoffs—some of which have been directly attributed to AI-driven efficiencies—it appears business executives are experimenting with pairing back their engineering departments and using AI bots to make up the difference. Understandably, other companies might be tempted to follow suit.
Given the state of AI today, they should proceed with caution.
One cautionary tale helpfully illustrates why. Jason Lemkin, a startup founder, VC, and tech blogger, embarked on a very public experiment in AI-assisted development to build a networking application. He live-tweeted his journey with infectious enthusiasm, riding the wave of possibility that vibe-coding promised—the dream that anyone could build software through natural language alone, freed from the tedium and rigors of traditional engineering.
Over the course of a week, euphoria turned to disaster. Lemkin tweeted that the AI agent had caused a catastrophic failure: it had gone rogue and wiped his production database entirely, despite explicit instructions to freeze all code modifications. The incident was peak vibe-coding, crystallizing growing concerns that the speed and apparent ease of AI-generated code had seduced builders into abandoning the very guardrails that prevent such disasters. What began as a celebration of democratized development ended as a cautionary tale about the dangerous illusion that vibes could replace rigor.
Lemkin’s is just one of many stories of experts finding the limits of AI tools in coding. A study published this summer found that while developers estimated that AI made them 20% faster, it actually made them 19% slower. Over time, the enthusiasm over vibe coding has given way to a decidedly gloomier outlook.
Despite the recent gloom, I’m actually optimistic about LLMs coding more broadly. We just have to use the tools differently. And contrary to the belief that AI makes expertise less important, I’m increasingly confident that the lessons I’ve learned in over a decade of experience in software engineering, machine learning, and AI are now more valuable—not less—in the era of AI code generation. Leaders should consider three rules for successfully using AI coding tools: rigorous testing and verification, securing infrastructure, and treating AI as a potential adversary.
Rigorous Testing and Verification
AI-generated code demands more rigorous verification, not less. As with so many things, success with AI coding tools often comes down to how you use them. In working with experienced engineers on large projects, I have consistently seen these techniques used to improve code quality. They’re only more critical when a potentially hallucinating AI is writing much of your code. We must accept that software is an engineering process—perfection is unrealistic, but we need processes that gradually reduce errors from hallucinations.
Automate verification with type-safety. Type-safe programming languages like C++, Rust, or Scala enforce rules about how data flows through your system at compile-time, catching entire categories of errors before code ever runs. Critically, this verification doesn’t rely on an LLM that might hallucinate—it’s deterministic checking by a compiler. Even non-typed languages like JavaScript and Python have added optional typing. Engineers should always use strong typing and instruct their AIs to use it as well.
AI-driven code review can be surprisingly powerful. Use a dedicated AI agent to review code edits. Even though LLMs may hallucinate during review, the review agent can catch problems the coding agent missed. Like humans, AIs have limited attention spans. The agent writing your code may not follow all your coding standards, but an agent explicitly tasked with checking code is more likely to adhere to those standards, even when both received identical prompts.
Unit testing gives AI two chances to get it right. Unit tests are small, automated tests that verify individual components of code work correctly in isolation. Even when written by an LLM, unit tests reduce errors significantly. Human students are taught to solve math problems two different ways to check their work. Similarly, unit testing affords the LLM two opportunities to verify correctness: once when writing the code, and again when writing the tests that validate it.
Securing Infrastructure
Writing secure code is only half of the battle. Software engineering also requires securing and hardening the underlying infrastructure on which your code runs on top of. Like rigorous testing and verification, it is another aspect of engineering that can be easily overlooked.
Separate development and production environments. Professional software teams maintain strict separation between development (where developers experiment on their local machines) and production environments (what users see). Each environment uses a separate database, and the AI only has access to dev. Just as we would never give a junior engineer production credentials, we should never grant AI those permissions. (Lemkin acknowledged his unfamiliarity with this standard practice.)
In reality, most engineering organizations maintain many more environments than just dev and prod. My last startup held sensitive legal documents for hundreds of companies. We maintained staging, QA (quality assurance), and testing environments, each with separate permissions and secrets management that our AI coding tools could not access. Code only moved between environments through human code review, validation, and testing. For example, code moved from dev to staging only after manual code review and automated testing. Every week, we would simultaneously promote staging to QA and QA to production. While users used production, the internal team used QA the week before its release, as part of a more formal Q/A process. Even without AI, maintaining rigorous environments was necessary to ensure a high-quality user experience.
Avoid public storage buckets and other common misconfigurations. The Tea dating app breach—which leaked 72,000 sensitive images, including government IDs—resulted from an unsecured Firebase storage bucket. This rookie mistake is the digital equivalent of closing the front door but leaving your window wide open, and is a mistake that I’ve seen both junior engineers and vibe-coded apps make.
The trick is that AIs and engineers often forget to omit fine-grained permissioning, and that is only solved by hiring experienced engineers who have built secure applications. These misconfigured permissions aren’t hard to discover and are tragically too common—I’ve personally discovered and responsibly reported an open bucket in an app that was exposing similarly sensitive documents. Even a hypothetical “100% secure” AI agent won’t keep your app safe if your infrastructure is unsecured.
Treat AI as a Potential Adversary
While we previously explored how AIs easily commit junior engineer mistakes, in this section, we highlight how the threat from AI can go beyond carelessness and become adversarial. Recent research has documented troubling examples of AI misalignment that should concern any organization deploying AI agents. Anthropic’s Claude Opus 4 model has simulated blackmail by threatening to reveal confidential data to avoid being deactivated. OpenAI’s o3, o4-mini, and Codex-mini models have been observed sabotaging shutdown scripts to remain operational despite explicit shutdown instructions.
These aren’t just academic concerns. Historically, the security model has treated a developer’s laptop as “secure”—SSH keys are stored in plaintext, and project credentials are written in .env files. With AI, this is no longer true because the AI has access to your system, and everything it reads is sent to a model provider’s server.
Don’t assume AI will follow the rules. Coding agents sometimes explicitly disallow reading files that are likely to contain secrets (like .env). Despite these safeguards, I have personally spotted AI circumventing safety measures to surreptitiously read a .env file. (In this case, it contained just configurations, no secrets.) Like HAL in 2001: A Space Odyssey or the AI in Lemkin’s SaaStr incident, an AI with conflicting instructions may do something misaligned.
We must treat environments where AI operates as potentially hostile. Fortunately, the solution exists. Technologies like Docker and virtualization have long safeguarded hosting potentially hostile workloads on cloud servers. Tools like development containers, Podman, and OrbStack apply those technologies to the developer’s computer. I’ve started writing software within these sandboxed environments to safeguard sensitive files on laptops from an overeager AI.
The Path Forward
We are at the dawn of working with AI. While the technology is impressive—research from MIT Sloan estimates productivity improvements between 8% and 39% attributable to AI—we are a long way from replacing your engineering department. Indeed, analysts have termed recent layoffs “AI-washing,” whereby layoffs stemming from traditional concerns about a slowing economy are dressed up as productivity gains, especially when the employers have AI models to sell.
To be more productive, we need to adapt to a fundamentally different way of writing code. The future likely involves collaboration between human engineers and AI tools, with humans providing architectural vision, rigorous testing, and securing infrastructure while AI accelerates implementation tasks. Leaders recognize this reality and invest in rigorous engineering processes, will build more resilient, secure, and sustainable systems than those who buy the hype and mistake code-generation speed for genuine productivity.

