May 13, 2025

Vibe Engineering in the Age of Vibe Coding

In the streets of San Francisco and Mountain View, a remarkable sight has become commonplace: Waymo's autonomous vehicles confidently navigating complex urban environments, carrying passengers to their destinations without human drivers. This achievement, unimaginable just a decade ago, represents a triumph of incremental progress in autonomous driving technology. Starting from basic driver assistance features like cruise control and lane-keeping, the industry methodically advanced through various levels of automation, learning and adapting at each stage.

The world of software development is undergoing a similar transformation. Just as autonomous vehicles evolved from simple driver assistance to full self-driving capabilities, AI-assisted coding is progressing from basic autocompletion to increasingly sophisticated code generation and problem-solving. We're witnessing the emergence of AI tools that can handle increasingly complex programming tasks, yet still require human oversight and expertise — much like how autonomous vehicles still operate within carefully defined parameters and safety frameworks.

At TRM Labs, we're not just observers of the AI revolution — we're active participants, pushing the boundaries of what's possible with these emerging technologies. As a company at the forefront of blockchain intelligence and cryptocurrency compliance, we understand that staying ahead means embracing and mastering new tools and methodologies as they emerge. This is especially true when it comes to AI-assisted development.

Our engineering team has been aggressively experimenting with and adopting AI coding tools in some form — namely Cursor, CoPilot, and ChatGPT — across various projects. We're learning in real-time how to effectively integrate these tools into our development workflow, discovering both their immense potential and their current limitations. This hands-on experience has given us valuable insights into how to maximize the benefits of AI assistance while maintaining the high standards of code quality and security that our platform demands.

AI tools were liberally used in writing this article, but we are signing off on the words just like we own the code we check-in.

It's crucial to note that the AI landscape is evolving at an unprecedented pace. With the emergence of Model Context Protocol (MCP), sophisticated AI agents, and standardized agent-to-agent communication protocols, the levels of autonomy and capabilities described in this article could become outdated within weeks, if not days. While the fundamental principles of human-AI collaboration outlined here remain valuable, we have to stay actively informed about the latest developments and be prepared to adapt our workflows as new capabilities emerge.

This article captures our learnings and best practices, drawn from real-world implementation experiences. It's designed to help both individual developers and organizations navigate the exciting but sometimes challenging landscape of AI-assisted development. Whether you're just starting to explore these tools or looking to optimize your existing AI-enhanced workflow, these insights will help you make the most of this transformative technology.

Levels of autonomy

The world of autonomous driving has given us a useful framework for understanding increasing levels of automation, from simple driver assistance (L1) to fully autonomous vehicles (L5). A similar evolution is happening in software development, powered by Large Language Models (LLMs) and tools like CoPilot, Cursor, Windsurf, Augment, Claude Code, etc. that embed AI directly into our IDEs or CLIs. Understanding these "levels" of AI coding assistance can help us maximize our productivity, adapt our workflows, and prepare for a future where our relationship with code generation is fundamentally collaborative.

L1: AI as cruise control (Enhanced autocompletion and snippets)

At this level, Cursor acts like an intelligent cruise control or lane-keeping assist. It offers advanced code completions, suggests small, contextual snippets, and helps with boilerplate. The developer is fully in command, making all micro-decisions, but benefits from AI "smoothing out the ride" for common, repetitive tasks.

Example: Asking Cursor for a specific regex pattern for parsing alert priorities (like r":\\\\w+?: (\\\\w+) priority issue is active") or a boilerplate try-except block for an API call.

L2: AI as partial autopilot (Function and block generation)

Here, you're still the pilot, but you can delegate more significant tasks. You define a function's purpose, inputs, and outputs, and Cursor generates the implementation. You monitor its work closely, integrate it, and test it. The AI handles both "steering" (logic) and "acceleration" (code volume) for discrete components.

Example: Our request, "Write a Python function using aiohttp to execute an NRQL query against the New Relic Insights API...," started building individual nodes like node_execute_initial_newrelic_query for the New Relic triage system; providing a specification and getting a functional draft for a single, well-defined piece of logic.

L3: AI as conditional automation (Guided feature implementation)

This is where things get truly exciting with modern AI. You define a larger feature or a multi-step process, and Cursor can help plan and execute it, with you providing guidance and making key decisions at crucial junctures. The AI operates more autonomously within a defined "operational design domain" (e.g. a specific module or workflow). But you're ready to "take the wheel" if it veers off course or needs clarification.

Example: From an initial prompt, "Figure out a plan to add a New Relic node to the graph and use it for the triage slack message shortcut and command... How would this orchestration be done elegantly with langgraph?" — this led to a comprehensive, multi-node plan which was then executed node-by-node. The AI drafted the code for each node, and after review and acceptance ("proceed"), it moved to the next, effectively guiding the AI through a complex feature implementation.

L4: AI as high automation (Agent-driven scaffolding and complex sequences)

While still emerging, this level involves the AI taking on more complex, end-to-end tasks within well-defined parameters. Imagine defining a new microservice's API contract, and an AI agent scaffolds the project, writes boilerplate, implements CRUD operations, sets up basic CI/CD, and even drafts initial tests based on established patterns within your organization's codebase. The developer sets the strategic goals, provides access to relevant internal libraries and standards, and reviews the complete "trip" segment, but doesn't micromanage the "driving" within that segment.

L5: AI as full autonomy (Autonomous software generation)

The dream of AI independently taking a high-level business problem ("reduce customer churn by improving onboarding") and delivering a fully coded, tested, deployed, and even self-monitoring and self-healing software solution. Like L5 driving, this is largely aspirational for general software development today, but the progress towards more autonomous agents is undeniable.

Currently, developers using Cursor are typically operating between L1 and L3, with L2 and L3 delivering the most significant productivity boosts for complex tasks. The key to success is not to relinquish control, but to become a skilled "AI co-pilot," knowing when to delegate, when to guide, and when to take direct command. This involves mastering the art of the prompt, understanding the AI's capabilities and limitations, and maintaining rigorous development practices.

Practical guidelines

We will use an example of building an automated New Relic alert triage system for Slack to illustrate the principles that we learnt. This project involved creating a multi-node workflow that could parse alerts, gather context, generate queries, and provide intelligent responses — making it an ideal case study for exploring different levels of AI assistance and best practices. The complexity of the task, requiring both technical precision and domain understanding, showcases how developers can effectively collaborate with AI tools while maintaining control over the development process. Most examples in this article will refer to Cursor — but in our opinion, the same principles apply to most code assistants.

1. Steering the AI: The power of clear, contextual prompts

The "garbage in, garbage out" principle applies forcefully to LLMs. The clearer and more context-rich your prompts, the better the AI can understand your needs and generate relevant, useful responses. Think of your prompt as the destination and critical waypoints you provide to a GPS: the more precise, the better the route.

Be specific

Vague requests lead to vague or incorrect results. Instead of "Write a function for New Relic," a much better prompt is, "Write an asynchronous Python function named node_execute_initial_newrelic_query that takes a GraphState object. It should use aiohttp to execute an NRQL query (found in state.initial_nrql_query) against the New Relic Insights API. It needs to retrieve the API key and account ID from state.prompts.new_relic_config, handle potential HTTP errors, and store the results in state.initial_newrelic_logs."

Provide context

Language/framework/libraries: Always specify crucial elements of your stack (e.g. "in Python 3.11, using LangGraph, Pydantic for models, and aiohttp for async HTTP calls").
Existing code and patterns: If you're modifying existing code or want the AI to follow a certain design pattern already present in your project, provide snippets or reference the relevant files. Cursor's ability to "see" your current file and attached context is invaluable here.
Desired outcome and intent: Explain what you're trying to achieve (the "why") not just how you think it should be done. This allows the AI to potentially suggest alternative, perhaps more elegant or efficient, approaches.
Input/output examples: This is often the most powerful way to convey requirements. For instance, when I started the New Relic triage project, the initial idea was to parse PagerDuty alerts. However, I quickly realized and clarified, "actually instead of using pagerduty alert, we can use the New relic alert in slack itself as the source message. Here is a sample message: ..." and provided the exact New Relic alert text. This single piece of input context was a game-changer, allowing it to build accurate parsing logic for node_check_if_newrelic_alert_and_parse immediately.

2. Iterate and conquer: The step-by-step approach to complex features

For anything beyond trivial tasks, don't expect Cursor to generate a perfect, complete, multi-component feature in one shot. Complex software is best built iteratively, like constructing a building floor by floor, not all at once.

Break down problems

Divide the larger goal into smaller, logically coherent, manageable chunks. Ask Cursor to help with one component or function at a time. This makes it easier to provide focused context, review the output, and test.

Iterative refinement

Get a first draft for a small piece, review it, test it (even mentally or with a quick local run if possible), and then ask for modifications or the next piece. The entire process of building the New Relic triage workflow exemplified this. We didn't try to build the whole LangGraph flow at once. Instead, we focused on one node at a time:

node_check_if_newrelic_alert_and_parse (identifying and parsing the alert)
node_fetch_slack_thread_context (gathering more information)
node_generate_initial_nrql (crafting the first query with LLM help)
And so on, through execution, ID extraction, detailed querying, and response formatting. Each "proceed" from you was a signal to move to the next well-defined step, with the AI generating the code for that step. This also allowed us to catch issues early, like the linter error with string formatting in one of the generated nodes.

3. Show, don't just tell: The unmistakable value of concrete examples

When you need the AI to generate code that processes specific data structures, interacts with external APIs, or follows a particular output format, providing concrete examples of inputs and expected outputs is far more effective than lengthy descriptions.

Sample data

As mentioned, the sample New Relic alert message was key for parsing. Later, when discussing how to extract a request_id or trace.id from logs for the node_extract_request_id function, I provided a sample JSON log: "here is a sample new relic log json. check for requestId and trace.id in this: {...}". This immediately allowed it to confirm that the proposed known_id_fields in the parsing logic were appropriate and would work with the actual data structure, making the direct field extraction more robust and reducing reliance on the LLM fallback.

Desired format

If you want output in a specific format (e.g. a particular JSON structure for an API response, a specific NRQL query pattern, a certain style of documentation), show an example. For NRQL generation, while we relied on the LLM's expertise, providing examples of good vs. bad queries for your specific New Relic setup could further refine its output.

4. Leverage AI for planning and design: Your architectural sounding board

Before you dive into implementing a new feature, use Cursor as a brainstorming partner and architectural consultant. Its ability to process vast amounts of information can help you think through design choices.

High-level planning

The very first prompt to start is a perfect example: "Figure out a plan to add a New Relic node to the graph and use it for the triage message shortcut and command... How would this orchestration be done elegantly with langgraph?" This set the stage for a structured, multi-node approach that we then fleshed out. The AI proposed a sequence of nodes, state variables, and conditional logic, which was then refined.

Discussing alternatives

You can ask Cursor about different architectural approaches (e.g. "Should I use a webhook or poll for this data?"), library choices ("What are the best Python libraries for interacting with the New Relic API for logs?"), or the pros and cons of various design patterns for a given problem.

5. Specify constraints and configuration details: Grounding the AI in your reality

LLMs are trained on general knowledge; they don't inherently know the specific constraints, conventions, or operational details of your project or environment unless you explicitly tell them.

Libraries, frameworks, and versions

Explicitly state which ones to use (e.g., aiohttp for async calls) or avoid. Specify versions if compatibility is critical.

Configuration management

This is crucial for operational code. When discussing API keys for New Relic, I specified, "New Relic API key should be same as any other key, environment loaded if mock_slack and secret manager if not." Similarly, for the New Relic bot identifier in Slack, I prompting "Does this look like a bot id - D123AB1A1AB? Make this a configurable variable instead of hard coding it." was vital. These details ensure the generated code for accessing configuration (e.g. in node_check_if_newrelic_alert_and_parse or node_execute_initial_newrelic_query) aligns with your project's deployment and security practices.

Error handling and edge cases

Prompt the AI to consider error handling. "How should this function handle a timeout from the New Relic API?" or "What if the expected request_id field is missing from the logs?"

6. The developer is still in charge: Review, test, and own the code

Cursor is an incredibly powerful assistant, but it's still just an assistant. You, the developer, are ultimately responsible for the code that gets committed and deployed. This is perhaps the most critical best practice.

Review generated code meticulously

Carefully scrutinize any code produced by the AI. Check for:

Correctness: Does it do what you intended?
Efficiency: Are there performance bottlenecks?
Security: Are there any potential vulnerabilities (e.g. improper input sanitization, insecure API calls)?
Readability and maintainability: Does it adhere to your project's coding standards? Is it overly complex?

Test thoroughly

Integrate AI-generated code and test it with the same rigor you would apply to code you wrote entirely yourself. Unit tests, integration tests, and end-to-end tests are all essential. When we added the New Relic test case to test_local.sh, the next step was implicitly for you to run it and verify the behavior through logs and application output.

Understand, don't just copy-paste

If the AI generates code you don't fully understand, ask it for an explanation before integrating it. "Can you explain how this regex in node_check_if_newrelic_alert_and_parse works?" or "Why did you choose to use aiohttp.ClientSession() here?" This not only helps you vet the code but also improves your own knowledge.

Force yourself to do one “Accept” of one change/edit at a time instead of “Accept” on the full file or “Accept All.” Your understanding of generated code will multiply significantly, and you will spot where AI did what you said and not what you wanted or sometimes even what it hallucinated.

7. Use AI for debugging and explanations: Your interactive troubleshooter

Cursor can be a great debugging partner, helping you understand and resolve issues faster.

Error messages

Paste error messages and stack traces and ask for insights or potential causes.

Code explanation

If you encounter a complex piece of code — whether it's legacy code, from a teammate, or even a particularly dense block generated by the AI itself — ask Cursor to explain it in plain terms.

Troubleshooting and linter errors

During my chat, a linter error surfaced: "String literal is unterminated" after a code generation step. You could easily dismiss the quality for such simple things, but we worked through it collaboratively — with the AI proposing a fix (switching to triple-quoted strings for prompts), which ultimately resolved the issue. This interactive debugging is a common and highly effective use case.

8. Embrace and guide the context window

Modern AI coding tools, including Cursor, often have an awareness of your open files and project context. Use this to your advantage. When discussing changes, Cursor often "knows" which file you're talking about because it's the active one in your editor. The contextual information it includes with your prompts to the LLM (like and ) helps the LLM stay grounded in the relevant part of your codebase, leading to more accurate and contextually appropriate suggestions. You can further guide this by explicitly referencing file paths or specific function names when your query spans multiple files.

Just like prompt engineering for ChatGPT, you will have to invest in .cursorrules (or equivalent for other editors) if you really want to unlock higher levels.

A mini case study revisited: New Relic alert triage

Our interaction to build the New Relic alert triage workflow in Slack serves as a practical demonstration of these best practices:

Planning and design (L3): We started with our high-level goal and request for a LangGraph plan. The AI outlined a multi-node architecture.
Clear examples (input context): We provided a sample New Relic alert message and a sample JSON log to cursor, which were pivotal for accurate parsing and ID extraction logic.
Step-by-step implementation (iterative refinement): We built the workflow node by node, with the AI drafting the code for each. After each step, we read the plan provided by cursor and then used the "proceed" directive that acted as approval and a signal for the next step.
Configuration and constraints: API key handling, Slack bot ID configurability, and library choices (aiohttp, LangChain) were discussed and incorporated.
Iterative adaptation: The plan was adapted (e.g. switching from PagerDuty to direct New Relic alerts) based on your evolving requirements and new information.
Debugging: We jointly addressed and fixed a linter error in AI-generated code.
Code generation and implied review: For each node, the AI generated code. The implicit next step — always — is developer review, testing (like the test_local.sh addition), and integration.

The future is collaborative

AI coding assistants like Cursor are not here to replace developers, but to augment and empower them. By moving up the "levels of automation" responsibly — from simple completions to guided feature implementation — we can offload tedious work, accelerate development, and focus more on the creative, architectural, and problem-solving aspects of software engineering.

The key to unlocking this potential lies in treating the AI as a true collaborator. Master the art of the prompt, provide rich context, iterate intelligently, and always maintain ownership and critical oversight of the code. As these tools evolve, so too will our partnership with them, leading to new heights of productivity and innovation.