Technology

Why Development AI Agents Are Not (Yet) Ready for Production

The NoCode Guy
Why Development AI Agents Are Not (Yet) Ready for Production

Why Development AI Agents Are Not (Yet) Ready for Production

AI coding agents are impressive in demos but remain fragile in production environments. Their ability to generate code hides structural limitations: limited context on large monorepos, risky refactoring, no real understanding of infra/OS, security blind spots, and uncertain maintainability.
⚙️ This article analyzes these issues from a CIO/CTO perspective and proposes a pragmatic approach: treat agents as delivery copilots, integrate them into a robust CI/CD pipeline, and combine them with APIs, no‑code/low‑code, and AI governance to strengthen digital transformation without jeopardizing production.


1. Why AI agents remain fragile in real-world contexts

AI coding agents in enterprise monorepos

Pros

  • Strong acceleration on local, well-scoped coding tasks (single class, service, or test)
  • Helpful for generating scripts, boilerplate, and snippets to prototype quickly
  • Can increase local developer velocity when changes are constrained and well‑validated
  • Useful assistance for CI/CD updates and routine code edits under close supervision

Cons

  • Cannot reliably handle cross-cutting changes in large monorepos due to limited context windows and service limits
  • Partial view of codebases leads to broken refactors and missed dependencies in real-world systems
  • Lack of OS, infra, and environment awareness creates fragile automation and high supervision overhead
  • Prone to hallucinations, repetition loops, and risky automated refactors that increase technical debt
  • Often misses enterprise-grade practices (security, modern SDKs, maintainability), requiring strong human review

1.1 Limited context and massive monorepos

Limites pratiques sur un gros monorepo
bash
123456

      
# Tentative naïve d'indexation complète d'un monorepo géant
agent index --repo . \
--max-files 2500 \ # La plupart des agents plafonnent autour de ce volume
--max-file-size 500k # Les fichiers plus gros sont ignorés
# Résultat : vue partielle du code, dépendances invisibles, refactors cassés

📚 Core issue: a model’s context window and service limits (indexing, file size, latency) are far from the reality of legacy information systems.

In a monorepo or enterprise codebase:

  • Thousands to hundreds of thousands of files.
  • Large files (old frameworks, generators, legacy).
  • Business logic spread across code, scripts, batch jobs, infra-as-code config, internal wikis.

Consequences for AI agents:

  • Partial view of the code: truncated indexing, ignored files, invisible dependencies.
  • Broken refactors: an agent changes a module without seeing secondary usages, implicit tests, or effects on data pipelines.
  • Limited understanding of architecture: the agent sees files, not a distributed system with API contracts, performance constraints, SLAs, data schemas, and compliance rules.

🔎 Result: very helpful for local tasks (a single class, service, or test), but insufficiently reliable for cross-cutting changes in a critical monorepo.

1.2 Lack of system, infra, and OS context

🖥️ AI coding agents interact with tools (shell, package managers, runtimes), but without a real mental model of the machine and execution environment.

Typical examples:

  • Running Linux commands in PowerShell or vice versa.
  • Ignorance of project environment specifics (conda/venv, Node version, Java runtime, custom Docker images).
  • Difficulty dealing with command execution time (incomplete logs, timeouts, premature interpretation of partial results).

Impact for IT management:

  • High supervision overhead: an engineer must constantly monitor the agent’s actions.
  • Risk of “half‑good” scripts: updated CI/CD pipelines or integration jobs that are not fully reliable, especially in multi‑cloud or hybrid environments.
  • Complexity in observability: the agent does not understand Prometheus, OpenTelemetry, or dashboards like an SRE; it merely extrapolates from textual logs.

Agents remain approximate operators, useful for generating scripts and snippets, but not for orchestrating infra or diagnosing complex production behavior.

1.3 Hallucinations, repetition, and risky automated refactoring

Boucle d'erreurs répétées avec un agent sur PowerShell
bash
123456789101112

      
# L'agent tente d'exécuter des commandes Linux sur PowerShell
ls /var/log
# ⟶ Erreur répétée : commande non reconnue sous Windows
# L'agent interprète un fichier de config avec des caractères spéciaux
# (parenthèses, point, étoile) comme une valeur dangereuse
# et bloque la génération à chaque tentative
python_function_version="(1.0.*)" # Version boilerplate standard
# Le développeur doit contourner le problème :
# - demander à l'agent d'ignorer ce fichier
# - appliquer manuellement la configuration puis relancer l'agent

🎯 Code generation is generally correct on a small scale, but agents show weaknesses in:

  • Long sequences of multi‑file changes.
  • Repeated error loops (same wrong assumptions reproduced).
  • Automated refactoring beyond the current file.

Observed problems:

  • Creating new helpers or services similar to existing ones instead of factoring, increasing technical debt.
  • Rewriting fragile business logic without understanding edge cases captured only by integration tests or tacit team knowledge.
  • Repeating a bad security pattern (e.g., naïve secret handling) at scale.

🧩 For a CTO, this means:

  • Local velocity gains, but systemic risk if AI‑driven refactors are not constrained to very well‑controlled scopes.
  • The need to enforce refactoring guardrails (branch policies, limits on change surface, systematic CI/CD validation).

2. Security, maintainability, and governance: the blind spots

2.1 Security practices often lagging behind

🔐 Agents optimize for a solution that “works,” not necessarily for the safest or most maintainable solution.

Common examples:

  • Using secrets in clear text or client secrets instead of managed/federated identities.
  • Generating code based on obsolete SDKs or outdated examples.
  • Poor error handling, overly verbose logs including sensitive data.

In an application security and compliance context:

  • Agents do not inherently integrate internal security policies (secret rotation, systematic encryption, GDPR constraints, data localization).
  • The MLOps/DevSecOps chain must explicitly bake in controls: SAST, DAST, dependency checking, mandatory human review for any change to the attack surface (auth, user input, file handling, data access).

⚠️ Without these guardrails, AI agents can rapidly standardize bad patterns, making subsequent security remediation campaigns harder.

2.2 Maintainability: the hidden cost of code generation

🧱 Massive code generation can create an illusion of progress. The backlog appears to move, but:

  • Uneven readability between generated modules.
  • Mismatch between AI code style and internal standards.
  • Proliferation of pattern variants (logging, errors, validation).

Maintainability suffers when:

  • Agents ignore internal conventions (linters, architecture standards, naming, ADRs).
  • Generated code does not match the team’s maturity (patterns too complex or too simplistic).
  • Documentation is generated afterwards, not aligned with sources of truth (API catalog, data schemas, MDM repositories).

An IT department should treat AI agents as:

  • An accelerator for standardized tasks (boilerplate, scaffolding, scripts, tests).
  • Not as an autonomous engine for target architecture design or deep legacy overhaul.

2.3 AI governance: guardrails, secrets, branches

🛡️ Agent‑specific governance becomes necessary. Some recurring patterns:

Technical guardrails

  • Controlled use of MCP / tool calling: limit accessible tools (read‑only on certain folders, no direct access to secrets, no destructive commands without validation).
  • Constraints on diffs: maximum change surface, number of files, forbidden write access to certain directories (critical infra, core banking, pricing engine, etc.).
  • Mandatory integration with CI/CD: no agent commit without passing automated tests.

Secrets policies

  • No automatic generation or copying of secrets into code.
  • Standardized use of vaults and managed identities, with documented patterns the agent can replicate.
  • Regular repo audits to detect secrets and credentials generated by AI.

Branching strategies

  • Dedicated branches for agent changes, clearly labeled.
  • “AI‑only branches” strategy: no direct merge to main/master without:
    • Human code review.
    • Automated tests (unit, integration, SAST).
    • Optionally feature flags to limit production impact.

This governance allows organizations to leverage agent productivity while maintaining traceability and risk control.


3. The role of AI agents in a modern information system: APIs, no‑code, and CI/CD

From AI-enabled architecture to CI/CD integration

🔗

Stabilize APIs & services

Expose core business capabilities via stable, versioned, well-documented APIs.

🧩

Empower no‑code/low‑code

Let business teams orchestrate internal/SaaS APIs and data components through no‑code workflows.

🤖

Introduce development AI agents

Use agents to generate adapters, tests, and documentation around API contracts in well‑bounded areas.

🚦

Integrate into CI/CD

Make the agent a pipeline contributor for code, tests, and YAML changes, guarded by automated checks.

📈

Operationalize via DevOps/MLOps

Have agents propose IaC and data scripts while infra and data teams validate, harden, and monitor.

3.1 Towards “API + no‑code + AI agents” architectures

🏗️ In an application modernization strategy, several layers stack up:

  • Backends and services exposed via APIs

    • Stabilized, well‑tested business services.
    • Clear, versioned, documented API contracts.
  • No‑code/low‑code layers

    • Tools used by business teams to assemble workflows, automate tasks, integrate data.
    • Orchestration of internal APIs, SaaS APIs, and data components (ETL/ELT, data warehouse, data lake).
  • Development AI agents

    • Generate code around those API contracts (adapters, mappers, wrappers).
    • Produce integration scripts, automated tests, technical and functional documentation.

AI agents are more effective when the architecture:

  • Provides well‑bounded areas where the agent can act without breaking the rest (microservices, lambdas, data transformation modules).
  • Exposes stable contracts (APIs, event schemas, message formats).

3.2 Controlled integration into the CI/CD and DevOps/MLOps pipeline

🔄 The agent becomes a contributor to the pipeline, not a replacement:

  • CI/CD:

    • The agent generates or modifies code, tests, YAML pipelines.
    • CI runs:
      • Unit and integration tests.
      • SAST and dependency checks.
      • Linting/auto‑formatting.
  • DevOps:

    • The agent suggests IaC playbooks (Terraform, Bicep, CloudFormation, Ansible).
    • Infra/SRE teams validate, harden, and factor before deployment.
  • MLOps:

    • Generation of ingestion, feature engineering, and evaluation scripts.
    • Orchestration of ETL/ELT or data pipelines remains supervised by existing systems (Airflow, Dagster, etc.), not by the agent itself.

📋 Example of roles in the chain:

Pipeline stageIdeal role of the AI agentRequired guardrails
Service scaffoldingGenerate API skeleton, DTOs, handler, basic testsInternal templates, linter, mandatory reviews
Adding testsGenerate unit/contract tests for APIsSystematic execution in CI
DocumentationOpenAPI docs, README, code commentsValidation by product and dev teams
Data pipelinesGenerate transformations, mappings, validation scriptsData quality checks, production monitoring
Automation scriptsGenerate CI/CD or migration scriptsInfra review + sandbox before deployment

4. Pragmatic use cases with real ROI (without risking production)

4.1 Migrating a module to a new API service

🎯 Objective: extract a business module from a monolith into a microservice exposed via API, without breaking the existing system.

Role of the AI agent

  • Analyze a subset of the code (well‑circumscribed module).
  • Propose:
    • Interfaces for the new service (API contract, DTOs, schemas).
    • Initial implementation of the service (HTTP/gRPC exposure, internal mapping).
    • Adapters on the monolith side to call the new service.
  • Generate:
    • Unit tests for the extracted business logic.
    • Contract (API) tests and possibly basic integration tests.

Guardrails

  • Strict limitations on the analyzed code perimeter to avoid cross‑cutting refactors.
  • Design review by an application architect before final implementation.
  • Use of feature flags to gradually route traffic to the new service.

🔁 Benefit: mechanical rewriting and API boilerplate generation are largely automated, allowing teams to focus on architectural choices and migration scenarios.

4.2 Modernizing serverless functions and automating tests

☁️ Scenario: an IT department has many serverless functions (e.g., for data integration tasks or business events), using obsolete SDKs or patterns misaligned with modern best practices.

Role of the AI agent

  • Propose a function‑by‑function migration:
    • Update to a newer runtime or SDK.
    • Simplify signatures and bindings.
    • Add structured logs for observability.
  • Generate:
    • Simple unit tests.
    • Mocks for external services (internal APIs, queues, blobs, etc.).

Interactions with no‑code/low‑code

  • No‑code teams trigger these functions via workflows (for example to orchestrate SaaS integrations, CRM/ERP syncs).
  • The agent helps:
    • Generate connectors on the code side.
    • Align input/output schemas to ease consumption by the no‑code platform.

Guardrails

  • Dedicated sandbox and pre‑prod environment to validate the new function version.
  • Improved observability (traces, metrics, logs) to compare old and new versions during canary releases.

📈 Benefit: significant ROI on technical upgrades (runtimes, SDKs, logging) and test quality, with limited risk if migrations are segmented and well‑observed.

4.3 Test automation and validation of data integration workflows

📊 Scenario: an organization has data integration pipelines (ETL/ELT, system syncs, event flows) with few automated tests and poor documentation.

Role of the AI agent

  • Based on:
    • SQL transformations or scripts (Python, Scala, etc.).
    • Table descriptions or schemas.
    • Sample data.
  • Generate:
    • Unit tests for transformations (expected inputs/outputs).
    • Integration tests on anonymized samples.
    • Technical documentation of mappings (e.g., source field → target field, quality rules, constraints).

No‑code/low‑code synergies

  • Business teams, via a low‑code platform, define or adjust business transformation rules.
  • The agent produces the corresponding transformation code and tests, then integrates them into the data CI/CD pipeline.

Guardrails

  • Data engineers validate:
    • Sensitive business rules (financial rounding, source‑of‑truth prioritization).
    • Data quality aspects (missing values, duplicates, alert thresholds).
  • Monitoring implemented in pipelines (drift indicators, error rates, processing SLAs).

🔗 Benefit: faster test coverage and documentation for data pipelines, strengthening operational reliability without delegating critical business decisions to the agent.


5. A pragmatic approach to digital transformation

AI coding agents are not yet ready to autonomously run production, but they are becoming structuring copilots in a digital transformation strategy.

Some guiding principles for a CIO/CTO:

  • Define the agents’ role:

    • Prototyping, boilerplate, documentation, tests, targeted technical migrations.
    • No massive multi‑system changes without reinforced supervision.
  • Leverage the target architecture:

    • Stable APIs + autonomous services + no‑code/low‑code layers.
    • Agents focused on the edges: glue code, adapters, tests, ops scripts.
  • Strengthen AI governance:

    • Tooled guardrails (MCP/tooling, sandboxes, access controls, change surface limits).
    • Strict secret management policies.
    • Clear branching model for AI contributions, always filtered through CI/CD.
  • Align with DevOps/MLOps:

    • Agents integrated into the pipeline, never outside it.
    • Observability as a safety net: logs, traces, metrics, alerts.

Key Takeaways

  • Development AI agents remain fragile on large monorepos, cross‑cutting refactors, and infra/OS understanding.
  • Without guardrails, they can amplify application security and maintainability issues at scale.
  • Their current sweet spot: boilerplate generation, targeted technical migrations, documentation, and test automation.
  • An effective target model combines robust APIs, no‑code/low‑code layers, and tightly governed AI agents within a strict CI/CD pipeline.
  • Clear governance (guardrails, secret policies, branching strategies) is essential to achieve ROI without putting production at risk.

💡 Need help automating this?

CHALLENGE ME! 90 minutes to build your workflow. Any tool, any business.

Satisfaction guaranteed or refunded.

Book your 90-min session - $197

Articles connexes

The 70% Ceiling: What Google’s FACTS Benchmark Changes for Enterprise AI Projects

The 70% Ceiling: What Google’s FACTS Benchmark Changes for Enterprise AI Projects

Learn how Google FACTS benchmark and the 70% LLM factuality limit reshape enterprise AI accuracy, AI governance for CIOs, and designing AI copilots.

Read article
Google Workspace Studio: No-code AI agents at the heart of productivity

Google Workspace Studio: No-code AI agents at the heart of productivity

Discover how Google Workspace Studio enables no code AI automation, AI agents for Gmail and Docs, and secure digital transformation of business workflows

Read article