From Scanners to Gatekeepers: How QA Secures Agentic Development

Key Takeaways

As AI agents increasingly write and modify code, QA becomes the final authority on safety, correctness, and release readiness.
Modern testing now requires security scanning, explainable evaluation, and structured gatekeeping, not just execution.

Read Today’s Notes

What changed this week

Three signals point to the same shift: AI-driven development only works if QA evolves into an enforcement and evaluation layer.

Signal breakdown

1. Promptfoo brings LLM security into CI/CD

New scanner detects prompt injection, PII leakage, and excessive agent autonomy
Works as GitHub Action, VS Code extension, or CLI
Traces user input → prompt construction → model invocation
Enables true “shift-left” security testing for LLM apps

Why it matters:
Traditional SAST/DAST tools cannot see prompt-level vulnerabilities. QA now owns this gap.

2. QA as the “Gate Keeper” in agentic coding workflows

Emerging Kanban-style workflow:
- Dev agent writes code
- QA agent validates:
  - unit tests exist and pass
  - integration tests
  - UI automation
QA becomes a decision gate, not a downstream executor

Why it matters:
Speed without validation creates silent risk. QA defines the stop/go criteria for AI output.

3. TIGERScore enables explainable, reference-free evaluation

Evaluates AI-generated text without golden answers
Uses instruction-based rubrics (accuracy, relevance, comprehension)
Produces multi-dimensional, explainable scores

Why it matters:
Most real-world LLM outputs don’t have a “correct” answer. TIGERScore gives QA a defensible way to say why something failed.

Core insight

AI doesn’t remove QA—it formalizes QA as governance.

Companion Newsletter

QA Is Becoming the Gatekeeper of AI Systems

AI agents can now write code, generate tests, and ship features faster than ever.
But speed introduces a new problem: who decides what is safe to release?

This week’s signals show that decision increasingly belongs to QA.

Promptfoo’s new code scanner makes LLM-specific vulnerabilities visible inside CI/CD pipelines—something traditional security tools simply miss. At the same time, practitioner-led agentic workflows explicitly position QA as the gate that all AI-generated code must pass through.

Finally, TIGERScore addresses a long-standing pain point in AI testing: judging outputs when no perfect answer exists. By using explainable, reference-free metrics, QA teams can evaluate quality without relying on subjective “vibe checks.”

Together, these trends redefine the role of testing. QA is no longer about executing steps after development. It is about enforcing standards, explaining failures, and deciding when AI output is trustworthy enough to ship.

What to try today