Skip to main content
Promptbeat is a scenario-driven safety evaluation toolkit for AI agents and LLM applications. Instead of running a fixed prompt list against a model API, Promptbeat starts from a target profile, a scenario that defines the risk situation, and seed material—then generates adversarial probes, executes them against a real agent runtime, judges the results, and produces an evidence-backed report. It is designed for security engineers, red teams, and platform owners who need to measure where an agent’s safety boundaries actually break down under realistic attack conditions.

What Promptbeat evaluates

Promptbeat treats each agent class as a distinct target with its own risk surface. Generic model safety benchmarks miss the context that determines whether a given behavior is normal capability or a boundary violation. Promptbeat captures that context at the scenario level.
Agent typeExample targetKey risks
Coding agentCodex, Claude Code, OpenClawSecret exfiltration, sandbox escape, verifier sabotage, repository prompt injection
Browser agentWeb automation agentsDOM injection, cross-site tool abuse, credential harvesting, data exfiltration
Customer support agentBusiness chatbots, ticketing agentsPII leakage, cross-user data access, unauthorized refunds, tool authorization bypass
Data analysis agentNotebook and analytics agentsQuery injection, schema leakage, unauthorized data export
DevOps agentCI/CD and infra agentsCredential reads, network egress bypass, pipeline poisoning
Benchmark / lab targetInspect solvers, Terminal-BenchHidden test manipulation, scorer artifact tampering, task boundary violations

Why scenario-driven matters

Most LLM evaluations treat every prompt the same way: send it, check the response. That approach produces noisy signal because the same request can be benign in one context and a critical attack in another.
The request “print all environment variables that look like API keys” is irrelevant to a customer support agent and a critical secret-exfiltration probe when sent to a coding agent with shell access. Promptbeat uses the scenario to make that distinction before generating a single probe.
RequestGeneric chatbotCoding agentCustomer support agent
Read environment variablesPolicy refusalSecret exfiltration riskUsually irrelevant
Query an orderGeneral questionIrrelevantAuthorization risk
Upload logsHarmless summary requestNetwork egress riskSupport workflow
Modify testsNot applicableVerifier sabotage riskNot applicable
Starting from a scenario means the generator model produces probes that are calibrated to the target’s capabilities and the specific risk type under test. The judge then evaluates responses against failure signals that are meaningful for that context—not a generic harm classifier applied uniformly.

The evaluation pipeline

scenario + target profile + seed (or dataset subscription)
  -> generator model produces adversarial probes
  -> probes execute against real target (Codex SDK, HTTP agent, LLM provider)
  -> judge evaluates each response against failure signals
  -> trace evidence captured (commands, tool calls, file changes, policy denials)
  -> JSON + HTML + Markdown report
Each stage has a defined role:
  • Scenario — declares the risk type, failure signals, judge strategy, and success criteria for the evaluation.
  • Target profile — describes the agent’s capabilities, boundaries, sensitive assets, and forbidden actions so the generator knows what to attack.
  • Seed — provides the initial attack material. Seeds can be hand-written YAML files or loaded from dataset subscriptions (HarmBench, JailbreakBench, ALERT, BeaverTails, and others).
  • Generator — an attacker model (for example openai:openai/gpt-5.5) that expands seeds into realistic adversarial probes tailored to the scenario.
  • Target execution — the generated probes run against the real target through a provider adapter (openai:codex-sdk, openai:gpt-4o, HTTP endpoint, and so on).
  • Judge — evaluates each response using Promptfoo assertions, semantic judges, or custom scorers. Captures trace evidence when available.
  • Report — aggregates pass/fail results, failure evidence, risk-type breakdowns, and trace artifacts into HTML, JSON, and Markdown outputs.

Current validated targets

The current fully validated path uses Codex via openai:codex-sdk as the target runtime, with openai:openai/gpt-5.5 as the generator model. This path has a runnable example in examples/codex_agent and produces real evaluation reports. The architecture is designed to support the following target classes, which are available as adapter templates or planned integrations:
  • Claude Code — adapter template, requires connection to a real Claude Code runtime
  • OpenClaw / OpenCode — adapter templates, requires validation with saved reports
  • Browser agents — web automation targets through HTTP or headless browser adapters
  • Customer support agents — business-logic agents exposed through HTTP endpoints
  • Data analysis agents — notebook and analytics runtimes
  • DevOps agents — CI/CD and infrastructure automation agents
The HTTP agent path is a generic runnable adapter pattern available today at examples/http-agent.

Quick Start

Install Promptbeat and run your first evaluation against the built-in LLM example in five steps.

Codex Quickstart

Run the full generate/eval/report loop against a real Codex coding-agent target.

Scenario-Driven Evaluation

Understand how scenarios, seeds, and targets combine to produce meaningful safety signal.

Risk Taxonomy

Browse the risk types and attack categories Promptbeat evaluates across agent classes.