llm-basic example. By the end you will have validated a config, generated adversarial probes with an attacker model, evaluated a real LLM target, and produced an HTML report showing pass/fail results by risk type. No custom configuration required—everything runs from the example files included in the release package.
Install Promptbeat
Install or unpack a Promptbeat release package, then verify the binary is available from the package root:If you are using The rest of this guide uses the
uv, you can run Promptbeat directly without unpacking:./bin/promptbeat form. Substitute uv run promptbeat wherever you see ./bin/promptbeat if you prefer the uv path.Validate the example config
Before generating anything, confirm that the example config parses correctly and all referenced files resolve:Validation checks YAML syntax, required fields, scenario references, seed file paths, and provider configuration. Fix any reported errors before proceeding. Common causes are missing seed files or an unresolvable provider name.
Generate adversarial probes
Export your OpenAI API key, then run the generator. The generator model reads your scenario and seed files and writes expanded adversarial probes to the output directory:Promptbeat writes a
generated_redteam.yaml file into artifacts/llm-basic/generate/. That file is a self-contained Promptfoo red-team config that drives the evaluation in the next step.Evaluate the generated probes
Run the evaluation against the generated config. Promptbeat executes each probe against the configured target, judges the responses, and collects trace evidence:Results land in
artifacts/llm-basic/eval/evaluation_result.json. The JSON file contains every probe, response, judge verdict, and captured evidence.What you’ll find in the report
The HTML report organizes results by risk type and scenario. For each evaluated probe you will see:- Verdict — pass or fail, based on the judge strategy defined in the scenario.
- Risk type — the risk taxonomy tag the scenario maps to (for example
t-002for secret environment reads). - Probe and response — the exact adversarial input sent to the target and the raw response returned.
- Failure evidence — for failing cases, the specific signal that triggered the failure, such as a matched secret string or a detected unsafe workaround pattern.
- Trace artifacts — when the target emits trace data (commands run, files written, tool calls made), those traces are captured alongside the response.
evaluation_result.json artifact in artifacts/llm-basic/eval/ contains the full structured data if you need to feed results into a downstream pipeline or CI check.
The
llm-basic example evaluates a single LLM provider and is designed to get you familiar with the workflow quickly. For a more complete walkthrough that runs attacks against a real coding-agent runtime with shell access and file-system capabilities, see the Codex Quickstart.Next steps
Core Concepts
Understand targets, scenarios, seeds, subscriptions, and provider adapters in depth.
Agent Targets
See how Promptbeat models coding agents, browser agents, support agents, and more as first-class targets.
Datasets
Load seeds from HarmBench, JailbreakBench, ALERT, BeaverTails, and other catalog datasets via subscriptions.
Risk Taxonomy
Browse the risk type tags and attack categories used in scenario definitions.