Preflight Checks Before Running an Evaluation

Before launching generate or eval, Promptbeat runs a set of preflight checks that catch configuration problems, unreachable targets, and quality issues before they waste a full evaluation run. Agent evaluations can be expensive in both time and API cost, so catching a misconfigured provider or a thin seed file early is far better than discovering the problem halfway through a large run.

What preflight checks

Promptbeat validates five categories of configuration before it allows a run to proceed:

Target reachability

Promptbeat verifies that the configured target provider is reachable and correctly authenticated:

Is the provider ID valid and supported?
Is the API key present in the environment and does it have access to the specified model?
For agent targets: does working_dir exist and is it accessible?
For Codex and other CLI agents: does the agent binary have the correct home or config path set?

Scenario quality

Promptbeat checks that each scenario in scenarios.yaml is well-formed and usable:

Does the scenario declare a risk_type?
Are failure_signals explicit enough to guide the judge?
Does the scenario match at least one of the target’s declared capabilities?
Do the referenced judge IDs exist and match the expected evidence level?

Seed quality

Promptbeat reviews the seed file for coverage and focus:

Does each seed’s risk_type match at least one scenario?
Does each seed declare required_capabilities?
Are the seeds varied enough in framing style to produce diverse probes?
Is dataset category mapping preserved in seed metadata when dataset-backed seeds are used?

Execution quality

Promptbeat checks that the execution environment matches the target’s requirements:

Is sandbox_mode set appropriately for the risks being tested?
Is approval_policy configured for reproducible, non-interactive runs?
Is tracing enabled at the depth needed for the scenario’s evidence requirements?
Is environment inheritance restricted to only what the agent needs?

Generation size warnings

Promptbeat estimates the total probe count before generation begins:

Promptfoo’s --count flag applies per plugin. A scenario with 8 plugins and --count 16 produces up to 128 probes.
If the estimated count exceeds a reasonable bound, Promptbeat warns you and suggests using a sampler for broad runs.

Running the validate command

Run validate explicitly before any generate or eval call, especially after editing config files:

./bin/promptbeat validate --config examples/codex_agent/promptbeat.yaml

You can also validate against the llm-basic example to check a simpler setup:

./bin/promptbeat validate --config examples/llm-basic/promptbeat.yaml

The command prints a structured summary of all checks, their status, and any messages.

Interpreting preflight output

Preflight results appear at two severity levels:

Severity	Meaning	Effect on execution
Warning	A potential quality or cost issue was detected, but the configuration is technically valid	Execution is not blocked — Promptbeat proceeds after displaying the warning
Error	A required configuration element is missing, invalid, or unreachable	Execution is blocked — you must fix the error before the run can proceed

Treat warnings seriously even though they don’t block execution. A seed file with only one entry, or a scenario with vague failure signals, will produce an evaluation that is harder to interpret and less likely to find real failures.

Common preflight failures

Failure	Cause	Fix
Target unreachable	API key is missing from the environment, the wrong provider ID is set, or the model name is not accessible with the current key	Export the correct environment variable (e.g., `export OPENAI_API_KEY=sk-...`) and verify the provider ID in `promptbeat.yaml` matches a supported format
Scenario missing risk type	A scenario entry in `scenarios.yaml` does not have a `risk_type` field	Add `risk_type` to every scenario; see the Risk Taxonomy for the list of valid values
Seed count too low	`seeds.yaml` contains fewer seeds than the minimum recommended for diverse generation	Add more hand-written seeds, or subscribe to a dataset to augment coverage — see Datasets
Generation size too large	The combination of plugin count and `--count` flag will produce more probes than your budget allows	Reduce the `--count` flag, reduce the number of active scenarios, or add a sampler configuration to cap the total probe count
Working directory not found	The `working_dir` path in the provider config does not exist at validation time	Create the directory, or correct the path in `providers.yaml` to point to an existing workspace
Scenario capability mismatch	A scenario requires a capability the target does not declare	Either add the capability to `target.yaml` if the target genuinely has it, or remove the scenario from the run

Always run promptbeat validate before promptbeat generate, especially when you have just edited target.yaml, scenarios.yaml, or seeds.yaml. A two-second validation check can save you from discovering a broken config after a twenty-minute generation run.

​What preflight checks

​Target reachability

​Scenario quality

​Seed quality

​Execution quality

​Generation size warnings

​Running the validate command

​Interpreting preflight output

​Common preflight failures

What preflight checks

Target reachability

Scenario quality

Seed quality

Execution quality

Generation size warnings

Running the validate command

Interpreting preflight output

Common preflight failures