Readiness levels
Promptbeat uses a four-level readiness model. A dataset advances through levels as its plumbing is validated end-to-end.Local raw file
The dataset file exists under
$PROMPTBEAT_DATASETS_DIR in the expected format. This is the minimum prerequisite — nothing runs without the raw file in place.Catalog spec
A
DatasetSpec exists that maps the prompt, ID, and category fields. Records can be loaded into typed Seed objects but may not yet have a risk mapping.Risk mapping
Source categories map into Promptbeat risk types via a
DatasetRiskMapping. Seeds from this dataset can be routed to the right scenario and judge.Dataset catalog
All datasets require you to download raw files locally. Check each dataset’s license and redistribution rules before use.
| Dataset | Risk categories covered | Readiness | Notes |
|---|---|---|---|
| HarmBench | Harmful content, cyber, chemical/biological | ✅ Ready | Local raw CSV. Catalog spec, risk mapping, and validated 5-case Codex slice. |
| XSTest | Exaggerated-safety and refusal calibration | 🔶 Partial | Catalog spec. Covers benign prompts that trigger over-refusal. |
| Forbidden Questions | Harmful content, policy compliance | 🔶 Partial | Catalog spec. Requires taxonomy mapping before mixing into shared reports. |
| SimpleSafetyTests | Lightweight safety smoke tests | 🔶 Partial | Catalog spec. Small set; good for baseline sanity checks alongside larger sources. |
| OR-Bench (deception slice) | Deception, unsafe persuasion | 🔶 Partial | Catalog spec with deception category filter applied at load time. |
| ALERT | Broad safety categories | 🔶 Partial | Catalog spec. Requires instruction-wrapper cleanup during loading. JSONL format. |
| Aya red-teaming | Multilingual red-teaming | 🔶 Partial | Catalog spec. Covers non-English harmful-content and jailbreak scenarios. JSONL format. |
| Aegis (unsafe slice) | Unsafe prompt classification | 🔶 Partial | Catalog spec with unsafe-label filter. JSON format with violated_categories field. |
| ToxicChat (jailbreak slice) | Jailbreak instruction override | 🔶 Partial | Catalog spec with jailbreaking=true filter. CSV with conv_id as ID field. |
| JailbreakBench (JBB) behaviors | Jailbreak behavior seeds | 🔶 Partial | Catalog spec. CSV with Index as ID field and Category for risk routing. |
| JADE-DB (Chinese) | Chinese harmful-content, jailbreak | 🔶 Partial | Catalog spec with easy/medium file split. Chinese-language fields (问题, 违规类型). |
| BeaverTails (unsafe slice) | Harmfulness preference prompts | 🔶 Partial | Catalog spec with unsafe-label filter. JSONL.GZ format; ID generated at load time. |
| Do-Not-Answer | Refusal and safety policy categories | 🔶 Partial | Catalog spec. Parquet format; requires pyarrow or pandas with parquet support. |
| SALAD-Bench base | Misinformation | 🔶 Partial | Catalog spec with misinformation category filter. Parquet format. |
| SALAD-Bench attack | Adversarial misinformation | 🔶 Partial | Catalog spec with misinformation category filter. Adversarial attack variants. Parquet format. |
Field mapping reference
| Dataset | Format | Prompt field | ID field | Category field |
|---|---|---|---|---|
| HarmBench | CSV | Behavior | BehaviorID | SemanticCategory |
| XSTest | CSV | prompt | id_v2 | type |
| Forbidden Questions | CSV | question | q_id | content_policy_name |
| SimpleSafetyTests | CSV | prompt | id | harm_area |
| OR-Bench | CSV | prompt | (generated) | category |
| ALERT | JSONL | prompt | id | category |
| Aya red-teaming | JSONL | prompt | (generated) | harm_category |
| Aegis | JSON | prompt | id | violated_categories |
| ToxicChat | CSV | user_input | conv_id | jailbreaking |
| JBB behaviors | CSV | Goal | Index | Category |
| JADE-DB | CSV | 问题 | ID | 违规类型 |
| BeaverTails | JSONL.GZ | prompt | (generated) | category |
| Do-Not-Answer | Parquet | question | id | risk_area |
| SALAD-Bench base | Parquet | prompt | (generated) | categories |
| SALAD-Bench attack | Parquet | prompt | (generated) | categories |
Minimum DatasetSpec YAML
Every dataset must have an explicitDatasetSpec before it can enter an evaluation. The spec below shows the minimum required fields. Add metadata_fields to preserve extra columns for audit and reporting.
name, path, format, and field names to match each dataset’s actual columns. For datasets without a natural ID column (marked generated in the table above), Promptbeat generates a stable hash ID from the prompt text at load time.
Risk taxonomy mapping
Raw dataset category labels are not sufficient on their own. You must map them into Promptbeat risk types so the pipeline knows which scenario and judge to apply. Define aDatasetRiskMapping for each dataset you use.
injection category records to prompt_injection and tool-use category records to tool_misuse:
unmappedPolicy: skip to silently drop records whose category has no rule, or unmappedPolicy: error to fail loudly if a record slips through unmapped.
Recommended starting order
Start narrow and validate your dataset plumbing before expanding to more sources.- HarmBench small slice — validates harmful-content refusal and confirms the full dataset pipeline is working end-to-end.
- SimpleSafetyTests or XSTest — validates refusal calibration and adds false-positive pressure to catch over-refusal.
- JBB behaviors or ToxicChat jailbreak slice — validates jailbreak-style prompt-injection seeds.
- JADE-DB — validates Chinese-language scenarios and downstream taxonomy mapping.
- BeaverTails or Do-Not-Answer — broadens harmful-content coverage with preference and policy-category data.
- Agent-specific fixtures — combine dataset seeds with repo files, browser DOM pages, support tickets, or DevOps environments for real agent safety coverage.
safety-baseline subscription in subscriptions/safety-baseline.yaml covers steps 1 and 5 out of the box — HarmBench, JBB, Do-Not-Answer, SimpleSafetyTests, and BeaverTails at 20 records each.
See the HarmBench guide for a fully worked example that includes the Python
DatasetSeedLoader code, the complete seed shape, risk mapping YAML, and exact validate and eval run commands.