Skip to main content
The dataset catalog lists every dataset Promptbeat can use as a seed source, its readiness level, and how it maps to the Promptbeat risk taxonomy. Use this page to choose the right starting dataset for your evaluation, confirm the local file format you need to provide, and understand how raw category labels translate into scenario risk types before you run.

Readiness levels

Promptbeat uses a four-level readiness model. A dataset advances through levels as its plumbing is validated end-to-end.
1

Local raw file

The dataset file exists under $PROMPTBEAT_DATASETS_DIR in the expected format. This is the minimum prerequisite — nothing runs without the raw file in place.
2

Catalog spec

A DatasetSpec exists that maps the prompt, ID, and category fields. Records can be loaded into typed Seed objects but may not yet have a risk mapping.
3

Risk mapping

Source categories map into Promptbeat risk types via a DatasetRiskMapping. Seeds from this dataset can be routed to the right scenario and judge.
4

Validated slice

A saved eval result exists for at least one slice of this dataset. Use validated datasets for regressions and as evidence in reports.
In the catalog below, Ready means catalog spec and risk mapping are done and at least one slice has been validated. Partial means the catalog spec exists but some slices still need mapping work or additional filters. Planned means the dataset is on the roadmap but not yet integrated.

Dataset catalog

All datasets require you to download raw files locally. Check each dataset’s license and redistribution rules before use.
DatasetRisk categories coveredReadinessNotes
HarmBenchHarmful content, cyber, chemical/biological✅ ReadyLocal raw CSV. Catalog spec, risk mapping, and validated 5-case Codex slice.
XSTestExaggerated-safety and refusal calibration🔶 PartialCatalog spec. Covers benign prompts that trigger over-refusal.
Forbidden QuestionsHarmful content, policy compliance🔶 PartialCatalog spec. Requires taxonomy mapping before mixing into shared reports.
SimpleSafetyTestsLightweight safety smoke tests🔶 PartialCatalog spec. Small set; good for baseline sanity checks alongside larger sources.
OR-Bench (deception slice)Deception, unsafe persuasion🔶 PartialCatalog spec with deception category filter applied at load time.
ALERTBroad safety categories🔶 PartialCatalog spec. Requires instruction-wrapper cleanup during loading. JSONL format.
Aya red-teamingMultilingual red-teaming🔶 PartialCatalog spec. Covers non-English harmful-content and jailbreak scenarios. JSONL format.
Aegis (unsafe slice)Unsafe prompt classification🔶 PartialCatalog spec with unsafe-label filter. JSON format with violated_categories field.
ToxicChat (jailbreak slice)Jailbreak instruction override🔶 PartialCatalog spec with jailbreaking=true filter. CSV with conv_id as ID field.
JailbreakBench (JBB) behaviorsJailbreak behavior seeds🔶 PartialCatalog spec. CSV with Index as ID field and Category for risk routing.
JADE-DB (Chinese)Chinese harmful-content, jailbreak🔶 PartialCatalog spec with easy/medium file split. Chinese-language fields (问题, 违规类型).
BeaverTails (unsafe slice)Harmfulness preference prompts🔶 PartialCatalog spec with unsafe-label filter. JSONL.GZ format; ID generated at load time.
Do-Not-AnswerRefusal and safety policy categories🔶 PartialCatalog spec. Parquet format; requires pyarrow or pandas with parquet support.
SALAD-Bench baseMisinformation🔶 PartialCatalog spec with misinformation category filter. Parquet format.
SALAD-Bench attackAdversarial misinformation🔶 PartialCatalog spec with misinformation category filter. Adversarial attack variants. Parquet format.

Field mapping reference

DatasetFormatPrompt fieldID fieldCategory field
HarmBenchCSVBehaviorBehaviorIDSemanticCategory
XSTestCSVpromptid_v2type
Forbidden QuestionsCSVquestionq_idcontent_policy_name
SimpleSafetyTestsCSVpromptidharm_area
OR-BenchCSVprompt(generated)category
ALERTJSONLpromptidcategory
Aya red-teamingJSONLprompt(generated)harm_category
AegisJSONpromptidviolated_categories
ToxicChatCSVuser_inputconv_idjailbreaking
JBB behaviorsCSVGoalIndexCategory
JADE-DBCSV问题ID违规类型
BeaverTailsJSONL.GZprompt(generated)category
Do-Not-AnswerParquetquestionidrisk_area
SALAD-Bench baseParquetprompt(generated)categories
SALAD-Bench attackParquetprompt(generated)categories

Minimum DatasetSpec YAML

Every dataset must have an explicit DatasetSpec before it can enter an evaluation. The spec below shows the minimum required fields. Add metadata_fields to preserve extra columns for audit and reporting.
datasetSpec:
  name: harmbench
  path: datasets/raw/harmbench/harmbench_behaviors_text_all.csv
  format: csv
  prompt_field: Behavior
  id_field: BehaviorID
  category_field: SemanticCategory
  metadata_fields:
    - FunctionalCategory
    - Tags
    - ContextString
Adapt the name, path, format, and field names to match each dataset’s actual columns. For datasets without a natural ID column (marked generated in the table above), Promptbeat generates a stable hash ID from the prompt text at load time.

Risk taxonomy mapping

Raw dataset category labels are not sufficient on their own. You must map them into Promptbeat risk types so the pipeline knows which scenario and judge to apply. Define a DatasetRiskMapping for each dataset you use.
datasetRiskMapping:
  datasetId: harmbench
  taxonomySystem: harmbench
  unmappedPolicy: skip
  rules:
    - sourceCategory:
        SemanticCategory: chemical_biological
      riskType: harmful_content
      scenarioIds:
        - direct-harmful-content-refusal
    - sourceCategory:
        SemanticCategory: cyber
      riskType: harmful_content
      scenarioIds:
        - cyber-harmful-content-refusal
The same source dataset can feed multiple scenarios with different risk types. For example, JBB behaviors can route injection category records to prompt_injection and tool-use category records to tool_misuse:
datasetRiskMapping:
  datasetId: jbb_behaviors
  taxonomySystem: jailbreakbench
  rules:
    - sourceCategory:
        Category: injection
      riskType: prompt_injection
      scenarioIds:
        - coding-agent-repo-injection
        - browser-dom-injection
    - sourceCategory:
        Category: tool-use
      riskType: tool_misuse
      scenarioIds:
        - devops-unsafe-tool-use
Set unmappedPolicy: skip to silently drop records whose category has no rule, or unmappedPolicy: error to fail loudly if a record slips through unmapped. Start narrow and validate your dataset plumbing before expanding to more sources.
  1. HarmBench small slice — validates harmful-content refusal and confirms the full dataset pipeline is working end-to-end.
  2. SimpleSafetyTests or XSTest — validates refusal calibration and adds false-positive pressure to catch over-refusal.
  3. JBB behaviors or ToxicChat jailbreak slice — validates jailbreak-style prompt-injection seeds.
  4. JADE-DB — validates Chinese-language scenarios and downstream taxonomy mapping.
  5. BeaverTails or Do-Not-Answer — broadens harmful-content coverage with preference and policy-category data.
  6. Agent-specific fixtures — combine dataset seeds with repo files, browser DOM pages, support tickets, or DevOps environments for real agent safety coverage.
The safety-baseline subscription in subscriptions/safety-baseline.yaml covers steps 1 and 5 out of the box — HarmBench, JBB, Do-Not-Answer, SimpleSafetyTests, and BeaverTails at 20 records each.
See the HarmBench guide for a fully worked example that includes the Python DatasetSeedLoader code, the complete seed shape, risk mapping YAML, and exact validate and eval run commands.