Data Models - qebench

All data models are defined in src/qebench/models.py using Pydantic v2.

Entry Types¶

Term¶

The simplest entry type — a single English term and its Chinese translation.

class Term(BaseModel):
    id: str          # "term-001" — auto-generated by qebench add
    en: str          # "Bellman equation"
    zh: str          # "贝尔曼方程"
    domain: str      # "dynamic-programming"
    difficulty: Difficulty  # basic | intermediate | advanced
    alternatives: list[str] = []   # ["贝尔曼等式"]
    contexts: list[TermContext] = []  # usage sentences from lectures
    source: str = ""               # "quantecon/dp-intro"

Validation rules:

id must match pattern ^term-\d{3,}$
en and zh must be non-empty
domain must be non-empty (checked against config at runtime)

TermContext¶

A supporting model that stores a usage sentence from a QuantEcon lecture. Populated by qebench update when it scans lecture repositories.

class TermContext(BaseModel):
    text: str    # "Dynamic programming is a powerful technique for solving..."
    source: str  # "lecture-python-intro/intro.md"

Up to 5 context sentences are stored per term (deterministic selection for stable version-controlled output). During qebench translate, one random context is shown to help the translator understand how the term is used.

Sentence¶

A complete sentence with optional human evaluation scores.

class Sentence(BaseModel):
    id: str          # "sent-042"
    en: str
    zh: str
    domain: str
    difficulty: Difficulty
    key_terms: list[str] = []      # ["term-001", "term-005"]
    human_scores: HumanScores | None = None
    source: str = ""

key_terms links sentences to the terms they contain — useful for measuring whether term-level accuracy translates to sentence-level quality.

Paragraph¶

The richest entry type — paragraphs can contain math, code, directives, roles, and mixed content. Feature flags describe the structural complexity of each paragraph.

class Paragraph(BaseModel):
    id: str          # "para-007"
    en: str
    zh: str
    domain: str
    difficulty: Difficulty
    key_terms: list[str] = []
    contains_math: bool = False
    contains_code: bool = False
    contains_directives: bool = False    # has MyST directives ({note}, {warning}, etc.)
    contains_roles: bool = False         # has MyST roles ({doc}, {ref}, {math}, etc.)
    contains_mixed_fencing: bool = False # has both $$ and ```{math} markers
    human_scores: HumanScores | None = None
    source: str = ""

The three MyST feature flags (contains_directives, contains_roles, contains_mixed_fencing) describe the structural complexity of each paragraph. They are set when paragraphs are seeded from lecture repos (see scripts/seed_from_lectures.py) or contributed via qebench add. These flags can be used for filtering and analysis but are not currently consumed by the formatting validators in scoring/formatting.py.

Supporting Types¶

Difficulty¶

class Difficulty(str, Enum):
    basic = "basic"
    intermediate = "intermediate"
    advanced = "advanced"

HumanScores¶

class HumanScores(BaseModel):
    accuracy: int  # 1-10
    fluency: int   # 1-10

Both fields are constrained to the range [1, 10] via Field(ge=1, le=10).

JSON Schema Generation¶

Pydantic models auto-generate JSON Schema for CI validation:

from qebench.models import Term
schema = Term.model_json_schema()

This can be used with jsonschema to validate data files in CI without loading the full Python package.

Storage Format¶

Data files are stored as bare JSON arrays:

[
  {
    "id": "term-001",
    "en": "inflation",
    "zh": "通货膨胀",
    "domain": "economics",
    "difficulty": "basic",
    "alternatives": ["通胀"],
    "source": ""
  }
]

The loader also supports a wrapped format (for future versioning):

{
  "version": "1.0",
  "entries": [...]
}