Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Data Models

All data models are defined in src/qebench/models.py using Pydantic v2.

Entry Types

Term

The simplest entry type — a single English term and its Chinese translation.

class Term(BaseModel):
    id: str          # "term-001" — auto-generated by qebench add
    en: str          # "Bellman equation"
    zh: str          # "贝尔曼方程"
    domain: str      # "dynamic-programming"
    difficulty: Difficulty  # basic | intermediate | advanced
    alternatives: list[str] = []   # ["贝尔曼等式"]
    contexts: list[TermContext] = []  # usage sentences from lectures
    source: str = ""               # "quantecon/dp-intro"

Validation rules:

TermContext

A supporting model that stores a usage sentence from a QuantEcon lecture. Populated by qebench update when it scans lecture repositories.

class TermContext(BaseModel):
    text: str    # "Dynamic programming is a powerful technique for solving..."
    source: str  # "lecture-python-intro/intro.md"

Up to 5 context sentences are stored per term (deterministic selection for stable version-controlled output). During qebench translate, one random context is shown to help the translator understand how the term is used.

Sentence

A complete sentence with optional human evaluation scores.

class Sentence(BaseModel):
    id: str          # "sent-042"
    en: str
    zh: str
    domain: str
    difficulty: Difficulty
    key_terms: list[str] = []      # ["term-001", "term-005"]
    human_scores: HumanScores | None = None
    source: str = ""

key_terms links sentences to the terms they contain — useful for measuring whether term-level accuracy translates to sentence-level quality.

Paragraph

The richest entry type — paragraphs can contain math, code, and mixed content.

class Paragraph(BaseModel):
    id: str          # "para-007"
    en: str
    zh: str
    domain: str
    difficulty: Difficulty
    key_terms: list[str] = []
    contains_math: bool = False
    contains_code: bool = False
    human_scores: HumanScores | None = None
    source: str = ""

Supporting Types

Difficulty

class Difficulty(str, Enum):
    basic = "basic"
    intermediate = "intermediate"
    advanced = "advanced"

HumanScores

class HumanScores(BaseModel):
    accuracy: int  # 1-10
    fluency: int   # 1-10

Both fields are constrained to the range [1, 10] via Field(ge=1, le=10).

JSON Schema Generation

Pydantic models auto-generate JSON Schema for CI validation:

from qebench.models import Term
schema = Term.model_json_schema()

This can be used with jsonschema to validate data files in CI without loading the full Python package.

Storage Format

Data files are stored as bare JSON arrays:

[
  {
    "id": "term-001",
    "en": "inflation",
    "zh": "通货膨胀",
    "domain": "economics",
    "difficulty": "basic",
    "alternatives": ["通胀"],
    "source": ""
  }
]

The loader also supports a wrapped format (for future versioning):

{
  "version": "1.0",
  "entries": [...]
}