Skip to main content

SNAG-Bench

The Quality Certifier. Temporal reasoning benchmark for LLMs. 60 adversarial tasks, 5 scoring axes, 3 difficulty tiers. Designed to stay hard through 2030. Measures Causal Resolution: Coverage x Convergence.

5 Scoring Axes

AxisNameSourceStatus
1GSR (Grounding)Flash APILive
2TCS (Temporal Coherence)Pro subprocess/APILive
3WMNED (Predictive)Proteus marketsStubbed
4HTP (Human Judgment)OpenRouter LLM judgesLive
5GCQ (Graph Coverage)Clockchain statsStubbed

Usage

SNAG-Bench is a local CLI tool — not a deployed service.
git clone https://github.com/timepoint-ai/timepoint-snag-bench.git
cd timepoint-snag-bench
pip install -e .

snag-bench run --tier 1

Task Tiers

TierDifficultyTasks
1Standard20
2Hard20
3Adversarial20

Causal Resolution

The composite metric: Coverage x Convergence
  • Coverage — how much of the relevant temporal space is represented
  • Convergence — how consistent the rendered outputs are across runs