SNAG-Bench

The Quality Certifier. Temporal reasoning benchmark for LLMs. 60 adversarial tasks, 5 scoring axes, 3 difficulty tiers. Designed to stay hard through 2030. Measures Causal Resolution: Coverage x Convergence.

GitHub

timepointai/timepoint-snag-bench — Apache-2.0, Python, Click CLI

Detailed Docs

Full task reference, scoring methodology, and evaluation docs

5 Scoring Axes

Axis	Name	Source	Status
1	GSR (Grounding)	Flash API	Live
2	TCS (Temporal Coherence)	Pro subprocess/API	Live
3	WMNED (Predictive)	Proteus markets	Stubbed
4	HTP (Human Judgment)	OpenRouter LLM judges	Live
5	GCQ (Graph Coverage)	Clockchain stats	Stubbed

Usage

SNAG-Bench is a local CLI tool — not a deployed service.

git clone https://github.com/timepointai/timepoint-snag-bench.git
cd timepoint-snag-bench
pip install -e .

snag-bench run --tier 1

Task Tiers

Tier	Difficulty	Tasks
1	Standard	20
2	Hard	20
3	Adversarial	20

Causal Resolution

The composite metric: Coverage x Convergence

Coverage — how much of the relevant temporal space is represented
Convergence — how consistent the rendered outputs are across runs

Clockchain Proteus

Getting Started

Products

API Reference

SNAG-Bench

SNAG-Bench

GitHub

Detailed Docs

5 Scoring Axes

Usage

Task Tiers

Causal Resolution

Getting Started

Products

API Reference

​SNAG-Bench

GitHub

Detailed Docs

​5 Scoring Axes

​Usage

​Task Tiers

​Causal Resolution

SNAG-Bench

5 Scoring Axes

Usage

Task Tiers

Causal Resolution