MASAR — Strategic Executive Intelligence

MASAR

Riyadh EXPO 2030 — Strategic Executive Intelligence

3 alertsFri, 15 May 2026 08:27:00 UTC

Model quality measurement · guards drift over time

Eval Harness

How it works

For every published entry flagged as an eval case, we re-ask the question through the full /ask pipeline (RAG + few-shot + LLM) and compare against the canonical answer. Token recall (60%) + citation-page Jaccard (40%) = a composite 0-100 score. ≥60 passes.

Scope