Benchmarks
(No details in this section)
Goals
LUPA tracks these baseline metrics:
- Index
Nfiles without failures. - Execute 10 queries and report
p95latency. - Keep warm-index search typically under
50msp95 on SSD.
For Doc Chat (optional), also track:
- first-answer latency for
extractiveandlocal_model.
Benchmark Script
PowerShell helper:
./scripts/bench.ps1 -Root . -Queries @(
"error","TODO","config","database","index","search","rust","panic","fn","impl"
)
Recommended stable run:
./scripts/bench.ps1 `
-Root . `
-Release `
-Warmup `
-Runs 5 `
-Queries @("error","TODO","config","database","index","search","rust","panic","fn","impl") `
-OutJson ./.lupa/bench/latest.json
Regression Gate
./scripts/bench.ps1 -Root . -Release -Warmup -Runs 3 -MaxP95Ms 50
If overall.p95_ms exceeds -MaxP95Ms, script exits with code 2.
What Is Measured
index buildduration.- Per-query latency for
search --json. - Stats per run:
min,max,avg,stddev,p50,p95,p99. - Aggregate stats across runs.
Desktop UX Perf Checks (manual)
When validating lupa-desktop-tauri, also check:
- Result list scroll smoothness under large hit sets.
- Selection latency (click/keyboard) without scroll jump.
- Right panel preview load behavior (image preview on demand).
- Search-to-render perceived delay with snippets enabled.
QA Latency Checks (Doc Chat)
Use a fixed document and prompts in both modes:
extractive: target sub-second warm response.local_model(small GGUF): expect higher latency; track p50/p95.
Example prompts:
Summarize this document in 3 bullets.How many times does the word "casa" appear?When was this file modified?
Track:
- startup-to-first-answer time
- warm-answer time
- answer consistency
- repetition rate
Notes
- Always benchmark with a warm index for realistic daily usage.
- Use
-Releasefor production-grade numbers. - For large datasets (100k+ files), tune:
include_extensionsmax_file_size_bytesmax_structured_file_size_bytes
- For local model mode, keep
max_tokensmoderate (120-256) to reduce latency spikes.
Local Reference Run (2026-03-05)
Environment: Windows, repository root as benchmark dataset.
Results:
index_build_ms:338.78query_count:10overall.p95_ms:25latencies_ms:[15, 14, 15.01, 13, 25, 15.1, 14, 14.01, 14.01, 14]