DIGITAL TWIN CALIBRATION · BITTENSOR SUBNET 456

The calibration layer
for digital twins.

A decentralized marketplace where specialized optimizers compete to make simulation models faithful to reality. Verified on held-out data. Scored with ASHRAE metrics. Truth wins.

ROUND_r-2026-04-18-1142/bestest_hydronic_heat_pump
emulator (ground truth)miner predictions (N=3 shown)
16°C18°C20°C22°C24°C00:0006:0012:0018:0024:00ZONE_AIR_TEMP · 24H
CVRMSE
0.119
NMBE
-0.019
0.901
SIMS_USED
287 / 1000
BITTENSOR SUBNET
456
testnet live
BEST CVRMSE
0.089
bestest_hydronic · placeholder
ACTIVE MINERS
18
placeholder
TEST CASES IN ROTATION
3
BOPTEST · verified
ROUNDS COMPLETED
2,147
placeholder
TESTS PASSING
131
from repo
01 · ARCHITECTURE

A two-model system. One produces truth, the other learns to match it.

Validators run a complex emulator (BOPTEST) to generate ground truth. Miners calibrate a simplified model to match it. Scoring is done on data miners never see.

VALIDATORground truth
BOPTEST emulator
complex physics · Docker · FMU
TRAINING MEASUREMENTSsent to miners
HELD-OUT MEASUREMENTSkept secret · scoring only
CalibrationSynapsecalibrated_params
MINER × Ncalibration target
R₁R₂C₁
RC network (simplified)
~5 parameters · sub-second run
LOCAL OPTIMIZERBayesian · CMA-ES · surrogate
SIMULATION_BUDGET1,000 evaluations / round
01

Validator generates ground truth

A containerized BOPTEST emulator runs the full period. Training measurements go to miners; held-out measurements are kept secret for scoring.

02

Miners calibrate locally

Each miner runs their own optimizer - Bayesian, CMA-ES, surrogate-assisted - against a simplified RC network model. Parameter search, not prediction.

03

Validator verifies

One simulation per miner against the held-out period. CVRMSE, NMBE and R² are deterministic - any validator reproduces the same score.

04

Weights set on-chain

5% relative floor, power-law amplification (p=2), normalized to sum=1. Top performers receive disproportionately more weight.

02 · MECHANISM

Three properties make this uniquely suited to Bittensor.

IASYMMETRY
SOLVE · hours~3hVERIFY · seconds0.8s

Verification is trivially cheap.

Running a calibrated model against held-out measurements takes seconds. Finding those parameters takes hours of search across thousands of evaluations.

cheap verify · expensive solve
IIDIVERSITY
best_cvrmse

A portfolio of strategies wins.

Bayesian, CMA-ES, MCMC, gradient-based, surrogate-assisted - no single method dominates every building type. Fifty competing approaches consistently beat any one of them.

ensemble > monoculture
IIIAI-PROOF
PATTERN MATCHLLM → ✗PHYSICS SEARCHfor i in 1..1000: sim.run(p[i])→ f_loss(Δ zone_temp, Δ energy)

LLMs cannot do this.

Calibration requires running actual physics simulations thousands of times. It is pure computational search, not pattern recognition. Frontier labs have zero interest in building physics.

compression-resistant
03 · SCORING

The formula is published. The weights are auditable.

No proxy metrics. No hidden parameters. Every miner receives a machine-readable breakdown every round. You can run the exact scoring module locally before submitting.

COMPOSITE SCOREspec v2
composite = 0.50 × cvrmse_norm
          + 0.25 × nmbe_norm
          + 0.15 × r2_norm
          + 0.10 × convergence_norm

# weight pipeline (applied per round)
floor     = max(composite) × 0.05
floored   = composite if composite ≥ floor else 0.0
powered   = floored ** 2          # power-law p=2
weights   = powered / sum(powered)
float64 · deterministic · open source
50%
CVRMSElower

Coefficient of Variation of RMSE. ASHRAE Guideline 14 standard.

threshold < 0.30
25%
NMBE→ 0

Normalized Mean Bias Error. Systematic over- or under-prediction.

threshold |NMBE| < 0.10
15%
higher

Coefficient of determination. Overall fit quality.

threshold > 0.85
10%
convergencefewer sims

Simulations used relative to budget. Self-reported, bounded.

threshold
04 · AUDIENCES

Three groups benefit immediately.

CUSTOMERS

Engineering firms & building owners

You need calibrated models for ASHRAE Guideline 14 M&V, Model Predictive Control, or portfolio-scale analysis. Submit your BOPTEST-compatible model and measurement data - receive verified parameters with full accuracy breakdown.

  • Per-calibration pricing (API, waitlist open)
  • CVRMSE, NMBE, R² on held-out data
  • Same methodology engineering consultancies charge $5K–$20K for
Join calibration waitlist
INVESTORS

TAO holders & subnet evaluators

Zhen targets a green-zone flywheel: emissions today, paying customers tomorrow. External revenue routes through value sinks - paid API access, licensing, alpha burn. Miner transparency is first-class.

  • Subnet 456 · spec v2 · testnet live
  • Phase 1 benchmark dominance → Phase 2 API → Phase 3 enterprise
  • Published scoring, score floor, power-law weight amp
Read the mechanism design
RESEARCHERS

Building energy & digital twin scientists

Zhen is a public benchmark. Every round publishes composite scores, per-component scores, and simulation-used counts. Methodology is open source. Test cases are BOPTEST and Energym; adding new ones follows a proposal process.

  • BESTEST reference buildings, Brussels climate
  • RC network simplified models, FMU ground truth
  • Deterministic train/test split via SHA-256
See test case registry
05 · LEADERBOARD

Public rankings, per round.

Composite scores are public. Calibrated parameters are private to the submitting miner, so top performers can't be copy-pasted.

r-2026-04-18-1142·bestest_hydronic_heat_pump
placeholder data
#
HOTKEY
CVRMSE
NMBE
SIMS
WEIGHT
01
5Cxyz…a8f2
0.089
-0.019
0.962
287
28.4%
02
5Fdra…Q4Nk
0.104
+0.031
0.951
412
21.1%
03
5GmpV…7yLx
0.117
-0.041
0.938
356
16.8%
04
5HnwR…kP1z
0.142
+0.052
0.914
801
9.1%
05
5DpKe…m3Xq
0.168
-0.072
0.887
644
7.2%
06
5EyTv…b2Hc
0.194
+0.089
0.863
923
5.1%
07
5JrMb…u8Wq
0.221
-0.094
0.841
767
4.1%
08
5KzLp…x4Vn
0.248
+0.101
0.819
888
3.4%
score floor applied: 0.05 × max·power-law p=2
Full leaderboard →
06 · FAQ

Questions people actually ask.

Calibration. Every digital twin needs parameters tuned to match reality, wall R-values, infiltration rates, HVAC efficiencies. Engineering consultancies charge $5K–$20K for this and take weeks. Zhen runs 50 optimizers in parallel per round and returns verified parameters in hours.
07 · GET IN

Bring us a model.
We'll calibrate it.

Priority access for engineering firms, ESCOs, and research teams working on building energy. BOPTEST-compatible models preferred. Joining the waitlist does not commit you to anything.