DIGITAL TWIN CALIBRATION · BITTENSOR SUBNET 456

The calibration layer
for digital twins.

A decentralized marketplace where specialized optimizers compete to make simulation models faithful to reality. Verified on held-out data. Scored with ASHRAE metrics. Truth wins.

Request a calibration→{ }Read the protocol

ROUND_r-2026-04-18-1142/bestest_hydronic_heat_pump

emulator (ground truth)miner predictions (N=3 shown)

CVRMSE

0.119

NMBE

-0.019

R²

0.901

SIMS_USED

287 / 1000

BITTENSOR SUBNET

456

testnet live

BEST CVRMSE

0.089

bestest_hydronic · placeholder

ACTIVE MINERS

placeholder

TEST CASES IN ROTATION

BOPTEST · verified

ROUNDS COMPLETED

2,147

placeholder

TESTS PASSING

131

from repo

01 · ARCHITECTURE

A two-model system. One produces truth, the other learns to match it.

Validators run a complex emulator (BOPTEST) to generate ground truth. Miners calibrate a simplified model to match it. Scoring is done on data miners never see.

VALIDATORground truth

BOPTEST emulator

complex physics · Docker · FMU

TRAINING MEASUREMENTSsent to miners

HELD-OUT MEASUREMENTSkept secret · scoring only

MINER × Ncalibration target

RC network (simplified)

~5 parameters · sub-second run

LOCAL OPTIMIZERBayesian · CMA-ES · surrogate

SIMULATION_BUDGET1,000 evaluations / round

Validator generates ground truth

A containerized BOPTEST emulator runs the full period. Training measurements go to miners; held-out measurements are kept secret for scoring.

Miners calibrate locally

Each miner runs their own optimizer - Bayesian, CMA-ES, surrogate-assisted - against a simplified RC network model. Parameter search, not prediction.

Validator verifies

One simulation per miner against the held-out period. CVRMSE, NMBE and R² are deterministic - any validator reproduces the same score.

Weights set on-chain

5% relative floor, power-law amplification (p=2), normalized to sum=1. Top performers receive disproportionately more weight.

02 · MECHANISM

Three properties make this uniquely suited to Bittensor.

IASYMMETRY

Verification is trivially cheap.

Running a calibrated model against held-out measurements takes seconds. Finding those parameters takes hours of search across thousands of evaluations.

cheap verify · expensive solve

IIDIVERSITY

A portfolio of strategies wins.

Bayesian, CMA-ES, MCMC, gradient-based, surrogate-assisted - no single method dominates every building type. Fifty competing approaches consistently beat any one of them.

ensemble > monoculture

IIIAI-PROOF

LLMs cannot do this.

Calibration requires running actual physics simulations thousands of times. It is pure computational search, not pattern recognition. Frontier labs have zero interest in building physics.

compression-resistant

03 · SCORING

The formula is published. The weights are auditable.

No proxy metrics. No hidden parameters. Every miner receives a machine-readable breakdown every round. You can run the exact scoring module locally before submitting.

COMPOSITE SCOREspec v2

composite = 0.50 × cvrmse_norm
          + 0.25 × nmbe_norm
          + 0.15 × r2_norm
          + 0.10 × convergence_norm

# weight pipeline (applied per round)
floor     = max(composite) × 0.05
floored   = composite if composite ≥ floor else 0.0
powered   = floored ** 2          # power-law p=2
weights   = powered / sum(powered)

float64 · deterministic · open source

50%

CVRMSElower

Coefficient of Variation of RMSE. ASHRAE Guideline 14 standard.

threshold < 0.30

25%

NMBE→ 0

Normalized Mean Bias Error. Systematic over- or under-prediction.

threshold |NMBE| < 0.10

15%

R²higher

Coefficient of determination. Overall fit quality.

threshold > 0.85

10%

convergencefewer sims

Simulations used relative to budget. Self-reported, bounded.

threshold —

04 · AUDIENCES

Three groups benefit immediately.

CUSTOMERS

Engineering firms & building owners

You need calibrated models for ASHRAE Guideline 14 M&V, Model Predictive Control, or portfolio-scale analysis. Submit your BOPTEST-compatible model and measurement data - receive verified parameters with full accuracy breakdown.

Per-calibration pricing (API, waitlist open)
CVRMSE, NMBE, R² on held-out data
Same methodology engineering consultancies charge $5K–$20K for

Join calibration waitlist →

INVESTORS

TAO holders & subnet evaluators

Zhen targets a green-zone flywheel: emissions today, paying customers tomorrow. External revenue routes through value sinks - paid API access, licensing, alpha burn. Miner transparency is first-class.

Subnet 456 · spec v2 · testnet live
Phase 1 benchmark dominance → Phase 2 API → Phase 3 enterprise
Published scoring, score floor, power-law weight amp

Read the mechanism design →

RESEARCHERS

Building energy & digital twin scientists

Zhen is a public benchmark. Every round publishes composite scores, per-component scores, and simulation-used counts. Methodology is open source. Test cases are BOPTEST and Energym; adding new ones follows a proposal process.

BESTEST reference buildings, Brussels climate
RC network simplified models, FMU ground truth
Deterministic train/test split via SHA-256

See test case registry →

05 · LEADERBOARD

Public rankings, per round.

Composite scores are public. Calibrated parameters are private to the submitting miner, so top performers can't be copy-pasted.

r-2026-04-18-1142·bestest_hydronic_heat_pump

placeholder data

HOTKEY

CVRMSE

NMBE

R²

SIMS

WEIGHT

5Cxyz…a8f2

0.089

-0.019

0.962

287

28.4%

5Fdra…Q4Nk

0.104

+0.031

0.951

412

21.1%

5GmpV…7yLx

0.117

-0.041

0.938

356

16.8%

5HnwR…kP1z

0.142

+0.052

0.914

801

9.1%

5DpKe…m3Xq

0.168

-0.072

0.887

644

7.2%

5EyTv…b2Hc

0.194

+0.089

0.863

923

5.1%

5JrMb…u8Wq

0.221

-0.094

0.841

767

4.1%

5KzLp…x4Vn

0.248

+0.101

0.819

888

3.4%

score floor applied: 0.05 × max·power-law p=2

Full leaderboard →

06 · FAQ

Questions people actually ask.

01What problem does Zhen actually solve?

Calibration. Every digital twin needs parameters tuned to match reality, wall R-values, infiltration rates, HVAC efficiencies. Engineering consultancies charge $5K–$20K for this and take weeks. Zhen runs 50 optimizers in parallel per round and returns verified parameters in hours.

02How do you prevent miners from gaming scores?

03Is the scoring formula reproducible?

04Why isn't this just an LLM task?

05When will paying customers arrive?

06What are the current test cases?

07 · GET IN

Bring us a model.
We'll calibrate it.

Priority access for engineering firms, ESCOs, and research teams working on building energy. BOPTEST-compatible models preferred. Joining the waitlist does not commit you to anything.

orread the protocol on GitHub →

The calibration layerfor digital twins.