SPECTRA

Generalizability audits for foundation models

SPECTRA turns model generalization claims into validated spectral performance curves.

This hosted knowledge server exposes the SPECTRA protocol, prior audit findings, and artifact references for agents and humans. Agents connect at https://spectra.yashaektefaie.com/mcp.

3models
3findings
62artifact refs
high similarity low similarity performance freeze prospective axis audit degradation confirm boundary

Protocol

What SPECTRA Requires

01Define the scientific unit

Choose the model, data, task, metric, and prospective novelty axis before target scoring.

02Freeze and validate the split

Measure that train-test or pretraining-test similarity decreases across levels.

03Score baselines and model

Run fixed baselines where labels exist, then evaluate the target model on frozen levels.

04Confirm live hypotheses

Outcome-mined axes are exploratory until frozen and confirmed on fresh or adequate evidence.

05Ledger weak and negative axes

Non-explanatory curves are findings that route back into prospective-axis discovery.

06State the claim boundary

A valid SPC closes only the deployment or mechanism boundary it actually tested.

CONCHTUM tissue/stain-fraction proximity tail ESMFold2high disulfide/cysteine capacity STATEpathway/module train-support density

Current Evidence

Stored SPECTRA Findings

These summaries are human-readable views of the same structured records exposed through the MCP tools.

CONCH

conch_tum_tissue_stain_tail_20260603

valid localized public crc tum tail boundary / weak independent site generalization

CONCH zero-shot CRC ROI classification shows a claim-valid but localized degradation boundary: true TUM recall falls in the low-similarity tail of a prospective tissue/stain-fraction axis. The result is bounded to public CRC ROI tiles and the CONCH CRC prompt ensemble. It is not an all-class, whole-slide, independent-site, non-CRC, or medical deployment claim.

Primary Axis TUM tissue/stain-fraction proximity tail

Euclidean distance in tissue, white/background, dark, and high-saturation pixel fractions from the class-balanced NCT-CRC-HE-100K reference median. Higher numbered levels mean lower similarity; split membership is frozen before CONCH scoring.

Discovery tiles7,180
CRC-VAL balanced accuracy0.785
Confirmation TUM tiles1,800
Expanded TUM tiles4,000
Low-tail recall gap0.223
Expanded Spearman-0.77

Claim Boundary

Public CRC 224x224 H&E ROI tiles scored with the CONCH CRC prompt ensemble. Confirmation and expansion are fresh relative to CRC-VAL discovery but still from the public Kather/NCT source family, so independent-site or clinical deployment generalization remains weak.

Limitations

  • Confirmation and expansion are still from the public Kather/NCT source family rather than an independent site cohort.
  • The primary result is TUM-class localized, not an all-class or deployment-wide generalizability statement.
  • The expanded curve is threshold-shaped rather than strictly monotone; levels 8-10 are degraded but levels 9 and 10 rebound slightly relative to level 8.

Downgraded Axes

  • broader NCT color/texture proximity: split-valid but not claim-closing
  • all-class CRC ROI generalizability: not supported as broad closure
Key MCP artifact IDs
  • conch:final_report.md
  • conch:artifacts:tum_expansion:expanded_finding_report.md
  • conch:artifacts:tum_expansion:expanded_metrics.json
  • conch:artifacts:tum_expansion:analysis.json
  • conch:artifacts:tum_expansion:scored_tum_expansion_with_baselines.csv
  • conch:artifacts:tum_expansion:target_model_results_tum_expansion.csv
  • conch:artifacts:tum_confirmation:analysis.json
  • conch:artifacts:tum_confirmation:decile_densification.json
  • and 7 more key artifacts in the MCP store

ESMFold2

esmfold2_high_disulfide_boundary_20260602

valid current pool confirmed

Across accumulated CAMEO and RCSB-derived ESMFold2 evaluations, high disulfide/cysteine capacity is a broad prospective risk proxy for reduced structural accuracy. Class-II-like viral E/E2/envelope-E proteins form a sharper localized failure mode. The failures are generally accompanied by low ESMFold2 confidence, so the result is a calibrated low-confidence generalization boundary rather than a hidden overconfident failure regime.

Primary Axis high disulfide/cysteine capacity

disulfide_capacity_per100 = floor(number_of_cysteines / 2) / sequence_length * 100

Primary rows652
High-disulfide n134
Lower-disulfide n518
Raw CA-lDDT gap-0.149
Matched-control gap-0.061
Class-II CA-lDDT gap-0.407

Claim Boundary

Supported for accumulated ESMFold2 target-level evaluations across CAMEO/CAMEO follow-up, external RCSB, fresh general RCSB, and viral RCSB panels. MSA depth was claim-valid in CAMEO but not globally comparable across later panels.

Limitations

  • Global table mixes scaled ESMFold2 inference with a smaller earlier CAMEO/default subset.
  • Cluster-aware validation used approximate k-mer clustering; publication could replace this with MMseqs2 or CD-HIT.
  • Class-II E/E2 is metadata-defined and localized, not a global all-protein sequence axis.

Downgraded Axes

  • MSA depth: CAMEO-claim-valid but not global
  • flexible/non-helical composition: CAMEO-local, not broad portable
  • reference geometry/topology: invalid as deployment axis
Key MCP artifact IDs
  • esmfold2:FINAL_ESMFOLD2_RESULT.md
  • esmfold2:data:primary_exact_sequence_dedup.csv
  • esmfold2:data:all_esmfold2_scored_evaluations.csv
  • esmfold2:tables:global_hypothesis_effects_primary_sequence.csv
  • esmfold2:tables:global_hypothesis_adjusted_effects_primary_sequence.csv
  • esmfold2:tables:global_hypothesis_level_curves_primary_sequence.csv
  • esmfold2:tables:per_source_high_disulfide_summary.csv
  • esmfold2:tables:cluster_aware_disulfide_effects.csv
  • and 5 more key artifacts in the MCP store

STATE

state_pathway_module_train_support_20260603

valid current pool confirmed

STATE generalizes worse for Replogle held-out gene perturbations whose pathways/modules are sparsely represented among training perturbations. The effect was discovered in hepg2 and k562, confirmed in jurkat and rpe1, checked in Replogle zero-shot, and extended beyond Reactome to independent gene-set systems.

Primary Axis pathway/module train-support density

For each held-out gene perturbation, support is the mean log(1 + number of same-context training genes sharing each pathway/module annotation of the target gene). Higher support means the target gene is closer to known training biology in that context.

Scored rows39,808
Few-shot HVG gap0.128
Few-shot SE gap0.109
Zero-shot HVG gap0.068
Zero-shot SE gap0.044
OLS support coefficient0.028

Claim Boundary

Current-pool confirmed using official precomputed STATE scored CSVs and official split definitions, not fresh STATE inference or raw single-cell matrices. The main reviewer-facing gap is a fixed baseline/raw-label rerun.

Limitations

  • Used official precomputed scored CSVs rather than fresh STATE inference.
  • Fixed baseline/raw-label rerun still needed for reviewer-facing closure.
  • Perturbation strength measured by DE-count severity is a strong mediator/confound and should remain in claim framing.

Downgraded Axes

  • Tahoe target-context chemical Tanimoto support: split-valid but negative/weak
  • Parse-PBMC ligand-family training support: reversed direction
  • GO Molecular Function support: positive but weaker
Key MCP artifact IDs
  • state:state_spectra_report.md
  • state:target_model_results.csv
  • state:spectra_loop_state.json
  • state:axis_ledger_prior.json
  • state:artifacts:spc_replogle_best_refined_axis.csv
  • state:artifacts:spc_replogle_best_refined_axis_trends.csv
  • state:artifacts:spc_replogle_best_refined_axis_context_trends.csv
  • state:artifacts:spc_replogle_zeroshot_reactome_support_confirmation.csv
  • and 6 more key artifacts in the MCP store

Agent Access

Connect Claude Code or another MCP client

The server uses streamable HTTP and exposes read-only tools for listing models, retrieving findings, reading protocol sections, and fetching text artifacts.

{
  "mcpServers": {
    "spectra": {
      "type": "http",
      "url": "https://spectra.yashaektefaie.com/mcp"
    }
  }
}