Methodology

Math & Thinking Behind Genome

Genome treats every industrial company as an unknown dynamic system. The only inputs are observable signals — public macro data, financial filings, commodity prices. The goal is to infer internal behavior without ever seeing inside the company. This page explains exactly how that inference works.

1. The Core Premise

In control theory, a system's behavior is characterized by how it responds to inputs over time. Give a system a step input — does the output overshoot and oscillate, settle slowly, or saturate at a ceiling? The response shape encodes the system's internal dynamics even when the internals are completely opaque.

Industrial companies behave this way too. Feed in a demand surge (input) and observe revenue, inventory, and operating income (output). The response pattern — lag, amplification, damping, oscillation — is the behavioral fingerprint. Companies with similar fingerprints share structural properties regardless of industry, size, or geography.

SYSTEM MODEL
Inputs  → [ Unknown Industrial System ] → Outputs
(FRED macro,          (Black box)           (EDGAR: revenue,
 commodity prices,                           inventory,
 sector IPI)                                 operating income)
                          ↓
                   Behavioral Fingerprint
                   {latency, gain, damping,
                    oscillation, saturation,
                    volatility_transmission}

2. Signal Layer

Every fingerprint starts with two time series: an input signal and an output signal. Genome auto-selects signals per company:

Signal Priority Order
  1. 1
    EDGAR company-specific — quarterly revenue, inventory, operating income from SEC filings. Used when ≥2 periods are available. ~300–400 US public companies.
  2. 2
    FRED sector outputs — Industrial Production Index, PPI, employment by sector. Used as fallback for foreign/private companies.
  3. 3
    FRED macro inputs — fed funds rate, 10Y yield, credit spreads, commodity prices (oil, copper, steel). These are universal — all companies use them as inputs.

Companies with EDGAR data get fingerprints anchored to their actual financials. Companies using sector fallback share structural signal but are differentiated by how that sector signal interacts with the macro inputs.

3. The Six Fingerprint Algorithms

Given an input time series X and output time series Y, Genome extracts six dimensions. These are implemented in core/system_id.py.

3.1 Latency — How long does the system take to respond?

Cross-correlation lag
lag = argmax( Σ X(t) · Y(t + τ) )  over τ ∈ [-T/2, T/2]

The lag τ* that maximizes cross-correlation is the latency.
Converted from quarters to days: latency_days = τ* × 91

High latency = slow-moving system. A company with 3-quarter latency takes ~9 months to reflect input changes in output — typical of capital-intensive industrials with long procurement cycles.

3.2 Gain — How much does the system amplify?

Output / Input amplitude ratio
gain = std(Y_detrended) / std(X_detrended)

Both series are linearly detrended before computing std.
gain > 1 → system amplifies (fragile to shocks)
gain < 1 → system absorbs (resilient)
gain ≈ 1 → proportional response

A fragile company with gain=2.5 sees revenue swings 2.5× larger than the macro inputs that drove them. This often indicates leverage, low inventory buffers, or just-in-time exposure.

3.3 Damping — Does the system stabilize or persist?

Autocorrelation decay rate
ρ(k) = Corr(Y_t, Y_{t-k})   for k = 1..8

Fit exponential: ρ(k) ≈ A · e^{-λk}
damping = λ  (estimated via log-linear regression)

damping > 0.3  → fast decay, system self-corrects
damping ≈ 0    → slow decay, shocks persist
damping < 0    → explosive / non-stationary (rare)

An oscillatory company with near-zero damping continues swinging long after an input shock ends. High damping means the system returns to baseline quickly — operationally resilient.

3.4 Oscillation — Is the system cycling?

Dominant frequency via FFT
Y_detrended → FFT → power spectrum P(f)

oscillation_frequency = f at argmax P(f),   f > 0
oscillation_period_weeks = 1 / oscillation_frequency × 13

spectral_power_fraction = P(f*) / Σ P(f)
(fraction of total variance in the dominant frequency)

Companies with spectral_power_fraction > 0.4 have strong cyclical behavior — demand or inventory cycles dominate their output. Low spectral power means noise-dominated, non-cyclical response.

3.5 Saturation — Where does throughput cap?

Slope change under increasing load
Sort X ascending. Split into 4 quartiles.
Compute Δ(output) / Δ(input) in each quartile.

saturation_threshold = fraction of input range where
slope drops to ≤ 50% of lower-quartile slope

saturation = 1.0 → full saturation (hits hard ceiling)
saturation = 0.0 → no saturation detected

A saturated company shows strong output response at low input levels but the response flattens as inputs grow — classic capacity constraint. This maps directly to the X1 Physical Constraint node.

3.6 Volatility Transmission — Does the system amplify or absorb shocks?

Conditional volatility ratio
vol_trans = std(ΔY) / std(ΔX)

where ΔX = first difference of input (shock proxy)
      ΔY = first difference of output

vol_trans > 1 → amplifies volatility (fragile)
vol_trans < 1 → absorbs volatility (resilient)
vol_trans ≈ 1 → passes through unchanged

Volatility transmission is the single best fragility indicator. High vol_trans companies respond to macro noise as if it were signal — their operations lack the buffers to absorb routine variation.

4. Archetype Classification

The six-dimensional fingerprint is classified into one of seven archetypes using threshold-based rules in core/archetypes.py. Rules are applied in priority order — the first match wins.

fragilegain > 1.8 AND vol_trans > 1.5

High amplification + high volatility transmission. Operationally leveraged, brittle to shocks. Any supply disruption or demand spike cascades through the system without buffering.

constrainedsaturation > 0.6 AND gain < 1.2

Throughput hits a ceiling at moderate input levels. Revenue doesn't scale proportionally with demand — a physical, workforce, or capital constraint is binding.

oscillatoryspectral_power > 0.35 AND damping < 0.25

Dominated by cycles that don't self-correct quickly. Demand cycles, inventory cycles, or procurement cycles run through the system with low damping — the pattern repeats.

saturatedsaturation > 0.5 AND damping > 0.3

Capacity-limited but stable. The system is hitting a ceiling but handles it gracefully — no fragility signature. Classic late-stage growth or mature capacity-constrained industrial.

decoupledgain < 0.5 AND latency > 180 days

Slow to respond, low amplification. Appears insensitive to macro inputs — either highly vertically integrated, long-contract revenue, or operating in a niche with structural demand insulation.

resilientdamping > 0.4 AND vol_trans < 0.8 AND gain in [0.7, 1.3]

Absorbs shocks, self-corrects, proportional response. Strong operational buffers (inventory, workforce flexibility, diversified customer base). The benchmark all others are compared against.

unknownno rule matches

Signal insufficient to classify, or fingerprint dimensions are contradictory. Typically means too few data points or a company in structural transition between archetypes.

5. Constraint Taxonomy (D1–X3)

After classification, the binding constraint node is identified — the specific point in the industrial value chain where the fingerprint signal is strongest. 20 nodes organized into 6 families, each mapping to observable signal combinations.

D — Demand
D1End Market DemandIPI sector output, revenue YoY
D2Channel MixRevenue segment variance, dealer/direct ratio proxies
D3Demand StabilityRevenue coefficient of variation, backlog-to-book
S — Supply
S1Raw Material SupplyPPI commodities, ISM supplier delivery, FRED industrial materials
S2Production CapacityCapacity utilization rate, OEE proxies from margin compression
S3Supplier ConcentrationCOGS variance, gross margin volatility vs. peer
S4WorkforceBLS sector employment, overtime rate, labor cost per unit
C — Cost
C1Input CostPPI, commodity indices, energy prices (EIA)
C2Conversion CostCOGS/revenue ratio trend, labor cost per unit
C3Margin RealizationGross margin vs. sector, operating leverage
F — Flow
F1InventoryDays inventory outstanding, inventory/revenue ratio
F2Order VelocityRevenue growth vs. backlog, book-to-bill proxies
F3Lead TimeISM delivery times, inventory build relative to sales
F4LogisticsFreight indices (Freightos FBX), port congestion, diesel prices
R — Risk
R1Supply FragilityVolatility transmission, supplier concentration × lead time
R2Regulatory RiskEPA enforcement proxies, carbon pricing trajectory
R3Financial RiskFRED credit spreads, debt/EBITDA, interest coverage
R4Geopolitical RiskFRED uncertainty index, trade flow proxies, sector exposure to affected regions
X — Hard Limits
X1Physical ConstraintSaturation threshold, capacity utilization ceiling
X2Regulatory CapPermit-linked output limits, emissions ceiling proxies
X3Capital ConstraintFRED credit tightening, capex/depreciation ratio, interest expense growth

6. Judgment Engine

The judgment layer converts a fingerprint into a structured operational brief. It runs in core/judgment.py and is called by the API after every fingerprint.

Trajectory inference

Given a sequence of fingerprints over time, trajectory is inferred by comparing the current fingerprint against the prior N snapshots. Direction of change in gain, damping, and volatility_transmission determines the label: improving / deteriorating / stable / inflecting_up / inflecting_down / volatile. Confidence scales with the number of prior snapshots and the consistency of direction.

Peer comparison

Cohort percentiles are computed against all companies in the same industry with the same archetype. Gain, damping, latency, and volatility_transmission are each expressed as a percentile rank within the cohort. A company at the 90th percentile of gain within its cohort is a strong positive signal — or a warning flag, depending on archetype.

Implications

Implications are generated rule-by-rule from the fingerprint dimensions and constraint node. Each implication has a node ID, severity (high / medium / low), probability (0–1), and time horizon in days. Probabilities are calibrated to dimension thresholds — they are heuristic, not statistical, and should be read as relative severity signals rather than absolute forecasts.

Recommended actions

Actions are mapped from archetype + constraint node using a lookup table of intervention patterns. They are bucketed into Now (<30 days), Near (30–90 days), and Horizon (>90 days) by their estimated time-to-effect. Priority ordering within each bucket reflects expected impact on the binding constraint.

7. Confidence, Limitations & What This Is Not

What Genome is not
  • — Not a financial model. Fingerprints are behavioral signals, not earnings forecasts.
  • — Not a causal model. Correlation between input and output does not prove mechanism.
  • — Not a substitute for field observation. Public signals are proxies; Ops Maturity field data replaces them.
  • — Not a trading signal. Time horizons are quarters to years, not days.

Confidence scores reflect data availability and internal consistency of the fingerprint — not the probability that the archetype or constraint diagnosis is "correct." A company with 40 quarters of EDGAR data will have high confidence scores; a private company using sector fallback signals will have low confidence. The diagnosis may still be right in both cases.

The algorithms are deterministic given the same input signals. Running the same company twice on the same day produces the same fingerprint. Variation over time reflects genuine changes in the underlying signals — not model instability.

Industrial Genome Platform · ryan.cahalane@lns-global.com · April 2026← Back to Docs