How Mars Score Weights Were Calibrated — The Methodology Behind the Number

Every product on vsMars carries a Mars Score from 0–100. It is not a vibe. The formula is published in our methodology and the per-category weights are version-controlled in the repo. Here is how those weights are chosen.

The Mars Score formula

For any product P in category C:

MarsScore(P) = Σ_i (weight_i × normalize(spec_i(P), category_C))

Each category defines (1) which spec keys are scored, (2) the weight of each, (3) the normalization curve (linear, log, threshold, piecewise).

How weights are chosen

For each category we:

Survey reviewer consensus. Read 30+ professional reviews of category flagships; tabulate which specs are described as "deal-breakers", "important", or "nice-to-have".
Survey buyer regret data. Reddit threads, Amazon reviews, return-rate surveys — what do buyers wish they'd weighted higher?
Run sensitivity analysis. For a sample of 20 products in the category, vary each weight ±20% and check whether rankings shift in ways that contradict consensus.
Lock weights, publish, accept feedback. When the score for a flagship feels meaningfully off from consensus, the disagreement either reveals a missing spec or a miscalibrated weight.

Example: smartphones (May 2026)

Spec key	Weight	Rationale
battery_life_hours	0.18	Top regret driver in surveys
chipset_score	0.14	Predictive of multi-year longevity
camera_main_score	0.13	Primary purchase driver
display_score	0.11	High visibility daily
build_water_resistance	0.07	Asymmetric downside
update_years_promised	0.07	Mid-cycle value
charging_w_wired + wireless	0.06	Convenience but ceiling effect
price_value	0.10	Cross-tier normalization
other (12 specs)	0.14	Long tail

These weights changed twice in 2026: chipset_score weight rose after the iPhone 17 Pro Max thermal-throttling story (sustained performance proved more predictive of multi-year experience than peak); charging weight fell after every flagship hit 80W+ (no longer a differentiator).

Normalization curves — the second half of the formula

A weight tells you how much a spec matters. A normalization curve tells you how a given spec value maps to a 0-100 sub-score. We use four curves:

Linear (minmax): maps min-to-max of the category to 0-100 linearly. Used for refresh rate, response time, peak brightness — specs where every unit of improvement is roughly equal.
Logarithmic: used for storage capacity, RAM — where doubling matters more than incremental gains.
Threshold (binary): used for boolean specs (has_anc, supports_dolby_vision) — present or absent.
Piecewise: used for specs with non-linear value, like camera megapixels (12-48 MP matters; above 48 MP shows diminishing returns).

The normalization curves are version-controlled alongside weights. Changes are documented in the score-recompute worker's audit log.

Per-category weight examples

Headphones (May 2026):

ANC effectiveness: 0.17
Sound quality (driver size + frequency response): 0.16
Battery life: 0.12
Codec support: 0.10
Multipoint: 0.08
Comfort (weight + clamp): 0.07
Build / water resistance: 0.06
Other: 0.24

Laptops (May 2026):

CPU/SoC score: 0.16
Battery life (real-world): 0.15
Display quality (panel + resolution + brightness): 0.13
Build quality + weight: 0.10
Memory (RAM): 0.08
Storage (SSD): 0.07
Port selection (USB-C, Thunderbolt): 0.06
Other: 0.25

What we explicitly don't do

No reviewer sentiment scores. "Feel" doesn't enter the formula. Subjective ratings would make the score irreproducible.
No popularity bonus. A million-unit-sold flagship doesn't beat a niche better product. Sales velocity is a market signal, not a quality signal.
No brand bias. Brand is not a spec. The Mars Score doesn't know whether the device is from Apple, Samsung, or an unknown OEM.
No price discount unless it materially changes a category's normalization. Price is included via a price_value weight but normalized within the category; a $200 phone won't beat a $1,200 phone on absolute score even if its price/spec ratio is better.

Why the score is not 92.4 → 87.1 = "much worse"

Mars Scores cluster between 60 and 95 for in-production products in any given category. A 5-point gap is meaningful (different sub-tier); a 1.3-point gap is within methodology noise (effectively a tie). Treat the number as a sorting key, not a ranking truth. The 87.1 product is not "5.7% worse" than the 92.4 product — they're in the same neighborhood, distinguished by specific trade-offs.

Why we publish the math

Other comparison sites publish opaque "expert ratings." The Mars Score is intentionally the opposite: every weight, every normalization curve, every spec key is in version control. Anyone can fork the repo, change the weights, re-run the score, and see how rankings shift. That's the only honest way to do a comparison platform — your weights are different from ours, and that's fine; the math should be open enough that you can substitute yours.

The full per-category weights live in src/lib/score/mars.ts. Anyone can audit, reproduce, or critique the math. See our methodology page for the broader testing-and-scoring approach.