Skip to content
vsMars
Field Report

How Mars Score Weights Were Calibrated — The Methodology Behind the Number

Every product on vsMars has a Mars Score 0–100. The formula is transparent and category-specific. Here's how the per-category weights were chosen and why they keep changing.

Buğra Sözeri··

Every product on vsMars carries a Mars Score from 0–100. It is not a vibe. The formula is published in our methodology and the per-category weights are version-controlled in the repo. Here is how those weights are chosen.

The Mars Score formula

For any product P in category C:

MarsScore(P) = Σ_i (weight_i × normalize(spec_i(P), category_C))

Each category defines (1) which spec keys are scored, (2) the weight of each, (3) the normalization curve (linear, log, threshold, piecewise).

How weights are chosen

For each category we:

  1. Survey reviewer consensus. Read 30+ professional reviews of category flagships; tabulate which specs are described as "deal-breakers", "important", or "nice-to-have".
  2. Survey buyer regret data. Reddit threads, Amazon reviews, return-rate surveys — what do buyers wish they'd weighted higher?
  3. Run sensitivity analysis. For a sample of 20 products in the category, vary each weight ±20% and check whether rankings shift in ways that contradict consensus.
  4. Lock weights, publish, accept feedback. When the score for a flagship feels meaningfully off from consensus, the disagreement either reveals a missing spec or a miscalibrated weight.

Example: smartphones (May 2026)

Spec keyWeightRationale
battery_life_hours0.18Top regret driver in surveys
chipset_score0.14Predictive of multi-year longevity
camera_main_score0.13Primary purchase driver
display_score0.11High visibility daily
build_water_resistance0.07Asymmetric downside
update_years_promised0.07Mid-cycle value
charging_w_wired + wireless0.06Convenience but ceiling effect
price_value0.10Cross-tier normalization
other (12 specs)0.14Long tail

These weights changed twice in 2026: chipset_score weight rose after the iPhone 17 Pro Max thermal-throttling story (sustained performance proved more predictive of multi-year experience than peak); charging weight fell after every flagship hit 80W+ (no longer a differentiator).

Normalization curves — the second half of the formula

A weight tells you how much a spec matters. A normalization curve tells you how a given spec value maps to a 0-100 sub-score. We use four curves:

  • Linear (minmax): maps min-to-max of the category to 0-100 linearly. Used for refresh rate, response time, peak brightness — specs where every unit of improvement is roughly equal.
  • Logarithmic: used for storage capacity, RAM — where doubling matters more than incremental gains.
  • Threshold (binary): used for boolean specs (has_anc, supports_dolby_vision) — present or absent.
  • Piecewise: used for specs with non-linear value, like camera megapixels (12-48 MP matters; above 48 MP shows diminishing returns).

The normalization curves are version-controlled alongside weights. Changes are documented in the score-recompute worker's audit log.

Per-category weight examples

Headphones (May 2026):

  • ANC effectiveness: 0.17
  • Sound quality (driver size + frequency response): 0.16
  • Battery life: 0.12
  • Codec support: 0.10
  • Multipoint: 0.08
  • Comfort (weight + clamp): 0.07
  • Build / water resistance: 0.06
  • Other: 0.24

Laptops (May 2026):

  • CPU/SoC score: 0.16
  • Battery life (real-world): 0.15
  • Display quality (panel + resolution + brightness): 0.13
  • Build quality + weight: 0.10
  • Memory (RAM): 0.08
  • Storage (SSD): 0.07
  • Port selection (USB-C, Thunderbolt): 0.06
  • Other: 0.25

What we explicitly don't do

  • No reviewer sentiment scores. "Feel" doesn't enter the formula. Subjective ratings would make the score irreproducible.
  • No popularity bonus. A million-unit-sold flagship doesn't beat a niche better product. Sales velocity is a market signal, not a quality signal.
  • No brand bias. Brand is not a spec. The Mars Score doesn't know whether the device is from Apple, Samsung, or an unknown OEM.
  • No price discount unless it materially changes a category's normalization. Price is included via a price_value weight but normalized within the category; a $200 phone won't beat a $1,200 phone on absolute score even if its price/spec ratio is better.

Why the score is not 92.4 → 87.1 = "much worse"

Mars Scores cluster between 60 and 95 for in-production products in any given category. A 5-point gap is meaningful (different sub-tier); a 1.3-point gap is within methodology noise (effectively a tie). Treat the number as a sorting key, not a ranking truth. The 87.1 product is not "5.7% worse" than the 92.4 product — they're in the same neighborhood, distinguished by specific trade-offs.

Why we publish the math

Other comparison sites publish opaque "expert ratings." The Mars Score is intentionally the opposite: every weight, every normalization curve, every spec key is in version control. Anyone can fork the repo, change the weights, re-run the score, and see how rankings shift. That's the only honest way to do a comparison platform — your weights are different from ours, and that's fine; the math should be open enough that you can substitute yours.

The full per-category weights live in src/lib/score/mars.ts. Anyone can audit, reproduce, or critique the math. See our methodology page for the broader testing-and-scoring approach.

Category
See all smartphones comparisons →