Every product on vsMars carries a Mars Score from 0–100. It is not a vibe. The formula is published in our methodology and the per-category weights are version-controlled in the repo. Here is how those weights are chosen.
The Mars Score formula
For any product P in category C:
MarsScore(P) = Σ_i (weight_i × normalize(spec_i(P), category_C))
Each category defines (1) which spec keys are scored, (2) the weight of each, (3) the normalization curve (linear, log, threshold, piecewise).
How weights are chosen
For each category we:
- Survey reviewer consensus. Read 30+ professional reviews of category flagships; tabulate which specs are described as "deal-breakers", "important", or "nice-to-have".
- Survey buyer regret data. Reddit threads, Amazon reviews, return-rate surveys — what do buyers wish they'd weighted higher?
- Run sensitivity analysis. For a sample of 20 products in the category, vary each weight ±20% and check whether rankings shift in ways that contradict consensus.
- Lock weights, publish, accept feedback. When the score for a flagship feels meaningfully off from consensus, the disagreement either reveals a missing spec or a miscalibrated weight.
Example: smartphones (May 2026)
| Spec key | Weight | Rationale |
|---|---|---|
| battery_life_hours | 0.18 | Top regret driver in surveys |
| chipset_score | 0.14 | Predictive of multi-year longevity |
| camera_main_score | 0.13 | Primary purchase driver |
| display_score | 0.11 | High visibility daily |
| build_water_resistance | 0.07 | Asymmetric downside |
| update_years_promised | 0.07 | Mid-cycle value |
| charging_w_wired + wireless | 0.06 | Convenience but ceiling effect |
| price_value | 0.10 | Cross-tier normalization |
| other (12 specs) | 0.14 | Long tail |
These weights changed twice in 2026: chipset_score weight rose after the iPhone 17 Pro Max thermal-throttling story (sustained performance proved more predictive of multi-year experience than peak); charging weight fell after every flagship hit 80W+ (no longer a differentiator).
Normalization curves — the second half of the formula
A weight tells you how much a spec matters. A normalization curve tells you how a given spec value maps to a 0-100 sub-score. We use four curves:
- Linear (minmax): maps min-to-max of the category to 0-100 linearly. Used for refresh rate, response time, peak brightness — specs where every unit of improvement is roughly equal.
- Logarithmic: used for storage capacity, RAM — where doubling matters more than incremental gains.
- Threshold (binary): used for boolean specs (has_anc, supports_dolby_vision) — present or absent.
- Piecewise: used for specs with non-linear value, like camera megapixels (12-48 MP matters; above 48 MP shows diminishing returns).
The normalization curves are version-controlled alongside weights. Changes are documented in the score-recompute worker's audit log.
Per-category weight examples
Headphones (May 2026):
- ANC effectiveness: 0.17
- Sound quality (driver size + frequency response): 0.16
- Battery life: 0.12
- Codec support: 0.10
- Multipoint: 0.08
- Comfort (weight + clamp): 0.07
- Build / water resistance: 0.06
- Other: 0.24
Laptops (May 2026):
- CPU/SoC score: 0.16
- Battery life (real-world): 0.15
- Display quality (panel + resolution + brightness): 0.13
- Build quality + weight: 0.10
- Memory (RAM): 0.08
- Storage (SSD): 0.07
- Port selection (USB-C, Thunderbolt): 0.06
- Other: 0.25
What we explicitly don't do
- No reviewer sentiment scores. "Feel" doesn't enter the formula. Subjective ratings would make the score irreproducible.
- No popularity bonus. A million-unit-sold flagship doesn't beat a niche better product. Sales velocity is a market signal, not a quality signal.
- No brand bias. Brand is not a spec. The Mars Score doesn't know whether the device is from Apple, Samsung, or an unknown OEM.
- No price discount unless it materially changes a category's normalization. Price is included via a price_value weight but normalized within the category; a $200 phone won't beat a $1,200 phone on absolute score even if its price/spec ratio is better.
Why the score is not 92.4 → 87.1 = "much worse"
Mars Scores cluster between 60 and 95 for in-production products in any given category. A 5-point gap is meaningful (different sub-tier); a 1.3-point gap is within methodology noise (effectively a tie). Treat the number as a sorting key, not a ranking truth. The 87.1 product is not "5.7% worse" than the 92.4 product — they're in the same neighborhood, distinguished by specific trade-offs.
Why we publish the math
Other comparison sites publish opaque "expert ratings." The Mars Score is intentionally the opposite: every weight, every normalization curve, every spec key is in version control. Anyone can fork the repo, change the weights, re-run the score, and see how rankings shift. That's the only honest way to do a comparison platform — your weights are different from ours, and that's fine; the math should be open enough that you can substitute yours.
The full per-category weights live in src/lib/score/mars.ts. Anyone can audit, reproduce, or critique the math. See our methodology page for the broader testing-and-scoring approach.