Why we stopped using eNPS (and what we measure instead)

January 30, 2026

We used eNPS for about a year. Every quarter, we'd send one question — "How likely are you to recommend this company as a place to work?" — collect the scores, subtract detractors from promoters, and get a number between -100 and +100.

It felt like we were measuring something. The number went into a slide deck. People nodded. Leadership asked "are we up or down?" and someone said a number, and then we moved on to the next agenda item.

But when the score dropped from +32 to +18, nobody could explain why. Was it workload? A bad quarter? A manager problem? One toxic team dragging the average? The number didn't say. When it climbed back to +29 the next quarter, we had no idea what we'd done right. We hadn't changed anything deliberately. Maybe it was seasonal. Maybe three detractors left the company. We were tracking a signal with no diagnostic value, and making no decisions from it. So we stopped.

This post explains what was wrong with eNPS as an organizational health metric, what we replaced it with, and why we made the specific measurement choices we did.

The case against eNPS as your headline metric

eNPS was adapted from the Net Promoter Score, which Frederick Reichheld introduced in a 2003 Harvard Business Review article as a customer loyalty metric (Reichheld, 2003). Someone took the consumer NPS question, swapped "product" for "workplace," and called it employee engagement measurement. The adaptation happened without peer-reviewed validation in the employment context. Qualtrics has written about this gap openly on their blog, and AIHR's eNPS guide acknowledges it too. The UWES (Utrecht Work Engagement Scale), by contrast, has been validated across 30+ languages and used in thousands of studies since Schaufeli, Bakker, and Salanova published the short-form version in 2006.

One question cannot capture a multi-dimensional construct. Schaufeli and Bakker established in 2002 that engagement alone has three sub-dimensions: vigor, dedication, and absorption. A single "would you recommend?" question collapses all of these into one number. You can't distinguish between a team that's energized but unfocused and one that's dedicated but exhausted. Both might score a 7.

The scoring buckets are arbitrary. NPS groups responses into promoters (9-10), passives (7-8), and detractors (0-6). That means a score of 6 is treated identically to a score of 0. A team where every person gives a 6 — slightly below neutral, arguably ambivalent — gets the same eNPS as a team where every person gives a 0. The buckets were designed for consumer purchase behavior and ported to employment without any revalidation of the cutoff points.

Passives are discarded. Scores of 7 and 8 are thrown out of the calculation entirely. In a 50-person company, that could be 20 or more people whose responses contribute nothing to the final number. These aren't edge cases — 7 and 8 are common scores. You're ignoring a large chunk of your workforce by design.

It measures employer brand sentiment, not employee experience. The question asks whether you'd recommend the company to a friend. That's an advocacy question. It tells you something about external reputation perception — it says almost nothing about whether someone feels burned out, undervalued, or stuck in a role with no growth path. A person can love telling friends about their company while quietly dreading Monday mornings. Those are different constructs.

There's a subtler issue too. eNPS creates an illusion of precision. You get a number. Numbers feel rigorous. A score of +34 feels more scientific than "people seem generally okay." But the precision is false — the methodology behind that number is full of arbitrary choices that were never validated for this context. A bad instrument that produces a clean number is worse than no instrument at all, because it gives you confidence without justification.

None of this means you should never ask the recommendation question. It has value as one data point among many. The problem is treating it as the metric — the single number on your people dashboard, the thing you set targets against, the KPI your HR team is evaluated on. One question measuring one construct through one arbitrary scoring scheme cannot bear that weight.

What actually matters: four dimensions of organizational health

We replaced eNPS with a model built on four dimensions: engagement, satisfaction, well-being, and culture. Each is measured with two to four questions on a 1-5 Likert scale, scored monthly. Here's the research basis for each.

Engagement

Gallup's Q12 research, based on a meta-analysis by Harter, Schmidt, and Hayes (2002), studied the relationship between engagement conditions and business outcomes across more than 100,000 business units. They found that specific conditions — clarity of expectations, opportunity to use strengths daily, recognition in the last seven days, and access to development — predict productivity, retention, and profitability at the business-unit level. The 2024 update of this meta-analysis confirms the same patterns with even larger sample sizes.

We don't use the Q12 items directly (Gallup holds the copyright). Our engagement questions measure the same underlying constructs: role clarity, strengths utilization, recognition frequency, and development opportunity. For example, "I have a clear understanding of what is expected of me" maps to Q12's expectations-clarity item. "I have the opportunity to do what I do best" maps to their strengths item. The idea is the same — engagement isn't a feeling, it's a set of workplace conditions that either exist or don't. You can act on "people don't feel recognized" in a way you can't act on "engagement is down."

Satisfaction

Work-life balance, growth opportunities, compensation fairness, tooling adequacy, and overall satisfaction. These map to what Herzberg called hygiene factors in his two-factor theory (Herzberg, 1959). Hygiene factors don't create motivation when present, but they create dissatisfaction when absent. You can have a highly engaged team that's simultaneously dissatisfied because the tools are broken and the pay is below market. Engagement and satisfaction are not the same thing, and measuring only one misses the other.

Well-being

This dimension draws on Maslach and Jackson's burnout research (1981). They identified three components of burnout: emotional exhaustion, depersonalization, and reduced personal accomplishment. The Maslach Burnout Inventory (MBI) is the standard instrument, but it's copyrighted and requires per-use licensing through Mind Garden.

We measure proxies for these three components. "I often feel drained at the end of my workday" maps to emotional exhaustion. "My workload is manageable" gets at the resource-demand imbalance that precedes burnout. "I feel motivated to do my best work" and "I feel disconnected from my work" address depersonalization and accomplishment. Negatively-framed items (feeling drained, feeling disconnected) are reverse-scored so that higher values always mean healthier.

Culture

Values alignment, belonging, psychological safety, leadership trust, and inclusion. Psychological safety comes from Edmondson's 1999 study of learning behavior in work teams, where she found that teams with higher psychological safety were more likely to report errors and engage in learning behaviors. The "I feel comfortable speaking up" question is a direct operationalization of her construct.

The culture dimension is the hardest to get right because it's the most contextual. What "values alignment" means differs across organizations. A startup that values risk-taking and a hospital that values protocol compliance have very different healthy cultures. We kept the questions general enough to work across different company types while still measuring something concrete: do people feel they belong, can they speak up without fear of retaliation, and do they trust the people making decisions. These aren't soft questions. Edmondson's research showed that psychological safety predicted whether medical teams reported errors — the difference between catching mistakes and hiding them.

Computing overall health

Overall organizational health is the simple average of the four dimension scores. All dimensions are weighted equally. We considered differential weighting — maybe well-being should count for more than satisfaction, since burnout has more severe consequences than tool dissatisfaction. But weighting introduces a value judgment into the score itself, and different organizations would reasonably weight differently. Equal weights are transparent, defensible, and don't embed our priorities into your data. If you want to prioritize a dimension, do it in your response to the data, not by manipulating the score.

Why 1-5 Likert, not 0-10

eNPS uses a 0-10 scale. We use a 5-point Likert scale. This wasn't arbitrary.

Rensis Likert introduced the technique in 1932 specifically for measuring attitudes, and five-point agreement scales have been the default in organizational psychology for decades. The anchors — Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree — give each point a clear meaning. On a 0-10 scale, the difference between a 5 and a 6 is ambiguous. On a 1-5 scale with labeled anchors, each point corresponds to a distinct attitude.

Fewer options also reduce cognitive load. We send surveys monthly, not quarterly. A 10-second question with five clear choices gets higher response rates than an 11-point scale that forces people to decide whether they're a 6 or a 7.

Our questions are agreement statements: "I have a clear understanding of what is expected of me." Agreement anchors are the natural fit. The UWES-9, for comparison, uses a 7-point frequency scale (Never to Always), but that's because its items ask about frequency of experience ("At my work, I feel bursting with energy"). Different item formats call for different scale types. Agreement items get agreement scales.

The practical upside: scores are immediately interpretable. A dimension averaging 3.8 means people generally agree with the positive statements. A dimension at 2.3 means they generally disagree. You don't need a conversion table.

Privacy and sample size: when numbers become meaningful

Survey data is worthless if people don't trust the anonymity. We built the privacy model into the architecture.

All responses are stored without user IDs. There is no join table that links a response back to a person. This is anonymity by architecture, not by policy. A policy can be overridden; a missing foreign key cannot.

We apply display thresholds at read time. Dimension-level scores (the average of questions within a dimension) require at least 2 responses before they're shown. Per-question breakdowns require 5 or more responses. This follows the k-anonymity principle — aggregate data should not allow re-identification of individuals, and small group sizes make that possible.

The thresholds are applied when displaying data, not when collecting it. No response is ever discarded. If a team has 3 responses today and gets 2 more next week, all 5 contribute to the per-question view once the threshold is met. Data isn't lost, it's just not yet displayable.

For small teams, this means you might only see dimension averages for the first few survey cycles. That's the right tradeoff. The alternative — showing a per-question breakdown when only 3 people responded — would let a manager triangulate individual answers. If you know that Alice and Bob scored high, the low score belongs to Carol. That's not anonymous. We'd rather show less data than compromise trust, because the moment people suspect their answers can be traced, response honesty collapses and the entire system becomes useless.

This is a real problem with most survey tools. They promise anonymity in their marketing copy, then show breakdowns by team, tenure, and department to any admin who asks. Architecture-level anonymity means there's no data to leak even if someone tries.

Alert thresholds that tell you what to fix

A single number going down tells you something is wrong. Four dimension scores going down tell you what is wrong.

We set two static thresholds per dimension:

Critical: below 2.5 out of 5. The average respondent is between "Disagree" and "Strongly Disagree" on the positive statements. Something is actively broken.
Warning: below 3.0 out of 5. The average is below neutral. Things aren't bad enough to be a crisis, but the trend is negative.

We also flag a 15% or greater decline from the previous measurement period, regardless of absolute level. A dimension dropping from 4.2 to 3.5 in one month is a signal even though 3.5 is above the warning line. Something changed and it needs attention.

Because alerts fire per dimension, the response can be specific. A well-being alert means you look at workload and burnout indicators. A culture alert means you look at psychological safety and belonging. An engagement alert with stable satisfaction scores suggests the problem is about meaning and growth, not about pay or tools. You're not staring at a single number trying to guess what went wrong. The diagnostic structure is built into the measurement itself.

Compare this to eNPS dropping by 15 points. What do you do? Survey again with more questions? Run focus groups? With dimensioned scores, the alert tells you where to look before you've had a single conversation.

What we'd recommend to anyone starting from scratch

If you're building an employee health measurement system today, here's what we'd suggest based on doing it ourselves.

Use 10 to 12 questions grouped into dimensions. More than that and completion rates drop. Fewer and you lose diagnostic resolution. Monthly cadence strikes a balance between signal freshness and survey fatigue.

Use a 5-point Likert agreement scale with labeled anchors. Frame most questions as positive statements and reverse-score any negatively framed items before averaging, so that higher always means healthier across all dimensions.

Include at least one open-ended text question. Numbers tell you what moved; free text tells you why. Even a simple "Is there anything else you'd like to share?" surfaces context that scores cannot.

Track dimensions separately. Resist the urge to collapse everything into one number and optimize against it. The whole point of this approach — the entire reason we moved away from eNPS — is that a single number hides what matters. If your engagement is high but your well-being is tanking, you need to see both numbers, not their average. An overall health score is fine as a summary for an executive dashboard, but the dimensions are where decisions get made. When well-being drops, you investigate workload. When culture drops, you investigate psychological safety. You can't do that with one number.

References

Edmondson, A. (1999). Psychological safety and learning behavior in work teams. Administrative Science Quarterly, 44(2), 350-383.
Gallup (2024). Q12 Meta-Analysis Report. gallup.com
Harter, J.K., Schmidt, F.L., & Hayes, T.L. (2002). Business-unit-level relationship between employee satisfaction, employee engagement, and business outcomes. Journal of Applied Psychology, 87(2), 268-279.
Herzberg, F. (1959). The Motivation to Work. John Wiley & Sons.
Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140, 1-55.
Maslach, C. & Jackson, S.E. (1981). The measurement of experienced burnout. Journal of Organizational Behavior, 2(2), 99-113.
Mind Garden. Maslach Burnout Inventory.
Reichheld, F.F. (2003). The one number you need to grow. Harvard Business Review, 81(12), 46-54.
Schaufeli, W.B., Bakker, A.B., & Salanova, M. (2006). The measurement of work engagement with a short questionnaire. Educational and Psychological Measurement, 66, 701-716.
Schaufeli, W.B., Salanova, M., González-Romá, V., & Bakker, A.B. (2002). The measurement of engagement and burnout. Journal of Happiness Studies, 3, 71-92.
Qualtrics. Is Employee Net Promoter Score (eNPS) a Good Measure of Engagement?
AIHR. Employee Net Promoter Score Guide.
Wikipedia. k-anonymity.