The case against gamified habits — and what works better
TL;DR. Gamification adds external rewards — XP, gold, character avatars, leaderboards — to habit-tracking. Decades of self-determination theory research show that external rewards can crowd out the intrinsic motivation that long-term habit formation actually needs. When the points stop, the behavior often stops too. The science-backed alternative is automaticity-based progression: track what's becoming automatic, celebrate when willpower drops to zero, and skip the points entirely.
What "gamification" actually means
The word gets used loosely. Before the case can be made, the term needs to be sharpened.
Origins (Foursquare 2009, Khan Academy 2010)
The term gamification was coined around 2003 but didn't enter mainstream product design until Foursquare's 2009 launch made checking into coffee shops feel like collecting Pokémon. Khan Academy's 2010 redesign added energy points and badges to math lessons. Within three years, "gamify your X" had become a standard pitch in the productivity, fitness, education, and corporate-training markets.
The pitch is straightforward: human attention is hard to win. Game mechanics are designed to win attention. Therefore, applying game mechanics to non-game activities will increase engagement.
The first half of that pitch is true. The second half — the implied "and therefore produce the long-term behavior change you want" — is where the design fails for habits specifically.
The promise: make hard things fun
Gamification promises that hard things become easier when wrapped in a game layer. Drinking enough water is a chore; hitting a daily hydration goal to unlock a tropical-fish avatar is fun. Practicing French verbs is dull; earning XP and seeing your fluency rank climb is engaging. The feedback loop is fast, the reward is immediate, the experience is pleasurable.
For one-time goals and limited-duration challenges, this works well. For habit formation specifically, it works less well, and for a specific reason that the next sections walk through.
The mechanics: XP, badges, leaderboards, streaks, avatars
The standard toolkit, more or less ranked by how external the reward feels:
- XP / experience points — abstract numerical reward that accumulates indefinitely.
- Levels — milestone unlocks tied to XP thresholds.
- Badges / achievements — discrete trophies for specific accomplishments.
- In-game currency (gold, coins) — tradeable for character upgrades, items, perks.
- Avatars and characters — visual representations of the user that grow / level / customize.
- Leaderboards — public ranking against other users.
- Strict streaks — counters that reset to zero on a single missed day.
The deeper down the list, the more external the reward and the stronger the crowding-out effect tends to be in long-term studies.
Where gamification works
It's worth being precise: gamification isn't useless. It works well in three contexts that habit formation isn't.
One-time tasks (signups, profile completion)
Onboarding flows, profile-completion progress bars, and "complete these 5 setup steps" flows benefit from gamification. The behavior you're trying to drive is a single completion, not a long-term repeated action. The external reward gets the one-time job done and then the user moves on. Crowding-out doesn't apply because there was no underlying motivation to crowd out — it was a chore.
Discovery and exploration phases
When a user is exploring what an app or system can do, gamified prompts ("try this feature to earn 50 XP") effectively guide attention. This is the original Foursquare insight: the first 90 days of using a new app are an exploration phase where game mechanics meaningfully shape what you discover. Once you've been using the app for a year, the same mechanics become annoying.
Social-comparison environments
Public, social, achievement-oriented contexts (charity fitness challenges, classroom-wide vocabulary competitions, sales-team performance dashboards) have always run on social comparison, and gamification fits naturally. The activity is already about external recognition, so adding more external structure isn't a change in kind.
Daily personal habits are the opposite of all three of these contexts. They're long-term, repeated, internal, and private. Which is why the technique that works in those settings stops working — or becomes actively counterproductive — for habits.
Where gamification fails: long-term behavior change
Self-determination theory: intrinsic vs extrinsic motivation
The most important framework here is self-determination theory (SDT), developed by Edward Deci and Richard Ryan starting in the 1970s. SDT distinguishes between two kinds of motivation:
- Intrinsic motivation — doing something because the activity itself is rewarding, meaningful, or aligned with your identity.
- Extrinsic motivation — doing something because of an external reward or punishment.
Both can produce action. They produce different kinds of long-term behavior. Intrinsic motivation is robust — it survives changes in environment, removal of rewards, periods of low energy. Extrinsic motivation is brittle — it requires the external reward structure to keep functioning, and tends to collapse when that structure is removed.
For habits, which are defined by the behavior eventually no longer requiring willpower or external prompting, intrinsic motivation isn't merely preferable. It's the goal. A habit that requires ongoing extrinsic reinforcement isn't a habit yet — it's a managed behavior.
The Deci 1985 over-justification effect
The empirical heart of the case against gamified habits is what's called the over-justification effect, demonstrated most famously by Deci and Ryan (and earlier by Lepper, Greene, and Nisbett 1973). The setup:
Take a group of people who already enjoy doing something. Give half of them an external reward for doing it. Continue for some weeks. Then remove the reward.
The consistent finding: the rewarded group does the activity less afterward than the never-rewarded control group. The external reward retroactively reframes the activity in the participant's mind from "something I do because I want to" into "something I do for the reward." When the reward goes away, so does the perceived reason.
This is the central problem with gamifying behaviors that you want to become long-term habits. The XP, the gold, the avatar — these are exactly the kind of external rewards that the over-justification literature warns against. They can drive engagement during the reward period and then leave the user with weaker intrinsic motivation than they started with.
When the points stop, the behavior stops too
The practical consequence is the user lifecycle that gamified habit apps consistently produce: an intense first 6–8 weeks of engagement, followed by a sharp dropoff as the novelty of the rewards fades.
The sharp dropoff isn't because users want to stop the behavior. It's because the gamified system trained them to associate the behavior with the reward, and the reward's marginal value has decayed below the threshold needed to justify the action. Without an underlying intrinsic motivation that the system has helped strengthen, there's nothing to fall back on.
The Habitica case study
2 million users, beloved by some, abandoned by most
Habitica is the canonical gamified habit app — released in 2013 as HabitRPG, built around an explicit RPG metaphor where habits are quests, you have a character with HP and mana, you earn gold and gear, and you can join parties to fight bosses together. It is genuinely well-designed within its frame. It has a passionate community.
Public usage data and a large body of user-written reflections paint a consistent pattern.
The pattern: 6 weeks of intense engagement, then drop-off
The dominant Habitica experience: an intense onboarding period of six to eight weeks during which users level their characters, hit the early gear unlocks, and feel real progress. Then a tapering — the reward density drops as later levels take longer, the novelty of the avatar fades, and the question "wait, why am I doing 30 push-ups for in-game gold?" starts to surface.
A meaningful subset stays. Most don't. The users who stay tend to be those who genuinely enjoy the RPG layer as a thing in itself, independent of the habit-formation outcome.
Why some people thrive (ADHD, novelty-seeking) and most don't
The Habitica subreddit, App Store reviews, and habit-research papers that include gamified apps in their samples consistently identify a few profiles that do thrive:
- People with ADHD for whom the constant novelty and immediate reward feedback genuinely helps initiate hard-to-start activities.
- High-novelty-seeking personalities who enjoy the RPG layer as entertainment in itself.
- People who explicitly enjoy game mechanics and would happily play a Habitica-equivalent even without the habit layer.
For these users, gamification works long-term because the gamification is itself a source of intrinsic motivation. The over-justification effect doesn't apply when the rewards aren't experienced as external — they're experienced as part of why they're playing in the first place.
For everyone else — most people — the rewards are a brittle scaffold that holds for a few weeks and then collapses, taking the underlying behavior with it.
What works better — automaticity-based design
Track the journey from "willpower required" to "no thought required"
The science-backed alternative to gamification is to track automaticity directly — the actual mental state that defines a habit. Instead of measuring how many points you've accumulated, you measure how reflexive the behavior has become.
This isn't a slogan; it's a measurable construct. The Self-Report Habit Index (Verplanken & Orbell 2003) gives a numerical score for how automatic a behavior feels. Lally's work modeled the curve over time. Keelify's strength score is a continuous proxy for the same underlying state.
The user feedback loop is: "I started this habit. It used to take willpower. It takes less willpower now. Soon it won't take any." That loop is genuinely intrinsic — the reward is the relief from effort, which is permanent and useful, not a digital trinket that decays in value.
Celebrate the disappearance of effort, not the accumulation of points
The right milestone isn't "you reached level 10." It's "this behavior is now automatic — you no longer need this tracker for this habit. Archive it and start the next one."
That milestone is genuinely meaningful in a way no XP threshold can be. It corresponds to a real and durable change in the user's life. Gamification can't reach for that milestone because the system depends on the user remaining engaged with the system. Automaticity-based design celebrates the user's ability to leave it behind.
Lally 2010: this is what habit formation actually IS
The framing matters. The goal of habit formation, as understood by the most-cited paper in the modern literature, is to reach the state where the behavior happens without conscious deliberation. That's the destination. Anything that delays or distracts from that destination — including reward structures that condition the behavior on receiving points — works against the actual goal.
How Keelify thinks about this
Strength score (0–1) instead of XP
Keelify's habit tracker uses a continuous strength score from 0 to 1, weighted toward recent adherence. There's no XP, no levels accumulated indefinitely, no gold. The score directly approximates automaticity, which is what we actually want to measure.
Four milestones (Spark/Foundation/Integration/Mastery) instead of levels
The four milestones map to the typical progression observed in Lally's data. They're descriptive of where you are in the formation process, not arbitrary thresholds tied to engagement metrics. Mastery isn't something you keep climbing past — it's the destination, and reaching it means the work is done.
No leaderboards, no characters, no gold
Keelify is single-player. Your habits aren't ranked against your friends'. There are no avatars to grow, no tropical fish to unlock, no in-game currency. The whole UI is designed to feel like calm personal infrastructure, not a game.
The streak counter and the milestone progression are the closest things to game-like elements, and both are designed to track the underlying behavioral reality rather than amplify external incentives.
When gamification is the right answer
The case against gamification isn't absolute. Three contexts where it works well:
Children's behavior (developmentally different)
Children's reward learning is genuinely different from adults' — external reinforcement plays a more central role in early skill acquisition, and the over-justification effect is weaker (though still present) in younger age groups. Gamified reading apps, math apps, and chore-tracking apps for kids have a much stronger evidence base than the equivalent products for adults.
Specific medical-rehab contexts
Stroke rehabilitation, post-surgical physical therapy, and structured behavioral interventions for specific conditions sometimes benefit from gamified compliance tracking. The behavior is bounded in duration, the goal is recovery rather than long-term habit, and the reward structure can be unwound when the rehab period ends.
One-time habit-spike challenges (30-day photo)
Short-duration challenges — 30-day photo projects, no-spend months, 75-Hard-style structured programs — operate more like one-time tasks than long-term habits. Gamification works fine for these because the goal is completion, not consolidation.
The error is generalizing from those contexts to "and therefore gamified daily-habit-tracking apps work for adult lifelong personal habits." That's the leap the evidence doesn't support.
Frequently asked questions
Isn't gamification just "making things fun"?
That's the marketing version. The behavioral version is more specific: gamification adds external reward structures (points, badges, levels, in-game currency, characters) to activities that wouldn't otherwise have them. It's a particular technique, not a synonym for fun. A well-designed habit tracker can be enjoyable to use without being gamified — and the distinction matters because the two approaches produce different long-term behavioral outcomes.
What does "crowding out" intrinsic motivation actually mean?
It means that when you reward someone with external incentives for doing something they already wanted to do, their internal reasons for doing it can weaken. Once the external reward is removed, the behavior often stops — even though it would have continued without the reward in the first place. The classic study is Lepper et al. 1973: kids who loved drawing were given gold-star rewards for drawing; weeks later, when the rewards stopped, those kids drew significantly less than the control group who never got rewards.
But Habitica works for me — what's wrong with that?
Nothing — for some people, gamification works long-term. The pattern in the data is that people with strong novelty-seeking traits, ADHD, or a personal affinity for game mechanics often thrive with gamified systems. They're a real and meaningful subset of users. The case against gamified habits isn't that it never works; it's that it works less reliably than automaticity-based design across a general population, and that the failure mode (sudden total dropout when novelty fades) is harsher.
Doesn't Keelify have streaks too? Aren't streaks gamification?
Streaks are a borderline case. They're a count of consecutive days, not an external reward — there's no in-game currency, no level-up, no character unlock. Where Keelify departs from typical streak design is the grace day rule: a single missed day doesn't reset, because the science says it shouldn't. We treat streaks as informational, not punitive. That's a meaningful distinction from gamified streaks that exist primarily to weaponize loss aversion.
What about competition with friends? Does that count as gamification?
Yes, and the literature on social-comparison-based motivation is mixed. It can be effective for short bursts (30-day fitness challenges, group goals) and corrosive for long-term personal habits, where it shifts the activity from "because I want to" to "because I don't want to lose to Jen." Keelify is deliberately single-player. Habits are personal infrastructure, not a competition.
Are streaks better than levels?
Slightly, in our view, because streaks at least correlate with actual habit consolidation — a 60-day streak is a real proxy for whether the behavior is becoming automatic. Levels and XP correlate with how much you've been opening the app, which is a much weaker proxy for whether the underlying habit is sticking. But neither is a substitute for measuring automaticity directly, which is what the strength score in Keelify is designed to do.
Sources
- Deci, E. L., & Ryan, R. M. (1985). Intrinsic Motivation and Self-Determination in Human Behavior. Plenum Press.
- Lepper, M. R., Greene, D., & Nisbett, R. E. (1973). Undermining children's intrinsic interest with extrinsic reward: A test of the "overjustification" hypothesis. Journal of Personality and Social Psychology, 28(1), 129–137.
- Lally, P., van Jaarsveld, C. H. M., Potts, H. W. W., & Wardle, J. (2010). How are habits formed: Modelling habit formation in the real world. European Journal of Social Psychology, 40(6), 998–1009. https://doi.org/10.1002/ejsp.674
- Verplanken, B., & Orbell, S. (2003). Reflections on past behavior: A self-report index of habit strength. Journal of Applied Social Psychology, 33(6), 1313–1330.
- Wood, W. (2019). Good Habits, Bad Habits: The Science of Making Positive Changes That Stick. Farrar, Straus and Giroux.
- Hamari, J., Koivisto, J., & Sarsa, H. (2014). Does gamification work? — A literature review of empirical studies on gamification. Proceedings of the 47th Hawaii International Conference on System Sciences.
Last updated: 26 April 2026. Reviewed by the Keelify team.
Frequently asked questions
Isn't gamification just 'making things fun'?
That's the marketing version. The behavioral version is more specific: gamification adds external reward structures (points, badges, levels, in-game currency, characters) to activities that wouldn't otherwise have them. It's a particular technique, not a synonym for fun. A well-designed habit tracker can be enjoyable to use without being gamified — and the distinction matters because the two approaches produce different long-term behavioral outcomes.
What does 'crowding out' intrinsic motivation actually mean?
It means that when you reward someone with external incentives for doing something they already wanted to do, their internal reasons for doing it can weaken. Once the external reward is removed, the behavior often stops — even though it would have continued without the reward in the first place. The classic study is Lepper et al. 1973: kids who loved drawing were given gold-star rewards for drawing; weeks later, when the rewards stopped, those kids drew significantly less than the control group who never got rewards.
But Habitica works for me — what's wrong with that?
Nothing — for some people, gamification works long-term. The pattern in the data is that people with strong novelty-seeking traits, ADHD, or a personal affinity for game mechanics often thrive with gamified systems. They're a real and meaningful subset of users. The case against gamified habits isn't that it never works; it's that it works less reliably than automaticity-based design across a general population, and that the failure mode (sudden total dropout when novelty fades) is harsher.
Doesn't Keelify have streaks too? Aren't streaks gamification?
Streaks are a borderline case. They're a count of consecutive days, not an external reward — there's no in-game currency, no level-up, no character unlock. Where Keelify departs from typical streak design is the grace day rule: a single missed day doesn't reset, because the science says it shouldn't. We treat streaks as informational, not punitive. That's a meaningful distinction from gamified streaks that exist primarily to weaponize loss aversion.
What about competition with friends? Does that count as gamification?
Yes, and the literature on social-comparison-based motivation is mixed. It can be effective for short bursts (30-day fitness challenges, group goals) and corrosive for long-term personal habits, where it shifts the activity from 'because I want to' to 'because I don't want to lose to Jen.' Keelify is deliberately single-player. Habits are personal infrastructure, not a competition.
Are streaks better than levels?
Slightly, in our view, because streaks at least correlate with actual habit consolidation — a 60-day streak is a real proxy for whether the behavior is becoming automatic. Levels and XP correlate with how much you've been opening the app, which is a much weaker proxy for whether the underlying habit is sticking. But neither is a substitute for measuring automaticity directly, which is what the strength score in Keelify is designed to do.