The Purpose of Metrics in a Game
Bateman and Bartle on Pokémon Go

Defending Game Metrics

In a comment replying to The Craft of Game Design Cannot Be Measured By Any Metric, game designer and Chief Creative Officer at Spryfox Dan Cook gave such a sterling, thorough rebuttal that I’ve reposted it here in full.

 

Dan CookEmpiricism in Game Design

When I design I have a mental model of how I imagine my game will be played by players. This includes predictions about player emotions, learning, buying behaviors and a dozen other factors necessary to make a self-sustaining game in one of today’s various markets. I also make predictions about how markets will act. Platform desires, player designers, press desires.

Then we build the game, or at least we build an initial version of it.

Then we playtest the game to see if the my predictions worked out. Most of the time they don’t. In the best cases I’m only off by a factor or two. In the worse cases I’m off by several orders of magnitude. However, I may also find that players behaved in a manner that was actually more interesting than I predicted.

So we build another iteration of the game. Somehow, we need to connect the empirical reality of what the playtest suggests with what we predict will happen. This usually involves updating our models, sometimes radically. Often incrementally.

For some designers, this process can be frustrating. The reality of player behavior imposes constraints on their mostly imaginary vision. But I tend to see constraints as necessary to the process of design. And constraints based off observing real people playing the game tends to more often than not yield opportunities to impact the real shared world of many people vs the isolated imaginary world of a single person. We find new ways of playing that are more vibrant and interesting.

 

How are metrics useful when iterated on a game?

Game designers are information starved. With writing, we have an imperfect but competent mechanism for imagining how someone might feel reading a bit of text. In order to write, you must read. And thus you are forced to process a work in a somewhat similar fashion to how a potential reader might process. Game developers do not have this luxury. We build systems multiple times removed from a player’s experience. Write some code. Do a dozen other steps. Build an executable that someone somewhere runs. Knowing how people with react to what we make is hard.

So we use crutches. We create complex models of how players think. We use ‘proven’ patterns. We watch players and try to imagine what they are feeling. Then we try to backtrack all far removed information to whether or not a number in the bowels of a broken machine should be 2 or 4.

There are certainly classes of information we can extract more easily. Surface player emotions on individual playthroughs. Awesome. We can do that. But human behavior is broad. We see the need to sample behaviors across populations and discover central tendencies or outliers.

So metrics or analytics are that tool. They let us understand statistical patterns of behavior. Do they let us see inside the minds of our players? No. Nothing does yet. Do they replace in person playtests? No. Smart designers use multiple sources of insight.

But metrics do provide an amazing range of insight by allowing us to look at hard problems from a different direction. If players in an MMO are flooding forums with complaints about a change, how many people are impacted? How did playstyles change?

When balancing economies and progression systems, metrics are essential. You can’t do an in-person playtest of someone playing a game for 90 days. The old tools don’t work. And various forms of data collection do.


Maybe all this doesn’t need to be said. Maybe you are worried about something else entirely.

Are you worried about how metrics shines a light on bullshit design? Because a lot of design is unsubstantiated bullshit. We imagine people will play a game a certain way and then they don’t. Such an ego buster. Metrics beat us with bully numbers. They bluntly state our initial idea was flawed. Or even worse, the thing that people have been praising us for years doesn’t actually apply to anyone but some weird elite group of outliers that happens to give out chintzy feel good awards. Reality can be cruel when you live in a fantasy. But it also acts as a constraint that forces us to up our game and make something that works. Versus wandering blindly off a cliff in a feel good haze. Which I’ve done. (Lovely until you fall).

Are you worried that Bad Men use metrics in a reductive fashion to emphasize making money over art? Bad Men have been emphasizing making money over art for a very long time. For any golden era of games there were penny pinchers micromanaging creative decisions at a level that destroyed souls. Might I suggest that a new tool for getting data is not the actual problem. The team sets their goals. The tools just get them there.

Are you worried that we are using Dumb Metrics? That the dumb patterns dumbly followed by dumb practitioners result in dumb ideas and dumb games? Well it is true. And the solution is one that applies to all complex instruments used in the pursuit of art and beauty: Get Good.


I actually see metrics, competent design and building something positive that meets player needs as three complementary pursuits. I’ve asked “Well, what do players want and how does that align with business? And how does that align with art or craft?”

Here’s one answer. Many players want connection with meaning and community. They want mastery and agency. This leads to them enjoying an activity for a long period of time. That results in great retention metrics. And when deep needs are being met, people are willing to spend. Will I spend a buck on Pokemon lures to enhance a relaxing afternoon with my wife at the coffee shop? Yes. It makes for joyful light conversation. The game improves our relationship by creating a shared playful space.

Metrics track and tune all this. Is that evil? Just the opposite. I consider it doing great good for the world through competent design practices.

I have made minor edits to the text to make it read as a standalone post: the original comment is still available under the original post.

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Thanks Dan! Just two disagreements.

I disagree that nothing lets you see into the minds of players: we use emotions of play for our blind trials, and this is effectively seeing into the minds of players in a very distinctive way. When the face shows the micro-expression for frustration, the player is frustrated. When the fist bump shows fiero, the player is experiencing triumph. You can then talk to the player after the trial, which may not always be reliable testimony, but it's closer to seeing into their mind than metrics. This for me is the gold standard for design iteration... but I totally take Brian's earlier point (on Twitter) that this is slow, expensive high-grade data, whereas metrics are quick, cheap, high volume data. And I can see a role for both.

However, I disagree in defaulting to seeing tools as morally neutral. This is a side effect of the Enlightenment split into minds and matter that is still causing problems today. Agency is more distributed than that; a landmine is not a morally neutral object... Every tool affects our affordances, and thus our agency, and hence has moral impact, and the impact the tool has depends upon the nature of the tool. Metrics are no different. But this is an issue that goes far beyond metrics in games, and one I've written about plenty in other place.

Many thanks for advancing the debate! It's greatly appreciated.

Chris.

I of course agree with you on both these points!

In person observation is the current gold standard for moment-to-moment understanding how players are reacting to the game. We may eventually improve upon that with biometrics, but we aren't there yet.

In addition to the cost of playtesting, it also has some rather serious limitations. It is good for high engagement sessions where someone comes in to a game playing environment and plays a game for a period of time; 15 to 60 minutes tends to be the sweet spot. Games that have shorter play sessions spread over longer periods of time often resist playtesting. And games that are played in non-conference room setups are harder to test. The toilet game. The subway game. The relaxing after taking care of my screaming kids game.

Where playtesting is common, we get a lot of games that playtest well. Local multiplayer games test amazingly well so indie teams make a bunch of them. Yet they flop in the broader world because that play context is rare in our modern life.

I tend not to traffic too much in morality (such a flexible topic in the hands of a rhetorician!) but I would agree that tools are *specialized*. They've got limits and strengths. And these drive biases. Data analysis can hide assumptions and mistakes behind opaque numbers. "It says 5" needs to always be followed by "What chain of things were measured to get to 5?" "What does 5 mean relative to other things?" "Does 5 matter with regards to our goals?" "Is there different knob that we should be setting to E that might work better?"

Metrics and playtests are both specialized tools to be wielded with care and craft.

Hey Dan,
I confess to never once thinking about the challenges involved in testing 'toilet games' before. :)

70 minutes is about the ideal length for the kind of blind play testing we do. I have never been hired to test an Augmented Reality game, but I think we could do it - but boy, that could be expensive, as it would take hours in each case. I sometimes think I should be pushing this side of our business more. ;)

The local multiplayer problem is an interesting ones. Indie devs often manage to build games they love playing, and as you say they'll often test well - but getting people to play in that mode can be a challenge, as you say. Yet both Mario Kart and Worms managed it to great commercial success. (Worms managed it in hotseat mode - I don't know of another game that managed this!). There are always surprises.

As for morality, I rather doubt that you don't traffic in it. Your previous comment is full of it, and very sensitively constructed. Probably you see what you're doing in terms of utility arguments... but these are also moral arguments. I suspect what you mean is that you tend to avoid making absolute moral judgements. But there's a lot more to morality than moralism, and I think everyone underestimates both the extent that they are involved with moral decisions on a daily basis, and the futility of reducing morality to mere absolutes.

Many thanks for getting involved in this discussion! I shall be hitting up the States next year, but I don't know if I'll make it as far as the West coast... might just be Tennessee and Texas. But would be great to catch up if we ever end up in the same corner of space-time again. ;)

*waves enthusiastically*

Chris.

Metrics are a fantastic tool for measuring player behaviour across large sets of users and long periods. Detractors often point to the less tangible measurements such as enjoyment and emotion that metrics can't measure but that doesn't mean metrics is useless - it's just useful for tangible measurements like understanding the times of day people play and how long a normal session might last. Playtesting of any kind will never be able to tell you how players in general behave in their natural environment - testers come in for a set period of time and have to play when they're told to.

Particularly when it comes to casual games and especially mobile games the only way to get actionable insight into the real playerbase (rather than selected participants or eager volunteers) is through metrics.

You can playtest and focus test and interview or whatever as much as you like but without metrics you're never going to know what demographics are actually playing your game or how, or for how long or how far they're getting.

Hi Ben,
While I agree that the narrow field of vision metrics provide doesn't make them useless, but I disagree with the claim that metrics let you know "what demographics are actually playing your game"... No game metric system I've seen provides demographic data, and when it comes to games - as indeed 21st Century Game Design argued - the demographics that matter are not so much age and gender (which can be sourced - although I wouldn't acquiring this data 'metrics', personally!). The important demographics for games are play styles, and this is something no automated metric measures. (Although I did apply for a grant for a metric system to bridge this gap, but that's another story...)

I'll say more in your other comment. Many thanks for wading in!

Chris.

Age and location information can be useful particularly in order to tailor future events and content to your audience. Though not easily available through many platforms, Facebook games for instance have access to a wealth of demographic detail.

As for play styles, actually this is something I'm working on currently. The only possible way to find out play styles is by looking at metrics for a large population. This can include (but is not limited to) cluster segmentation based on things like missions completed per session, session length, play time of day, upgrades per level and many others.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Your Information

(Name is required. Email address will not be displayed with the comment.)