Which metrics to use when picking an intervention

What gets measured gets improved. - Robin Sharma When selecting which projects to implement, you should also try to maximize your impact. Your ethical beliefs will dictate what ‘good achieved’ means to you personally - that will be the metric you use to measure your charity’s success. In this we will discuss the pros and cons of some common metrics but, ultimately, your values will underpin your decision: do you value education as an intrinsic good? How about income? Is it worse to be blind for a year or deaf for a year? Read on to find out why we think wellbeing is often the best metric of all.

THE SOVIET NAIL FACTORY: A PARABLE ON THE IMPORTANCE OF METRICS

A metric is a unit of measurement. Usually people think of things like centimeters or kilograms, but it can include anything that that can be measured, such as income per capita, health, education, and happiness. Which metrics you choose to pay attention to, and specifically how you measure them, will dramatically change the effectiveness of your charity. Take for example, the following, probably apocryphal, story from the Soviet Union. During Stalin’s reign, the government set a quantity quota for a nail factory. The manager then orders people to churn out thousands of tiny, useless nails. The government, frustrated, then set set a weight quota. The nails could not be so miniscule as to be useless. The factory hears the order and promptly starts producing big, heavy nails, weighing a tonne each. The moral of the story? In both cases, the factory was using unhelpful metrics (first quantity and then weight) when ‘number of functional nails’, as well as perhaps some other metrics, like employee satisfaction and health, might have been better indicators of success. Unfortunately, this same mistake is very common in the nonprofit sector. In order to understand how successful a program really is, you need to understand how to recognize whether a metric is useful or not. This chapter will cover how you can do so, as well as particular metrics that are more or less helpful.

HOW TO SPOT AN INCOMPLETE METRIC

Identify the core goal It is hugely important to clarify exactly what your target is early on, so you can identify the key metric for your charity or project. Choose a metric that has a thoroughly proven causal link with your end goal. Once you know what your metric is, you’ll be able to monitor whether you’re hitting it - and so will your donors. Would it be possible to cheat this metric? Ask yourself whether it is possible to game this metric, thus making it less valuable. For example, if Kate wanted to gain a bunch of traffic to her charity’s website, it would be quite easy for her to invest in non-targeted online ads. Although this would boost her website traffic, it would be very unlikely cause any real increase in welfare, which is the metric Kate really cares about. Causality chains A causality chain is a representation of the causal steps linking an intervention to its intended impact. Good metrics measure the end result, so should occur at the end of the causality chain. Take a look at these two examples, one of which measures the endline metric and one of which measures the input and blindly trusts that the chain to impact will work in practice. Consider an unconditional cash transfer charity that uses self-reported happiness as a metric. Their ‘causality chain’ looks like this:

↓Cash is sent to the global poor with no strings attached↓ ↓The global poor spend the money on what they desperately need↓ ↓The global poor don’t need to worry about being able to afford basic necessities↓ ↓Not having to worry about basic necessities makes the global poor happier↓ ☆When people are happier, they say that they are happier when asked.☆

Measuring future happiness is a good way to test the validity of the causal chain. If one of the links is broken, the global poor will not be happier, making this a great measure of the intervention’s effectiveness. Now consider an organization that lobbies the local government to provide the homeless with free suits for job interviews. They measure success by the number of petitions sent:

↓Many petitions are sent lobbying for suits for the homeless↓ ↓The government feels a sense that they should do what the public wants↓ ↓The government states they will start a program to provide suits↓ ↓The government gets the bill passed↓ ↓The program is functional enough that it’s relatively easy for the homeless to get suits↓ ↓The homeless go to the program and get suits↓ ↓The homeless get job interviews↓ ↓The homeless wear the suits to their interviews↓ ↓The homeless get jobs they wouldn’t have otherwise↓ ↓The jobs prevent them from being homeless↓ ☆The homeless are happier working at their new jobs than they were on the street☆

This metric is not the best measure of the intervention’s effectiveness because ‘petitions sent’ occurs at the start of the causality chain, so any weak links further down will go unnoticed. There’s actually a mathematical explanation for this: Probability of both A & B happening = [Chance of occurrence A] x [Chance of occurrence B] For example, to find the odds of flipping two heads in a row, you multiply 0.5 by 0.5. The free suit lobbyists’ chain had 10 connections, so even if there is a 70% chance that each one will follow through, which is fairly high, that’s 0.7 to the power of 10, which is 2.8%. This means there’s a 97.2% chance that the plan will fail. All this assuming that each step has a 70% chance of working, which is implausible. Of course, sometimes you cannot measure the endline metric, so you will have to make do with something further up the chain. Additionally, if there are multiple studies supporting the causality between each link, we can be more confident in the validity of the chain. In example 1, the causality chain is backed up by evidence. In example 2, the supposed causality is based on problematic/ debatable assumptions. Do petitions affect legislation? Is a new suit enough to get a homeless person a job? In fact, at each step there are plenty of ways for the plan to go wrong. Generally, the more steps there are, the more confidence you need to have in each one.

EXAMPLES OF INCOMPLETE METRICS

Let’s look at some real-life examples of ‘incomplete’ metrics that don’t tell the whole story: Work hours Imagine that you are starting a charity and deciding what intervention to implement. You work really hard. In fact, last week, you completed 54 hours of research. Was it effective? It’s hard to know without knowing what you actually spent the hours doing. Sure, your inputs matter some, and it’s possible that working more hours translates into finding a better intervention, but it’s much more accurate to measure your outputs. How much did you accomplish during those hours, and how much more effective is your new best option than your original one? Organization growth Imagine that you’re the CEO of a charity. The organization was founded in 2011 and had a budget of $30K. Now it’s 2016 and it has a budget of $300K. That’s tenfold growth in just five years. But is the charity effective? The amount your organization spends, like the amount of time you spend working, is only a means to an end. Again, what matters is what you accomplish with that budget. It’s possible to do more good with $30K than $300K.

WHICH METRICS ARE USEFUL AND WHEN?

Generally speaking, the best metric to focus on is happiness, as measured by self-reports because that metric is closest to what people truly care about and is relatively easy to measure. People are surprisingly good at assessing their level of happiness and reporting it in questionnaires. If that isn’t available, the next best metrics, depending on your intervention, are DALYs or income. How do you measure happiness? The short answer is: you ask. Since happiness is subjective, even if you hooked up a brain to an MRI and measured their outputs when watching a funny movie, the only way you could tell whether this actually represented happiness is to ask the subject. The long answer is that it depends on what sort of happiness you are concerned with: moment-to-moment emotions, or overall assessment of life. The former is formally called affect balance and the latter, life satisfaction. “Affect” in psychological circles refers to emotions, so affect balance refers to the balance of positive and negative emotions one has throughout the day. The psychological index used to measure this is to ask people to what extent they felt different negative and positive emotions throughout the last day or week. For example, they might ask “how often did you feel sad yesterday? How about excited? How about contented? How about scared?”. Afterwards they add up the positive and the negative and give a score for both negative and positive experiences. They don’t subtract the negative from the positive as one might expect, because negative and positive emotions don’t always negatively correlate. Somebody might predominantly feel neutral most of the time, whereas another might experience extreme highs and lows throughout the course of the day (Harmon-Jones & Harmon-Jones, 2010, p.120). You probably know instances of both sorts of personality yourself. The most common metric for this is the Positive Affect and Negative Affect Schedule (PANAS). It gives you more fine grain information on the person’s experiences, but takes longer to administer. Life satisfaction is the answer to the question, “looking at your entire life, how good would you say it is?”. This is an easier metric to gather because it is shorter. However, it leads to a less fine-grained understanding. For example, a revolutionary may be captured and tortured for decades before dying. However, because of their dedication to the cause and their pride at not having given up their comrades, they might say that they are satisfied with their life. However, their affect balance was probably quite bad during the torture. This is not to say that maximizing affect balance always leads to political complacency. It might be good in expectation to fight for your cause and risk torture for the sake of the country’s overall affect balance, but it does highlight the need for a more detailed metric. One issue with these metrics is that people can be overly influenced by the most recent event. For example, if you had an otherwise good day but just before a psychologist asked you how your day was, you stubbed your toe, it might color your response. The Experience Sampling Method may well be the best way to measure correct for this (source, p2). Researchers using this technique ask participants to stop at certain intervals and record their experiences in real time, thereby avoiding the usual biases associated with recall. Day Reconstruction is another stronger than average method, which involves asking participants to describe the experiences they had on a given day, through a systematic reconstruction conducted on the following day (Kahneman et al, 2004, p.1776). However, this method does still rely upon memory recall. What are DALYs Disability Adjusted Life-Years (DALYs) are a measure of how many years of healthy life are lost due to early death or debilitating condition. According to the WHO, “the sum of these DALYs across the population, or the burden of disease, can be thought of as a measurement of the gap between current health status and an ideal health situation where the entire population lives to an advanced age, free of disease and disability.” Years of life lost is calculated with respect to the average life expectancy at birth. Each disability is given a weight which is multiplied by the years spent living with the disease. E.g. blindness has a disability weight of 0.6, so if you were blind for twenty years that would mean that you have lost 0.6 x 20 = 12 DALYs. To get the weightings for the disabilities, they brought it to a panel of judges and asked them to give them a weighting from 0 to 1. A weight of 1 meant that they would be indifferent between having the condition and being dead, compared to 0 where one would be indifferent between having the condition and having perfect health. Benefits 1. Comparison It gives a single metric to all of the different diseases, which allows you to find the more cost effective ways to help a population. This helps policy-makers and donors make the difficult decision of who to prioritize with limited resources. 2. Pain is bad Health is an important cause to focus on. There is much research finding that ill people and countries with high disease burdens are unhappier than their healthier counterparts. Additionally, most illness is inherently bad. People inevitably bring up masochists as a counterexample, but masochists are rarely if ever into a getting cancer. The “pain” referred to is actually pleasant for them, so in a sense, does not fall into the pain category. It is similar to the experience of people who like spicy food. It hurts, but in a good way. Of course, not all illness is necessarily intrinsically bad. Illness, as with all words, has ambiguous borders. For example, being deaf is usually considered an illness, but there is a growing movement saying that it is not. Despite such edge cases, the “core” instances of illness are bodily experiences that are undesirable and intrinsically bad, whether from preventing normal participation in society to direct and unwelcome pain. You have but to recall your own experiences of being sick to confirm this fact. Potential flaws Environmental context Being blind in Nigeria is worse than being blind in the Netherlands. This is because in the Netherlands there are support systems in place, such as braille and audiobooks. Most of the blind in Nigeria do not get this support. This means that the disability weighting that is assigned to blindness should be different in Nigeria and the Netherlands. However, in the interests of fairness, DALYs are equally weighted everywhere. This is not a deal-breaker as all social metrics are slightly flawed. It is simply something to keep in mind. Additionally it is still probably the case that treating blindness in Nigeria is a lot less costly than treating blindness in the Netherlands. What about non-health issues? DALYs do not take into account the pain of social ostracization. For example a cleft lip simply means your face came out a bit wrong and your lip looks different. It can make it slightly harder to eat and talk, but compared to cancer or a heart attack it’s not so bad, so the disability weighting is very small. What that misses out on is that in many cultures people with cleft lips are ostracized from society. They are not accepted by their family, they are thought to be cursed, and they can find it impossible to make friends or to have their own family. This is devastating. Social isolation is one of the saddest, most lonely things that can happen to a human being. This is an important issue about DALYs, but you can make up for it by making disability adjustments yourself based on the specifics of your values and the context you will be working in. Despite these issues with the DALY, it is much more widely used than affect balance or life satisfaction, and it is decidedly better than most other alternatives. We recommend using it if you are considering health interventions.

WOULD YOU FEEL SICK FOR A YEAR TO DOUBLE YOUR SALARY THE NEXT?

When comparing interventions, sometimes there will be a clear ‘winner’ in terms of happiness, but often you will have to make trade-offs between it and other metrics. For example, one intervention might double somebody’s income for a year, whereas another, with the same amount of resources, would prevent a year of blindness. Which is better? There’s no objective answer to this (yet), but a good interim solution is to think about your own personal trade-offs. Look at the example questions below and come up with your own to figure out what your personal trade-offs between health, happiness, and income would be. Money to Health Trade-off Examples: (All of these examples are of health issues that score around 0.5 DALYs, so you can calculate your DALY to income trade-off with these numbers. Make sure to measure income based on how much your income increases relatively, not absolutely, as happiness increases for every doubling of income, not per every dollar increased linearly).

How much extra money would you have to be paid to live with Parkinson’s disease for a year?
How much money would you pay to not live with AIDS for a year?
If you had to have your legs amputated for the last year of your life, where otherwise you’d be in perfect health, how much extra money would you have to spend to make up for the loss?

Is income a useful metric?

Money is a means to an end, not an end in and of itself. It is a known fact that money does not guarantee happiness, as countless stories of celebrities and billionaires can attest to. In fact, you would be forgiven for believing the frequently recited maxim that “money doesn’t buy happiness.” However, that statement is too simplistic. The literature suggests that the relationship between income and life satisfaction is one of diminishing returns ( p.3, p.4, p.10, p.12) . In other words, an increase in income is correlated with an increase in life satisfaction, but that increase gets smaller and smaller the richer you get. A nice heuristic to use is doubling income increases subjective well-being by one point out of ten (p.7). So, the poorer you are, the more earning extra income matters to your happiness. For example, if you are struggling to feed your family on $300 a year, and you get a job that pays $600 a year, allowing you to never go to sleep hungry, this will change your day-to-day well-being substantially. However, if you’re already making $70,000 a year, and then you get a small bonus, bringing your total annual income to $70,300, this will have virtually no impact on your happiness whatsoever. This might be why the expression is so common in the developed world, because most people are at a level of wealth where adding to it does not substantially affect them.

In summary, absolute income is not a good metric, but a relative increase compared to baseline is. It has flaws like all of the others, but is a metric well correlated with well-being.

Is education a useful metric?

Is education an inherent good?

Many charities decide to focus on increasing the number of years of school kids attend. But why is education useful? Most people don’t stop to think about why they value education or what’s the underlying reason why they spend years in the classroom. If you ask someone why they think high-schoolers should finish school, they might say “getting a high school diploma allows you to go to college,” which just begs the question.

Keeping children in the classroom for as long as possible isn’t a good thing in itself. In fact, the number of years a child has spent in education tells us very little about how much their well-being has improved or how much they learnt. Several years in a very poor school might not be equal to a single year in a good one. Several years learning something that will not help in life outside of school is not nearly as valuable as several years spent learning the most relevant possible content. Because of this, we have to ask ourselves, what is the point of education?

Future Income

Education often increases long-term income and opens up more enjoyable job opportunities. We’re getting closer to the heart of why we care about education, but as we discussed earlier in this chapter, it’s a long causality chain and thus runs into problems. In fact, it is not clear whether education does indeed lead to higher income.

Unfortunately there’s mostly only observational data on the issue, and studies rarely control for important explanatory variables. For example, those who go on to complete high school or tertiary education often come from higher socioeconomic backgrounds in the first place. Maybe rich adults became wealthy simply because they had rich parents. Alternatively, maybe those who have high innate cognitive abilities or high amounts of grit stay in school longer because of these qualities. Of course, these abilities also make them much better employees, so they are more likely to be promoted to higher-paying jobs.

Additionally, it can vary enormously depending on the economic climate. If you get very educated but there is not enough jobs requiring academic qualifications around, you may be just as poor off as before, but with more student debt. In fact, if you take into account the costs of education and the foregone income during those years, some studies have found that it might actually hurt your lifetime earnings.

This is why we do not recommend years of education as a metric to measure your intervention by. It is better to simply measure the increase in income resulting from the education, or better yet, the increased happiness resulting from said income. This prevents the issues of working with a metric that relies on long causality chains and makes sure you are actually making a positive difference in people’s lives.

Is IQ a useful metric?

Is IQ correlated with well-being?

IQ has the potential to cut both ways, to give you more advantages in life, but also to make you feel isolated from many of your peers and feel more keenly the big problems in the world. It is then not surprising that studies have found no or small connections between childhood cognitive ability and later life satisfaction when controlling for childhood socioeconomic status (source, source).

Is IQ correlated with income?

Even if IQ doesn’t matter much for individual well-being, it might still be a useful metric if it’s correlated with income (source, source). It seems intuitively plausible that a higher IQ might mean increased educational ability, better access to jobs, increased job performance, and thus higher income.

Some evidence does suggest that IQ is correlated with income . However, the returns to IQ appear to be lower for manual jobs (e.g. farming) than non-manual jobs (e.g. banking). In many developing countries, a large percentage of the rural poor work in agriculture (source, source), so increasing the IQ of those people will not be as likely to increase their income. This bears out in many studies, finding that IQ has low or no relationship with income in the developing world (For example, Jolliffe (1998) found that cognitive skills raised non-farm income and total income in Ghana, but not farm income, which suggests that education may not be as helpful to the many of those focused on agricultural jobs. Additionally, Vijverberg (1999) found that the effects of education on the self-employed in Ghana were weak or nonexistent).

What about intellectual disability?

Some argue that a gain of four IQ points would be enough to allow some people to function independently, as opposed to needing special assistance, and that gains in IQ would reduce the prevalence of intellectual disability (source). Likewise, establishing basic literacy and numeracy in disabled children would have a sizable impact on their employability and future income potential (source, source).

But would reducing intellectual disability have any effect on subjective well-being? A survey in England found that mean overall SWB was only slightly lower among the intellectually disabled, as compared to the rest of the population (source), and a survey in Kentucky produced similar findings (source). Of course, these data could be wildly unrepresentative of experiences in other countries, depending on the culture and quality of care institutions.