Orazio Attanasio and Elisa Cavatorta | 8th August, 2022
To evaluate interventions effectively, it is important to understand the mechanisms that generate the intervention’s outcomes and, to achieve that, to look carefully at what one is measuring and what it tells about the determinants of individual behaviour and outcomes, write Orazio Attanasio and Elisa Cavatorta
Measurement is crucial to evaluation; without measurement it is not possible to quantify the effects of a policy intervention.
But the importance of measurement goes beyond this obvious point; establishing what should be measured and how it should be measured is key to good evaluations. When evaluating an intervention, the temptation is to measure only the specific behavioural outcomes of interest in a given evaluation exercise. Often this temptation is coupled with the inclination to use established measures, sometimes developed in very different contexts. These temptations should be resisted in favour of a detailed appraisal of the drivers underlying those outcomes and towards measurement innovation. A proper evaluation exercise goes beyond estimating the effects of an intervention and identifies the mechanisms behind a given set of outcomes.
A deep understanding of the broader behavioural process that gives rise to the observed behavioural changes is important; only then can the evaluation be used to improve an intervention, if necessary, and inform future ones. The drivers of behavioural outcome can be many. They may be elements of subjects’ decision-making process such as subjects’ preference, their constraints, their information, their expectations about the future, cultural and social norms. Behavioural outcome may also be driven by characteristics of the individuals targeted by the intervention, and their interaction with those around them, and by characteristics of the environment in which the subjects live in.
Overlooking behavioural drivers can mask side effects
The cost of overlooking the drivers of observed outcomes is that we may stay ignorant of important side effects of the intervention. Take a simple hypothetical example: what if parents of children who receive breakfast at school reduce the food they provide to them at home, and provide more nutrition to their siblings? We would not know unless we are able to develop a theory on the decision-making model of parents. These problems can be enhanced when the intervention is scaled-up.
Striving to develop better measurement of behaviour and its multiple drivers is key for overcoming these problems and developing flexible models that are useful to policy.
A perfect measurement does not exist, yet, at the core of measurement innovation are strategies to engineer data collection to tackle the most common causes of inappropriate measurement directly. These include traditional measurement errors (which is usually referred to as errors-in-variables problems and refers to the fact that attempts to measure a certain quantity are subject to error), the ability to measure factors not directly observables (e.g. beliefs about future, preferences, cognitive biases) and the ability to make meaningful interpersonal comparisons across individuals (in so far as aggregate statistics are of interest).
Disentangling beliefs and preferences
One example that illustrates well the gains of measurement innovation is the elicitation of beliefs and their distinction from individual preferences. This endeavour has broad interest in all social sciences, and it is especially important in evaluation work. Take, for example, early child development interventions that have been shown to increase parental engagement with their children (Attanasio et al, 2021). We can ask: why did parents increase their investment? One possibility is that the intervention changed parental beliefs about the process of child development and about the usefulness of parental investment in this process. Alternatively, it is possible that the intervention changed parental preferences, for example, in changing the parents’ enjoyment of spending time with their children. Clearly the measurement of such beliefs (and preferences) can be key to understand the mechanisms behind the intervention’s outcomes and, eventually, to change its design and focus.
Consider another example from the field of conflict resolution (Cavatorta and Groom, 2021); what drives the support (or the lack of) for peace agreements in ongoing conflicts? One possibility is that the willingness to support peace agreement depends on the perceived returns from peace agreements; support might be low because the perceived returns on the things that matter to people are perceived as low. These explanations can be unpacked if measurement can separate what “matters” to people and the perceived benefits of these outcomes.
Detailed measurement of beliefs and expectations can be achieved using different techniques. Recent works have evidenced the benefits from ‘counterfactual’ scenarios questions in which the same respondent evaluates different hypothetical scenarios (some of which can be unrealised) and indicates their choices in all set of circumstances. This approach is also called ‘strategic survey questions’ (e.g. Americks et al. (2020), Caplin (2021), Bernheim et al. (2022), Cavatorta and Groom, (2021)). Because these measures ask directly about what might have happened in different contingencies, they can reveal preferences that are otherwise hidden but would be visible if different states of the world existed. These types of predictions are at the heart of what different models of individual behaviour aim to do. These measures then allow theories to be tested against data, instead of making untestable assumptions.
Measurement innovation also looks at how we ask questions. One example concerns the widespread use of Likert Scales to measure likelihoods of events in traditional surveys, where respondents are asked to assess whether certain events are more or less likely to occur or report their preferences for certain situations. These measures suffer from interpersonal comparability problems, as different people have different interpretation of what is ‘likely’ or ‘unlikely’ or whether they like something a little or very much. Recent work has evidenced how probabilistic assessments can offer improvements without increasing respondents’ confusion (Caplin, 2021).
The need to improve measurement is ubiquitous in many disciplines and its process requires multi- and cross-disciplinary input. What to measure, how to measure and how to create new measures is a discussion of wide interest that goes hand-in-hand with the development of new, more informative, theories of human behaviour.
Cover image: catd_mitchell