Parachutes, apples and the overstated limits to generalisability

The Nobel prize-winning economist, Angus Deaton, has criticised the development impact community’s insistence on randomised controlled trials (RCTs) as the only source of credible evidence for interventions, declaring: we don’t need an RCT to know we’d rather be wearing a parachute when jumping out of plane. In fact, some years earlier a tongue in cheek systematic review had been published in the British Medical Journal on the effectiveness of parachutes in preventing injury. But it was an ‘empty review’. The authors could find no RCTs and so were unable to conclude whether parachutes are effective or not. But now there is such an RCT.

This week BMJ publicised a study in which volunteers had been randomly assigned to jump from a plane with and without a parachute. Unfortunately volunteers could only be found to jump out of a small plane whilst it was on the ground: that is, a jump of a few feet. The authors warn that their finding that no trauma is experienced from jumping out of a plane without a parachute may not apply to planes flying at 40,000 feet.

Meanwhile, in a more policy relevant example, Nurse Family Partnership is a home visitation programme for mothers from disadvantaged backgrounds.  A nurse visits the mothers to give childrearing advice. There are three RCTs from different cities in the United States showing programme effectiveness. So NFP is promoted as an evidence-based programme. However a study from the UK found no effects from NFP. A plausible explanation is what happens to the control group. The control group gets ‘usual services’. Usual services on childrearing advice for disadvantaged mothers in the US are as good as nothing. But in the UK the National Health Service provides free antenatal care, which includes child care advice, free hospital delivery, and, once the mother goes home with the child, a health visitor visits once a week to monitor the child’s development and advise the mother. Sounds a lot like Nurse Family Partnership. So it is not surprising there is no difference between outcomes in the treatment and control groups.

A lot of attention is being paid to generalisability at present. It is claimed that the value of studies is undermined by the limits to generalisability. But both these examples seem convincing cases of generalisation. In one case we conclude the results won’t apply to real life examples. In the second case we can explain why a programme which appears effective in one place is not in another. But the two examples show that perhaps generalisability isn’t such a big deal. We rely on generalisability all the time in our daily lives. Wherever in the world I am, when I turn on a tap I expect water to come out. If I hold up an apple and let it go I expect it to fall to the ground.

I use the apple example deliberately. Newton’s principle was that if we have seen something work in one place we should assume it will work everywhere. Newton went from the apocryphal apple falling on his head to laws of physics which apply across the universe. That seems a big jump. But it is the basis for much scientific progress over the last 300 years and seems to be working out pretty well for us. (This example comes from Pedro Domingo’s discussion of teaching computers to generalise in his excellent book on machine learning, The Master Algorithm).

Generalisability is a central concern of the DFID-funded CEDIL programme. The paper by Calum Davey and colleagues from the CEDIL consortium makes the distinction between generalisability and transferability. A generalisable finding is one which is universally true. A transferable finding is one which can apply in other contexts but may not be universal.  They propose using mid-level (or mid-range) theory as a way of assessing transferability. Similarly, researchers at JPAL have proposed a generalisability framework based on an analysis of the mechanisms in the theory of change.  A slightly different approach has been proposed by the Campbell-supported TRANSFER project by a team from the Norwegian Institute of Public Health, who proposed a stakeholder-based approach to identify and test the conditions for transferability in systematic reviews.

We should not rule out the possibility of generalisable findings. Two stylised findings support the argument that parenting programmes will be effective in improving child development outcomes. First, stimulation of babies and infants promotes child cognitive development. This is a biological fact. Second, parents around the world believe there is no point in talking to babies since they can’t talk back. This is fairly universal stylised fact from anthropology.  So programmes which get parents to talk to and provide other stimulation to babies and infants will work.  International Rescue Committee took evidence of programme effectiveness in developed countries to implement similar programmes in developing countries. The effectiveness of these programmes has been demonstrated with RCTs in Liberia, Burundi, and amongst Burmese migrants.

One of the greatest success stories of transferability has been the spread of conditional cash transfers (CCTs). Following their demonstrated success of CCTs in Mexico and other Latin American countries, CCTs are now common across much of the developing world, with further studies confirming their effectiveness in countries as diverse as the Philippines and Zambia.

We would like more examples like this, so it matters a great deal to determine which findings are transferable or generalisable. The Centre of Excellence for Development Impact and Learning (CEDIL), a DFID-funded Centre established to innovate and improve evaluations, is issuing a request for proposals which invites research ideas to develop mid-level theories for development interventions, and offers funding for impact evaluations and systematic reviews which will test these theories. Details of the RFP can be found on the website funding page. Please take a look and distribute it to others who may be interested.

Howard White

CEDIL Research Director

December 2018

One Response

  1. I agree that some limitations on generalizability are overstated, but there is a critical difference between parachutes and apples and the evaluations of a complex goal-directed intervention like parenting programmes and CCT. In the former case, there are remarkably few degrees of freedom, and no confounding factors that could differ between the initial evaluation and the subsequent implementations. While RCTs can control for many things, some problems that limit generalizability are still present, and critical. As a couple of obvious reasons this is true, the sites for an RCT are rarely chosen at random from the set of sites that the intervention could be applied to, and the most promising reported RCTs are a biased sample from the set of RCTs performed.

    Given this, despite agreeing that we should rely on RCTs to a large extent, and agreeing that Deaton is wrong is his simplified and sarcastic responses, I think it behooves us to be a bit less dismissive of the criticisms of over-reliance on RCTs.

Leave a Reply

Your email address will not be published. Required fields are marked *