Big data for development: bridging the gap between machine learning and human learning



Co-hosted with 3ie

Topic: Big data for development: bridging the gap between machine learning and human learning

Date: 3 March

Big data presents new opportunities for innovative research in international development. In this webinar, a panel of experts will draw on real-world applications in this area to reflect on key questions for the field, such as: Is the term “big data” even useful? When do the benefits of big data outweigh its blind spots? Will low-touch measurement become the new normal? 


Douglas Glandon, 3ie

Alessandra Garbero, IFAD

Michael Bamberger, Independent Consultant

Maria Ruth Jones, DIME, World Bank


Tom Wilkinson, Head of Data Science, International Development, FCDO

Presentations from the Webinar:

Douglas Glandon’s Presentation

Maria Ruth Jone’s Presentation

Resources of use:

Blogpost mentioned on four principles for data quality checks

3ie Big Data Systematic Map

Questions and Answers Answered within the Q&A function:

Is Big data the solution to managing the -problem of lack of data for evidence and public service delivery improvement especially in Africa?

An opportunity, yes, but not a “solution”, per se. The growing ability to gather, process and analyze diverse data sources (e.g., administrative data on public services outputs, citizen reporting, satellite data for certain types of programs, etc.) can help fill in some gaps in the evidence base. At the same time, as we’re discussing, big data sources can also present a misleading or incomplete picture if the ‘blind spots’ are not taken into account. For these reasons, it remains important to consider mixed methods approaches to “ground truth” the variables estimated with big data sources and to understand contextual factors and mechanisms related to program implementation.

Bamberger mentions larger samples and population coverage and comparison of groups. Which forms of big data are appropriate? Social media and mobile phone data are not really samples of “people’.

This is one area where satellite data can be very useful, given that it is often possible to obtain high resolution imagery over geographic areas much larger than can be included in conventional data collection methods (e.g., household surveys). There are a growing number of measurement and validation studies that have demonstrated how satellite imagery can be used to estimate social variables of interest (e.g., estimating local economic activity using a combination of daytime and nighttime light). This can also be used to identify potential areas that can be used as a comparison for intervention areas, e.g., based on matching a variety of observable characteristics. One obvious drawback is that satellites will be much more useful as a data source for certain types of interventions (e.g., infrastructure development, forest management, agricultural productivity) than others (e.g., worker training programs, women’s empowerment, etc.).

Social media and mobile phone data can still be useful, but it is critical for researchers to understand and articulate the characteristics of the group represented by the data (and those not represented).

Really interesting work presented by Alessandra – what training data was used to identify SDG targets? Was this only IFAD documentation, and was the training set manually classified? (And does the model have to go via Goals to get to Target level?)

The model identify targets and goals based on separate taxonomies using IFAD documentation. Human validation is conducted to examine accuracy, ex-post.

The ATENA project is impressive. Does the system also summarize causal mechanisms of interventions if they are reported in the text? Would it be helpful to have this feature?

The causal mechanism should be in the lessons learned, if identified in project docs

Can you say more about the ethical considerations around using big data in fragile and conflict affected states which are an ever increasing focus of development. Particularly in terms of the conflict and gender sensitivity of using big data, and negative impacts of using it#

You are totally right as there is algorithmic bias and specific groups may suffer from exclusion.