Apply research skills to new data, transform developmental effectiveness


The following article is authored by Arianna Legovini.



For most of our history, data has been scarce and expensive to obtain, and economic research has advanced at a slow pace, especially in data poor countries (The World Bank, 2021). The high cost of impact evaluations, for example, has mainly been determined by the cost of collecting data. This is rapidly changing because of recent advances in technology. Technology, like smartphones, mobile networks, e-services and e-government, has changed the process through which data is generated and has created massive amounts of real-time transaction level data, a phantasmagorial 1.145 trillion megabytes per day. It is hard to fathom that more data has been generated in the last two years, than in all human history combined.


Even though data use is fast expanding to tackle anything from profit maximization to optimizing athletic performance, most data is unused especially in the public sector and in low-capacity environments. The scarcity we now face is of a different nature: there are limited high-quality research skills and institutional capabilities to make sense of all the data and turn it into a resource for human progress.


Building knowledge takes time and evidence, especially in development economics, and is scarce and hard to get. Realistically, evidence only informs a tiny proportion of the policy decisions being made every day. The question is whether we can substantially increase the speed of generating evidence by taking advantage of what we can safely call a data revolution. Even before the COVID pandemic, researchers were investing in innovative tools to leverage the explosion in non-traditional sources of data and apply them to addressing development problems. COVID-19 stimulated a flurry of innovation by heightening the demand for understanding the impact of the crisis and shaping policy response targeted to differently affected populations and sectors of geographies. Investments in real-time data tools made before the crisis were quickly repurposed.


One of the questions asked before the crisis was, can an openly available dataset like Twitter be transformed into a resource for urban planning and development? And can bystanders’ reports be used to create location data for urban events like car crashes, scarce in most developing countries but essential for addressing the number one cause of mortality for children over five and young adults. To do so, we at the World Bank scraped 874,588 traffic related tweets in Nairobi, Kenya, developed machine learning algorithms to identify and geolocate crashes, and deployed a motorcycle delivery service to verify crashes and locations. The exercise helped identify the one percent of the road network that hosts 50% of the crashes allowing road safety authorities to prioritize road safety investments to where they matter most (Milusheva et al., 2021).



Legovini_Picture 1.png

Image 1: Nairobi Road Safety Data Action Plan



By merging the data with administrative records, uber, maps, google and weather data, we could create a high frequency real-time resource for urban planning. During COVID, the data was used to monitor the effect of curfews on urban mobility and car crashes and measure the decrease in traffic and the peaks of crashes and mortality around curfew times as people rushed to return home.



Legovini_Figure 1.1- Crash Data Based on Curfew in Nairobi.png


Legovini_Figure 1.2 - Crash Data Based on Curfew in Nairobi.png

Figure 1: Crash Data Based on Curfew in Nairobi


Similarly, making investments in data integration to create an unusually rich data set for the country of Rwanda, including tax data from electronic billing and crowdsourced information on market prices (Byrne, 2021) and quantities allowed us to monitor sectoral heterogeneity in economic impact of the initial and recovery stages of the COVID crisis on both workers and firms. The information was critical to help shape government recovery policies.



Legovini_Figure 2- Market Prices and Road Connectivity in Rwanda.png

Figure 2: Market Prices and Road Connectivity in Rwanda



Visualizations speak to people. Whether it is a map of satellite night lights, a live map of the speed at which vehicles move in a city, or a representation of the flow of homeless people from city to city in the US, visualizations of data are watched, downloaded, shared.  Sometimes they are even put on exhibit at the Museum of Modern Art like the maps of prison admissions from District 16 in Brooklyn by Sarah Williams, an MIT professor and a co-author in our Nairobi study.


It is the very process of querying and transforming data into information, especially actionable information, that gives data meaning and purpose. The greater the strategic value of the questions asked, the more research can itself add value. The greater the scientific rigor used to get to the right answer, the more we can move the decision needle in the direction of human progress. In development economics, structures like World Bank’s Development Impact Evaluation (DIME) that create collaborations between highly skilled researchers and policy makers are the necessary institutions for surfacing and answering questions to help addressing the most pressing development problems. These collaborations focus on co-generating the evidence and building the local capacities that turn answers into policy action.


This requires building trust, the first step of which is to help governments understand how they can use the data their administrative system generates to do better. Recent investments in e-government systems are generating massive amounts of transaction-level data that can be used for research. DIME’s Data and Evidence for Justice Reform (DE JURE), leverages investments in smart courts and associated data systems to increase the economies of scale of knowledge generation on the quality and efficiency of judicial proceedings (Legovini and Ruth Jones, 2020). In Rwanda, the use of electronic receipt data was used to help the tax administration monitor economic activity during the crisis and tailor recovery policies.



Legovini_Figure 3- Electronic Receipt Data in Rwanda.png


Figure 3: Electronic Receipt Data in Rwanda



The questions shape the content of the data we need. For the most part, data is not ready for use. Identifying sources, obtaining permissions, integrating data from different sources, triangulating and ground-truthing to understand biases in coverage and representativeness are necessary steps toward developing an understanding of what data can be used for and interpreted results. A high level of specialization in technical skills and multidisciplinary teams that combine economics with data science and sector-specific knowledge must be deployed. Collecting primary data to supplement existing data requires more human and financial resources.


The effort for properly preparing and synthesizing data is worth its while only when a team of researchers is then able and ready to use the resulting datasets to extract information of high value and empowered to impose exogenous variation on the data generation process to narrow down causal pathways. Further, only when policy counterparts have the interest, understanding, and power to put the evidence into action.


The way this should be done at a scale great enough to solve the universe of policy problems in the world is unmapped. The main constraint for scaling up a data and research-driven approach to make a transformative difference in development policy is lack of financing. The resources for research are minute. At the World Bank for example, we are spending 1% of project financing on impact evaluation for only 1% of those projects. This compares to the returns that impact evaluation research has on the returns to investment of those projects. In Mozambique, a research solution to water usage reduced water scarcity in half (Christian et. al., 2018). In Ghana, the research optimized farmer incentives for tree planting and increased project effectiveness by eighty percent (Legovini, 2018). In Senegal, an analysis informed regulatory reform decreased court delays by a third (Kondylis and Stein, 2018). In Cote d’Ivoire, a research informed retargeting of a social protection program increased effectiveness by seventy percent (Bertrand et. Al., 2021).




Bertrand, M., Crépon, B., Marguerie, A. and Premand, P., 2021. Do Workfare Programs Live Up to Their Promises ? Experimental Evidence from Côte d’Ivoire (English)Policy Research working paper;no. WPS 9611; Impact Evaluation series. World Bank Group.


Byrne, K., Karpe, S., Kondylis, F., Langb, M. and Loesera, J., 2021. Sectoral heterogeneity in the Covid-19 recovery: Evidence from Rwanda. Shaping Africa’s Post-Covid Recovery.


Christian, P., Kondylis, F., Mueller, V.M., Zwager, A.M.T. and Siegfried, T., 2018. Water when it counts: reducing scarcity through irrigation monitoring in Central Mozambique. World Bank Policy Research Working Paper, (8345).


Kondylis, F. and Stein, M., 2018. The speed of justice. World Bank Policy Research Working Paper, (8372).


Legovini. A., 2018. Maybe Money does Grow on Trees.[online] World Bank Blogs.


Legovini, A., and Ruth Jones, M., 2020. Administrative Data in Research at the World Bank: The Case of Development Impact Evaluation (DIME), in: Handbook on Using Administrative Data for Research and Evidence-Based Policy. Abdul Latif Jameel Poverty Action Lab, pp. 503–549.


Milusheva, S., Marty, R., Bedoya, G., Williams, S., Resor, E., Legovini, A., 2021. Applying machine learning  and  geolocation techniques to social media data (Twitter) to develop a resource for urban planning.


The World Bank, 2021. Data for Better Lives. World Development Report 2021.




Arianna Legovini is the head of the World Bank’s Development Impact Evaluation (DIME) department.


The author is responsible for the facts contained in the article and the opinions expressed therein, which are not necessarily those of UNESCO and do not commit the Organization.