Measuring the effect of COVID19 lockdowns on air quality and the impact on child health using Machine Learning

Subho Majumdar
8 min readDec 28, 2020

Introduction

This is a three-part series by UNICEF and Solve for Good to look at the effect of local lockdowns during the COVID19 crisis on air pollution levels and child health. The motivation of this series is to provide answers to questions like why young children are particularly vulnerable to the effects of air pollution (Rees 2017). Is air quality data widely available on a global scale, if yes, then at what scale and how feasible it is to utilize this data layer for pollution monitoring? In the absence of global level fine grained air quality data, how can we utilize openly available satellite data sources to build a global model for the estimation of pollutants (using machine learning)? Did COVID19 Lockdown at all contribute to a considerable decrease in air pollution levels? Borneman, 2020 observed considerable changes in NO2 Levels in China (Figure 1) — so, what about the changes in the principal pollutant PM2.5 at a global scale? In a bid to provide an operationalizable answer to these questions, we start our exploration.

Figure 1: Comparison between pre- and post-lockdown NO2 concentrations in north-eastern China (Source: NASA, Borneman 2020)

Objective

UNICEF and Solve for Good have partnered together to analyze various aspects of changes in air pollution — -especially, related with COVID-19 and with the long term goal to build a platform for air pollution monitoring with a strong emphasis on UNICEF’s operations.

  • A model to measure the exposure of children to air pollutant PM2.5 which exceeds the WHO Standard in the present COVID Scenario.
  • Understanding the air quality level around the global with the target of attaining a fine grained estimation from the combination of Ground value measurement from openly available ground sensor measurements and Remote Sensing.
  • Enable Citizen Scientists to delve deeper in Air Quality Measurement globally.

Specifically, we aim to (a) model the impact of COVID19 lockdowns on air quality, and (b) to test whether this has led to an improvement in children’s health. Following the footsteps of existing works like Borneman 2020, Mahato et al, 2020, we hypothesize to find evidence of the positive effects in air quality resulting from large-scale modal shift to low emissions vehicles post lockdown, and develop an exploratory visualization platform for local program managers. The platform will provide functions to manage, analyze and visualize changes in air pollution data at different locations preferably, in countries and cities where UNICEF operates with interests in air pollution monitoring for children’s health.

Why should we care?

Globally, 93% of children live in environments where air pollution levels exceed WHO guidelines. There is a strong link between human health and exposure to high levels of air pollution. Long-term exposure to fine particulate matter with a diameter less than 2.5µm (PM2.5) are estimated to cause ~8 million excess deaths annually, while nitrogen dioxide (NO2) results in 4 million new pediatric asthma cases annually (Venter et al 2020). The impact of air pollution is felt more acutely by the young, with one in every four deaths of children under 5 years is directly or indirectly related to environmental risks (WHO 2018).

However, the distribution of child population does not correspond well to the global distribution of air quality sensors. Illustrating these distributions are panel (a) and of Figure 2 showing the locations of air quality sensors and panel (b) the global concentration of child populations. Therefore using only the locations of ground-source air quality sensors to infer the effects of exposure to high levels of air pollution on global child populations is difficult, particularly in high populous locations such as Western Africa and the Great Rift Valley.

Figure 2a: Global distribution of air quality sensors
Figure 2b: Global distribution of child Population. (data source: Silent Suffocation in Africa, UNICEF-2019)

Further, during the COVID19 lockdowns, fossil fuel consumption has decreased due to lower mobility levels in general, as well as a shift to low-emission modes of transport (such as walking and cycling). This prevents previous models that measure the global distribution of air pollution inaccurate, as they are unable to represent the current changes in air pollution levels due to COVID19 lockdown events (Health Effects Institute 2019).

Figure 3: In locations where air quality data is available, historic data do not capture COVID-19 related air quality trends (source: OpenAQ).

Therefore, the decreased concentration of harmful emission resulting from this has the potential to significantly improve cardiovascular health, for children, who are more vulnerable to the impact of air pollution (UNICEF 2017). At UNICEF and Solve for Good, we wanted to put the question of the children’s health to the global air quality emissions data UNICEF collected during lockdown, to identify if there is a significant improvement in child health with a mass transition to low carbon energy sources for industry and vehicles.

Our Approach

We take a 3 step approach for our analysis.

  1. We develop a large-scale model aimed at providing air quality predictions across geographic regions using widely available data sources,
  2. Develop a geospatial visualization platform to aid in exploration of results,
  3. Fine tune results of the large-scale model using location-specific heterogeneous data sources for more accurate local-level predictions, and correlate results with child health indicators.

We focus on the first two steps in this article series. In the first article we discuss the importance of air quality monitoring, the data sources available, data setup and some exploratory results in the current context. In the second article, we shall discuss details and results pertaining to the global level model. The third article will be aimed at visualization and discussion of results, as well as pointers towards future work on local models, and incorporating child health data.

Data sources

We collect air pollution levels using 1600 air quality stations measuring ground-level PM2.5 concentrations from January 2019 to September 2020 as our target variable. To predict PM2.5 concentrations, we extract weekly averaged time-series values for AOD, NO2, Land Use and Precipitation satellite data from Google Earth Engine. To extract data from google earth engine, we import data from 1st January 2019 to 11th September 2020 from the following satellites:

Table 1: Details on the datasets used.

Data Processing

The satellites collect data at different temporal and spatial granularities (Table 1). To standardize the satellite data, we aggregate the satellite data by weekly average (Figure 4). To generate these statistics, we generated a vrt driver (a format driver for GDAL that allows a virtual GDAL dataset to be composed from other GDAL datasets) helped us process the data faster.

Using the vrt file, we extracted weekly averaged data for each of the cities using their geometric shapes. We utilize GADM Shape files for this purpose to mask the images and obtain weekly satellite variable averages for city shapes. We extracted weekly averaged data at two different geographic levels, at the local level and the city-wide level. At local level, we extract weekly averaged satellite data values using a local mask defined as a 75m buffer around a ground sensor location. We extract city-wide weekly averaged satellite data values using corresponding city masks from GADM city polygons (GADM 2020). In addition to the incorporation of satellite data as features in the machine learning model, we include a number of secondary data sources, such as the covid-19 “Stringency index” across different spatial and temporal resolutions. See Table 1 for details on the data sources.

Furthermore, we preprocessed the PM2.5 station data to ensure we had a reliable training dataset. To do so, we removed PM2.5 values above and equal to 3000, and values below 0. We resampled the PM2.5 values on a weekly basis (Monday Start Day) and average. Using the 75m buffer and the city-wide polygons, we averaged weekly PM2.5 value for a 75m radius around the location point of the sensor, and averaged the weekly PM2.5 value for the city extent if the point is within the city in GADM. Our final preprocessing step was to aggregate the latitude and longitude of a sensor location to four decimal points, and merge the city and country identifiers to the sensor point data.

Figure 4a: This illustrates the ‘local’ extraction of AOD* (aerosol optical depth) from a 75m buffer around a point. *AOD = Aerosol Optical Depth is a well-known proxy for PM2.5 (Kumar et al. 2011).
Figure 4b: This illustrates the “city-level” extraction of AOD for each city globally.

Exploratory analysis

We conclude this article with some exploratory comparisons of OpenAQ PM2.5 readings to contextualize our work. As we see in the two panels of Figure 5, pre- and post-lockdown air quality readings for Lima, Peru replicate known patterns, demonstrating improvement in air quality after lockdown. Note that there is heterogeneity in terms of PM2.5 reading locations — -the sets of air quality monitoring stations we have data for in these two dates are not identical. Comparing observations from two different sources (OpenAQ vs. Drone) in Figure 6, we see a high degree of agreement, with discrepancy at the tails. This discrepancy, and the data heterogeneity mentioned above, are some of the challenges we shall attempt to tackle through our machine learning model.

Figure 5: Ground-level air quality comparison in Lima, Peru: PM2.5 levels (a) before and (b) after COVID19 lockdown. Sources: QAIRA (Unicef Venture Fund), PlumeLab, OpenAQ.
Figure 6: comparison of PM2.5 readings from two different sources: Remote Sensing (OpenAQ) and ground-level (Drone).

We have introduced in this article the problem of developing a global-level machine learning model of accurately predicting air quality — especially in the wake of COVID19 imposed lockdowns — with multiple operational objectives. In the next article, we shall incorporate the bi-level satellite data, and features extracted from the secondary data sources in Table 1 into an XGBoost to model PM2.5 globally. These secondary data sources include additional features such as population density, and 2020 Covid19 lockdown data that are known to significantly affect ground-level PM2.5 concentrations (Hale et al. 2020).

References

Borneman, E. Rebounding Pollution Levels Mark End of Coronavirus Lockdowns. 2020.
Rees, N. Danger in the air: How air pollution may be affecting the brain development of young children around the world, Data, Research and Policy Working Paper, November 2017 (Accessed 22nd November 2020).
GADM data. version 3.6, 2020.
Hale, T.; Angrist, N.; Boby, T. et al. Variation in government responses to COVID-19. Working paper BSG-WP-2020/032, version 10, 2020.
Health Effects Institute. State of Global Air 2019, 2019.
Kumar, N.; Chu, A.; and Foster, F. An empirical relationship between PM2.5 and aerosol optical depth in Delhi Metropolitan, Atmos Environ 2011, 41(21): 4492–4503.
Mahato, S.; Pal, S.; Ghosh, K. G. Effect of lockdown amid COVID-19 pandemic on air quality of the megacity Delhi, India. Science of The Total Environment 2020, 730:139086.
Rees, N. Danger in the air: How air pollution can affect brain development in young children, UNICEF Division of Data, Research and Policy, December 2017.
World Health Organization. Air pollution and child health: prescribing clean air, WHO reference number: WHO/CED/PHE/18.01, 2018.
Venter, Z. S.; Aunan, K.; Chowdhury, S. and Lelieveld J. COVID-19 lockdowns cause global air pollution declines. Proc Natl Acad Sci USA 2020, 117(32):18984–18990.

--

--