Daniel Fink is an applied statistician and senior research associate at the Cornell Lab of Ornithology, where he directs research for the eBird Status and Trends project, a source of high-resolution global biodiversity data describing where bird populations occur and how they change through time. Much of his work has focused on using statistics and machine learning to model complex spatiotemporal ecological signals from volunteer collected observational data, often in a production-scale environment.
Talk: Estimating Population Change with Citizen Science Data
Abstract: Information on species’ distributions and abundances and how they change over time are central to the study of wildlife populations and their conservation. For many taxa, this information is challenging to obtain at relevant geographic scales. Birds, however, are conspicuous, occur in all habitats, and are enjoyed by tens of millions of people around the world. Begun in 2002, the citizen science project eBird was designed to harnesses the excitement and passion of birdwatchers to help fill these information gaps. Today, eBird engages more than 1 million volunteers worldwide to function as an on-the-ground avian sensor network. The eBird database currently contains over 1.6 billion bird observations, providing an unparalleled source of fine-scale, year-round biodiversity data.
These species-observation data have great potential for use monitoring populations and identifying drivers of population change across broad. However, to realize this potential requires methods that can 1) account for heterogenous patterns of population change that arise when multiple drivers (e.g. change in land use and climate) affect species populations simultaneously, and 2) control for the many potential confounding sources of variation common in citizen science data sets.
In this presentation we investigate the use of Causal Forests and the R-learner, two-step algorithms designed for heterogenous effect estimation in observational studies using off-the-shelf statistical and machine learning models. In the first step nuisance functions, e.g. propensity scores and marginal outcomes, are estimated to isolate the effect of interest, here, population change. Then in the second step population change is estimated by conditioning on features to capture sources of heterogeneity. The algorithmic flexibility of this approach is attractive because models can be selected to meet inferential objectives, e.g. using machine learning model and large feature sets to estimate complex propensity scores or using statistical models to investigate hypotheses about population change.
To illustrate the approach, we estimate the trends in population abundance for a variety of species using eBird data. We use a simulation study to evaluate the empirical performance estimating spatially varying trends in the face of real-world confounding. Then we use eBird data to estimate spatially explicit trends in species abundance and study recent changes in the population of North American birds. Finally, we will discuss some outstanding ecological and analytical challenges.