The Statistics Seminar speaker for Wednesday, February 5, 2020, is Jonathan Stewart, a Ph.D. candidate in the department of statistics at Rice University, advised by Dr. Michael Schweinberger. His research interests include statistical methods and theory for dependent data in complex systems, focusing on applications in statistical network analysis and network data, which have included social network analysis, brain networks, and most recently HIV-1 transmission networks.
Talk: A probabilistic framework for models of dependent network data, with statistical guarantees
Abstract: The statistical analysis of network data has attracted considerable attention since the turn of the twenty-first century, fueled by the rise of the internet and social networks and applications in public health (e.g., the spread of infectious diseases through contact networks), national security (e.g., networks of terrorists and cyberterrorists), economics (e.g., networks of financial transactions), and more. While substantial progress has been made on exchangeable random graph models and random graph models with latent structure (e.g., stochastic block models and latent space models), these models make explicit or implicit independence or weak dependence assumptions that may not be satisfied by real-world networks, because network data are dependent data. The question of how to construct models of random graph with dependent edges without sacrificing computational scalability and statistical guarantees is an important question that has received scant attention.
In this talk, I present recent advancements in models, methods, and theory for modeling networks with dependent edges. On the modeling side, I introduce a novel probabilistic framework for specifying edge interactions that allows dependence to propagate throughout the population graph, with applications to brokerage in social networks. On the statistical side, I obtain the first consistency results in settings where dependence propagates throughout the population graph and the number of parameters increases with the number of population members. Key to my approach lies in establishing a direct link between the convergence rate of maximum likelihood estimators and scaling of the Fisher information matrix. Last, but not least, on the computational side I demonstrate how the conditional independence structure of models can be exploited for local computing on subgraphs, which facilitates parallel computing on multi-core computers or computing clusters.