By Louis DiPietro
“The best thing about being a statistician is that you get to play in everyone’s backyard.”
These timeless words from John Tukey, one of the field of statistic’s most distinguished scholars, have crystallized into an axiom among modern data scientists like Y. Samuel Wang. The newest faculty addition to Cornell University’s Department of Statistics and Data Science, Wang references the oft-cited quote when articulating what interests him most about statistics.
The diversity of problems that statistics can help address, the applicability of statistics with literally any other field – these are what ultimately drew Wang to becoming a statistician and, as of this fall, an assistant professor in Statistics and Data Science (SDS).
“Statistics is used everywhere and in every area of scientific inquiry,” said Wang, a scholar in graphical models and causal discovery. “We’re not tied to any scientific problem. We can be experts in data.”
Wang is one of two hires made by the department this recruiting season, along with Dana Yang, most recently a post-doctoral researcher at the Fuqua School of Business at Duke University. Wang and Yang are among the 14 new faculty added this fall to the department’s home college, the Cornell Ann S. Bowers College of Computing and Information Science. Wang arrives to Cornell after completing a post-doctoral appointment at the University of Chicago’s Booth School of Business. He earned a PhD in Statistics at the University of Washington in 2018 and worked as a management consultant prior to pursuing a doctoral degree.
After earning his bachelor’s degree in applied mathematics and economics at Rice University, Wang got an initial taste of what statistics was and what data could inform while working in industry. As a management consultant, he said it was in the nature of the job to be dropped into assorted projects and tasked with finding solutions.
“I was often tasked with discovering patterns in the data and unlocking insights that could lead to business recommendations. Initially, you’re not expected to have any expertise in a specific business domain, but you do have to be curious, hardworking, and able to analyze solutions,” he said. “We would start with a basic analytical framework, and adapt it to the specifics of the industry or company we were working with. When done well, statistics can be similar.”
Statisticians typically think about the world in a probabilistic framework, Wang continued, but when analyzing data, the particular context in which the data was generated shapes the specific tools and methods used.
"Statistics is used everywhere and in every area of scientific inquiry. We’re not tied to any scientific problem. We can be experts in data."
His experience working with data spurred the decision to pursue doctoral studies in statistics at the University of Washington, where he was advised by Mathias Drton.
Wang’s primary research area is graphical models and a sub-field called causal discovery. Practitioners of causal discovery take observational data with multiple variables – say, proteins in a cell, impulses measured in different areas of the brain, or even different stocks in the stock market – and test to see which variables might have a direct effect on other variables.
“Oftentimes, scientists aren’t just concerned with correlations, but want to know what variables have a causal effect on other variables. For instance, if I intervene in a cell on this protein, what downstream effects will that result in?” he explained. “Of course, randomized experiments are the most reliable way to uncover causal relationships.”
Unfortunately, he continued, there are many situations where observational data is the best we can get because experiments are too expensive or infeasible.
“A lot of my work seeks to prove what assumptions are helpful in teasing apart simple correlation from causation when you only have observational data,” Wang said.
His applied interests vary but tend to bend toward the social sciences. One of his most recent research projects looks at gender homophily in the co-authorship of research papers across fields. Why is there a tendency for researchers to co-author papers with individuals of the same gender?
“When observing the scientific corpus, we see that men disproportionately coauthor with other men,” he said. “One hypothesis is that men tend to study the same types of problems.”
For instance, historically, electrical engineers have been disproportionately men, whereas sociology has a relatively higher proportion of women, he said. So if electrical engineers tend to co-author with other electrical engineers and sociologists tend to co-author with other sociologists, we would expect this to explain gender homophily in co-authorships.
“However, our findings suggest that, no, this hypothesis does not fully explain what we’re seeing,” he said.
Wang chose Cornell Statistics and Data Science after an unconventional interview process done completely online due to Covid precautions. From Chicago, Wang completed his Statistics Seminar remotely and interviewed with Cornell faculty via Zoom. He’s yet to see the Cornell campus firsthand as a professor, relying on YouTube videos and distant memories of a childhood visit to campus as a 12-year-old (“I remember the Dairy Bar,” he said).
But despite a fully remote introduction to Ithaca, NY, Cornell University, and Statistics and Data Science, Wang said all come highly regarded, which helped partly inform his decision to join SDS.
“The Stats department has a broad breadth of research that isn’t narrowed to just theory or applied work. There are scholars across the department who do a wide range of theory and applied work,” said Wang, who will remotely lead an introductory course for new SDS graduate students this Fall. “It’s hard to get a real sense of a place through Zoom, but I just felt really comfortable about Cornell and the department. I could tell via Zoom that faculty enjoyed being around each other. It feels like a collegial place, where people are supportive and happy to be in the department.”
While currently working remotely, Wang will be on-campus periodically this semester. His office is in Comstock Hall, room 1186.
Louis DiPietro is the communications specialist for the departments of Statistics and Data Science and Information Science.