I have a number of research areas and active projects. For a complete description, my CV and links to publications, please see my website at the Department of Computational Biology.
As a brief description, I am interested in
Machine Learning: In particular, I focus on the use of heuristic predictive algorithms within more classical statistical methods. Many successful methods in machine learning produce "black box" models that make accurate predictions but cannot be readily interpreted. I am interested in using these methods to understand questions like "Which variables are important in this prediction?" and "How do these two variables to combine to produce this prediction?" I want to both give answers to these questions and to provide measures of the strength of evidence behind these answers.
I am also interested in using these methods within statistical models; using them to predict things that we can only indirectly measure. As an example, in my work with the Laboratory of Ornithology we would like to predict bird migration, but can only observe where birds are, not their migratory movement.
Nonlinear Dynamics: While statistical models are generally built to explain observations and involve linear models or variants of them, much of applied mathematics has been developed through first-principles modelling, producing dynamic systems described by ordinary differential equations along with more complex models. I am interested in developing statistical techniques to interface these dynamical models with data. The problems in this field include parameter estimation, providing confidence intervals and tests of parameters, assessing goodness of fit and the means for improving it and designing experiments for systems that are governed by dynamic systems.
Robust Statistics: I work on a class of models called disparity estimates. These involve first estimating a non-parametric version of the model and then comparing this estimate to a parametric description. The extra smoothness that you gain from the non-parametric estimate allows you to use a comparison metric -- Hellinger distance is the best know of these -- that makes your parameter estimates insensitive to outlying data points without giving up statistical precision. These proceedures can be readily examined in simple cases (univariate, i.i.d. data); my research aims to extend these methods to models that commonly used; regression, generalized linear models, time series, random effects models etc.
Functional Data Analysis: Functional Data describe high-resolution measurements of repeated processes: think motion capture data of several people performing the same activity (walking, reaching, writing...) or of one person repeating it numerous times. There are many other areas of application and I work on satelite imaging data and vehicular emissions as particular examps. I maintain the fda library in R providing tools to analyze such data. I also develop new methods including for latent functional data, functional data that is measured over spatial domains and regression models to predict an outcome from functional covariates.
Consulting and Other Applications: In addition to the areas above, part of my job involves statistical consulting to the Cornell community and I have been involved in a wide range of applied problems. I have also worked on Item Response Theory and its applications to analyzing web browsing behavior, educational testing and medical diagnostics.
Please see my webpage at the Department of Computational Biology for an up to date list of publications. You might be interested in:
- Functional Data Analysis in R and Matlab, with James Ramsay and Spencer Graves on the fda tools.
-
For a bit of amusement, some creativity from long ago: "The Stanford Statistics Songbook: A Musical Tribute". Technical Report, Department of Statistics, Stanford University with Armin Schwartzman and Matthew Finkelman.