Skip to main content
Cornell university
Cornell Statistics and Data Science Cornell Statistics and Data Science
  • About Us

    About Us
    Cornell's Department of Statistics and Data Science offers four programs at the undergraduate and graduate levels. Undergraduates can earn a BA in statistical science, social...

    Welcome to the Department of Statistics and Data Science
    History
    Facilities
    Statistics Graduate Society
    Recently Published Papers
  • Academics

    Academics

    Undergraduate
    PhD
    MPS
    PhD Minor in Data Science
    Courses & Course Enrollment
  • People

    People

    Faculty
    Field Faculty
    PhDs
    Emeritus Faculty
    Academic Staff
    Staff
    Research Areas of Expertise
    Statistical Consultants
  • News and Events

    News and Events

    Events
    News
  • Resources

    Resources

    Professional Societies and Meetings
    Affiliated Groups
    Career Services
    Cornell Statistical Consulting Unit
  • Alumni

    Alumni
    Cornell's Statistics and Data Science degrees prepare students for a wide variety of careers, from academia to industry.  See the After Graduation page for a general overview of...

    Alumni Profiles

Search form

You are here

  1. Home

Jacob Bien

Headshot of Jacob Bien

Jacob Bien is an assistant professor of data sciences and operations at the University of Southern California-Marshall. Dr. Bien's research focuses on statistical machine learning and in particular the development of novel methods that balance flexibility and interpretability for analyzing complex data. He combines ideas from convex optimization and statistics to develop methods that are of direct use to scientists and others with large datasets. His work has been supported by an NSF CAREER award and a three-year NSF grant on high-dimensional covariance estimation. He serves as an associate editor of Biometrika and Biostatistics. Before joining USC, he was an assistant professor at Cornell.

Talk: High-Dimensional Variable Selection When Features are Sparse

View presentation slides 

Abstract: It is common in modern prediction problems for many predictor variables to be counts of rarely occurring events. This leads to design matrices in which a large number of columns are highly sparse. The challenge posed by such "rare features" has received little attention despite its prevalence in diverse areas, ranging from biology (e.g., rare species) to natural language processing (e.g., rare words). We show, both theoretically and empirically, that not explicitly accounting for the rareness of features can greatly reduce the effectiveness of an analysis. We next propose a framework for aggregating rare features into denser features in a flexible manner that creates better predictors of the response.  An application to online hotel reviews demonstrates the gain in accuracy achievable by proper treatment of rare words. This is joint work with Xiaohan Yan.

Image Gallery

Headshot of Jacob Bien

In This Section

  • Ed George
  • Rina Barber
  • Jim Berger
  • Anirban Bhattacharya
  • Jacob Bien
  • Dean Foster
  • Rob McCulloch
  • Veronika Rockova
  • Home
  • About Us
  • Contact Us
  • Careers
© Cornell University Department of Statistics and Data Science

1198 Comstock Hall, 129 Garden Ave., Ithaca, NY 14853

Social Menu

  • Facebook
  • Twitter
  • YouTube
Cornell Bowers CIS College of Computing and Information Science Cornell CALS ILR School

If you have a disability and are having trouble accessing information on this website or need materials in an alternate format, contact web-accessibility@cornell.edu for assistance.