Each MPS student completes a two-semester project, which is supported by core courses. The project involves large-scale data analysis and is often completed in collaboration with a private company.
Five projects from the 2018-2019 academic year
1. Characterizing Patients for Optimal Lung Transplant Strategies
Project team: Eunyoung Kim, Elly Kipkogei, and Yiran Wang
Sponsor: Weill Cornell Medicine
Advisors: Drs. Arindam RoyChoudhury and Xiaolong Yang
Recognition: Best Project Award recipient
Abstract: Different types of procedure preferences of lung transplantation (bilateral, unilateral, or no preference) could have an impact on the overall survival. To demonstrate whether such a preference is closely related to better survival outcomes, we conducted a retrospective cohort research of 8,766 adults with the interstitial lung disease (ILD) aged 18 years or older listed for lung transplantation in the United States between 2005 and 2014, which was divided into three groups: "restricted" for bilateral lung transplantation only, "unrestricted single" for left or right single lung transplantation only, and "no preference." We used a stratified Mixed Effects Cox-Regression with transplant center as a random effect, including age, race, body mass index, height, ABO blood group, lung allocation score (LAS), use of mechanical support (mechanical ventilation or extracorporeal support) as time-independent variables. We also performed multiple analyses separately on subgroups for pulmonary artery pressure (PA) and LAS values. In the whole data analysis, we found no evidence that the survival outcome is different between unrestricted preference, and bilateral only preference. However, for those patients with pulmonary hypertension (PH) ≥35 mm Hg, we found the outcome of unrestricted preference to be significantly better than bilateral only preference (p ~ 0.04). This corresponds to 20% reduction in overall hazard. This is an important finding as it is often favored by the treatment centers to have the patient enlisted for bilateral lung transplantation, based on the general believe that it would lead to better survival outcome.
2. Scheduled Maintenance Time Period Analysis for Counterbalance Models
Project team: Zhenjie Shen, Yash Shah and Yiting Zhong
Sponsor: The Raymond Corporation
Advisors: Dr. John Bunge
Recognition: Best Project Award recipient
Abstract: Maintenance is a critical topic during the lifetime of a truck. It includes scheduled maintenance and parts repairs. It is important to figure out whether a truck needs to participate in a scheduled maintenance plan, and if it does, how often should the optimal maintenance time periods be. In this project, two datasets that respectively contain information of counterbalance forklift regular scheduled maintenance as well as other parts and services maintenance, are processed and analyzed. Based on whether trucks participate in SM plans, trucks are divided into SM trucks and Non-SM trucks and compared for differences. Next, for trucks participating in SM plans, baseline scheduled maintenance time periods, such as intervals of 30, 60 and 90 days, are identified from the actual time of days between services. The distribution of interval days is identified as mostly close to burr distribution, and then this variable is classified into seven groups. The differences of variables concerning customer characteristics and truck models are further compared for each interval group. Besides, the total cost ownership for trucks are calculated and compared for different truck demographics and interval groups. K-S test compares frequency of failure types with/without scheduled maintenance, resulting in failure types of ELE, LIF and TVL occurring less frequently when using a schedule maintenance plan. Finally, the optimal service time period that leads to minimum costs for each truck is predicted based on truck demographics. Random forest regression model is applied for predicting minimum cost per hour, and after parameter tuning, the model provides around 54-60% reduction in cost per hour. In order to generate more realistic results, interval groups 1 and 7 are removed in the improved model, and the new model provides a 54% reduction in cost per hour, with 60% trucks assigned to normal interval groups 2 and 3.
3. Planted Area Prediction
Project team: Marielle Jurist, Zhixiang Wang and Yige Yin
Sponsor: Gro Intelligence
Advisors: Yang Ning
Abstract: This project was for predicting corn Planted Area for 2019 using the data available on Gro Intelligence online data platform. It consisted of two parts: forming a complete data set and fitting predictive models. Collecting the data involved setting up a database, handling missing values, and feature selection. Once a comprehensive and usable dataset was built, three types of models were fit: linear regression (Lasso, Ridge, Elastic Net), regression trees (boosting), and time series (multivariate state space model). The regression methods utilized the entire database and the county name column as a high-level categorical in order to get predictions for every county. The results for these two models concluded that the complicated and hierarchical data set could not all be fit in the same model. Fitting each county as its own separate model is a more time consuming, yet more practical approach. Multivariate state space models with univariate response and covariates in the observation and process equations were successfully fit. One year ahead forecasts for this model showed much improvement and the mean squared errors were low enough to confidently report 2019 predictions for a single county.
4. Disease Analysis using Medicare Data
Project team: Peixin Gao, Haoxuan Guo and Yan Zhang
Sponsor: Trinity Partners
Advisor: Dr. Martin Wells
Abstract: Medicare information about patients has been a good source for disease data analysis. Companies or research institutes often times seek to better understand a certain type of disease and how they are coded in Medicare data, for instance information like how frequent a patient needs to be serviced, how much money is spent on the patient, etc. In this analysis, a set of Medicare data was analyzed that contains information of patients of Friedreich’s Ataxia (FA), as well as patients of other various diseases, in order to find certain characteristics of FA that separates it from other diseases. The analysis includes comparison graphs between certain statistics of FA patients versus other Ataxia patients as well as general patients. The results also include steps for building a predictive model to find the variables in the dataset that have the highest correlation with determining if a patient has Friedreich Ataxia.
5. Statistical Analysis of Financial Planning for Hangar Theatre
Project Team: Ryan Nickerson, Linxing Yao and Jialu You
Sponsor: Hangar Theatre
Advisor: Dr. John Bunge
Abstract: Hangar Theatre is a local theater based in Ithaca, New York. The goal of the project was to provide financial analysis on the sales data of the theatre to aid in future budgeting and schedule planning. The sales dataset provided was very limited, both in terms of the number of entries and usable covariates; however, after data cleaning and structuring, a multiple linear regression was successfully performed with a high R-squared value and full significance in the model. The results show that musicals are creating more gross income than any other play types shown at Hangar Theatre, shows with longer run length generate higher average gross income, and moreover, that Hangar Theatre has room to increase their ticket price. Before reaching the final model, several other advanced techniques of analysis were attempted, including time series analysis and ridge regression. With the finalized linear model obtained from the data analysis, an Excel template was built, including linear model and essential visualizations, for the theatre for future forecasting.