MPS projects further prepare students for work in industry.

Each MPS student completes a two-semester project, which is supported by core courses.  The project involves large-scale data analysis and is often completed in collaboration with a private company.

Four projects from the 2019-2020 academic year

Virtual Reality (VR) Simulator Operator Behavior Analysis

Project team: Yukun Cheng, Zackary Downey, Shan Lu, and Yangyang Wang

Sponsor: Raymond Corporation

Advisor: Dr. Xiaolong Yang

Recognition: Best Project Award recipient

In January of 2018, Raymond launched the Raymond VR Simulator, a product that enables operators to learn how to operate a Raymond forklift in a simulated warehouse environment while standing on a real truck.  The system connects directly to the lift truck through the Simulation Port, disabling all vehicle motion so that manipulating the controls drives a simulated truck in virtual reality.  Operators complete a series of guided lessons in VR and receive a score at the end of each lesson.  A recording of the operator completing the lessons is available for replay so it can be reviewed with a trainer for further coaching.

One major appeal of the VR simulator is the amount of data it can collect on operators and the resulting insights this data can provide. This enables customers to continuously improve the performance of operators by identifying trends and correlations between leading indicators such as operator performance on the simulator and how these translate to how the operator will behave in the real warehouse. The objective of this project is to analyze the data in the replay files to determine if any of the data can be correlated with operator behavior.

In our paper, we introduce The Raymond Corporation, a major forklift manufacturer, located in Greene, NY, their VR Simulator, and the data analysis performed on the output of the VR Simulator. To meet the requirements of the project, we needed to use a variety of methods to answer a list of potential questions provided by Raymond Corporation. 

First, we describe some of the questions provided. Then, we describe the data sets provided, both core and supplementary, as well as updates made over the semester. We will then walk through some initial data exploration techniques and results and an initial look into using supervised machine learning to answer one of the main questions. We proposed a new variable called the “Aggressiveness Index”, which allows us to quantify the activity of a user at a specific time and simplify the data set. Our supervised machine learning methods include (but are not limited to) LDA, Naive Bayes, KNNs, Hidden Markov Models, and XGBoost. 

Due to data classification imbalances described later, we will propose an alternate data grouping technique to improve algorithm inputs. Using these new inputs, we refer to some improved supervised machine learning methods for predicting existence of user penalties at specific times. Finally, we use some unsupervised machine learning methods to answer a couple of the other originally proposed questions. These methods mostly center around clustering and visualization of proposed clusters to map certain characteristics to types of users. 

We summarize our work, provide final conclusions, and describe some possible future enhancements to our efforts. In summary, we answer four of the main questions provided by Raymond Corporation:

  • Is there a relationship between head rotation, vehicle position, and penalties?
  • Can we use request variables to cluster users?
  • Is there a relationship between horn use and the total score for a user?
  • Can we predict success or failure from all provided variables?

The proposed answers to these may provide insight to future data analysis and data usage done by the company.

Default Risk: Swapping how we Measure Loan to Values

Project team: Junyi Bao, Yutong Hou, Jacob Schoifet, and Yuyang Ye

Sponsor: Home Diversification Corp.

Advisor: Dr. Sumanta Basu

Recognition: Best Project Award recipient

Home ownership is one of the most meaningful ways to create wealth for most families; however, it carries risks. Mortgage defaults negatively impact both lenders and homeowners. One of the most meaningful indicators of the default likelihood is loan-to-value (“LTV”), which is the loan amount owed by the homeowner as a percentage of the value of the house. More specifically, while the LTV on the date of purchase is a strong indicator of risk, the LTV at every point going forward from the purchase date (the “mark-to-market LTV” or “MLTV”) gives an indication at any time whether or not the homeowner has more value in the home than they owe the lender. 

Our report investigates how MTLV can impact default rates and how swapping the mechanic measuring these MLTVs can help lower the overall risk of default. We analyzed the Freddie-Mac Loan level data from 2008 to 2010 and built graphs showing the risk-reducing potential of swapping our measuring mechanic from using Local Home Price Indices to a national one. We also used graphs showing the potential to reduce residential mortgage default credit losses through the swap and then we built a logistic regression model to capture the effect of the MLTV on default rates. We tested our model against data from 2014 to 2016. The model incorporated LTV and Combined Loan-to-Value at origination, MLTV, origination channel, Debt-to-Income, credit score, and first-time homebuyer flag. We finally discussed further studies to explore more datasets and to discover if swapping to the national is ideal.

End-To-End Analysis of Energy Data

Project team: Hyun Do Cha, Ruoqi Ge, Shan Huang, and Jian Shi 

Sponsor: Ursa Space Systems

Advisor: Dr. Sumanta Basu

Recognition: Honorable Mention Award recipient

Abstract: Understanding the movements of global oil flow is critical towards forecasting trends in the energy industry and its associated economic sectors. This project assessed the viability of utilizing satellite-imaged oil inventory figures, energy analytics data on offtakes and loads, as well as vessel traffic records of nearby ships to compute pipeline flows and investigate oil terminal operations. In this project, spreadsheet modelling methods were applied to reconfigured data in order to estimate the total crude oil flow through the Sumed pipeline over 2019. Additionally, geospatial visualization techniques and canonical correlation analyses were employed to assess the existence of licensing or ownership relations between individual oil tanks and shipowning entities. The results indicate that tank fill data can reconcile a reliable lower bound estimate on total oil flow, with robustness improving with either higher granularity of data or less frequent internal fluctuations of port inventory totals. Furthermore, it was found that most ships or shipowning entities are unlikely to be extracting crude oil from licensed tanks. These conclusions return insights into the Sumed Company’s Egyptian closed oil system and illuminate the potential of a cost-effective, high-frequency imaging project over a shortened period in order to accurately model oil flow and refine a list of ship-tank pairs.

Financial Markets and Housing Sector: A Cross-Country Empirical Study

Project Team: Harshita Garg, Jiayin Liu, Shutong Wu, and Yi Zhu

Sponsor: Dr. Yunhui Zhao (IMF Economist)

Advisor: Dr. Yang Ning

Recognition: Honorable Mention Award recipient

Housing is by far the most important asset in households’ balance sheets across the world, so it is extremely important to understand the driving forces for high housing prices. However, despite the forceful and frequent government interventions, the housing prices in some countries such as China have been on a fast-increasing trend. Motivated by this dilemma, the project applies a variety of empirical approaches to a cross-country panel dataset, including panel regressions (which in turn select variables based on Lasso regressions), difference-in-difference models (which in turn include fixed effect and random effect models), and machine-learning models that ensure higher similarity among the restricted subsample. Results from all these approaches support the findings in Bayoumi, Xie and Zhao (2020) (which studies the housing market in China), and suggest that countries with “underdeveloped” or incomplete financial markets (such as shallow bond markets and stock markets that are plagued by insider trading) tend to experience higher housing price growth, after controlling for other key supply-side and demand-side factors in the housing market. The results imply that to eradicate the root causes of the high housing price issue, policymakers need to go beyond the housing market itself; instead, it may be desirable to deepen the financial markets because these markets would help channel financial resources to productive sectors instead of to housing speculations and help enhance the overall efficiency of the economy.

Disclaimer: The views expressed here are those of the author(s) and do not necessarily represent the views of the IMF, its Executive Board, or IMF management.

Five of the best projects from the 2018-2019 academic year
The best project from the 2017-2018 academic year.