Demographic Imputation from Online Activity

Students - Xinran Li, Anthony Maers, Mingjue Yin, and Zhun Wang

Advisor - David Matteson

Spring 2014

comScore is a leading internet analytics company providing marketing data and analytics to many of the world's largest enterprises, agencies, and publishers. It has a panel of over one million panelists in the United States who have agreed to let comScore track and see all of their online activity. In this project, the MPS students tried to use the panelists’ browsing activities to infer their age and gender. The model can be generalized to the unknown online population for future marketing purposes.

The heavy programming and analytics of this project were achieved with statistical software R. Due to the nature and the large volume of the data, the structure of the data was reshaped at the beginning. Various data mining techniques were explored during the modeling procedure, including advanced tree and regression methods. The final regression model with a build-in selection operator successfully resolved the issue of extensive variables and produced satisfying predictions. The results were reasonable and easily interpreted. Further, constructive suggestions regarding data utilization were made for client’s future analysis.

Due to sufficient analysis, productive results, and organized presentation, the project was awarded one of two Best MPS Project Awards for 2014.

Figure below: Selected websites from the final model that falls into people’s common stereotype create substantial predicting power.