The following is a breakdown of a project I completed at Metis Data Science Bootcamp

Goal: To web scrape sneaker data from and use linear regression to predict sneaker prices

Tools: Python, Pandas, Numpy, Selenium, BeautifulSoup, Scikit-Learn, Statsmodels, Matplotlib, Seaborn

Features: Age, # of Sales, Volatility, Price Premium, Brand Name

Target: Sale Price

Data Cleaning

The data came on very messy. All of the columns were the wrong type, there were missing values and NaNs, and there were unusual elements (such as ‘?’, ‘/’, etc.). So all of that had to be cleaned up. Furthermore, outliers and NaNs were removed.

Exploratory Data Analysis

Here, I explored the data to discover correlations. I also transformed the Brand Name column into dummy variables.

Linear Regression Analysis

The following models were used in the analysis: Simple OLS, Ridge Regression, Polynomial Regression, and Polynomial with Ridge Regression.

All of these models were sampled using 5-fold cross validation.

Feature Engineering: I included a multiplicative interaction between # of Sales and Price Premium. All of the above models were tested before and after the inclusion of feature engineering.


Please see Jupyter Notebook below for code

Data Scientist with a background in banking and finance. I love statistics, programming, and machine learning.