The Sneaker Game: Datafied
The following is a breakdown of a project I completed at Metis Data Science Bootcamp
Goal: To web scrape sneaker data from www.StockX.com and use linear regression to predict sneaker prices
Tools: Python, Pandas, Numpy, Selenium, BeautifulSoup, Scikit-Learn, Statsmodels, Matplotlib, Seaborn
Features: Age, # of Sales, Volatility, Price Premium, Brand Name
Target: Sale Price
Data Cleaning
The data came on very messy. All of the columns were the wrong type, there were missing values and NaNs, and there were unusual elements (such as ‘?’, ‘/’, etc.). So all of that had to be cleaned up. Furthermore, outliers and NaNs were removed.
Exploratory Data Analysis
Here, I explored the data to discover correlations. I also transformed the Brand Name column into dummy variables.
Linear Regression Analysis
The following models were used in the analysis: Simple OLS, Ridge Regression, Polynomial Regression, and Polynomial with Ridge Regression.
All of these models were sampled using 5-fold cross validation.
Feature Engineering: I included a multiplicative interaction between # of Sales and Price Premium. All of the above models were tested before and after the inclusion of feature engineering.
Results
Please see Jupyter Notebook below for code