Forecasting Google’s Stock Price with ARIMA Modeling
Github repo for this can found here
After the Gamestop fiasco with the subreddit r/wallstreetbets, I became very intrigued with the stock market. It promises the opportunity of extreme wealth. If you could only be right on one stock, you can change your life forever. So many people have pursued this dream, and yet so many people have had their dreams crushed like Tom Brady did to the Kansas City Chiefs.
Why?
The number of variables that go into the price of a stock are countless. The average human cannot possibly fully discern how all of this works. Processing tens of thousands of data points cannot be handled by the human mind. But perhaps a machine can…
Disclaimer: Before I begin, the obligatory statement — this article is in no way trading advice of any kind, but merely a simple data science project. Accurately predicting stock price movements requires highly complex models, which this is not.
Objective: To create a simple time-series model to forecast Google’s stock price.
Methodology:
- Retrieve data using TD Ameritrade’s API
- Clean and visualize the data
- Calculate returns
- Test for stationarity: Dickey-Fuller Test
- Choosing parameters using ACF and PACF charts
- Build ARIMA time-series model
- Plot Predictions with Actual
- Make Forecast
- Analyze results
Import Libraries
I will be using the following python libraries to perform my analysis:
# Libraries for handling data
from information import client_id
import requests
import numpy as np
import pandas as pd
from datetime import datetime
import random# For visualizations
import plotly.graph_objects as go
import matplotlib.pyplot as plt# For time series modeling
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima_model import ARMA
Retrieve Data Using TD Ameritrade’s API
TD Ameritrade had a decent stock price API, it’s not super robust, but it’s enough to get the data we need. For a good tutorial on how to get started with TD Ameritrade’s API, check out this:
My custom function to request and pull the data from TD Ameritrade’s API:
Clean and Visualize the Data
Check for unknown, or NaN (not a number) values:
dataframe[dataframe.isna().any(axis=1)]
Plot the data:
Calculate Returns
A proper time-series required the data be stationary. This means the mean and standard deviation are independent of time. In order to this, we take a first-order difference (calculate daily returns) on the data.
Plot returns:
Test for Stationarity: Dickey-Fuller Test
Stationarity implies the mean and standard deviation of the returns have no correlation with time. This is important because it allows for stability and some level of certainty in forecasting model.
To do this, we must first calculate rolling mean and standard deviation
Plot returns on rolling mean and standard deviation
Now we can perform our Dickey-Fuller Test:
With a p-value < 0.05, we can reject the null hypothesis that there is a unit root, which is a fancy way of saying there is a discernible patter. Since we reject the null hypothesis, the data is therefore stationary! We can now proceed with the time-series modeling process.
Choosing Parameters using ACF and PACF Charts
The parameters of anARIMA model are defined as follows:
- p: The number of lag observations included in the model, also called the lag order.
- d: The number of times that the raw observations are differenced, also called the degree of differencing. Our data was differenced once (daily returns)
- q: The size of the moving average window, also called the order of moving average.
Making autocorrelation and partial autocorrelation charts help us choose parameters for the ARIMA model.
The ACF gives us a measure of how much each “y” value is correlated to the previous n “y” values.
The PACF is the partial correlation function that gives us (a sample of) the amount of correlation between two “y” values separated by n-lags excluding the impact of all the “y” values in between them.
ACF
PACF
The above charts show the ACF and PACF readings give us a lag “p” of 6 and a lag “q” of 6
Build ARIMA Time-Series Model
Plot Forecast with Actual
Make Forecast
And now for the moment we’ve all been waiting for…
Analyze Results
Let’s take a look at our predictions. How close were we from the actual stock price? Let’s plot the distributions of our residuals (errors)
Although the bulk of the errors are close to 0, too many of them lie far from zero, meaning many predictions were way off from the actual. This can potentially mean large losses (which is why you shouldn’t use this model for trading purposes, but only for educational purposes).
- The AIC of our mode is small at -20964.701. But does this equate to a good model? Probably not.
- If we check the errors (predictions — close price), the mean absolute error is approximately 6.8, which may lead to significant losses.
- At one point the model prediction was off by 144, these trades would have resulted in enormous losses.
Questions? Comments? Contact me:
email: md.ghsd@gmail.com
LinkedIn: https://www.linkedin.com/in/marcosdominguez2018/
Twitter: https://twitter.com/mdcruz2010