Forecasting Google’s Stock Price with ARIMA Modeling

Marcos Dominguez
Analytics Vidhya
Published in
5 min readFeb 16, 2021

--

Github repo for this can found here

After the Gamestop fiasco with the subreddit r/wallstreetbets, I became very intrigued with the stock market. It promises the opportunity of extreme wealth. If you could only be right on one stock, you can change your life forever. So many people have pursued this dream, and yet so many people have had their dreams crushed like Tom Brady did to the Kansas City Chiefs.

Why?

The number of variables that go into the price of a stock are countless. The average human cannot possibly fully discern how all of this works. Processing tens of thousands of data points cannot be handled by the human mind. But perhaps a machine can…

Disclaimer: Before I begin, the obligatory statement — this article is in no way trading advice of any kind, but merely a simple data science project. Accurately predicting stock price movements requires highly complex models, which this is not.

Objective: To create a simple time-series model to forecast Google’s stock price.

Methodology:

  1. Retrieve data using TD Ameritrade’s API
  2. Clean and visualize the data
  3. Calculate returns
  4. Test for stationarity: Dickey-Fuller Test
  5. Choosing parameters using ACF and PACF charts
  6. Build ARIMA time-series model
  7. Plot Predictions with Actual
  8. Make Forecast
  9. Analyze results

Import Libraries

I will be using the following python libraries to perform my analysis:

# Libraries for handling data
from information import client_id
import requests
import numpy as np
import pandas as pd
from datetime import datetime
import random
# For visualizations
import plotly.graph_objects as go
import matplotlib.pyplot as plt
# For time series modeling
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima_model import ARMA

Retrieve Data Using TD Ameritrade’s API

TD Ameritrade had a decent stock price API, it’s not super robust, but it’s enough to get the data we need. For a good tutorial on how to get started with TD Ameritrade’s API, check out this:

My custom function to request and pull the data from TD Ameritrade’s API:

Clean and Visualize the Data

Check for unknown, or NaN (not a number) values:

dataframe[dataframe.isna().any(axis=1)]

Plot the data:

Retrieved from TD Ameritrade API

Calculate Returns

A proper time-series required the data be stationary. This means the mean and standard deviation are independent of time. In order to this, we take a first-order difference (calculate daily returns) on the data.

Plot returns:

Test for Stationarity: Dickey-Fuller Test

Stationarity implies the mean and standard deviation of the returns have no correlation with time. This is important because it allows for stability and some level of certainty in forecasting model.

To do this, we must first calculate rolling mean and standard deviation

Plot returns on rolling mean and standard deviation

Now we can perform our Dickey-Fuller Test:

With a p-value < 0.05, we can reject the null hypothesis that there is a unit root, which is a fancy way of saying there is a discernible patter. Since we reject the null hypothesis, the data is therefore stationary! We can now proceed with the time-series modeling process.

Choosing Parameters using ACF and PACF Charts

The parameters of anARIMA model are defined as follows:

  • p: The number of lag observations included in the model, also called the lag order.
  • d: The number of times that the raw observations are differenced, also called the degree of differencing. Our data was differenced once (daily returns)
  • q: The size of the moving average window, also called the order of moving average.

Making autocorrelation and partial autocorrelation charts help us choose parameters for the ARIMA model.

The ACF gives us a measure of how much each “y” value is correlated to the previous n “y” values.

The PACF is the partial correlation function that gives us (a sample of) the amount of correlation between two “y” values separated by n-lags excluding the impact of all the “y” values in between them.

ACF

PACF

The above charts show the ACF and PACF readings give us a lag “p” of 6 and a lag “q” of 6

Build ARIMA Time-Series Model

Plot Forecast with Actual

Make Forecast

And now for the moment we’ve all been waiting for…

Analyze Results

Let’s take a look at our predictions. How close were we from the actual stock price? Let’s plot the distributions of our residuals (errors)

Although the bulk of the errors are close to 0, too many of them lie far from zero, meaning many predictions were way off from the actual. This can potentially mean large losses (which is why you shouldn’t use this model for trading purposes, but only for educational purposes).

  • The AIC of our mode is small at -20964.701. But does this equate to a good model? Probably not.
  • If we check the errors (predictions — close price), the mean absolute error is approximately 6.8, which may lead to significant losses.
  • At one point the model prediction was off by 144, these trades would have resulted in enormous losses.

Questions? Comments? Contact me:

email: md.ghsd@gmail.com

LinkedIn: https://www.linkedin.com/in/marcosdominguez2018/

Twitter: https://twitter.com/mdcruz2010

--

--

Marcos Dominguez
Analytics Vidhya

Data Scientist with a background in banking and finance. I love statistics, programming, and machine learning.