Using ARIMA Models and Sentiment Analysis To Predict Stock Prices

The Scraper Guy
3 min readJan 14, 2021

--

Can we beat the market using two of the most common ML practices?

Pixabay

Another stock market article on Medium? Unfortunately, yes.

In the past year I have embarked on a journey to beat the stock market. I have tested every approach imaginable yet, here I am no closer to finding alpha.

Luckily, I find myself in the final year of a Computer Science degree, which perfectly intersects with this stock market predicament.

As part of my studies we are tasked with developing a fully fledged project to demonstrate our skills to potential employers, post graduation. Can you guess what my project revolves around?

Over the next 4 months I am attempting to develop an investment strategy built around ARIMA modelling the most popular method to perform time series analysis, as well as utilizing sentiment analysis, to in theory boost returns.

This article does not contain any code but merely the process by which I plan to tackle this project. Hopefully we can master the market together.

ARIMA

Constrained slightly by what models I can use, the obvious choice to build this system is an ARIMA model.

I do not expect to revolutionize this Machine Learning approach in any significant way but merely optimize my ARIMA model as much as possible so the other areas of my strategy can become the focal points.

My initial hypothesis, is to first model the S&P 500 (SPY). Inherently taking long positions in most stocks if the market is forecast to fall, is not a favorable approach. There will be exceptions to this rule that I will discuss.

For each company I intend to calculate its correlation to SPY. This process is essentially a linear regression model that will show if a stock e.g. TSLA is positively/negatively correlated to SPY.

If a stock is correlated in any significant way , this added layer of forecasting the market could prove to strengthen our strategy in a significant way.

This method is probably not incredibly advanced in any way but is a great starting point to hone my Quantitative skills.

Sentiment

A quick search on Medium provides endless results of some form of sentiment analysis on a myriad of sources from Twitter to the Business Times.

Most approaches focus on one or two sources but neglect almost everything else. My approach will revolve around news articles from google which aggregates essentially every noteworthy source on each company in the S&P 500.

But, inspecting only articles about the company alone is a naive approach. Essentially we will have to filter most articles related to the market, that could possibly have an affect on conditions.

Scraping articles on intervals every minute is the most efficient approach as the computational intensity of running the program constantly and the affect of news on the market has to be balanced.

A strategy of this sort can inform intraday trading strategies but we will couple this sentiment analysis of overall economic conditions with an analysis on twitter sentiment from both regular users and VIP’s.

VIP’s a nice way of saying people who can post a tweet and alter the course of a company stock price, or the market itself. We will have to weight the number of users tweets and their polarity against that of those of our important people/accounts.

Conclusion

To conclude, this process is going to be challenging to say the least. I expect to encounter issues by the day, but I am confident we can create some semblance of a working Quantitative Investment strategy that can perform honorably!

Check back every week, as I will post an update on my progress and any insights I gain.

Until next time traders.

--

--