Download All S&P500 Company Data The Easy Way

The Scraper Guy
3 min readJan 25, 2021

yfinance is a blessing

Photo by Burak K from Pexels

yfinance is the dependency we will use to make the process of downloading all data easy and efficient.

If you are unfamiliar with this package I recommend reading the documentation. It is incredibly easy to understand and use and undoubtedly an incredibly powerful resource to use with stock trading.

In this article I will explain how to download all historical stock price data over a certain period and format it in a dataframe to ensure you can manipulate to create trading strategies!

CODE

import yfinance as yf
import pandas as pd
import statsmodels
import numpy as np
from pandas_datareader import data as pdr
yf.pdr_override()

Here are all the packages that we will need. If you dont have all of these installed its very straightforward to do so.

stocks = pd.read_csv('CSV DIRECTORY LINK')
stocks.head()

In the 2nd step you will need to download a CSV file with all of the S&P 500 companies listed. In my case I copy and pasted the list from wikipedia into a CSV file. Once done just replace the ‘CSV DIRECTORY LINK’ with the directory location of you file.

We then return the head of the file to ensure it is functioning correctly.

tickers = list(stocks['Symbol'])

Here we are creating a list of the Symbols(Tickers) in our csv file as it will be easier to use them in tandem with yfinance.

data = pdr.get_data_yahoo( 

tickers,
period = "1mo",)

This is our function to download all of the historical data. pdr.get_data_yahoo is a built in yfinance function that downloads multiple tickers stock prices in the form of a pandas dataframe.

tickers is our list of all the companies we need to get prices on.

Period I have set is 1mo but this can be adjusted depending on your needs.

We wait until our prices have been downloaded.

data

We can check the formatting of our data by executing the name of our dataframe and it should be in the form as shown below:

As we can see our dataframe needs to be cleaned up.

newdataframe = data.stack().reset_index().rename(index=str, columns={"level_1": "Symbol"}).sort_values(['Symbol','Date'])

We can do this using 2 simple lines of code as seen above.

And now our dataframe should be cleaned up nicely.

newdataframe

Success! Our dataframe is much cleaner and is much easier to manipulate now.

Conclusion

This is a very simple tutorial but I hope it will help some beginners get their footing when it comes to developing quant trading strategies and utilizing stock market data.

Hope you enjoyed and Happy Trading!

--

--