Proper handling of intraday data to create ML dataset in python

Hello, I've been working with small datasets to create models in tensorflow / keras functional api. What I'm used to do is to preprocess and clean / manipulate / visualize the data using pandas and later create a tfrecord that I use for training. Recently I started working with stock data (1min frequency) so, I currently have somewhere between 10,000 – 30,000 stock signs for 10+ years of data that I stored in the following fashion: I got the data from polygon api and for each stock sign I create a separate .parquet file and I have the files in a GCP bucket. Now, if I'm going to create a dataset that will include most of the signs I have the following concerns:

  • Variable length and frequency files which implies for example: AAPL df has 2,566,598 rows and AMZN has 1,928,479 rows for the same period, and some signs have fewer than 10,000 rows. What is a proper way of dealing with NAN values?
  • For calculating technical indicators, lagged returns and many other computations efficiently, I was thinking maybe I could use Google BigQuery and store the data and perform necessary computations using SQL queries, is there a better way? should I store all signs in 1 table with multiple indices? or use one table for each sign?
  • If anyone worked with intraday stock data before, what frequency do you recommend to avoid overfitting and get good results?
  • What technical indicators work best for data with this frequency(1min)? I'm asking because moving averages, MACD and many other indicators calculate over periods of days and I'm not sure whether this can be applied to this frequency as well.

Submitted October 10, 2020 at 08:33AM by emadboctor

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s