Train/Test Sequential Split

I'm working on Hidden Markov Models in python, and I saw this comment in an article regarding the splitting of data:

"The only way your splitting could make some sense is if you prove that there is no autocorrelation in your series (which is very unlikely).

If I can suggest, I think the best train/test splitting would be a sequential split:

[ 75 % train ] → [ 5% “embargo” ] → [ 20% test ]

The embargo is data that you throw away and makes sure that you are not leaking information from the train to the test."

My question is, is it as simple as running a train_test_split with a 75% training set and 20% test that is NOT randomized? For context, the referenced article is named "When to Buy the Dip" on TDS. Any help or input is appreciated, thank you.

Submitted October 22, 2020 at 08:38AM by jakecberry

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s