I'm working on Hidden Markov Models in python, and I saw this comment in an article regarding the splitting of data:
"The only way your splitting could make some sense is if you prove that there is no autocorrelation in your series (which is very unlikely).
If I can suggest, I think the best train/test splitting would be a sequential split:
[ 75 % train ] → [ 5% “embargo” ] → [ 20% test ]
The embargo is data that you throw away and makes sure that you are not leaking information from the train to the test."
My question is, is it as simple as running a train_test_split with a 75% training set and 20% test that is NOT randomized? For context, the referenced article is named "When to Buy the Dip" on TDS. Any help or input is appreciated, thank you.
Submitted October 22, 2020 at 08:38AM by jakecberry