What are the recommended strategies to combat data leakage on time series OHLC's?
I wonder whether there is an industry norm solution to prevent the training from getting exposure to information it shouldn't. I have heard of using cross validators in pipelines, other people say this is a risky strategy.
Status: Model trains, 95% accuracy on train/test (keras CNN). Deployment is set up over currency pairs with web sockets for performance and logging functionality. On the deployment rig the algorithm has a 52% accuracy which means the train is likely exposed to information which in no way resembles new unseen before data.
Submitted October 21, 2020 at 07:05AM by Dream3r111