My question is on data cleaning. It would be nice if data was perfect, and the idea being you pay for better, cleaner, data, but you are always going to have to have some sort of filtering algorithm otherwise the 'bad' ticks are going to get in there. I'm not concerned with the ticks that are 0.01 off. I'm never going to catch those, and it won't likely effect me anyway.
From my somewhat naiive review of my data provider, it seems that the really bad ticks are easily identifiable – off by a factor of 100 or so, like this:
9/25/20 close 99
9/28/20 close 100
9/29/20 close 1.01
9/30/20 close 101
So that on a chart view its really obvious. But its tedious to manually correct.
So far I have decided on coding a %bollinger band of about 7 periods a with 20% to filter out the bad data points which works pretty well except in rapidly moving markets (see: BA) . I use this with a ratio rule whereby I compare the price to the prior close and the subsequent close (yes, its a look forward, but I'm not trading off of it or using it with a live feed) to find those anomalous spikes. Anomalous data points are deleted and replaced with the prior.
It works OK , but I'm wondering if someone has wrote this stuff already and I'm reinventing the wheel. A brief github search didn't yield anything useful. Is there some package out there everyone uses with this already done or is it just everyone doing their own thing? I've showed you mine – could you please comment and share what you're using to avoid obvious price anomalies? (edit: formatting)
Submitted October 04, 2020 at 09:57AM by drsxr