Python beginner, scraping economic calendar

Hello,

little disclaimer, I have little to no coding experience so please bear with me.

I made a little calendar scraper, scraping marketwatch.com calendar.

The html structure is very basic (see attached), it's basically table/tbody/ and from there, all the infos are in separate tds.

I can scrape (code in attachement) and get the table in a csv file then use pandas to get the whole thing in a formatted and exploitable dataframe.

My issue is that I want to be able to get scrapy to run at the time that the announcement is supposed to come out. For example if there's a release the 13th of october at 10:00 AM, I want to schedule my scrapy to launch 2 min later and get the latest number.

My thinking is as follows :

1/ Extract all the days and times and store them into some kind of scheduler table.

2/ write a script that gets the result from above and launch the scraper at the wanted times

But since the webpage html isn't really structured, I can't tell my scrapy to go look for a particular location only. So the only way I can think of is basically to scrape again the whole table…not efficient probably.

What could be the workaround? I might be doing this wrong by using pandas when I don't have to.

I'd love if you guys could point me in the right direction.

Best regards

scrapy code

table structure

Submitted October 12, 2020 at 08:19AM by crashbandishocks
via https://ift.tt/2SNNS7i

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s