Hi all,
I'm an epidemiologist by training and my former colleagues at the University of Washington (
healthdata.org) [and collaborators around the world] are currently doing a lot of work on Covid-19 data collation and analysis, which is being documented here:
Github repo:
https://github.com/beoutbreakprepared/nCoV2019
Initial visualization:
healthmap.org/covid-19/
Nature article:
https://www.nature.com/articles/s41597-020-0448-0.pdf
Lancet article:
https://www.thelancet.com/journals/laninf/article/PIIS1473-3099(20)30119-5/fulltext
They're having a hard time keeping up with all of the new sources, which they're largely manually checking and copying data from into a spreadsheet a few times a day. They're looking for help automating that process so that they can focus more on building models and delivering results to stakeholders in governments, health systems, and non-profits around the world.
They are working right now on compiling a list of all of the websites they regularly gather data from and then I'd like to help them create Prefect Flows to run BeautifulSoup tasks a few times a day to check the sites for updates, parse the results, and add them to their data sources.
@Chris White and
@David Abraham have generously offered up a free Prefect Cloud account for us to use, so should be easy for us to get started.
I expect there to be about 100 different sites to write parsers for, so I'd love to get some help crowdsourcing that effort. If you'd like to be involved, please respond here or email me at
kyleforeman@gmail.com. I'll aim to have a kickoff meeting this Saturday so that we can figure out how best to tackle the problem.