https://prefect.io logo
Title
k

Kyle Foreman (Convoy)

03/25/2020, 1:19 AM
Hi all, I'm an epidemiologist by training and my former colleagues at the University of Washington (healthdata.org) [and collaborators around the world] are currently doing a lot of work on Covid-19 data collation and analysis, which is being documented here: Github repo: https://github.com/beoutbreakprepared/nCoV2019 Initial visualization: healthmap.org/covid-19/ Nature article: https://www.nature.com/articles/s41597-020-0448-0.pdf Lancet article: https://www.thelancet.com/journals/laninf/article/PIIS1473-3099(20)30119-5/fulltext They're having a hard time keeping up with all of the new sources, which they're largely manually checking and copying data from into a spreadsheet a few times a day. They're looking for help automating that process so that they can focus more on building models and delivering results to stakeholders in governments, health systems, and non-profits around the world. They are working right now on compiling a list of all of the websites they regularly gather data from and then I'd like to help them create Prefect Flows to run BeautifulSoup tasks a few times a day to check the sites for updates, parse the results, and add them to their data sources. @Chris White and @David Abraham have generously offered up a free Prefect Cloud account for us to use, so should be easy for us to get started. I expect there to be about 100 different sites to write parsers for, so I'd love to get some help crowdsourcing that effort. If you'd like to be involved, please respond here or email me at kyleforeman@gmail.com. I'll aim to have a kickoff meeting this Saturday so that we can figure out how best to tackle the problem.
🚀 2
👨‍⚕️ 6
👍 6
:upvote: 9
😍 5
s

Scott Zelenka

03/25/2020, 1:20 AM
Count me in
❤️ 1
d

Dylan

03/25/2020, 1:21 AM
Never written a parser but I’m in
❤️ 1
k

Kyle Foreman (Convoy)

03/25/2020, 1:23 AM
great! for anyone who is new to parsing, this is a good tutorial to get you started: https://realpython.com/beautiful-soup-web-scraper-python/
💯 1
👍 1
j

josh

03/25/2020, 1:51 AM
I spent almost all of December 2017 writing custom parsers to scrape data for a news event aggregation pipeline for the project I was on at the time. The biggest headache I ran into was when sites would change their layout, thus breaking the parser without us knowing that it happened. So I am definitely in because Prefect is built for this kind of stuff!
❤️ 2
b

bardovv

03/25/2020, 7:53 AM
can you send me link where prefect has parser for this type of stuff?
a

abtrout

03/25/2020, 4:13 PM
👋 kyle, i can help with scraping too! about.trout@gmail.com
❤️ 1
d

Dylan

03/25/2020, 4:45 PM
@bardovv Prefect doesn’t have a parser built in. But, because it’s all 😛ython:, you can import any python parser (like, in this case, Beautiful Soup https://www.crummy.com/software/BeautifulSoup/bs4/doc/) and create flows to parse, transform, store, and analyze information. I hope this helps! Feel free to reach out if you have more questions
👍 1
k

Kyle Foreman (Convoy)

03/25/2020, 10:12 PM
thanks to everyone who has expressed interest here or via email! since we have interested parties who aren't in this Slack, I'll start an email chain tomorrow to coordinate a first video call. If you haven't already, please send me your email (either here, via DM, or at kyleforeman@gmail.com) - looking forward to it!
💯 1
e

Elliot

03/25/2020, 11:47 PM
@Kyle Foreman (Convoy) How does your effort differ from https://covidtracking.com/? Looks like they are just doing US. But wondering if worth combining efforts
a

Andrew Theis

03/27/2020, 3:33 AM
Hello @Kyle Foreman (Convoy)! Happy to help where I can. :)
❤️ 1
k

Kyle Foreman (Convoy)

03/27/2020, 5:31 AM
@Elliot that project focuses primarily on US testing and includes aggregates. This project gets microdata ("line lists", eg one row per case so that you can do much more detailed epi analysis) for cases and deaths globally. Both important, but different use cases!
👍 1
just sent out an email - let me know if I missed anyone!
a

alvin goh

03/27/2020, 3:13 PM
I would be happy to help in anyway too!