You have become familiar with two repositories of COVID-19 epidemic data in the USA1 https://github.com/nytimes/covid-19-data and in Italy2 https://github.com/pcm-dpc/COVID-19, and have been invited to explore others sources3 https://data.ca.gov/group/covid-19. Indeed, there are many other publically available datasets you could explore. Here are some more examples of data collections:
https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html
https://github.com/Yu-Group/covid19-severity-prediction/blob/master/data/readme.md
The health department of your state or county of interest might also provide access to some of the data they are collecting and using for policy setting. And you know that reference information is helpful and easily available.
Your task for this week is to formulate a causal hypothesis related to the COVID pandemic and to test it on the data.
Two clarifications might help. First, the goal here is not to carry out a formal statistical test – a graphical display or numerical summary of relevant data will be perfectly adequate. The goal is to practice making causal statements and “elaborate theories”, thinking about what distinct patterns they predict on the data. This is an exercise in interrogating data precisely and relentlessly. So, make sure you ar not limiting your self to the first obvious association that comes to mind, but use your subtlety to look for patterns that cannot be easily explained in many different ways.
Second, don’t let the title of this note put you on the wrong path. You are not asked to identify the root causes of how come that we find ourselves living in isolation for so long, over 600,000 people worldwide have died from COVID-19, etc etc. Concentrate on manageable statements. For example, think of the role of hospital capacity, sheltering in place measures, testing, age of cases etc…
On wednesday morning, you should be ready to share with the group a hand-out that documents: