- Write out the steps you'd use to clean/wrangle the data for your database. This can be a simple step by step process, pseudocode, SQL, Python, Perl, or another language. These steps can be used to clean the data, remove unnecessary information, or normalize. If it's just a website, write down what data you'd scrape from it.
- Find one other dataset public or private that you could use in conjunction with this dataset.
- What inferences could be made by using these two datasets together?
Datasets:
a. 2017-2014 Candy Hierarchy Data - Data from a survey across 4 years showing people's preference in Halloween candy. https://www.scq.ubc.ca/so-much-candy-data-seriousl… (Links to an external site.)
b. FDA's National Drug Code Directory - https://www.fda.gov/drugs/drug-approvals-and-datab… (Links to an external site.)
c. The Avengers Death Database - https://github.com/fivethirtyeight/data/tree/maste… (Links to an external site.)
d. Bachelor/Bachelorette Dataset - https://github.com/fivethirtyeight/data/blob/maste… (Links to an external site.)
e. Daily Show Guests - https://github.com/fivethirtyeight/data/blob/maste… (Links to an external site.)