Big Data Case Study

Select a dataset that you are interested in and can relate to on a personal level.
Dataset Link:
https://archive.ics.uci.edu/ml/datasets/Student+Performance
a. (5%) Introduction
b. (5%) Define and collecting data
i. Identify variables, data types, etc. Provide in-depth info on the data elements.
ii. Identify the size of the dataset, any missing values, any outliers, etc.
c. (10%) Organize and visualization of variables
i. Apply the specific organization and visualization technique to provide effective understanding about the
data
ii. Interpret your findings succinctly in your report.
d. (20%) Build a research question with a full explanation. Identify a method to answer the question and
explain why you picked the method(s).
e. (20%) Apply at least 1 of the following methods to your dataset which should be linked to your research
questions. Explain results and why you selected the method(s):

  1. Confidence intervals
  2. Sampling distributions
  3. Correlation analysis
  4. Regression analysis
  5. ANOVA tests
  6. Hypothesis testing
    f. (30%) Implement 1 ML algorithm with your dataset. Explain why you chose this specific algorithm. The
    rationale and approach is the most important when you answer this part of the assignment question.
    Implement the algorithm using a toolse