Graph Construction

• Construct a graph or a table or a sketch, you must
include it in your answer script. Your graphs should include a main title
and axis titles.
• You will need to refer to your lecture notes to complete the assignment.
Question 4 refers to Week 9 lecture notes.
• Where you are asked to perform a hypothesis test, show all six steps of
the procedure.
• When you submit your assignment it must be just one PDF document.
• You must submit your assignment online via Blackboard (all instructions
under Assignments tab). Please ensure your assignment is reasonably
well spaced so that it is easier to mark.
• Late assignments will incur a 5% penalty if received after the due date
but before midnight Wednesday 7 October 2020. Assignments can not
be graded after this time.
• In case of illness, bereavement or significant technical problems, late
assignments may be accepted without incurring a penalty, but you must
email us on [email protected] to arrange this.
• You may work together on assignments, but you must write alone.
Plagiarism is not acceptable.
——————————————————————————————————————–
STAT 193 2020 Tri 2 1 Project Assignment
INFORMATION ABOUT THE DATASET
The project assignment uses a computer generated dataset that is unique to each student.
The dataset represents information about a sample of 200 New Zealanders aged 15-45 and
is based on the New Zealand Income Survey conducted in 2007.
To generate your unique dataset, click on the Get project dataset link. A file named
project data username.csv (replace username with your own username) will be created and
you’ll be prompted to save it or open it. Save the file in a folder of your choice.
If you lose your dataset you can generate it again using the steps listed above. You will
get the exact same dataset each time you generate it.
The dataset contains the following variables:
Variable name Description
Gender Gender of subject
Age Age of individual in complete years
Ethnicity Ethnic group that individual belongs to
Marital Marital status of individual
Qualification Highest educational qualification obtained
PostSchool Whether or not highest qualification is post school
Hours Average weekly hours of work from all
wage and salary jobs (rounded to nearest whole number)
Income Average weekly income from all sources,
excluding investment income
(rounded to nearest whole number)
Use your sample data of 200 individuals to answer the following questions.

  1. The dataset [11 marks] For each variable in the dataset, classify whether it is
    categorical or numerical.
    If categorical:
    • List the possible values (levels) the variable takes. You can determine these
    from a pivot table of each categorical variable separately (ignoring any values
    listed as (blank)).
    • State whether the variable is nominal or ordinal.
    If numerical:
    • State the minimum and maximum values of the variable.
    • State whether the variable is discrete or continuous.
    STAT 193 2020 Tri 2 2 Project Assignment
  2. Weekly Income [38 marks]
    (a) [6 marks] Create a histogram of weekly income in Excel. Comment on the
    main features of the distribution, including the skewness of the distribution.
    (b) [5 marks] Using Excel, calculate and give both a point estimate and an interval
    estimate (using the function CONFIDENCE.T) of the mean weekly income of
    the population of New Zealanders aged 15-45. Use a 99% level of confidence.
    State your value of s, the sample standard deviation for your dataset.
    (c) [5 marks] Write down the general distribution of X¯, the sample mean, using
    the notation X¯ ∼, defining any symbols you use. Explain why the distribution
    you have chosen applies in this case. You may wish to use the Week 7 lecture
    notes to help you.
    (d) [12 marks] According to the website https://tradingeconomics.com/australia/
    wages, average weekly income in Australia in 2007 was equivalent to $986 New
    Zealand dollars. Using your sample data and the distribution of the sample
    mean you identified in 2c, perform a hypothesis test at the 5% significance
    level to determine whether there is evidence that the mean weekly income of
    the New Zealand population aged 15-45 differed from that of the Australian
    population. Show the formula you use for calculating the test statistic
    and use Excel to find the p−value. Show all working, including Excel
    functions used.
    Remember that a test statistic that is far above or below zero will naturally lead
    to a very small p-value.
    (e) [4 marks] In 2007, the mean income for New Zealanders aged 15-45 years was
    given by µNZ = $667.00. Find P(X > ¯ 710) for any sample of 200 individuals
    from this population, using your value of s. Give the Excel code you used to
    obtain your answer.
    (f) [6 marks] Suppose a randomly selected New Zealander was found to have an
    average weekly income of $850.00 and a randomly selected Australian had an
    average weekly income equivalent to NZ$950.00. Further suppose that weekly
    income is approximately normally distributed in both NZ and Australia, with
    population parameters given below. Which of the two individuals would then
    have a higher relative standing in their respective population? (Relative
    standings can be found by standardising scores.) Show your working and explain your answer briefly.
    New Zealand: µNZ = $667.00, σNZ = $237.50
    Australia (amounts in NZ dollars): µAus = $986.00, σAus = $245.70
    STAT 193 2020 Tri 2 3 Project Assignment
  3. Marital status and ethnicity [23 marks]
    (a) [3 marks] Does the graph below suggest that there is an association between
    Marital and Ethnicity? Explain your answer in one or two sentences.(You do
    not have to reproduce the graph in your answer script.)
    (b) [2 marks] Write down the null and alternative hypotheses you would use to
    test whether there is an association between Marital and Ethnicity.
    (c) [5 marks] Import your dataset into iNZight. Choose Ethnicity as Variable 1
    and Marital as Variable 2. Use Get Summary to obtain a table of observed
    counts. Include the table in your assignment. Based on this table, calculate
    the number of M¯aori who are married that we would expect in the sample if
    we assume there is no association between marital status and ethnicity. Show
    your working.
    (d) [4 marks] Use the output from Get Summary to answer the following:
    i. In the survey, what percentage of people who have never married are
    M¯aori?
    ii. What percentage of the M¯aori people surveyed have never married?
    (HINT: You may need to change the order of Variable 1 and Variable 2 when
    selecting variables)
    STAT 193 2020 Tri 2 4 Project Assignment
    (e) [7 marks] Using output from Get Inference, give the test statistic value, degrees
    of freedom and p−value for testing the hypotheses in part (b). Draw a sketch
    of the distribution of the test statistic, showing the approximate position of
    your test statistic value and shading the area corresponding to the p−value.
    Give your conclusion, related to the variables tested, using α = 0.05.
    (f) [2 marks] Did your conclusion in part (e) surprise you, given your answer to
    part (a)? Explain briefly.
  4. Highest Qualification and Income [28 marks] In iNZight, choose Income as
    Variable 1 and Qualification as Variable 2.
    (a) [2 marks] Do the side-by-side boxplots of Income by Qualification suggest that
    mean weekly income differs among qualification levels? Briefly justify your
    answer.
    (b) [2 marks] Write down the null and alternative hypotheses you would use to
    test whether mean weekly income differs among qualification levels.
    (c) [6 marks] Perform an ANOVA test at the 5% level of significance for the
    hypotheses in part (b). Give the test statistic, degrees of freedom, p-value and
    conclusion.
    (d) [2 marks] Show how the degrees of freedom you gave in part (c) were calculated.
    (e) [5 marks] For which pairs of qualification levels is there sufficient evidence
    of a difference in mean income at the 5% level of significance? Explain your
    answer briefly.
    (f) [3 marks] State the assumptions that must be met for the ANOVA test to be
    valid.
    (g) [6 marks] Create a residual plot and use it to comment on whether the assumptions in part (f) have been met. Include your residual plot in your answer
    script and justify your comments.
    (h) [2 marks] Explain in one or two sentences how residuals for an ANOVA test
    are calculated.