Descriptive statistics
Note 2: You might need to make more than one appointment with me or the TAs
to help you with some coding questions. Please make those appointments in advance
Estimating your first model. The purpose of this homework is to write your first draft of sections 2-4
(Max. number of pages : 8 pages of text (excluding graphs, tables and the R code ).
- The empirical model and hypothesis.
(a) Describe your question research in terms of the relationship(s) that you want to evaluate. (4 pts)
(b) Write down the linear regression model that you plan to estimate. Include a clear definition of
the dependent variable as well as the definition of the regressors. Describe the expected sign of
the parameters β. Define also your population of interest. Be careful with the notation (7 points) - Data issues.
(a) Describe the type of dataset that you are using and data sources. (5 points)
(b) Create a table to show the following descriptive statistics of the variables that you will use in
your LRM: Minimum value, maximum, sample mean, sample median, sample standard deviation,
and number of observations. Please follow this format (or a similar format) and your
common sense to determine the number of decimal places. (6 points):
Table 1. Descriptive Statistics
Variable Variable Variable Variable Variable
name name name name name
Minimum
Maximum
Mean
Median
Standard Deviation
Number of observations
There are many functions in R that calculate descriptive statistics. Check the following link to
learn how to use some of them.
If you prefer, you can use the R function stargazer() to create a table with descriptive statistics
and export it into a Word document. I suggest to do some edition later to the table exported
to Word, so your table look professional. The function requires to install the package stargazer.
You might need an statement like this1
1Check all the arguments of this function HERE
1
stargazer(mydata, type = ”html”, out = myf ile.doc
, median = T RUE)
where mydata is the dataframe that contains ONLY the variables that you want to describe in
your table. myf ile.doc is the name of the Word document where the table will be exported.
(c) Describe the variables in the table (e.g. how the variables are defined or constructed, unit of
measure, etc). Include any problem or relevant issue you observed in the data (e.g. uneven
number of observations, potential outliers, etc.). Please be clear and precise. (6 points)
(d) Based on your dependent variable and one (or two) of the regressors of interest, describe the
relationship between this (these) regressor(s) and the dependent variable by selecting among
these options : (8 pts):
i. If your regressors of interest and the dependent variable are quantitative, do a scatter plot.
Place the dependent variable in the vertical axis and that regressor on the horizontal axis.
Check this link to learn R functions that creates scatter plot.2
ii. If your dependent variable is quantitative and one of your regressor of interest is a dummy
variable, do a box plot. Place the dependent variable in the vertical axis. You can find an
example of boxplot here.
iii. If your dependent variable is a dummy variable and one of your regressor of interest is also
a dummy variable prepare row percentage table. Place the regressor-dummy variable as the
row variable. You can find an example here.
Describe the relevant findings that you observe in the graph or table.Based on the visual
inspection of these graphs or tables, do they suggest a relationship between your
variable of interest? Explain.