Intro to Econometrics FINAL EXAM Thursday 05/14/20

T. Christensen Time Allowed: 24 hours

Question 1. (25 points in total, each part is worth 5 points)

You wish to estimate the causal e↵ect 1 of X on Y :

Yi = 0 + 1Xi + ui . (1)

You are concerned endogeneity bias might lead to inconsistency of the OLS estimate of 1. You

have a control variable Ci, which is not binary. The control variable satisfies conditional mean

independence:

E[ui|Xi, Ci] = E[ui|Ci] . (2)

However, the conditional mean of ui depends on Ci in a nonlinear fashion:

E[ui|Ci] = 0 + 1Ci + 2C2

i , (3)

where each of the coecients is non-zero.

You have data on Xi, Yi and Ci drawn i.i.d. from their joint distribution. You also know that each

of Yi and Xi has finite nonzero fourth moments and Ci has finite nonzero eighth moment.

Hint: conditioning on Ci is the same as conditioning on Ci and C2

i . This is because C2

i contains no

extra information beyond that contained in Ci. Therefore, E[ui|Ci] = E[ui|Ci, C2

i ] and similarly for

other conditional expectations.

(a) Propose an approach for consistently estimating 1 from data on Xi, Yi and Ci.

Be sure to clearly describe the model you would estimate. You should state what the dependent and explanatory variable/s are and the method you would use to estimate 1.

(b) Write the model from (a) in BLP form. In answering, clearly relate the BLP coecients to

your parameter of interest 1. Show your working to receive full credit.

(c) Show that the procedure you describe in part (a) will produce a consistent and unbiased

estimate of 1.

You do not need to provide a formal proof of consistency and unbiasedness, but you should

be able to show whether or not the relevant key assumption is satisfied.

(d) Briefly explain and distinguish the concepts of consistency and unbiasedness. In answering,

give an example of an estimator we’ve used this semester which is consistent but not unbiased.

(e) How, if at all, would your answer to (a) change if Ci was binary? Explain.

2

Intro to Econometrics FINAL EXAM Thursday 05/14/20

T. Christensen Time Allowed: 24 hours

Question 2. (15 points in total, each part is worth 5 points)

You wish to investigate whether an individual’s previous union membership status influences their

current status. You have panel data on individuals’ union membership over 4 years (t = 1, 2, 3, 4)

on the variable Mit, which takes the value 1 if individual i was a union member in year t and 0

otherwise. You model individual i’s utility from choosing to be a union member (U1) or not (U0) in

year t as a function of previous membership status Mit1, a fixed e↵ect ↵i, and random components

“it,1 and “it,0:

U1(Mit1, ↵i, “it,1) = u1(Mit1, ↵i) + “it,1 , (4)

U0(Mit1, ↵i, “it,0) = u0(Mit1, ↵i) + “it,0 . (5)

The “it,0 and “it,1 terms represent the parts of individual i’s utility from each choice in year t that

are not explained by previous membership status and the fixed e↵ect. These are drawn randomly

each year whereas the fixed e↵ect is constant over time. You assume

u1(Mit1, ↵i) u0(Mit1, ↵i) = 1Mit1 + ↵i . (6)

You also assume that, for each year t, the conditional distribution of “it,1 “it,0 given Mit1 and

↵i is a logistic distribution:

(“it,1 “it,0)|Mit1, ↵i has cdf ⇤, where ⇤(u) = 1

1 + eu . (7)

(a) Derive an expression for Pr(Mit = 1|Mit1, ↵i).

(b) Explain the role of the individual fixed e↵ects in this model. What is it that we are attempting

to control for by the inclusion of individual fixed e↵ects?

(c) Unlike panel regression models, here there is no obvious way to di↵erence out the individual

fixed-e↵ect ↵i from the expression you obtained in (a). After some algebra, you deduce

Pr(Mi2 = 1|Mi4, Mi2 + Mi3 = 1, Mi1, ↵i) = 1

1 + e1(Mi1Mi4) , (8)

Pr(Mi2 = 0|Mi4, Mi2 + Mi3 = 1, Mi1, ↵i) = e1(Mi1Mi4)

1 + e1(Mi1Mi4) . (9)

Describe how you could use these expressions to estimate 1. Be sure to clearly describe the

model you would estimate. You should state what the dependent and explanatory variable/s

are, the (subset of) data you would use, and the method you would use to estimate 1.

Hint: You might want to consider only “switchers”: these are individuals who change union

membership status between dates 2 and 3 (i.e., for whom Mi2 + Mi3 = 1).

3

Intro to Econometrics FINAL EXAM Thursday 05/14/20

T. Christensen Time Allowed: 24 hours

Question 3. (15 points in total, each part is worth 5 points)

Two economists wish to investigate whether news about COVID-19 related hospitalizations triggers

consumers to seek face masks, disinfectant, and the like, in response. Together, they assemble a

data set of COVID-19 related hospitalizations in New York and a Google Trends index of searches

for “face mask” in New York. The data are daily and span the period February 29 to May 7, 2020.

Figure 1: Google Trends index.

Time

cd$gti

0 10 20 30 40 50 60 70

20 40 60 80 100

Figure 2: Hospitalizations.

Time

cd$hosp

0 10 20 30 40 50 60 70

0 500 1000 1500

The first economist runs a regression of the Google Trends index gtit on the total number of

hospitalizations the previous day hospt (note: hospt represents the total number of hospitalizations

on day t 1, since this is only known at the end of date t 1) and obtains the following R output:

4

Intro to Econometrics FINAL EXAM Thursday 05/14/20

T. Christensen Time Allowed: 24 hours

> fm0 <- lm(gti ~ hosp, data = cd)

> summary(fm0)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 21.284007 3.460304 6.151 4.84e-08 ***

hosp 0.018346 0.004189 4.380 4.27e-05 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 18.82 on 67 degrees of freedom

Multiple R-squared: 0.2226,Adjusted R-squared: 0.211

F-statistic: 19.18 on 1 and 67 DF, p-value: 4.272e-05

> coeftest(fm0, df = Inf, vcov = vcovHAC)

z test of coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 21.2840070 6.7613407 3.1479 0.001644 **

hosp 0.0183465 0.0059299 3.0939 0.001976 **

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The second economist runs a regression of gtit = gtit gtit1 on the change in hospitalizations

hospt = hospt hospt1 and obtains:

> fmd0 <- lm(diff(gti) ~ diff(hosp), data = cd)

> summary(fmd0)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.26005 1.35401 0.192 0.8483

diff(hosp) 0.02231 0.01230 1.814 0.0742 .

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.17 on 66 degrees of freedom

Multiple R-squared: 0.04748,Adjusted R-squared: 0.03305

5

Intro to Econometrics FINAL EXAM Thursday 05/14/20

T. Christensen Time Allowed: 24 hours

F-statistic: 3.29 on 1 and 66 DF, p-value: 0.07425

> coeftest(fmd0, df = Inf, vcov = vcovHAC)

z test of coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 0.260051 1.416218 0.1836 0.8543

diff(hosp) 0.022314 0.015189 1.4690 0.1418

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(a) How does the interpretation of the slope coecient di↵er across the two economists’ models?

Which of the two interpretations seems more relevant to the economists’ research question?

(b) Which of the two sets of results provides more reliable evidence of the causal e↵ect of news

about hospitalizations on consumer behavior? In answering, be sure to state whether the

e↵ect is significant or not.

The two economists notice that the second spike in the Google Trends index around day 47 coincides

with the announcement by Governor Cuomo that face masks would be mandatory in New York.

They define a dummy variable Dt that takes the value 0 before April 15 and 1 on and after April

15.

The first economist performs a Chow test for a structural break on April 15 and obtains:

> fm1 <- lm(gti ~ hosp + D + D:hosp, data = cd)

> summary(fm1)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 10.208618 3.084119 3.310 0.00152 **

hosp 0.022826 0.003187 7.162 8.97e-10 ***

D 2.826095 6.360400 0.444 0.65828

hosp:D 0.060502 0.013699 4.417 3.88e-05 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 13.09 on 65 degrees of freedom

Multiple R-squared: 0.635,Adjusted R-squared: 0.6181

6

Intro to Econometrics FINAL EXAM Thursday 05/14/20

T. Christensen Time Allowed: 24 hours

F-statistic: 37.69 on 3 and 65 DF, p-value: 3.122e-14

> coeftest(fm1, df = Inf, vcov = vcovHAC)

z test of coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 10.2086176 1.2444053 8.2036 2.333e-16 ***

hosp 0.0228260 0.0042302 5.3960 6.815e-08 ***

D 2.8260955 5.9497960 0.4750 0.6348

hosp:D 0.0605016 0.0139306 4.3431 1.405e-05 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> waldtest(fm0, fm1, test = “Chisq”, vcov = vcovHAC)

Wald test

Model 1: gti ~ hosp

Model 2: gti ~ hosp + D + D:hosp

Res.Df Df Chisq Pr(>Chisq)

1 67

2 65 2 111.88 < 2.2e-16 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(c) Explain what components of this output provide evidence of a structural break in the relation

between search behavior for face masks and hospitalizations at April 15. In answering, be sure

to state the null hypothesis you are testing and whether or not you reject the null hypothesis.

7

Intro to Econometrics FINAL EXAM Thursday 05/14/20

T. Christensen Time Allowed: 24 hours

Question 4. (15 points in total, each part is worth 5 points)

You have been tasked with the following consulting project by a firm. The firm would like individuals

to be remunerated for how they perform. However, the firm is worried that there may be unconscious

bias, through which workers who currently earn high wages may be more likely to earn high wages

in future, irrespective of their performance on the job.

The firm decides to run an experiment to investigate this issue. For an incoming cohort of graduates,

the firm randomly assigns a wage Wi1 to each individual i for their first year. Each individual’s wages

are then recorded for the subsequent two years. It is hypothesized that wages for the subsequent

two years (t = 2, 3) evolve according to the model

Wit = 1Wit1 + ↵i + uit , (10)

where 1 < 1 is an unknown parameter to be estimated, ↵i is an individual fixed e↵ect, and uit is

drawn independently each year.

The firm has given you a balanced panel of wages for years t = 1, 2, 3 for a large cohort of individuals.

(a) Explain whether or not you can estimate the model (10) by a panel regression of Wit on Wit1

using either of our two approaches for panel regression.

Hint: check if any of our assumptions for the fixed e↵ects model are violated. If your answer

is negative, you should provide some reasoning for why the relevant assumption fails.

(b) You notice that

Wi3 = 1Wi2 + ui3 , (11)

where Wi3 = Wi3 Wi2, Wi2 = Wi2 Wi1, and ui3 = ui3 ui2.

Calculate Cov(Wi2, Wi1) and Cov(ui3, Wi1).

(c) Using your answer to (b), propose an estimator of 1 and show it is consistent.

8