3 9/30 | Lab II: Dummy Variables & Career Data
## Packages
library(knitr)
library(car)
library(AER)
library(stargazer)
## Data
load("~/GOVT702/Data/Ch6_Lab_CareerHappiness.RData")
We will use General Social Survey data for this lab. The key variables we will use are
- happy: 3 very happy, 2 pretty happy, 1 not too happy
- married: 1 for currently married (based on dta$marital, which is 1 for married, 2 for widowed, 3 for divorced, 4 for separated, 5 for never married)
- sex: 1 for men, 2 for women
- edcat: 1 for less than high school, 2 for high school, 3 for some college and 4 for college graduate
- race: 1 for white, 2 for black, 3 for other races (self-identified, first race mentioned)
3.1 Use OLS to estimate the difference in happiness (the “happy” variable) for married versus unmarried people. Then use the t test function to assess the difference in happiness by gender (see the Computing Corner in Chapter 6 of the book). Discuss similarities and differences in OLS model and t test function results
## Difference in Happiness with OLS
<- lm(happy~married, data=dta)
model_1 summary(model_1)
##
## Call:
## lm(formula = happy ~ married, data = dta)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.3227 -0.3227 -0.0249 0.6773 0.9751
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.024903 0.006033 335.64 <2e-16 ***
## married 0.297810 0.007621 39.08 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5984 on 26346 degrees of freedom
## (2442 observations deleted due to missingness)
## Multiple R-squared: 0.05478, Adjusted R-squared: 0.05475
## F-statistic: 1527 on 1 and 26346 DF, p-value: < 2.2e-16
confint(model_1)
## 2.5 % 97.5 %
## (Intercept) 2.013079 2.0367282
## married 0.282872 0.3127481
## With t.test()
t.test(dta$happy[dta$married==1], dta$happy[dta$married==0])
##
## Welch Two Sample t-test
##
## data: dta$happy[dta$married == 1] and dta$happy[dta$married == 0]
## t = 38.977, df = 20513, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.2828336 0.3127865
## sample estimates:
## mean of x mean of y
## 2.322714 2.024903
3.2 Use OLS with robust standard errors to estimate the difference in happiness for married and unmarried people. Then use the t test function with unequal variances to assess the difference in happiness by marital status. Discuss similarities and differences in OLS model and t test function results.
## model_1 with Robust Standard Errors
coeftest(model_1, vcov. = vcovHC(model_1, type = "HC1"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.0249034 0.0060709 333.541 < 2.2e-16 ***
## married 0.2978101 0.0076407 38.977 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# t-test, without assuming equal variance
t.test(dta$happy[dta$married==1], dta$happy[dta$married==0], var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: dta$happy[dta$married == 1] and dta$happy[dta$married == 0]
## t = 38.977, df = 20513, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.2828336 0.3127865
## sample estimates:
## mean of x mean of y
## 2.322714 2.024903
3.3 Create an interaction between married and age. Estimate a model that explains happiness as a function of age and marital status, allowing for the age effect to differ according to marital status. (For simplicity, we use only married and unmarried for marital status.) What is the effect of age for unmarried people? For married people?
## Interaction Term
<- dta$married*dta$age
dum_interact
## Model 2
<- lm(happy~age + married + dum_interact, data=dta)
model_2 summary(model_2)
##
## Call:
## lm(formula = happy ~ age + married + dum_interact, data = dta)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.32925 -0.32408 -0.04825 0.67451 1.02224
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.1300310 0.0270265 78.813 < 2e-16 ***
## age -0.0028199 0.0007067 -3.990 6.61e-05 ***
## married 0.2109524 0.0351041 6.009 1.89e-09 ***
## dum_interact 0.0023504 0.0009036 2.601 0.0093 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5982 on 26344 degrees of freedom
## (2442 observations deleted due to missingness)
## Multiple R-squared: 0.05538, Adjusted R-squared: 0.05527
## F-statistic: 514.8 on 3 and 26344 DF, p-value: < 2.2e-16
## or
<- lm(happy~married*age, data=dta)
model_2b summary(model_2b)
##
## Call:
## lm(formula = happy ~ married * age, data = dta)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.32925 -0.32408 -0.04825 0.67451 1.02224
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.1300310 0.0270265 78.813 < 2e-16 ***
## married 0.2109524 0.0351041 6.009 1.89e-09 ***
## age -0.0028199 0.0007067 -3.990 6.61e-05 ***
## married:age 0.0023504 0.0009036 2.601 0.0093 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5982 on 26344 degrees of freedom
## (2442 observations deleted due to missingness)
## Multiple R-squared: 0.05538, Adjusted R-squared: 0.05527
## F-statistic: 514.8 on 3 and 26344 DF, p-value: < 2.2e-16
The effect of age is \(-0.00281\) for unmarried people and \(-0.00281 + 0.002350 = -0.0004699\) for married people. The intercept is \(2.13\) for unmarried people and \(2.13 + 0.2109 = 2.3409\) for married people.
3.4 Estimate separate models explaining happiness in terms of age for married and unmarried people. Comment on similarities and differences compared to results immediately above for (i) the estimated effect of age and (ii) the intercept.
## Model 3a
<- lm(happy~age, data=subset(dta, married==1))
model_3a summary(model_3a)
##
## Call:
## lm(formula = happy ~ age, data = subset(dta, married == 1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.3293 -0.3250 -0.3189 0.6754 0.6844
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.3409834 0.0223240 104.864 <2e-16 ***
## age -0.0004695 0.0005612 -0.837 0.403
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5961 on 16508 degrees of freedom
## (1385 observations deleted due to missingness)
## Multiple R-squared: 4.24e-05, Adjusted R-squared: -1.817e-05
## F-statistic: 0.7 on 1 and 16508 DF, p-value: 0.4028
## and
<- lm(happy~age, data=subset(dta, married==0))
model_3b summary(model_3b)
##
## Call:
## lm(formula = happy ~ age, data = subset(dta, married == 0))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.05953 -0.05389 -0.02570 0.01096 1.02224
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.1300310 0.0271844 78.355 < 2e-16 ***
## age -0.0028199 0.0007108 -3.967 7.32e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6017 on 9836 degrees of freedom
## (1053 observations deleted due to missingness)
## Multiple R-squared: 0.001598, Adjusted R-squared: 0.001496
## F-statistic: 15.74 on 1 and 9836 DF, p-value: 7.322e-05
The results here are the same as above: for unmarried people, the effect of age is \(-0.00281\) and the intercept is \(2.130\).
The effect of age for married people also the same: \(-0.000469\), with an intercept of \(2.3409\).
3.5 Marianne Bertrand wrote an article called ``Work on Women’s Work is Never Done: Career, Family, and the Well-Being of College-Educated Women’’ published in the American Economic Review: Papers & Proceedings 2013, 103(3): 244–250. She analyzed the effect of careers and family on college-educated women. She defines a career variable that is 1 if someone’s earnings are above the twenty-fifth percentile in the relevant year and age group. Estimate a model in which happiness is a function of career, being married and an interaction of career and married (“careermarried”). To match Betrand’s analysis, limit the data to only to college educated (dta$educat==4) women (dta$sex==2). Interpret the estimated average happiness for the four types of women implied by this analysis.
<- lm(happy ~ career*married, data=dta[dta$sex==2 & dta$educat==4,])
model_4 summary(model_4)
##
## Call:
## lm(formula = happy ~ career * married, data = dta[dta$sex ==
## 2 & dta$educat == 4, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4215 -0.4214 -0.1247 0.5786 0.8753
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.12470 0.01997 106.406 < 2e-16 ***
## career 0.09204 0.02985 3.084 0.00206 **
## married 0.29676 0.02503 11.855 < 2e-16 ***
## career:married -0.09923 0.04023 -2.466 0.01369 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5739 on 3595 degrees of freedom
## (400 observations deleted due to missingness)
## Multiple R-squared: 0.04765, Adjusted R-squared: 0.04685
## F-statistic: 59.96 on 3 and 3595 DF, p-value: < 2.2e-16
The four groups are unmarried women without careers, married women with careers, married women without careers, and unmarried women without careers.
College-educated women that have no careers and are not married have an average happiness of \(2.12\).
College-educated women with careers but not married have an average happiness of \(2.12 + 0.09 = 2.21\).
College-educated women with no career but are married have average happiness of \(2.12 + 0.29 = 2.41\).
College-educated women that are both married and have careers have average happiness of \(2.12 + 0.092 + 0.29 -0.099 = 2.4\).
3.7 Estimate a model similar to the above, but for a different group of your choice (e.g., limit by race or gender or education) and feel free to include different covariates.
No wrong answers here (mostly)
This model explores the how commute times affect happiness. I specifically think that time commuting will - conditionally - lower a person’s happiness if it means they are away from their families. Because of this, I create an interaction term between commuting and marriage and estimate the effects.
## Commuting Model on Adults
<- lm(happy ~ commute*married, data=dta[dta$age>18,])
model_6 summary(model_6)
##
## Call:
## lm(formula = happy ~ commute * married, data = dta[dta$age >
## 18, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.30339 -0.29480 -0.05136 0.69947 0.98484
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.0513587 0.0553465 37.064 < 2e-16 ***
## commute -0.0003732 0.0020973 -0.178 0.858816
## married 0.2520331 0.0681019 3.701 0.000231 ***
## commute:married -0.0001996 0.0026100 -0.076 0.939047
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5647 on 726 degrees of freedom
## (28060 observations deleted due to missingness)
## Multiple R-squared: 0.04227, Adjusted R-squared: 0.03831
## F-statistic: 10.68 on 3 and 726 DF, p-value: 7.076e-07
My theory is not supported by this model. I found no significant effect for those who commute longer and no significant conditional effect of happiness for those who commute long hours and are married. I did find support that marriage in general makes a person happier, though this should not be considered a causal effect.