3 9/30 | Lab II: Dummy Variables & Career Data

## Packages
library(knitr)
library(car)
library(AER)
library(stargazer)

## Data
load("~/GOVT702/Data/Ch6_Lab_CareerHappiness.RData")

We will use General Social Survey data for this lab. The key variables we will use are

happy: 3 very happy, 2 pretty happy, 1 not too happy
married: 1 for currently married (based on dta$marital, which is 1 for married, 2 for widowed, 3 for divorced, 4 for separated, 5 for never married)
sex: 1 for men, 2 for women
edcat: 1 for less than high school, 2 for high school, 3 for some college and 4 for college graduate
race: 1 for white, 2 for black, 3 for other races (self-identified, first race mentioned)

3.1 Use OLS to estimate the difference in happiness (the “happy” variable) for married versus unmarried people. Then use the t test function to assess the difference in happiness by gender (see the Computing Corner in Chapter 6 of the book). Discuss similarities and differences in OLS model and t test function results

## Difference in Happiness with OLS
model_1 <- lm(happy~married, data=dta)
summary(model_1)

## 
## Call:
## lm(formula = happy ~ married, data = dta)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.3227 -0.3227 -0.0249  0.6773  0.9751 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.024903   0.006033  335.64   <2e-16 ***
## married     0.297810   0.007621   39.08   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5984 on 26346 degrees of freedom
##   (2442 observations deleted due to missingness)
## Multiple R-squared:  0.05478,    Adjusted R-squared:  0.05475 
## F-statistic:  1527 on 1 and 26346 DF,  p-value: < 2.2e-16

confint(model_1)

##                2.5 %    97.5 %
## (Intercept) 2.013079 2.0367282
## married     0.282872 0.3127481

## With t.test()
t.test(dta$happy[dta$married==1], dta$happy[dta$married==0])

## 
##  Welch Two Sample t-test
## 
## data:  dta$happy[dta$married == 1] and dta$happy[dta$married == 0]
## t = 38.977, df = 20513, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2828336 0.3127865
## sample estimates:
## mean of x mean of y 
##  2.322714  2.024903

3.2 Use OLS with robust standard errors to estimate the difference in happiness for married and unmarried people. Then use the t test function with unequal variances to assess the difference in happiness by marital status. Discuss similarities and differences in OLS model and t test function results.

## model_1 with Robust Standard Errors
coeftest(model_1, vcov. = vcovHC(model_1, type = "HC1"))

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 2.0249034  0.0060709 333.541 < 2.2e-16 ***
## married     0.2978101  0.0076407  38.977 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# t-test, without assuming equal variance
t.test(dta$happy[dta$married==1], dta$happy[dta$married==0], var.equal = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  dta$happy[dta$married == 1] and dta$happy[dta$married == 0]
## t = 38.977, df = 20513, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2828336 0.3127865
## sample estimates:
## mean of x mean of y 
##  2.322714  2.024903

3.3 Create an interaction between married and age. Estimate a model that explains happiness as a function of age and marital status, allowing for the age effect to differ according to marital status. (For simplicity, we use only married and unmarried for marital status.) What is the effect of age for unmarried people? For married people?

## Interaction Term
dum_interact <- dta$married*dta$age

## Model 2
model_2 <- lm(happy~age + married + dum_interact, data=dta)
summary(model_2)

## 
## Call:
## lm(formula = happy ~ age + married + dum_interact, data = dta)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.32925 -0.32408 -0.04825  0.67451  1.02224 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.1300310  0.0270265  78.813  < 2e-16 ***
## age          -0.0028199  0.0007067  -3.990 6.61e-05 ***
## married       0.2109524  0.0351041   6.009 1.89e-09 ***
## dum_interact  0.0023504  0.0009036   2.601   0.0093 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5982 on 26344 degrees of freedom
##   (2442 observations deleted due to missingness)
## Multiple R-squared:  0.05538,    Adjusted R-squared:  0.05527 
## F-statistic: 514.8 on 3 and 26344 DF,  p-value: < 2.2e-16

## or
model_2b <- lm(happy~married*age, data=dta)
summary(model_2b)

## 
## Call:
## lm(formula = happy ~ married * age, data = dta)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.32925 -0.32408 -0.04825  0.67451  1.02224 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.1300310  0.0270265  78.813  < 2e-16 ***
## married      0.2109524  0.0351041   6.009 1.89e-09 ***
## age         -0.0028199  0.0007067  -3.990 6.61e-05 ***
## married:age  0.0023504  0.0009036   2.601   0.0093 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5982 on 26344 degrees of freedom
##   (2442 observations deleted due to missingness)
## Multiple R-squared:  0.05538,    Adjusted R-squared:  0.05527 
## F-statistic: 514.8 on 3 and 26344 DF,  p-value: < 2.2e-16

The effect of age is $-0.00281$ for unmarried people and $-0.00281 + 0.002350 = -0.0004699$ for married people. The intercept is $2.13$ for unmarried people and $2.13 + 0.2109 = 2.3409$ for married people.

3.4 Estimate separate models explaining happiness in terms of age for married and unmarried people. Comment on similarities and differences compared to results immediately above for (i) the estimated effect of age and (ii) the intercept.

## Model 3a
model_3a <- lm(happy~age, data=subset(dta, married==1))
summary(model_3a)

## 
## Call:
## lm(formula = happy ~ age, data = subset(dta, married == 1))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.3293 -0.3250 -0.3189  0.6754  0.6844 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.3409834  0.0223240 104.864   <2e-16 ***
## age         -0.0004695  0.0005612  -0.837    0.403    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5961 on 16508 degrees of freedom
##   (1385 observations deleted due to missingness)
## Multiple R-squared:  4.24e-05,   Adjusted R-squared:  -1.817e-05 
## F-statistic:   0.7 on 1 and 16508 DF,  p-value: 0.4028

## and
model_3b <- lm(happy~age, data=subset(dta, married==0))
summary(model_3b)

## 
## Call:
## lm(formula = happy ~ age, data = subset(dta, married == 0))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.05953 -0.05389 -0.02570  0.01096  1.02224 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.1300310  0.0271844  78.355  < 2e-16 ***
## age         -0.0028199  0.0007108  -3.967 7.32e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6017 on 9836 degrees of freedom
##   (1053 observations deleted due to missingness)
## Multiple R-squared:  0.001598,   Adjusted R-squared:  0.001496 
## F-statistic: 15.74 on 1 and 9836 DF,  p-value: 7.322e-05

The results here are the same as above: for unmarried people, the effect of age is $-0.00281$ and the intercept is $2.130$.

The effect of age for married people also the same: $-0.000469$, with an intercept of $2.3409$.

3.5 Marianne Bertrand wrote an article called ``Work on Women’s Work is Never Done: Career, Family, and the Well-Being of College-Educated Women’’ published in the American Economic Review: Papers & Proceedings 2013, 103(3): 244–250. She analyzed the effect of careers and family on college-educated women. She defines a career variable that is 1 if someone’s earnings are above the twenty-fifth percentile in the relevant year and age group. Estimate a model in which happiness is a function of career, being married and an interaction of career and married (“careermarried”). To match Betrand’s analysis, limit the data to only to college educated (dta$educat==4) women (dta$sex==2). Interpret the estimated average happiness for the four types of women implied by this analysis.

model_4 <- lm(happy ~ career*married, data=dta[dta$sex==2 & dta$educat==4,])
summary(model_4)

## 
## Call:
## lm(formula = happy ~ career * married, data = dta[dta$sex == 
##     2 & dta$educat == 4, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4215 -0.4214 -0.1247  0.5786  0.8753 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     2.12470    0.01997 106.406  < 2e-16 ***
## career          0.09204    0.02985   3.084  0.00206 ** 
## married         0.29676    0.02503  11.855  < 2e-16 ***
## career:married -0.09923    0.04023  -2.466  0.01369 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5739 on 3595 degrees of freedom
##   (400 observations deleted due to missingness)
## Multiple R-squared:  0.04765,    Adjusted R-squared:  0.04685 
## F-statistic: 59.96 on 3 and 3595 DF,  p-value: < 2.2e-16

The four groups are unmarried women without careers, married women with careers, married women without careers, and unmarried women without careers.

College-educated women that have no careers and are not married have an average happiness of $2.12$.

College-educated women with careers but not married have an average happiness of $2.12 + 0.09 = 2.21$.

College-educated women with no career but are married have average happiness of $2.12 + 0.29 = 2.41$.

College-educated women that are both married and have careers have average happiness of $2.12 + 0.092 + 0.29 -0.099 = 2.4$.

3.6 The GSS provides a race variable (dta$race). The variable equals 1 for white respondents, 2 for black respondents and 3 for everyone else. Add race to the above model and interpret the coefficients related to race. Estimate another model with a different reference category and explain coefficients across the two models.

## First Race Model with White as Reference
model_5 <- lm(happy ~ career*married + factor(race),
data=dta[dta$sex==2 & dta$educat==4,])
summary(model_5)

## 
## Call:
## lm(formula = happy ~ career * married + factor(race), data = dta[dta$sex == 
##     2 & dta$educat == 4, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4359 -0.4329 -0.1575  0.5641  1.0085 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     2.15746    0.02077 103.867  < 2e-16 ***
## career          0.08830    0.02974   2.969  0.00301 ** 
## married         0.27847    0.02518  11.060  < 2e-16 ***
## factor(race)2  -0.16593    0.03267  -5.079 3.98e-07 ***
## factor(race)3  -0.10051    0.04036  -2.491  0.01280 *  
## career:married -0.09132    0.04010  -2.277  0.02283 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5717 on 3593 degrees of freedom
##   (400 observations deleted due to missingness)
## Multiple R-squared:  0.05555,    Adjusted R-squared:  0.05424 
## F-statistic: 42.27 on 5 and 3593 DF,  p-value: < 2.2e-16

## Second Race Model with Other as the Reference Category
model_5b <- lm(happy ~ career*married + relevel(factor(race), ref = "3"),
data=dta[dta$sex==2 & dta$educat==4,])
summary(model_5b)

## 
## Call:
## lm(formula = happy ~ career * married + relevel(factor(race), 
##     ref = "3"), data = dta[dta$sex == 2 & dta$educat == 4, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4359 -0.4329 -0.1575  0.5641  1.0085 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        2.05695    0.04261  48.273  < 2e-16 ***
## career                             0.08830    0.02974   2.969  0.00301 ** 
## married                            0.27847    0.02518  11.060  < 2e-16 ***
## relevel(factor(race), ref = "3")1  0.10051    0.04036   2.491  0.01280 *  
## relevel(factor(race), ref = "3")2 -0.06542    0.04972  -1.316  0.18835    
## career:married                    -0.09132    0.04010  -2.277  0.02283 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5717 on 3593 degrees of freedom
##   (400 observations deleted due to missingness)
## Multiple R-squared:  0.05555,    Adjusted R-squared:  0.05424 
## F-statistic: 42.27 on 5 and 3593 DF,  p-value: < 2.2e-16

Both results indicated white college-educated women are both $0.16$ higher on the happiness variable than African-American college-educated women and $0.10$ higher than other college-educated women of other races. Even though the coefficient on the other race variables change across specifications, our interpretation of the results does not change; the change is due solely to the change in reference category.

3.7 Estimate a model similar to the above, but for a different group of your choice (e.g., limit by race or gender or education) and feel free to include different covariates.

No wrong answers here (mostly)

This model explores the how commute times affect happiness. I specifically think that time commuting will - conditionally - lower a person’s happiness if it means they are away from their families. Because of this, I create an interaction term between commuting and marriage and estimate the effects.

## Commuting Model on Adults
model_6 <- lm(happy ~ commute*married, data=dta[dta$age>18,])
summary(model_6)

## 
## Call:
## lm(formula = happy ~ commute * married, data = dta[dta$age > 
##     18, ])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.30339 -0.29480 -0.05136  0.69947  0.98484 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      2.0513587  0.0553465  37.064  < 2e-16 ***
## commute         -0.0003732  0.0020973  -0.178 0.858816    
## married          0.2520331  0.0681019   3.701 0.000231 ***
## commute:married -0.0001996  0.0026100  -0.076 0.939047    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5647 on 726 degrees of freedom
##   (28060 observations deleted due to missingness)
## Multiple R-squared:  0.04227,    Adjusted R-squared:  0.03831 
## F-statistic: 10.68 on 3 and 726 DF,  p-value: 7.076e-07

My theory is not supported by this model. I found no significant effect for those who commute longer and no significant conditional effect of happiness for those who commute long hours and are married. I did find support that marriage in general makes a person happier, though this should not be considered a causal effect.