8.1 Does a divorce (famstat==4) causally affect mental health (mcs)? Construct binary variables for the categories of famstat (tab famstat, gen(fam)) and a binary indicator of low mental health (gen mcs01=mcs<40 if mcs !=.) Estimate the effect of a divorce (fam4) on mental health (mcs01) in two different model specifications: Pooled Logit (logit, cluster()) and FE-Logit (xtlogit, fe). Add relevant third variables, if necessary. Summarize the estimates in a table (est tab)
8.2 Interpret your findings and the differences between the models.
set more off
capt clear
version 14
* Open the healtl-file.
use "_data/healthl.dta", clear
* Does a divorce (famstat==4) causally affect mental health (mcs)?
/* Construct binary variables for the categories of famstat (tab famstat, gen(fam))
and a binary indicator of low mental health (gen mcs01=mcs<40 if mcs !=.) */
tab famstat, gen(fam)
tab occupation, gen(occ)
gen mcs01 = mcs<40 if !missing(mcs)
save "_data/health.dta", replace
. set more off
. capt clear
. version 14
.
. * Open the healtl-file.
. use "_data/healthl.dta", clear
(HEALTH: 09/10/10 10:14:13-634 DB09)
.
. * Does a divorce (famstat==4) causally affect mental health (mcs)?
.
. /* Construct binary variables for the categories of famstat (tab famstat, gen
> (fam))
> and a binary indicator of low mental health (gen mcs01=mcs<40 if mcs !=.) */
.
. tab famstat, gen(fam)
famstat | Freq. Percent Cum.
-----------------------------+-----------------------------------
[1] Married | 17,566 65.22 65.22
[2] Married, But Separated | 529 1.96 67.18
[3] Single | 5,101 18.94 86.12
[4] Divorced | 2,014 7.48 93.60
[5] Widowed | 1,725 6.40 100.00
-----------------------------+-----------------------------------
Total | 26,935 100.00
. tab occupation, gen(occ)
occupation | Freq. Percent Cum.
--------------+-----------------------------------
Blue-Collar | 4,412 16.38 16.38
White-Collar | 7,804 28.97 45.35
Self-Employed | 1,721 6.39 51.74
Civil Service | 1,076 3.99 55.74
Pensioner | 6,939 25.76 81.50
Unemployed | 1,543 5.73 87.23
Not working | 3,440 12.77 100.00
--------------+-----------------------------------
Total | 26,935 100.00
.
. gen mcs01 = mcs<40 if !missing(mcs)
.
. save "_data/health.dta", replace
(note: file _data/health.dta not found)
file _data/health.dta saved
#### load dataset ####
health_raw <- read_dta("_data/healthl.dta")
Construct binary variables for the categories of famstat and a binary indicator of low mental health (mcs01 = 1 if mcs<40, otherwise 0)
# Generate Binary Variables
health <- health_raw %>%
# mutate mcs
mutate(mcs01 = ifelse(mcs < 40, 1, 0)) %>%
# generate multiple binary vars from categorical var
tibble::rownames_to_column() %>%
group_by_all()%>%
dplyr:: summarize(count = n()) %>%
spread(famstat, count, sep = "_", fill = 0) %>%
ungroup() %>%
dplyr:: select(-rowname) %>%
drop_na()
health %>%
dplyr:: select(mcs, mcs01, starts_with("famstat")) %>%
head()
# transform into panel data frame
p.health <- pdata.frame(health, index = c("id", "year"))
Does a divorce (famstat==4) causally affect mental health (mcs)? Construct binary variables for the categories of famstat (tab famstat, gen(fam)) and a binary indicator of low mental health (gen mcs01=mcs<40 if mcs !=.)
use "_data/health.dta", clear
/* Estimate the effect of a divorce (fam4) on mental health (mcs01) in two different model specifications: Pooled Logit (logit, cluster()) and FE-Logit (xtlogit, fe).*/
* Pooled Logit
logit mcs01 fam4, cluster(id)
est sto pool1
* Fixed-Logit
xtset id year
xtlogit mcs01 fam4, fe
est sto fe1
* Add relevant third variables, if necessary.
* Pooled Logit
logit mcs01 fam4, cluster(id)
est sto pool2
* Fixed-Logit
xtlogit mcs01 fam4 partner##i.gender i.occupation hhinc, fe
est sto fe2
* Summarize the estimates in a table (est tab)
est tab pool1 pool2 fe1 fe2
. use "_data/health.dta", clear
(HEALTH: 09/10/10 10:14:13-634 DB09)
.
. /* Estimate the effect of a divorce (fam4) on mental health (mcs01) in two di
> fferent model specifications: Pooled Logit (logit, cluster()) and FE-Logit (x
> tlogit, fe).*/
.
. * Pooled Logit
. logit mcs01 fam4, cluster(id)
Iteration 0: log pseudolikelihood = -11798.301
Iteration 1: log pseudolikelihood = -11780.267
Iteration 2: log pseudolikelihood = -11780.028
Iteration 3: log pseudolikelihood = -11780.028
Logistic regression Number of obs = 26,935
Wald chi2(1) = 22.28
Prob > chi2 = 0.0000
Log pseudolikelihood = -11780.028 Pseudo R2 = 0.0015
(Std. Err. adjusted for 8,122 clusters in id)
------------------------------------------------------------------------------
| Robust
mcs01 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
fam4 | .3587721 .0760098 4.72 0.000 .2097957 .5077485
_cons | -1.69553 .0237543 -71.38 0.000 -1.742088 -1.648972
------------------------------------------------------------------------------
. est sto pool1
.
. * Fixed-Logit
. xtset id year
panel variable: id (unbalanced)
time variable: year, 2002 to 2008, but with gaps
delta: 1 unit
. xtlogit mcs01 fam4, fe
note: multiple positive outcomes within groups encountered.
note: 5,937 groups (19,026 obs) dropped because of all positive or
all negative outcomes.
Iteration 0: log likelihood = -2924.6065
Iteration 1: log likelihood = -2921.7723
Iteration 2: log likelihood = -2921.7723
Conditional fixed-effects logistic regression Number of obs = 7,909
Group variable: id Number of groups = 2,185
Obs per group:
min = 2
avg = 3.6
max = 4
LR chi2(1) = 3.63
Log likelihood = -2921.7723 Prob > chi2 = 0.0569
------------------------------------------------------------------------------
mcs01 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
fam4 | -.3729736 .1967566 -1.90 0.058 -.7586095 .0126623
------------------------------------------------------------------------------
. est sto fe1
.
. * Add relevant third variables, if necessary.
.
. * Pooled Logit
. logit mcs01 fam4, cluster(id)
Iteration 0: log pseudolikelihood = -11798.301
Iteration 1: log pseudolikelihood = -11780.267
Iteration 2: log pseudolikelihood = -11780.028
Iteration 3: log pseudolikelihood = -11780.028
Logistic regression Number of obs = 26,935
Wald chi2(1) = 22.28
Prob > chi2 = 0.0000
Log pseudolikelihood = -11780.028 Pseudo R2 = 0.0015
(Std. Err. adjusted for 8,122 clusters in id)
------------------------------------------------------------------------------
| Robust
mcs01 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
fam4 | .3587721 .0760098 4.72 0.000 .2097957 .5077485
_cons | -1.69553 .0237543 -71.38 0.000 -1.742088 -1.648972
------------------------------------------------------------------------------
. est sto pool2
.
. * Fixed-Logit
. xtlogit mcs01 fam4 partner##i.gender i.occupation hhinc, fe
note: multiple positive outcomes within groups encountered.
note: 5,937 groups (19,026 obs) dropped because of all positive or
all negative outcomes.
note: 1.gender omitted because of no within-group variance.
Iteration 0: log likelihood = -2907.5045
Iteration 1: log likelihood = -2892.8111
Iteration 2: log likelihood = -2892.7887
Iteration 3: log likelihood = -2892.7887
Conditional fixed-effects logistic regression Number of obs = 7,909
Group variable: id Number of groups = 2,185
Obs per group:
min = 2
avg = 3.6
max = 4
LR chi2(10) = 61.59
Log likelihood = -2892.7887 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
mcs01 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
fam4 | -.4471369 .2027485 -2.21 0.027 -.8445167 -.0497571
1.partner | -.8675788 .1879527 -4.62 0.000 -1.235959 -.4991983
|
gender |
female | 0 (omitted)
|
partner#|
gender |
1#female | .4970283 .2373957 2.09 0.036 .0317413 .9623153
|
occupation |
White-Col~r | .1982613 .1417361 1.40 0.162 -.0795363 .476059
Self-Empl~d | .6475259 .2176755 2.97 0.003 .2208897 1.074162
Civil Ser.. | .4108767 .3842811 1.07 0.285 -.3423004 1.164054
Pensioner | .204351 .1835927 1.11 0.266 -.155484 .5641861
Unemployed | .4446522 .1495026 2.97 0.003 .1516325 .7376719
Not working | .2796007 .1545024 1.81 0.070 -.0232184 .5824198
|
hhinc | -.0001125 .0000338 -3.33 0.001 -.0001789 -.0000462
------------------------------------------------------------------------------
. est sto fe2
.
. * Summarize the estimates in a table (est tab)
.
. est tab pool1 pool2 fe1 fe2
------------------------------------------------------------------
Variable | pool1 pool2 fe1 fe2
-------------+----------------------------------------------------
fam4 | .35877208 .35877208 -.37297361 -.44713692
|
partner |
1 | -.8675788
|
gender |
female | (omitted)
|
partner#|
gender |
1#female | .49702833
|
occupation |
White-Col~r | .19826133
Self-Empl~d | .64752592
Civil Ser.. | .41087672
Pensioner | .20435103
Unemployed | .44465219
Not working | .2796007
|
hhinc | -.00011253
_cons | -1.6955301 -1.6955301
------------------------------------------------------------------
Note: pglm package is not supported for paneldata: https://stats.stackexchange.com/questions/146434/why-pglm-fails-for-within-model
Pooled Logit Model
# Pooled Logit Model
plogit <- glm(mcs01 ~ famstat_4, family = binomial(), data = health)
tidy(plogit)
summary(plogit)
##
## Call:
## glm(formula = mcs01 ~ famstat_4, family = binomial(), data = health)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.671 -0.574 -0.574 -0.574 1.941
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.7195 0.0185 -93.16 < 2e-16 ***
## famstat_4 0.3419 0.0607 5.64 1.7e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 21311 on 24640 degrees of freedom
## Residual deviance: 21281 on 24639 degrees of freedom
## AIC: 21285
##
## Number of Fisher Scoring iterations: 4
Fixed Effects Logit Model
# Fixed Effects Logit Model
felogit <- felm(mcs01 ~ famstat_4 | id , data = health)
tidy(felogit)
summary(felogit)
##
## Call:
## felm(formula = mcs01 ~ famstat_4 | id, data = health)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.788 0.000 0.000 0.000 0.775
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## famstat_4 -0.0505 0.0206 -2.45 0.014 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.29 on 17472 degrees of freedom
## Multiple R-squared(full model): 0.534 Adjusted R-squared: 0.343
## Multiple R-squared(proj model): 0.000343 Adjusted R-squared: -0.41
## F-statistic(full model):2.79 on 7168 and 17472 DF, p-value: <2e-16
## F-statistic(proj model): 5.99 on 1 and 17472 DF, p-value: 0.0144
Pooled Logit Model II
# Pooled OLS Model
plogit2 <- glm(mcs01 ~ famstat_4 + partner*gender + factor(occupation)+ hhinc,
family = binomial,
data = health)
# Fixed Effects Logit Model
felogit2 <- felm(mcs01 ~ famstat_4 + partner + partner:gender + factor(occupation) + hhinc | id ,
data = health)
felogit2.2 <- plm(mcs01 ~ famstat_4 + partner*gender + factor(occupation) + hhinc,
model = "within",
data = p.health)
Overview of estimates in a table
tidy(plogit2$coef)
tidy(felogit2$coef)
tidy(felogit2.2$coef)
Interpret your findings and the differences between the models.