Exercise 8

Questions

8.1 Does a divorce (famstat==4) causally affect mental health (mcs)? Construct binary variables for the categories of famstat (tab famstat, gen(fam)) and a binary indicator of low mental health (gen mcs01=mcs<40 if mcs !=.) Estimate the effect of a divorce (fam4) on mental health (mcs01) in two different model specifications: Pooled Logit (logit, cluster()) and FE-Logit (xtlogit, fe). Add relevant third variables, if necessary. Summarize the estimates in a table (est tab)

8.2 Interpret your findings and the differences between the models.

Data Prep

Stata

set more off
capt clear
version 14

* Open the healtl-file.
use "_data/healthl.dta", clear

* Does a divorce (famstat==4) causally affect mental health (mcs)?

/* Construct binary variables for the categories of famstat (tab famstat, gen(fam))
and a binary indicator of low mental health (gen mcs01=mcs<40 if mcs !=.) */
    
    tab famstat, gen(fam)
    tab occupation, gen(occ)

    gen mcs01 = mcs<40 if !missing(mcs)
    
save "_data/health.dta", replace

. set more off

. capt clear

. version 14

. 
. * Open the healtl-file.
. use "_data/healthl.dta", clear
(HEALTH: 09/10/10 10:14:13-634 DB09)

. 
. * Does a divorce (famstat==4) causally affect mental health (mcs)?
. 
. /* Construct binary variables for the categories of famstat (tab famstat, gen
> (fam))
> and a binary indicator of low mental health (gen mcs01=mcs<40 if mcs !=.) */
.         
.         tab famstat, gen(fam)

                     famstat |      Freq.     Percent        Cum.
-----------------------------+-----------------------------------
                 [1] Married |     17,566       65.22       65.22
  [2] Married, But Separated |        529        1.96       67.18
                  [3] Single |      5,101       18.94       86.12
                [4] Divorced |      2,014        7.48       93.60
                 [5] Widowed |      1,725        6.40      100.00
-----------------------------+-----------------------------------
                       Total |     26,935      100.00

.         tab occupation, gen(occ)

   occupation |      Freq.     Percent        Cum.
--------------+-----------------------------------
  Blue-Collar |      4,412       16.38       16.38
 White-Collar |      7,804       28.97       45.35
Self-Employed |      1,721        6.39       51.74
Civil Service |      1,076        3.99       55.74
    Pensioner |      6,939       25.76       81.50
   Unemployed |      1,543        5.73       87.23
  Not working |      3,440       12.77      100.00
--------------+-----------------------------------
        Total |     26,935      100.00

. 
.         gen mcs01 = mcs<40 if !missing(mcs)

.     
. save "_data/health.dta", replace
(note: file _data/health.dta not found)
file _data/health.dta saved

R

Load Data

#### load dataset ####
health_raw <- read_dta("_data/healthl.dta")

Construct binary variables for the categories of famstat and a binary indicator of low mental health (mcs01 = 1 if mcs<40, otherwise 0)

# Generate Binary Variables
health <- health_raw %>% 
      # mutate mcs
      mutate(mcs01 = ifelse(mcs < 40, 1, 0)) %>% 
      # generate multiple binary vars from categorical var
      tibble::rownames_to_column() %>%
      group_by_all()%>% 
      dplyr:: summarize(count = n()) %>% 
      spread(famstat, count, sep = "_", fill = 0) %>% 
      ungroup() %>% 
      dplyr:: select(-rowname) %>% 
      drop_na()


health %>% 
      dplyr:: select(mcs, mcs01, starts_with("famstat")) %>% 
      head()
# transform into panel data frame
p.health <- pdata.frame(health, index = c("id", "year"))

Answers

8.1

Does a divorce (famstat==4) causally affect mental health (mcs)? Construct binary variables for the categories of famstat (tab famstat, gen(fam)) and a binary indicator of low mental health (gen mcs01=mcs<40 if mcs !=.)

  • Estimate the effect of a divorce (fam4) on mental health (mcs01) in two different model specifications:
    • Pooled Logit (logit, cluster()) and
    • FE-Logit (xtlogit, fe).
  • Add relevant third variables, if necessary.
  • Summarize the estimates in a table (est tab)

Stata

use "_data/health.dta", clear

/* Estimate the effect of a divorce (fam4) on mental health (mcs01) in two different model specifications: Pooled Logit (logit, cluster()) and FE-Logit (xtlogit, fe).*/
    
* Pooled Logit
logit mcs01 fam4, cluster(id)
est sto pool1
    
* Fixed-Logit 
xtset id year
xtlogit mcs01 fam4, fe
est sto fe1
    
* Add relevant third variables, if necessary.

* Pooled Logit
logit mcs01 fam4, cluster(id)
est sto pool2

* Fixed-Logit 
xtlogit mcs01 fam4 partner##i.gender i.occupation hhinc, fe
est sto fe2
    
* Summarize the estimates in a table (est tab)

est tab pool1 pool2 fe1 fe2

. use "_data/health.dta", clear
(HEALTH: 09/10/10 10:14:13-634 DB09)

. 
. /* Estimate the effect of a divorce (fam4) on mental health (mcs01) in two di
> fferent model specifications: Pooled Logit (logit, cluster()) and FE-Logit (x
> tlogit, fe).*/
.     
. * Pooled Logit
. logit mcs01 fam4, cluster(id)

Iteration 0:   log pseudolikelihood = -11798.301  
Iteration 1:   log pseudolikelihood = -11780.267  
Iteration 2:   log pseudolikelihood = -11780.028  
Iteration 3:   log pseudolikelihood = -11780.028  

Logistic regression                             Number of obs     =     26,935
                                                Wald chi2(1)      =      22.28
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -11780.028               Pseudo R2         =     0.0015

                                 (Std. Err. adjusted for 8,122 clusters in id)
------------------------------------------------------------------------------
             |               Robust
       mcs01 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        fam4 |   .3587721   .0760098     4.72   0.000     .2097957    .5077485
       _cons |   -1.69553   .0237543   -71.38   0.000    -1.742088   -1.648972
------------------------------------------------------------------------------

. est sto pool1

.         
. * Fixed-Logit 
. xtset id year
       panel variable:  id (unbalanced)
        time variable:  year, 2002 to 2008, but with gaps
                delta:  1 unit

. xtlogit mcs01 fam4, fe
note: multiple positive outcomes within groups encountered.
note: 5,937 groups (19,026 obs) dropped because of all positive or
      all negative outcomes.

Iteration 0:   log likelihood = -2924.6065  
Iteration 1:   log likelihood = -2921.7723  
Iteration 2:   log likelihood = -2921.7723  

Conditional fixed-effects logistic regression   Number of obs     =      7,909
Group variable: id                              Number of groups  =      2,185

                                                Obs per group:
                                                              min =          2
                                                              avg =        3.6
                                                              max =          4

                                                LR chi2(1)        =       3.63
Log likelihood  = -2921.7723                    Prob > chi2       =     0.0569

------------------------------------------------------------------------------
       mcs01 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        fam4 |  -.3729736   .1967566    -1.90   0.058    -.7586095    .0126623
------------------------------------------------------------------------------

. est sto fe1

.         
. * Add relevant third variables, if necessary.
. 
. * Pooled Logit
. logit mcs01 fam4, cluster(id)

Iteration 0:   log pseudolikelihood = -11798.301  
Iteration 1:   log pseudolikelihood = -11780.267  
Iteration 2:   log pseudolikelihood = -11780.028  
Iteration 3:   log pseudolikelihood = -11780.028  

Logistic regression                             Number of obs     =     26,935
                                                Wald chi2(1)      =      22.28
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -11780.028               Pseudo R2         =     0.0015

                                 (Std. Err. adjusted for 8,122 clusters in id)
------------------------------------------------------------------------------
             |               Robust
       mcs01 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        fam4 |   .3587721   .0760098     4.72   0.000     .2097957    .5077485
       _cons |   -1.69553   .0237543   -71.38   0.000    -1.742088   -1.648972
------------------------------------------------------------------------------

. est sto pool2

. 
. * Fixed-Logit 
. xtlogit mcs01 fam4 partner##i.gender i.occupation hhinc, fe
note: multiple positive outcomes within groups encountered.
note: 5,937 groups (19,026 obs) dropped because of all positive or
      all negative outcomes.
note: 1.gender omitted because of no within-group variance.

Iteration 0:   log likelihood = -2907.5045  
Iteration 1:   log likelihood = -2892.8111  
Iteration 2:   log likelihood = -2892.7887  
Iteration 3:   log likelihood = -2892.7887  

Conditional fixed-effects logistic regression   Number of obs     =      7,909
Group variable: id                              Number of groups  =      2,185

                                                Obs per group:
                                                              min =          2
                                                              avg =        3.6
                                                              max =          4

                                                LR chi2(10)       =      61.59
Log likelihood  = -2892.7887                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
       mcs01 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        fam4 |  -.4471369   .2027485    -2.21   0.027    -.8445167   -.0497571
   1.partner |  -.8675788   .1879527    -4.62   0.000    -1.235959   -.4991983
             |
      gender |
     female  |          0  (omitted)
             |
     partner#|
      gender |
   1#female  |   .4970283   .2373957     2.09   0.036     .0317413    .9623153
             |
  occupation |
White-Col~r  |   .1982613   .1417361     1.40   0.162    -.0795363     .476059
Self-Empl~d  |   .6475259   .2176755     2.97   0.003     .2208897    1.074162
Civil Ser..  |   .4108767   .3842811     1.07   0.285    -.3423004    1.164054
  Pensioner  |    .204351   .1835927     1.11   0.266     -.155484    .5641861
 Unemployed  |   .4446522   .1495026     2.97   0.003     .1516325    .7376719
Not working  |   .2796007   .1545024     1.81   0.070    -.0232184    .5824198
             |
       hhinc |  -.0001125   .0000338    -3.33   0.001    -.0001789   -.0000462
------------------------------------------------------------------------------

. est sto fe2

.         
. * Summarize the estimates in a table (est tab)
. 
. est tab pool1 pool2 fe1 fe2

------------------------------------------------------------------
    Variable |   pool1        pool2         fe1          fe2      
-------------+----------------------------------------------------
        fam4 |  .35877208    .35877208   -.37297361   -.44713692  
             |
     partner |
          1  |                                         -.8675788  
             |
      gender |
     female  |                                         (omitted)  
             |
     partner#|
      gender |
   1#female  |                                         .49702833  
             |
  occupation |
White-Col~r  |                                         .19826133  
Self-Empl~d  |                                         .64752592  
Civil Ser..  |                                         .41087672  
  Pensioner  |                                         .20435103  
 Unemployed  |                                         .44465219  
Not working  |                                          .2796007  
             |
       hhinc |                                        -.00011253  
       _cons | -1.6955301   -1.6955301                            
------------------------------------------------------------------

R

Models

Note: pglm package is not supported for paneldata: https://stats.stackexchange.com/questions/146434/why-pglm-fails-for-within-model

Pooled Logit Model

# Pooled Logit Model
plogit <- glm(mcs01 ~  famstat_4, family = binomial(), data = health)


tidy(plogit)
summary(plogit)
## 
## Call:
## glm(formula = mcs01 ~ famstat_4, family = binomial(), data = health)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -0.671  -0.574  -0.574  -0.574   1.941  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.7195     0.0185  -93.16  < 2e-16 ***
## famstat_4     0.3419     0.0607    5.64  1.7e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 21311  on 24640  degrees of freedom
## Residual deviance: 21281  on 24639  degrees of freedom
## AIC: 21285
## 
## Number of Fisher Scoring iterations: 4

Fixed Effects Logit Model

# Fixed Effects Logit Model
felogit <- felm(mcs01 ~ famstat_4 | id , data = health)

tidy(felogit)
summary(felogit)
## 
## Call:
##    felm(formula = mcs01 ~ famstat_4 | id, data = health) 
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -0.788  0.000  0.000  0.000  0.775 
## 
## Coefficients:
##           Estimate Std. Error t value Pr(>|t|)  
## famstat_4  -0.0505     0.0206   -2.45    0.014 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.29 on 17472 degrees of freedom
## Multiple R-squared(full model): 0.534   Adjusted R-squared: 0.343 
## Multiple R-squared(proj model): 0.000343   Adjusted R-squared: -0.41 
## F-statistic(full model):2.79 on 7168 and 17472 DF, p-value: <2e-16 
## F-statistic(proj model): 5.99 on 1 and 17472 DF, p-value: 0.0144

Pooled Logit Model II

# Pooled OLS Model
plogit2 <- glm(mcs01 ~  famstat_4 + partner*gender + factor(occupation)+ hhinc, 
               family = binomial, 
               data = health)

# Fixed Effects Logit Model
felogit2 <- felm(mcs01 ~ famstat_4 + partner + partner:gender + factor(occupation) + hhinc | id , 
                   data = health)

felogit2.2 <- plm(mcs01 ~  famstat_4 + partner*gender + factor(occupation) + hhinc, 
                model = "within", 
                data = p.health)

Overview of estimates in a table

tidy(plogit2$coef)
tidy(felogit2$coef)
tidy(felogit2.2$coef)

8.2

Interpret your findings and the differences between the models.