Exercise 6

Questions

6.1 Using Soep Data: Specify the fixed effects model of the mincer equation and compare your results to the pooled OLS specification.

Data Prep

Stata

set more off
capt clear
version 14

use "_data/ex_mydf.dta", clear

. set more off

. capt clear

. version 14

. 
. use "_data/ex_mydf.dta", clear
(PGEN: Feb 12, 2017 13:00:53-1 DBV32L)

R

Load Data

#### load dataset ####
ex_mydf <- readRDS(file = "_data/ex_mydf.rds")

asample <- ex_mydf %>% 
      filter(
            # Working Hours
            pgtatzeit >= 6,
            # age
            alter %>% dplyr::between(18, 65),
            # Employment Status
            pgemplst %in% c(1,2,4),   # full-time(1), part-time(2), marg.-empl(4)
            # population status
            pop < 3 # only private households
            ) %>%
      # filter unplausible cases
      mutate(na = case_when(
              pid %in% c(1380202, 1380202, 607602, 2555301) ~ 1,
              pid == 8267202 & syear == 2007 ~ 1,
              pid == 2633801 & syear == 2006 ~ 1,
              pid == 2582901 & syear > 2006 ~ 1 )
             ) %>% 
      filter(is.na(na), 
             syear > 2002) %>% 
      select(pid, syear, lnwage, pgbilzeit, erf, ost, frau, pgexpue, pgallbet, phrf )

## Sample For Analysis for 2015
asample15 <- asample %>% filter(syear == 2015)

# panel data frame
p.asample <- pdata.frame(asample, index = c("pid", "syear"))
p.asample15 <- pdata.frame(asample15, index = c("pid", "syear"))

Answers

6.1

Using Soep Data: Specify the fixed effects model of the mincer equation and compare your results to the pooled OLS specification.

Note: Because the size of the dataset is so large and the computing for fixed effects models is takes long, for this exercise the sample is restricted to years 2002 and up.

The coefficients for the fixed effects regression are identical to the OLS dummy regression. - Fixed effects regression does not show any effects of an intercept or sex on Lebenszufriedenheit since constant variables drop out. - The coefficient on sex cannot be interpreted in a fixed effects regression.

The coefficients of the standard OLS/pooled regression with intercept are all strongly significant. The effects differ in intensity (gesund_org) and also direction (anz_kind, bildung) from the fixed effects regression. This is partially related to the consideration of of an intercept and sex.

By ignoring the clustering in persnr, we just calculate a pooled statistical relation. Instead the fixed panel regression exaluates changes with the persnr and therefore we can speak or (causal) effects induced by changes of our chosen covariates

  1. gesund_org is strongly significant. An increase by 1 in gesund_org (with {1,2,3,4,5} in the dataset) leads to an average increase of 0.45 in lebensz_org

  2. anz_kind is significant at a significance level of 0.015 (P-value) An increase by one child (with {1,2,3} in this dataset) leads to an average increase of 0.13 in lebensz_org

Stata

  • rho := Intraklassenkorrleation := sigma_u²/(sigma_u² + sigma_e²)
  • Maß der Ähnlichkeit der Beobachtungen innerhalb eines Clusters
  • wenn 1, dann ist gesamte Variation auf die Between-Ebene zurückzuführen;
  • Beobachtungen innerhalb eines Clusters sind dann identisch.
  • fe unterstützt keine zeitvariierenden Gewichte –> workaround über areg , absorb
  • FE schätzt ATET –> mangelnde Generalisierbarkeit
  • two-wayfixed effect
    • neben Individuenspezifischen unbeobachteten Effekten werden nun zusätzlich periodenspezifische Effekte kontrolliert
use "_data/ex_mydf.dta", clear

* POLS-Model
reg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet if asample==1 & syear>2002
reg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet if asample==1 & syear>2002 [pw = phrf]

* FE-Model
xtset pid syear
xtreg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet i.syear if asample==1 & syear>2002,  fe cluster(pid)

areg  lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet i.syear  if asample==1 & syear>2002,  absorb(pid) cluster(pid)

* two-way-fixed-effect
xtreg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet i.syear  if asample==1 & syear>2002 & pgbilztch==0,  fe

. use "_data/ex_mydf.dta", clear
(PGEN: Feb 12, 2017 13:00:53-1 DBV32L)

. 
. * POLS-Model
. reg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet if asample==1 &
>  syear>2002

      Source |       SS           df       MS      Number of obs   =   156,029
-------------+----------------------------------   F(10, 156018)   =  10057.25
       Model |  21926.0046        10  2192.60046   Prob > F        =    0.0000
    Residual |   34013.772   156,018  .218011845   R-squared       =    0.3920
-------------+----------------------------------   Adj R-squared   =    0.3919
       Total |  55939.7766   156,028  .358523961   Root MSE        =    .46692

------------------------------------------------------------------------------
      lnwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   pgbilzeit |   .0802162   .0004426   181.26   0.000     .0793488    .0810836
         erf |   .0385538   .0003902    98.82   0.000     .0377891    .0393185
             |
 c.erf#c.erf |  -.0006474   9.60e-06   -67.46   0.000    -.0006662   -.0006286
             |
     pgexpue |  -.0530502   .0007337   -72.30   0.000    -.0544883   -.0516121
         ost |  -.2861231   .0029417   -97.26   0.000    -.2918888   -.2803574
        frau |  -.1677748   .0024721   -67.87   0.000      -.17262   -.1629296
             |
    pgallbet |
[2] GE 20..  |   .1263102   .0032532    38.83   0.000     .1199339    .1326864
[3] GE 20..  |   .2368372   .0035369    66.96   0.000      .229905    .2437694
[4] GE 2000  |   .2984117    .003429    87.03   0.000     .2916909    .3051325
[5] Selbs..  |   -.124032   .0065447   -18.95   0.000    -.1368594   -.1112046
             |
       _cons |   1.264246   .0070926   178.25   0.000     1.250345    1.278148
------------------------------------------------------------------------------

. reg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet if asample==1 &
>  syear>2002 [pw = phrf]
(sum of wgt is   4.0554e+08)

Linear regression                               Number of obs     =    153,014
                                                F(10, 153003)     =    3459.25
                                                Prob > F          =     0.0000
                                                R-squared         =     0.3639
                                                Root MSE          =     .46091

------------------------------------------------------------------------------
             |               Robust
      lnwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   pgbilzeit |   .0744084   .0007376   100.88   0.000     .0729628     .075854
         erf |   .0389242   .0006379    61.02   0.000      .037674    .0401744
             |
 c.erf#c.erf |  -.0006587    .000015   -43.99   0.000     -.000688   -.0006293
             |
     pgexpue |  -.0538709   .0016444   -32.76   0.000    -.0570938   -.0506479
         ost |  -.2862332   .0047512   -60.24   0.000    -.2955455    -.276921
        frau |  -.1700573   .0037989   -44.76   0.000    -.1775032   -.1626115
             |
    pgallbet |
[2] GE 20..  |   .1254696   .0052313    23.98   0.000     .1152164    .1357228
[3] GE 20..  |   .2324542    .005442    42.71   0.000      .221788    .2431204
[4] GE 2000  |   .2962349   .0052689    56.22   0.000     .2859081    .3065618
[5] Selbs..  |  -.1131356   .0163553    -6.92   0.000    -.1451916   -.0810796
             |
       _cons |   1.322194   .0118697   111.39   0.000      1.29893    1.345459
------------------------------------------------------------------------------

. 
. * FE-Model
. xtset pid syear
       panel variable:  pid (unbalanced)
        time variable:  syear, 1984 to 2015, but with gaps
                delta:  1 unit

. xtreg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet i.syear if as
> ample==1 & syear>2002,  fe cluster(pid)
note: frau omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =    156,029
Group variable: pid                             Number of groups  =     33,894

R-sq:                                           Obs per group:
     within  = 0.0401                                         min =          1
     between = 0.3541                                         avg =        4.6
     overall = 0.3156                                         max =         13

                                                F(21,33893)       =     123.40
corr(u_i, Xb)  = 0.1227                         Prob > F          =     0.0000

                               (Std. Err. adjusted for 33,894 clusters in pid)
------------------------------------------------------------------------------
             |               Robust
      lnwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   pgbilzeit |   .0529867   .0046515    11.39   0.000     .0438696    .0621039
         erf |    .045247    .002356    19.21   0.000     .0406292    .0498648
             |
 c.erf#c.erf |  -.0006661   .0000262   -25.43   0.000    -.0007174   -.0006147
             |
     pgexpue |  -.0332011   .0078562    -4.23   0.000    -.0485995   -.0178027
         ost |  -.1209915   .0243707    -4.96   0.000    -.1687589   -.0732241
        frau |          0  (omitted)
             |
    pgallbet |
[2] GE 20..  |   .0495245   .0058391     8.48   0.000     .0380796    .0609693
[3] GE 20..  |   .0771106   .0067836    11.37   0.000     .0638146    .0904067
[4] GE 2000  |   .0896249    .007137    12.56   0.000     .0756362    .1036137
[5] Selbs..  |  -.0717673   .0153061    -4.69   0.000    -.1017677   -.0417669
             |
       syear |
       2004  |  -.0172732   .0042352    -4.08   0.000    -.0255744    -.008972
       2005  |  -.0460263   .0053915    -8.54   0.000    -.0565939   -.0354587
       2006  |  -.0833928   .0067448   -12.36   0.000    -.0966128   -.0701729
       2007  |  -.1048617   .0080679   -13.00   0.000     -.120675   -.0890483
       2008  |  -.1178463   .0095065   -12.40   0.000    -.1364794   -.0992132
       2009  |   -.109741   .0111915    -9.81   0.000    -.1316767   -.0878053
       2010  |  -.1125573   .0126257    -8.91   0.000    -.1373041   -.0878104
       2011  |  -.1263725   .0142115    -8.89   0.000    -.1542276   -.0985174
       2012  |  -.1278882    .015645    -8.17   0.000    -.1585529   -.0972234
       2013  |  -.1120904    .016911    -6.63   0.000    -.1452365   -.0789443
       2014  |  -.0969339   .0182056    -5.32   0.000    -.1326174   -.0612503
       2015  |  -.0739467   .0197261    -3.75   0.000    -.1126105    -.035283
             |
       _cons |   1.564629   .0666481    23.48   0.000     1.433996    1.695261
-------------+----------------------------------------------------------------
     sigma_u |  .48874503
     sigma_e |  .28535324
         rho |  .74577917   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. 
. areg  lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet i.syear  if a
> sample==1 & syear>2002,  absorb(pid) cluster(pid)
note: frau omitted because of collinearity

Linear regression, absorbing indicators         Number of obs     =    156,029
                                                F(  21,  33893)   =      96.59
                                                Prob > F          =     0.0000
                                                R-squared         =     0.8222
                                                Adj R-squared     =     0.7729
                                                Root MSE          =     0.2854

                               (Std. Err. adjusted for 33,894 clusters in pid)
------------------------------------------------------------------------------
             |               Robust
      lnwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   pgbilzeit |   .0529867   .0052576    10.08   0.000     .0426817    .0632917
         erf |    .045247   .0026629    16.99   0.000     .0400276    .0504664
             |
 c.erf#c.erf |  -.0006661   .0000296   -22.50   0.000    -.0007241    -.000608
             |
     pgexpue |  -.0332011   .0088798    -3.74   0.000    -.0506057   -.0157965
         ost |  -.1209915   .0275459    -4.39   0.000    -.1749824   -.0670006
        frau |          0  (omitted)
             |
    pgallbet |
[2] GE 20..  |   .0495245   .0065999     7.50   0.000     .0365885    .0624605
[3] GE 20..  |   .0771106   .0076674    10.06   0.000     .0620823     .092139
[4] GE 2000  |   .0896249   .0080669    11.11   0.000     .0738136    .1054363
[5] Selbs..  |  -.0717673   .0173003    -4.15   0.000    -.1056764   -.0378582
             |
       syear |
       2004  |  -.0172732    .004787    -3.61   0.000    -.0266559   -.0078905
       2005  |  -.0460263    .006094    -7.55   0.000    -.0579707   -.0340819
       2006  |  -.0833928   .0076235   -10.94   0.000    -.0983352   -.0684505
       2007  |  -.1048617    .009119   -11.50   0.000    -.1227353   -.0869881
       2008  |  -.1178463   .0107451   -10.97   0.000    -.1389071   -.0967855
       2009  |   -.109741   .0126496    -8.68   0.000    -.1345347   -.0849473
       2010  |  -.1125573   .0142707    -7.89   0.000    -.1405284   -.0845862
       2011  |  -.1263725   .0160631    -7.87   0.000    -.1578568   -.0948882
       2012  |  -.1278882   .0176834    -7.23   0.000    -.1625482   -.0932282
       2013  |  -.1120904   .0191143    -5.86   0.000     -.149555   -.0746258
       2014  |  -.0969339   .0205776    -4.71   0.000    -.1372666   -.0566011
       2015  |  -.0739467   .0222961    -3.32   0.001    -.1176479   -.0302455
             |
       _cons |   1.564629   .0753316    20.77   0.000     1.416976    1.712281
-------------+----------------------------------------------------------------
         pid |   absorbed                                   (33894 categories)

. 
. * two-way-fixed-effect
. xtreg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet i.syear  if a
> sample==1 & syear>2002 & pgbilztch==0,  fe
note: frau omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =    137,206
Group variable: pid                             Number of groups  =     29,468

R-sq:                                           Obs per group:
     within  = 0.0424                                         min =          1
     between = 0.3784                                         avg =        4.7
     overall = 0.3328                                         max =         13

                                                F(21,107717)      =     227.32
corr(u_i, Xb)  = 0.0057                         Prob > F          =     0.0000

------------------------------------------------------------------------------
      lnwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   pgbilzeit |   .0842833   .0031764    26.53   0.000     .0780576     .090509
         erf |   .0450217   .0015773    28.54   0.000     .0419301    .0481132
             |
 c.erf#c.erf |  -.0006308   .0000188   -33.54   0.000    -.0006676   -.0005939
             |
     pgexpue |  -.0346334    .003403   -10.18   0.000    -.0413032   -.0279637
         ost |  -.1282438   .0146121    -8.78   0.000    -.1568832   -.0996043
        frau |          0  (omitted)
             |
    pgallbet |
[2] GE 20..  |     .05027   .0040706    12.35   0.000     .0422917    .0582483
[3] GE 20..  |   .0780984   .0047546    16.43   0.000     .0687794    .0874174
[4] GE 2000  |   .0889808   .0049328    18.04   0.000     .0793126    .0986491
[5] Selbs..  |  -.0771079   .0073701   -10.46   0.000    -.0915532   -.0626627
             |
       syear |
       2004  |  -.0173967   .0042274    -4.12   0.000    -.0256823    -.009111
       2005  |   -.048454   .0047254   -10.25   0.000    -.0577156   -.0391923
       2006  |  -.0860305   .0053577   -16.06   0.000    -.0965315   -.0755294
       2007  |  -.1113415   .0060158   -18.51   0.000    -.1231324   -.0995505
       2008  |  -.1251727   .0068083   -18.39   0.000    -.1385167   -.1118286
       2009  |  -.1195701   .0076648   -15.60   0.000    -.1345929   -.1045473
       2010  |  -.1225387    .008599   -14.25   0.000    -.1393925   -.1056849
       2011  |  -.1368962   .0093792   -14.60   0.000    -.1552793   -.1185132
       2012  |  -.1426543   .0102839   -13.87   0.000    -.1628107   -.1224979
       2013  |  -.1334428   .0110012   -12.13   0.000     -.155005   -.1118806
       2014  |  -.1116799   .0118414    -9.43   0.000     -.134889   -.0884709
       2015  |  -.0930661   .0128066    -7.27   0.000    -.1181668   -.0679654
             |
       _cons |   1.166358   .0445909    26.16   0.000     1.078961    1.253756
-------------+----------------------------------------------------------------
     sigma_u |  .47311258
     sigma_e |  .28070512
         rho |   .7396321   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(29467, 107717) = 9.19               Prob > F = 0.0000

. 

R

Model

Note1: Difficulty with FE-Models in R Computing fixed effects models with a lot of individuals has proven to be very time consuming in r and needs a lot of computing power. Most of the packages for fixed-effects models are written in a way that they can handle panel data regarding a few countries, but they are not well fit for panel data with some thousand individuals like the SOEP data. If you know of packages that are fit for population panel data please let me know on github or twitter @_asilisa_ :).

Note2: *There is also the problem so far, that the p-values seem not to be calculated correctly in for fe models with the plmpackage.

Pooled Model

# POLS- and FE-model ------------------------------------------------------
pols <- plm(lnwage ~ pgbilzeit + erf + I(erf^2) + pgexpue + ost + frau + factor(pgallbet),
                        data = p.asample, 
                        weights = phrf,
                        model = "pooling") 

# other way of writing
# pols <- lm(lnwage ~  pgbilzeit + erf + I(erf^2) + ost + frau, 
#            weights = phrf,
#            data = asample)

tidy(pols)

Fixed effect model

fe  <- plm(lnwage ~ pgbilzeit + erf + I(erf^2) + pgexpue + ost + frau + pgallbet, 
            data = p.asample,
            weights = phrf,
            model = "within")

fe
## 
## Model Formula: lnwage ~ pgbilzeit + erf + I(erf^2) + pgexpue + ost + frau + 
##     pgallbet
## 
## Coefficients:
## pgbilzeit       erf  I(erf^2)   pgexpue       ost  pgallbet 
##  0.068041  0.073147 -0.000722 -0.010591 -0.139414  0.008868
# t.fe   <- tidy(fe) # takes too long
# display results
      # smy_fe <- summary(fe) # takes too long
      # summary(fe, robust = T) # like the robust function in STATA

# just show general coefficients and no dummies
      # smy_fe$coef[1:5,] # takes too long

Fixed effects like areg in STATA with the felm package. You can find more information on different models in R and STATA at this awesome website

# fixed effects like areg in STATA
fe_areg <- felm(lnwage ~ pgbilzeit + erf + I(erf^2) + ost 
                         | pid | 0 | pid, data = asample, weights = asample$phrf)

fe_areg      
## pgbilzeit       erf  I(erf^2)       ost 
##  0.066675  0.072787 -0.000713 -0.140869
# areg y x1 [w=x3], a(id1) cl(id1) 
# felm(y ~ x1 | id1 | 0 | id1, df, weight = x3))
Output

output as nice table (takes too long to draw at the moment)

# stargazer(fe, pols, title="Results", align=TRUE) # takes too long
Tests
# Pooled Model
# LM test for random effects versus OLS
plmtest(pols) 
## 
##  Lagrange Multiplier Test - (Honda) for unbalanced panels
## 
## data:  lnwage ~ pgbilzeit + erf + I(erf^2) + ost + frau
## normal = 500, p-value <2e-16
## alternative hypothesis: significant effects
# FE Model
# coeftest(fe, vcov.=vcovHC(fe,type="HC1")) # takes too long

info on fixed effects / LSDV Reg https://stats.stackexchange.com/questions/41916/within-model-with-plm-package