6.1 Using Soep Data: Specify the fixed effects model of the mincer equation and compare your results to the pooled OLS specification.
set more off
capt clear
version 14
use "_data/ex_mydf.dta", clear
. set more off
. capt clear
. version 14
.
. use "_data/ex_mydf.dta", clear
(PGEN: Feb 12, 2017 13:00:53-1 DBV32L)
#### load dataset ####
ex_mydf <- readRDS(file = "_data/ex_mydf.rds")
asample <- ex_mydf %>%
filter(
# Working Hours
pgtatzeit >= 6,
# age
alter %>% dplyr::between(18, 65),
# Employment Status
pgemplst %in% c(1,2,4), # full-time(1), part-time(2), marg.-empl(4)
# population status
pop < 3 # only private households
) %>%
# filter unplausible cases
mutate(na = case_when(
pid %in% c(1380202, 1380202, 607602, 2555301) ~ 1,
pid == 8267202 & syear == 2007 ~ 1,
pid == 2633801 & syear == 2006 ~ 1,
pid == 2582901 & syear > 2006 ~ 1 )
) %>%
filter(is.na(na),
syear > 2002) %>%
select(pid, syear, lnwage, pgbilzeit, erf, ost, frau, pgexpue, pgallbet, phrf )
## Sample For Analysis for 2015
asample15 <- asample %>% filter(syear == 2015)
# panel data frame
p.asample <- pdata.frame(asample, index = c("pid", "syear"))
p.asample15 <- pdata.frame(asample15, index = c("pid", "syear"))
Using Soep Data: Specify the fixed effects model of the mincer equation and compare your results to the pooled OLS specification.
Note: Because the size of the dataset is so large and the computing for fixed effects models is takes long, for this exercise the sample is restricted to years 2002 and up.
The coefficients for the fixed effects regression are identical to the OLS dummy regression. - Fixed effects regression does not show any effects of an intercept or sex on Lebenszufriedenheit since constant variables drop out. - The coefficient on sex cannot be interpreted in a fixed effects regression.
The coefficients of the standard OLS/pooled regression with intercept are all strongly significant. The effects differ in intensity (gesund_org) and also direction (anz_kind, bildung) from the fixed effects regression. This is partially related to the consideration of of an intercept and sex.
By ignoring the clustering in persnr, we just calculate a pooled statistical relation. Instead the fixed panel regression exaluates changes with the persnr and therefore we can speak or (causal) effects induced by changes of our chosen covariates
gesund_org is strongly significant. An increase by 1 in gesund_org (with {1,2,3,4,5} in the dataset) leads to an average increase of 0.45 in lebensz_org
anz_kind is significant at a significance level of 0.015 (P-value) An increase by one child (with {1,2,3} in this dataset) leads to an average increase of 0.13 in lebensz_org
use "_data/ex_mydf.dta", clear
* POLS-Model
reg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet if asample==1 & syear>2002
reg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet if asample==1 & syear>2002 [pw = phrf]
* FE-Model
xtset pid syear
xtreg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet i.syear if asample==1 & syear>2002, fe cluster(pid)
areg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet i.syear if asample==1 & syear>2002, absorb(pid) cluster(pid)
* two-way-fixed-effect
xtreg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet i.syear if asample==1 & syear>2002 & pgbilztch==0, fe
. use "_data/ex_mydf.dta", clear
(PGEN: Feb 12, 2017 13:00:53-1 DBV32L)
.
. * POLS-Model
. reg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet if asample==1 &
> syear>2002
Source | SS df MS Number of obs = 156,029
-------------+---------------------------------- F(10, 156018) = 10057.25
Model | 21926.0046 10 2192.60046 Prob > F = 0.0000
Residual | 34013.772 156,018 .218011845 R-squared = 0.3920
-------------+---------------------------------- Adj R-squared = 0.3919
Total | 55939.7766 156,028 .358523961 Root MSE = .46692
------------------------------------------------------------------------------
lnwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pgbilzeit | .0802162 .0004426 181.26 0.000 .0793488 .0810836
erf | .0385538 .0003902 98.82 0.000 .0377891 .0393185
|
c.erf#c.erf | -.0006474 9.60e-06 -67.46 0.000 -.0006662 -.0006286
|
pgexpue | -.0530502 .0007337 -72.30 0.000 -.0544883 -.0516121
ost | -.2861231 .0029417 -97.26 0.000 -.2918888 -.2803574
frau | -.1677748 .0024721 -67.87 0.000 -.17262 -.1629296
|
pgallbet |
[2] GE 20.. | .1263102 .0032532 38.83 0.000 .1199339 .1326864
[3] GE 20.. | .2368372 .0035369 66.96 0.000 .229905 .2437694
[4] GE 2000 | .2984117 .003429 87.03 0.000 .2916909 .3051325
[5] Selbs.. | -.124032 .0065447 -18.95 0.000 -.1368594 -.1112046
|
_cons | 1.264246 .0070926 178.25 0.000 1.250345 1.278148
------------------------------------------------------------------------------
. reg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet if asample==1 &
> syear>2002 [pw = phrf]
(sum of wgt is 4.0554e+08)
Linear regression Number of obs = 153,014
F(10, 153003) = 3459.25
Prob > F = 0.0000
R-squared = 0.3639
Root MSE = .46091
------------------------------------------------------------------------------
| Robust
lnwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pgbilzeit | .0744084 .0007376 100.88 0.000 .0729628 .075854
erf | .0389242 .0006379 61.02 0.000 .037674 .0401744
|
c.erf#c.erf | -.0006587 .000015 -43.99 0.000 -.000688 -.0006293
|
pgexpue | -.0538709 .0016444 -32.76 0.000 -.0570938 -.0506479
ost | -.2862332 .0047512 -60.24 0.000 -.2955455 -.276921
frau | -.1700573 .0037989 -44.76 0.000 -.1775032 -.1626115
|
pgallbet |
[2] GE 20.. | .1254696 .0052313 23.98 0.000 .1152164 .1357228
[3] GE 20.. | .2324542 .005442 42.71 0.000 .221788 .2431204
[4] GE 2000 | .2962349 .0052689 56.22 0.000 .2859081 .3065618
[5] Selbs.. | -.1131356 .0163553 -6.92 0.000 -.1451916 -.0810796
|
_cons | 1.322194 .0118697 111.39 0.000 1.29893 1.345459
------------------------------------------------------------------------------
.
. * FE-Model
. xtset pid syear
panel variable: pid (unbalanced)
time variable: syear, 1984 to 2015, but with gaps
delta: 1 unit
. xtreg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet i.syear if as
> ample==1 & syear>2002, fe cluster(pid)
note: frau omitted because of collinearity
Fixed-effects (within) regression Number of obs = 156,029
Group variable: pid Number of groups = 33,894
R-sq: Obs per group:
within = 0.0401 min = 1
between = 0.3541 avg = 4.6
overall = 0.3156 max = 13
F(21,33893) = 123.40
corr(u_i, Xb) = 0.1227 Prob > F = 0.0000
(Std. Err. adjusted for 33,894 clusters in pid)
------------------------------------------------------------------------------
| Robust
lnwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pgbilzeit | .0529867 .0046515 11.39 0.000 .0438696 .0621039
erf | .045247 .002356 19.21 0.000 .0406292 .0498648
|
c.erf#c.erf | -.0006661 .0000262 -25.43 0.000 -.0007174 -.0006147
|
pgexpue | -.0332011 .0078562 -4.23 0.000 -.0485995 -.0178027
ost | -.1209915 .0243707 -4.96 0.000 -.1687589 -.0732241
frau | 0 (omitted)
|
pgallbet |
[2] GE 20.. | .0495245 .0058391 8.48 0.000 .0380796 .0609693
[3] GE 20.. | .0771106 .0067836 11.37 0.000 .0638146 .0904067
[4] GE 2000 | .0896249 .007137 12.56 0.000 .0756362 .1036137
[5] Selbs.. | -.0717673 .0153061 -4.69 0.000 -.1017677 -.0417669
|
syear |
2004 | -.0172732 .0042352 -4.08 0.000 -.0255744 -.008972
2005 | -.0460263 .0053915 -8.54 0.000 -.0565939 -.0354587
2006 | -.0833928 .0067448 -12.36 0.000 -.0966128 -.0701729
2007 | -.1048617 .0080679 -13.00 0.000 -.120675 -.0890483
2008 | -.1178463 .0095065 -12.40 0.000 -.1364794 -.0992132
2009 | -.109741 .0111915 -9.81 0.000 -.1316767 -.0878053
2010 | -.1125573 .0126257 -8.91 0.000 -.1373041 -.0878104
2011 | -.1263725 .0142115 -8.89 0.000 -.1542276 -.0985174
2012 | -.1278882 .015645 -8.17 0.000 -.1585529 -.0972234
2013 | -.1120904 .016911 -6.63 0.000 -.1452365 -.0789443
2014 | -.0969339 .0182056 -5.32 0.000 -.1326174 -.0612503
2015 | -.0739467 .0197261 -3.75 0.000 -.1126105 -.035283
|
_cons | 1.564629 .0666481 23.48 0.000 1.433996 1.695261
-------------+----------------------------------------------------------------
sigma_u | .48874503
sigma_e | .28535324
rho | .74577917 (fraction of variance due to u_i)
------------------------------------------------------------------------------
.
. areg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet i.syear if a
> sample==1 & syear>2002, absorb(pid) cluster(pid)
note: frau omitted because of collinearity
Linear regression, absorbing indicators Number of obs = 156,029
F( 21, 33893) = 96.59
Prob > F = 0.0000
R-squared = 0.8222
Adj R-squared = 0.7729
Root MSE = 0.2854
(Std. Err. adjusted for 33,894 clusters in pid)
------------------------------------------------------------------------------
| Robust
lnwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pgbilzeit | .0529867 .0052576 10.08 0.000 .0426817 .0632917
erf | .045247 .0026629 16.99 0.000 .0400276 .0504664
|
c.erf#c.erf | -.0006661 .0000296 -22.50 0.000 -.0007241 -.000608
|
pgexpue | -.0332011 .0088798 -3.74 0.000 -.0506057 -.0157965
ost | -.1209915 .0275459 -4.39 0.000 -.1749824 -.0670006
frau | 0 (omitted)
|
pgallbet |
[2] GE 20.. | .0495245 .0065999 7.50 0.000 .0365885 .0624605
[3] GE 20.. | .0771106 .0076674 10.06 0.000 .0620823 .092139
[4] GE 2000 | .0896249 .0080669 11.11 0.000 .0738136 .1054363
[5] Selbs.. | -.0717673 .0173003 -4.15 0.000 -.1056764 -.0378582
|
syear |
2004 | -.0172732 .004787 -3.61 0.000 -.0266559 -.0078905
2005 | -.0460263 .006094 -7.55 0.000 -.0579707 -.0340819
2006 | -.0833928 .0076235 -10.94 0.000 -.0983352 -.0684505
2007 | -.1048617 .009119 -11.50 0.000 -.1227353 -.0869881
2008 | -.1178463 .0107451 -10.97 0.000 -.1389071 -.0967855
2009 | -.109741 .0126496 -8.68 0.000 -.1345347 -.0849473
2010 | -.1125573 .0142707 -7.89 0.000 -.1405284 -.0845862
2011 | -.1263725 .0160631 -7.87 0.000 -.1578568 -.0948882
2012 | -.1278882 .0176834 -7.23 0.000 -.1625482 -.0932282
2013 | -.1120904 .0191143 -5.86 0.000 -.149555 -.0746258
2014 | -.0969339 .0205776 -4.71 0.000 -.1372666 -.0566011
2015 | -.0739467 .0222961 -3.32 0.001 -.1176479 -.0302455
|
_cons | 1.564629 .0753316 20.77 0.000 1.416976 1.712281
-------------+----------------------------------------------------------------
pid | absorbed (33894 categories)
.
. * two-way-fixed-effect
. xtreg lnwage pgbilzeit c.erf##c.erf pgexpue ost frau i.pgallbet i.syear if a
> sample==1 & syear>2002 & pgbilztch==0, fe
note: frau omitted because of collinearity
Fixed-effects (within) regression Number of obs = 137,206
Group variable: pid Number of groups = 29,468
R-sq: Obs per group:
within = 0.0424 min = 1
between = 0.3784 avg = 4.7
overall = 0.3328 max = 13
F(21,107717) = 227.32
corr(u_i, Xb) = 0.0057 Prob > F = 0.0000
------------------------------------------------------------------------------
lnwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pgbilzeit | .0842833 .0031764 26.53 0.000 .0780576 .090509
erf | .0450217 .0015773 28.54 0.000 .0419301 .0481132
|
c.erf#c.erf | -.0006308 .0000188 -33.54 0.000 -.0006676 -.0005939
|
pgexpue | -.0346334 .003403 -10.18 0.000 -.0413032 -.0279637
ost | -.1282438 .0146121 -8.78 0.000 -.1568832 -.0996043
frau | 0 (omitted)
|
pgallbet |
[2] GE 20.. | .05027 .0040706 12.35 0.000 .0422917 .0582483
[3] GE 20.. | .0780984 .0047546 16.43 0.000 .0687794 .0874174
[4] GE 2000 | .0889808 .0049328 18.04 0.000 .0793126 .0986491
[5] Selbs.. | -.0771079 .0073701 -10.46 0.000 -.0915532 -.0626627
|
syear |
2004 | -.0173967 .0042274 -4.12 0.000 -.0256823 -.009111
2005 | -.048454 .0047254 -10.25 0.000 -.0577156 -.0391923
2006 | -.0860305 .0053577 -16.06 0.000 -.0965315 -.0755294
2007 | -.1113415 .0060158 -18.51 0.000 -.1231324 -.0995505
2008 | -.1251727 .0068083 -18.39 0.000 -.1385167 -.1118286
2009 | -.1195701 .0076648 -15.60 0.000 -.1345929 -.1045473
2010 | -.1225387 .008599 -14.25 0.000 -.1393925 -.1056849
2011 | -.1368962 .0093792 -14.60 0.000 -.1552793 -.1185132
2012 | -.1426543 .0102839 -13.87 0.000 -.1628107 -.1224979
2013 | -.1334428 .0110012 -12.13 0.000 -.155005 -.1118806
2014 | -.1116799 .0118414 -9.43 0.000 -.134889 -.0884709
2015 | -.0930661 .0128066 -7.27 0.000 -.1181668 -.0679654
|
_cons | 1.166358 .0445909 26.16 0.000 1.078961 1.253756
-------------+----------------------------------------------------------------
sigma_u | .47311258
sigma_e | .28070512
rho | .7396321 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(29467, 107717) = 9.19 Prob > F = 0.0000
.
Note1: Difficulty with FE-Models in R Computing fixed effects models with a lot of individuals has proven to be very time consuming in r and needs a lot of computing power. Most of the packages for fixed-effects models are written in a way that they can handle panel data regarding a few countries, but they are not well fit for panel data with some thousand individuals like the SOEP data. If you know of packages that are fit for population panel data please let me know on github or twitter @_asilisa_ :).
Note2: *There is also the problem so far, that the p-values seem not to be calculated correctly in for fe models with the plm
package.
Pooled Model
# POLS- and FE-model ------------------------------------------------------
pols <- plm(lnwage ~ pgbilzeit + erf + I(erf^2) + pgexpue + ost + frau + factor(pgallbet),
data = p.asample,
weights = phrf,
model = "pooling")
# other way of writing
# pols <- lm(lnwage ~ pgbilzeit + erf + I(erf^2) + ost + frau,
# weights = phrf,
# data = asample)
tidy(pols)
Fixed effect model
fe <- plm(lnwage ~ pgbilzeit + erf + I(erf^2) + pgexpue + ost + frau + pgallbet,
data = p.asample,
weights = phrf,
model = "within")
fe
##
## Model Formula: lnwage ~ pgbilzeit + erf + I(erf^2) + pgexpue + ost + frau +
## pgallbet
##
## Coefficients:
## pgbilzeit erf I(erf^2) pgexpue ost pgallbet
## 0.068041 0.073147 -0.000722 -0.010591 -0.139414 0.008868
# t.fe <- tidy(fe) # takes too long
# display results
# smy_fe <- summary(fe) # takes too long
# summary(fe, robust = T) # like the robust function in STATA
# just show general coefficients and no dummies
# smy_fe$coef[1:5,] # takes too long
Fixed effects like areg in STATA with the felm
package. You can find more information on different models in R and STATA at this awesome website
# fixed effects like areg in STATA
fe_areg <- felm(lnwage ~ pgbilzeit + erf + I(erf^2) + ost
| pid | 0 | pid, data = asample, weights = asample$phrf)
fe_areg
## pgbilzeit erf I(erf^2) ost
## 0.066675 0.072787 -0.000713 -0.140869
# areg y x1 [w=x3], a(id1) cl(id1)
# felm(y ~ x1 | id1 | 0 | id1, df, weight = x3))
output as nice table (takes too long to draw at the moment)
# stargazer(fe, pols, title="Results", align=TRUE) # takes too long
# Pooled Model
# LM test for random effects versus OLS
plmtest(pols)
##
## Lagrange Multiplier Test - (Honda) for unbalanced panels
##
## data: lnwage ~ pgbilzeit + erf + I(erf^2) + ost + frau
## normal = 500, p-value <2e-16
## alternative hypothesis: significant effects
# FE Model
# coeftest(fe, vcov.=vcovHC(fe,type="HC1")) # takes too long
info on fixed effects / LSDV Reg https://stats.stackexchange.com/questions/41916/within-model-with-plm-package