---------------------------------------------------------------------------------------------------------- log: April21.txt log type: text opened on: 21 Apr 2005, 15:18:30 . use "D:\MROZ.DTA", clear // This is data on wages of women . desc Contains data from D:\MROZ.DTA obs: 753 vars: 22 2 Sep 1996 16:04 size: 39,909 (96.2% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- inlf byte %9.0g =1 if in lab frce, 1975 hours int %9.0g hours worked, 1975 kidslt6 byte %9.0g # kids < 6 years kidsge6 byte %9.0g # kids 6-18 age byte %9.0g woman's age in yrs educ byte %9.0g years of schooling wage float %9.0g est. wage from earn, hrs repwage float %9.0g rep. wage at interview in 1976 hushrs int %9.0g hours worked by husband, 1975 husage byte %9.0g husband's age huseduc byte %9.0g husband's years of schooling huswage float %9.0g husband's hourly wage, 1975 faminc float %9.0g family income, 1975 mtr float %9.0g fed. marg. tax rte facing woman motheduc byte %9.0g mother's years of schooling fatheduc byte %9.0g father's years of schooling unem float %9.0g unem. rate in county of resid. city byte %9.0g =1 if live in SMSA exper byte %9.0g actual labor mkt exper nwifeinc float %9.0g (faminc - wage*hours)/1000 lwage float %9.0g log(wage) expersq int %9.0g exper^2 ------------------------------------------------------------------------------- Sorted by: ** Drop observations of women not in labor force: . drop if inlf (428 observations deleted) (1) OLS: A biased structural equation: . regress lwage educ exper expersq Source | SS df MS Number of obs = 428 -------------+------------------------------ F( 3, 424) = 26.29 Model | 35.0223023 3 11.6741008 Prob > F = 0.0000 Residual | 188.305149 424 .444115917 R-squared = 0.1568 -------------+------------------------------ Adj R-squared = 0.1509 Total | 223.327451 427 .523015108 Root MSE = .66642 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .1074896 .0141465 7.60 0.000 .0796837 .1352956 exper | .0415665 .0131752 3.15 0.002 .0156697 .0674633 expersq | -.0008112 .0003932 -2.06 0.040 -.0015841 -.0000382 _cons | -.5220407 .1986321 -2.63 0.009 -.9124668 -.1316145 ------------------------------------------------------------------------------ **... But we suspect that education is endogenous with wages... ** What can be good instruments? Parents and husband education? . regress lwage educ exper expersq motheduc fatheduc huseduc Source | SS df MS Number of obs = 428 -------------+------------------------------ F( 6, 421) = 13.81 Model | 36.7253298 6 6.1208883 Prob > F = 0.0000 Residual | 186.602121 421 .443235443 R-squared = 0.1644 -------------+------------------------------ Adj R-squared = 0.1525 Total | 223.327451 427 .523015108 Root MSE = .66576 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .1275808 .0186496 6.84 0.000 .0909228 .1642388 exper | .0413839 .0131632 3.14 0.002 .01551 .0672577 expersq | -.0008434 .0003933 -2.14 0.033 -.0016164 -.0000703 motheduc | -.0157719 .0119842 -1.32 0.189 -.0393282 .0077843 fatheduc | -.0043324 .0114794 -0.38 0.706 -.0268966 .0182318 huseduc | -.0109268 .0133371 -0.82 0.413 -.0371424 .0152887 _cons | -.4395985 .2041974 -2.15 0.032 -.8409718 -.0382251 ------------------------------------------------------------------------------ ** Note that parents and husband education do not explain lwage... ** But do they explain educ? . regress educ exper expersq motheduc fatheduc huseduc Source | SS df MS Number of obs = 428 -------------+------------------------------ F( 5, 422) = 63.30 Model | 955.830608 5 191.166122 Prob > F = 0.0000 Residual | 1274.36565 422 3.01982382 R-squared = 0.4286 -------------+------------------------------ Adj R-squared = 0.4218 Total | 2230.19626 427 5.22294206 Root MSE = 1.7378 ------------------------------------------------------------------------------ educ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- exper | .0374977 .0343102 1.09 0.275 -.0299424 .1049379 expersq | -.0006002 .0010261 -0.58 0.559 -.0026171 .0014167 motheduc | .1141532 .0307835 3.71 0.000 .0536452 .1746613 fatheduc | .1060801 .0295153 3.59 0.000 .0480648 .1640955 huseduc | .3752548 .0296347 12.66 0.000 .3170049 .4335048 _cons | 5.538311 .4597824 12.05 0.000 4.634562 6.44206 ------------------------------------------------------------------------------ ** Yes, they do. So they are good instruments: ** uncorrelated with lwage, but correlated with educ ** 2. Assuming mother/father/husband education are good instruments for the ** education of each woman in sample, we can do a Hausman test of endogeneity: (2) Hausman test ** 2.1 Estimate a reduced/1st stage regression . regress educ exper expersq motheduc fatheduc huseduc Source | SS df MS Number of obs = 428 -------------+------------------------------ F( 5, 422) = 63.30 Model | 955.830608 5 191.166122 Prob > F = 0.0000 Residual | 1274.36565 422 3.01982382 R-squared = 0.4286 -------------+------------------------------ Adj R-squared = 0.4218 Total | 2230.19626 427 5.22294206 Root MSE = 1.7378 ------------------------------------------------------------------------------ educ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- exper | .0374977 .0343102 1.09 0.275 -.0299424 .1049379 expersq | -.0006002 .0010261 -0.58 0.559 -.0026171 .0014167 motheduc | .1141532 .0307835 3.71 0.000 .0536452 .1746613 fatheduc | .1060801 .0295153 3.59 0.000 .0480648 .1640955 huseduc | .3752548 .0296347 12.66 0.000 .3170049 .4335048 _cons | 5.538311 .4597824 12.05 0.000 4.634562 6.44206 ------------------------------------------------------------------------------ ** 2.2 Capture the residuals of reduced equation: . predict edu_res, res ** 2.3 Introduce 1st stage residuals (which may be contaminated of endogenous ** stuff!) into the structural equation: . regress lwage educ exper expersq edu_res Source | SS df MS Number of obs = 428 -------------+------------------------------ F( 4, 423) = 20.48 Model | 36.230504 4 9.05762599 Prob > F = 0.0000 Residual | 187.096947 423 .442309568 R-squared = 0.1622 -------------+------------------------------ Adj R-squared = 0.1543 Total | 223.327451 427 .523015108 Root MSE = .66506 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .0803918 .0216362 3.72 0.000 .0378639 .1229197 exper | .0430973 .013181 3.27 0.001 .017189 .0690057 expersq | -.0008628 .0003937 -2.19 0.029 -.0016366 -.000089 edu_res | .047189 .0285519 1.65 0.099 -.0089322 .1033102 _cons | -.1868574 .2835905 -0.66 0.510 -.7442794 .3705647 ------------------------------------------------------------------------------ . test edu_res ( 1) edu_res = 0 F( 1, 423) = 2.73 Prob > F = 0.0991 ** The residuals are significant at 10% --> We cannot rule out endogeneity ** See also the "ivendog" command (dowloadable) ** 3. Using IV regression to control for endogeneity: ** (mother/father/husband education are instruments for educ) (3) 2SLS . ivreg lwage (educ = motheduc fatheduc huseduc) exper expersq Instrumental variables (2SLS) regression Source | SS df MS Number of obs = 428 -------------+------------------------------ F( 3, 424) = 11.52 Model | 33.3927427 3 11.1309142 Prob > F = 0.0000 Residual | 189.934709 424 .447959218 R-squared = 0.1495 -------------+------------------------------ Adj R-squared = 0.1435 Total | 223.327451 427 .523015108 Root MSE = .6693 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .0803918 .021774 3.69 0.000 .0375934 .1231901 exper | .0430973 .0132649 3.25 0.001 .0170242 .0691704 expersq | -.0008628 .0003962 -2.18 0.030 -.0016415 -.0000841 _cons | -.1868574 .2853959 -0.65 0.513 -.7478243 .3741096 ------------------------------------------------------------------------------ Instrumented: educ Instruments: exper expersq motheduc fatheduc huseduc ------------------------------------------------------------------------------ ** The previous command only shows you the 2nd stage regression, if you want ** to look at both 1st and 2nd stages, try: . ivreg lwage (educ = motheduc fatheduc huseduc) exper expersq, first First-stage regressions ----------------------- Source | SS df MS Number of obs = 428 -------------+------------------------------ F( 5, 422) = 63.30 Model | 955.830608 5 191.166122 Prob > F = 0.0000 Residual | 1274.36565 422 3.01982382 R-squared = 0.4286 -------------+------------------------------ Adj R-squared = 0.4218 Total | 2230.19626 427 5.22294206 Root MSE = 1.7378 ------------------------------------------------------------------------------ educ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- exper | .0374977 .0343102 1.09 0.275 -.0299424 .1049379 expersq | -.0006002 .0010261 -0.58 0.559 -.0026171 .0014167 motheduc | .1141532 .0307835 3.71 0.000 .0536452 .1746613 fatheduc | .1060801 .0295153 3.59 0.000 .0480648 .1640955 huseduc | .3752548 .0296347 12.66 0.000 .3170049 .4335048 _cons | 5.538311 .4597824 12.05 0.000 4.634562 6.44206 ------------------------------------------------------------------------------ Instrumental variables (2SLS) regression Source | SS df MS Number of obs = 428 -------------+------------------------------ F( 3, 424) = 11.52 Model | 33.3927368 3 11.1309123 Prob > F = 0.0000 Residual | 189.934704 424 .447959208 R-squared = 0.1495 -------------+------------------------------ Adj R-squared = 0.1435 Total | 223.327441 427 .523015084 Root MSE = .6693 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .0803918 .021774 3.69 0.000 .0375934 .1231901 exper | .0430973 .0132649 3.25 0.001 .0170242 .0691704 expersq | -.0008628 .0003962 -2.18 0.030 -.0016415 -.0000841 _cons | -.1868572 .2853959 -0.65 0.513 -.7478242 .3741097 ------------------------------------------------------------------------------ Instrumented: educ Instruments: exper expersq motheduc fatheduc huseduc ------------------------------------------------------------------------------ ** 4. Doing 2SLS by yourself... ** (4) My own 2 stages ** 4.1. Run 1st stage/ reduced equation for EDUC: . regress educ exper expersq motheduc fatheduc huseduc Source | SS df MS Number of obs = 428 -------------+------------------------------ F( 5, 422) = 63.30 Model | 955.830608 5 191.166122 Prob > F = 0.0000 Residual | 1274.36565 422 3.01982382 R-squared = 0.4286 -------------+------------------------------ Adj R-squared = 0.4218 Total | 2230.19626 427 5.22294206 Root MSE = 1.7378 ------------------------------------------------------------------------------ educ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- exper | .0374977 .0343102 1.09 0.275 -.0299424 .1049379 expersq | -.0006002 .0010261 -0.58 0.559 -.0026171 .0014167 motheduc | .1141532 .0307835 3.71 0.000 .0536452 .1746613 fatheduc | .1060801 .0295153 3.59 0.000 .0480648 .1640955 huseduc | .3752548 .0296347 12.66 0.000 .3170049 .4335048 _cons | 5.538311 .4597824 12.05 0.000 4.634562 6.44206 ------------------------------------------------------------------------------ ** 4.2. Capture the PREDICTED values... . predict edu_pre, xb ** 4.3 Estimate 2nd stage/structural equation using edu_pre instead of educ: . regress lwage edu_pre exper expersq Source | SS df MS Number of obs = 428 -------------+------------------------------ F( 3, 424) = 10.53 Model | 15.4878336 3 5.16261119 Prob > F = 0.0000 Residual | 207.839607 424 .490187753 R-squared = 0.0694 -------------+------------------------------ Adj R-squared = 0.0628 Total | 223.327441 427 .523015084 Root MSE = .70013 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- edu_pre | .0803918 .0227772 3.53 0.000 .0356215 .125162 exper | .0430973 .013876 3.11 0.002 .015823 .0703717 expersq | -.0008628 .0004144 -2.08 0.038 -.0016774 -.0000482 _cons | -.1868572 .2985449 -0.63 0.532 -.7736695 .3999552 ------------------------------------------------------------------------------ ** The coefficient on edu_pre is lower than OLS but still significant. ** This means that you reduced some of the endogeneity-bias. ** Note that the coefficients are the same as those from the 2SLS, but Std. Err. values ** are different. MSS and RSS are also different, hence different R2--this is because ** Stata doesnt know we are feeding 1st stage stuff into 2nd regression, and doesnt control ** for lost information in the process. ** It is advised to use the command, ivreg, which corrects for Std. Err. ** Note also the similarities of this regression and the one in step 2.3, ** They are actually very similar regressions, one controls endogeneity via educ_pre, ** the other controls via edu_res. ** Additional issues: ** Testing over-identifying restrictions (can my instruments identify the effect on wages?) ** Download "overid" command from the web. . ivreg lwage (educ = motheduc fatheduc huseduc) exper expersq Instrumental variables (2SLS) regression Source | SS df MS Number of obs = 428 -------------+------------------------------ F( 3, 424) = 11.52 Model | 33.3927427 3 11.1309142 Prob > F = 0.0000 Residual | 189.934709 424 .447959218 R-squared = 0.1495 -------------+------------------------------ Adj R-squared = 0.1435 Total | 223.327451 427 .523015108 Root MSE = .6693 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .0803918 .021774 3.69 0.000 .0375934 .1231901 exper | .0430973 .0132649 3.25 0.001 .0170242 .0691704 expersq | -.0008628 .0003962 -2.18 0.030 -.0016415 -.0000841 _cons | -.1868574 .2853959 -0.65 0.513 -.7478243 .3741096 ------------------------------------------------------------------------------ Instrumented: educ Instruments: exper expersq motheduc fatheduc huseduc ------------------------------------------------------------------------------ . overid Tests of overidentifying restrictions: Sargan N*R-sq test 1.115 Chi-sq(2) P-value = 0.5726 Basmann test 1.102 Chi-sq(2) P-value = 0.5763 . overid, all Tests of overidentifying restrictions: Sargan N*R-sq test 1.115 Chi-sq(2) P-value = 0.5726 Sargan (N-L)*R-sq test 1.105 Chi-sq(2) P-value = 0.5756 Basmann test 1.102 Chi-sq(2) P-value = 0.5763 Sargan pseudo-F test 0.552 F(2,424) P-value = 0.5760 Basmann pseudo-F test 0.551 F(2,422) P-value = 0.5767 ** Endogeneity test: ** Download "ivendog" command . ivendog Tests of endogeneity of: educ H0: Regressor is exogenous Wu-Hausman F test: 2.73157 F(1,423) P-value = 0.09912 Durbin-Wu-Hausman chi-sq test: 2.74613 Chi-sq(1) P-value = 0.09749 ** Notice that the p-value of Hausman test is similar to what we got in ** step 2.3 above.