------------------------------------------------------------------------------------------- log: D:\Mis documentos\heckman_log.txt log type: text opened on: 19 May 2005, 23:47:36 . . *** Heckman Selection Models . ************************************ . . *Downloading dataset from the web: . use http://www.stata-press.com/data/r8/womenwk, clear . . * This is a dataset of women's wages, . * where women who do not work have wage = . (missing) . . * Generating a dummy for those who work--ie, have positive wages: . generate d = 1 . replace d = 0 if wage == . (657 real changes made) . . ** Wages depend on (X vars) education and age... . ** But there is a prior decision to get a job . ** So the labor force participation decision affects the wage observed sample . . ** We need to estimate a "selection equation": . ** Work-decision (a dummy) depends on (Z vars): being married, # of children . ** plus education and age . ** Note that X is a subset of Z--otherwise the system is not identified . . ** 1. Using Heckman (1979) "two-step consistent" procedure: . heckman wage educ age, select (married children educ age) twostep Heckman selection model -- two-step estimates Number of obs = 2000 (regression model with sample selection) Censored obs = 657 Uncensored obs = 1343 Wald chi2(4) = 551.37 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- wage | education | .9825259 .0538821 18.23 0.000 .8769189 1.088133 age | .2118695 .0220511 9.61 0.000 .1686502 .2550888 _cons | .7340391 1.248331 0.59 0.557 -1.712645 3.180723 -------------+---------------------------------------------------------------- select | married | .4308575 .074208 5.81 0.000 .2854125 .5763025 children | .4473249 .0287417 15.56 0.000 .3909922 .5036576 education | .0583645 .0109742 5.32 0.000 .0368555 .0798735 age | .0347211 .0042293 8.21 0.000 .0264318 .0430105 _cons | -2.467365 .1925635 -12.81 0.000 -2.844782 -2.089948 -------------+---------------------------------------------------------------- mills | lambda | 4.001615 .6065388 6.60 0.000 2.812821 5.19041 -------------+---------------------------------------------------------------- rho | 0.67284 sigma | 5.9473529 lambda | 4.0016155 .6065388 ------------------------------------------------------------------------------ . . * Note that stata automatically assumes missing wage cases as unobserved . * And reports the estimated results of both the structural and the selection equations . . ** 2. Using a maximum-likelihood procedure (pretty much the same thing--but . ** ML is a little biased): . heckman wage educ age, select (married children educ age) nolog Heckman selection model Number of obs = 2000 (regression model with sample selection) Censored obs = 657 Uncensored obs = 1343 Wald chi2(2) = 508.44 Log likelihood = -5178.304 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- wage | education | .9899537 .0532565 18.59 0.000 .8855729 1.094334 age | .2131294 .0206031 10.34 0.000 .1727481 .2535108 _cons | .4857752 1.077037 0.45 0.652 -1.625179 2.59673 -------------+---------------------------------------------------------------- select | married | .4451721 .0673954 6.61 0.000 .3130794 .5772647 children | .4387068 .0277828 15.79 0.000 .3842534 .4931601 education | .0557318 .0107349 5.19 0.000 .0346917 .0767718 age | .0365098 .0041533 8.79 0.000 .0283694 .0446502 _cons | -2.491015 .1893402 -13.16 0.000 -2.862115 -2.119915 -------------+---------------------------------------------------------------- /athrho | .8742086 .1014225 8.62 0.000 .6754241 1.072993 /lnsigma | 1.792559 .027598 64.95 0.000 1.738468 1.84665 -------------+---------------------------------------------------------------- rho | .7035061 .0512264 .5885365 .7905862 sigma | 6.004797 .1657202 5.68862 6.338548 lambda | 4.224412 .3992265 3.441942 5.006881 ------------------------------------------------------------------------------ LR test of indep. eqns. (rho = 0): chi2(1) = 61.20 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ . . ** 3. To avoid ambiguity, you also can specify the selection equation: . heckman wage educ age, select (d = married children educ age) twostep nolog Heckman selection model -- two-step estimates Number of obs = 2000 (regression model with sample selection) Censored obs = 657 Uncensored obs = 1343 Wald chi2(4) = 551.37 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- wage | education | .9825259 .0538821 18.23 0.000 .8769189 1.088133 age | .2118695 .0220511 9.61 0.000 .1686502 .2550888 _cons | .7340391 1.248331 0.59 0.557 -1.712645 3.180723 -------------+---------------------------------------------------------------- d | married | .4308575 .074208 5.81 0.000 .2854125 .5763025 children | .4473249 .0287417 15.56 0.000 .3909922 .5036576 education | .0583645 .0109742 5.32 0.000 .0368555 .0798735 age | .0347211 .0042293 8.21 0.000 .0264318 .0430105 _cons | -2.467365 .1925635 -12.81 0.000 -2.844782 -2.089948 -------------+---------------------------------------------------------------- mills | lambda | 4.001615 .6065388 6.60 0.000 2.812821 5.19041 -------------+---------------------------------------------------------------- rho | 0.67284 sigma | 5.9473529 lambda | 4.0016155 .6065388 ------------------------------------------------------------------------------ . . * Same result as in model 1 . . * Recall that the selection equation is just a probit (because Heckman assumes . * a normal distribution for "d") . probit d married children educ age, nolog Probit estimates Number of obs = 2000 LR chi2(4) = 478.32 Prob > chi2 = 0.0000 Log likelihood = -1027.0616 Pseudo R2 = 0.1889 ------------------------------------------------------------------------------ d | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- married | .4308575 .074208 5.81 0.000 .2854125 .5763025 children | .4473249 .0287417 15.56 0.000 .3909922 .5036576 education | .0583645 .0109742 5.32 0.000 .0368555 .0798735 age | .0347211 .0042293 8.21 0.000 .0264318 .0430105 _cons | -2.467365 .1925635 -12.81 0.000 -2.844782 -2.089948 ------------------------------------------------------------------------------ . . * It's the same result as the "selection" model in 3 . . ** Exploring Model 3 . heckman wage educ age, select (d = married children educ age) twostep nolog Heckman selection model -- two-step estimates Number of obs = 2000 (regression model with sample selection) Censored obs = 657 Uncensored obs = 1343 Wald chi2(4) = 551.37 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- wage | education | .9825259 .0538821 18.23 0.000 .8769189 1.088133 age | .2118695 .0220511 9.61 0.000 .1686502 .2550888 _cons | .7340391 1.248331 0.59 0.557 -1.712645 3.180723 -------------+---------------------------------------------------------------- d | married | .4308575 .074208 5.81 0.000 .2854125 .5763025 children | .4473249 .0287417 15.56 0.000 .3909922 .5036576 education | .0583645 .0109742 5.32 0.000 .0368555 .0798735 age | .0347211 .0042293 8.21 0.000 .0264318 .0430105 _cons | -2.467365 .1925635 -12.81 0.000 -2.844782 -2.089948 -------------+---------------------------------------------------------------- mills | lambda | 4.001615 .6065388 6.60 0.000 2.812821 5.19041 -------------+---------------------------------------------------------------- rho | 0.67284 sigma | 5.9473529 lambda | 4.0016155 .6065388 ------------------------------------------------------------------------------ . . predict cndwage, ycond . * ycond calculates the expected value of the dependent variable conditional on the . * dependent variable being observed/selected; E(y | y was observed). . . predict expwage, yexpected . * yexpected calculates the expected value of the dependent variable (y*), where that . * value is taken to be 0 when it is expected to be unobserved; . * y* = P(y observed) * E(y | y was observed). . . * Create an artifact variable (actually, a left-censored variable) . gen wage0 = wage (657 missing values generated) . replace wage0 = 0 if wage >= . (657 real changes made) . * wage0 contains positives wages or zeros when wage was missing . . summarize wage cndwage if wage < . Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- wage | 1343 23.69217 6.305374 5.88497 45.80979 cndwage | 1343 23.69217 3.332615 16.22861 33.78897 . * The mean predicted wage (conditional on being observed) is the same . * as the mean observed wages! . . summarize wage0 expwage Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- wage0 | 2000 15.90929 12.27081 0 45.80979 expwage | 2000 15.90306 5.991076 2.520658 32.44454 . * The mean predicted wage (for the full sample) is the same . * as the mean of the wage0 artifact! . . * ...and now you know it... . . log close log: D:\Mis documentos\heckman_log.txt log type: text closed on: 19 May 2005, 23:47:37 -------------------------------------------------------------------------------------------