************************************* * Metodos Cuantitativos II * Laboratorio 15-nov-2005 ************************************* /* En esta sesión vimos algunos ejemplos de MAXIMUM LIKELIHOOD ESTIMATION usando comandos como: LOGIT / PROBIT / DPROBIT MLOGIT / OPROBIT */ . use "C:\Documents and Settings\computob1\Escritorio\MROZ.DTA", clear ** Esta base intenta explicar la participacion de la mujer en el mercado laboral . desc Contains data from C:\Documents and Settings\computob1\Escritorio\MROZ.DTA obs: 753 vars: 22 2 Mar 1999 11:30 size: 39,909 (98.1% of memory free) ------------------------------------------------------------------------------ > - storage display value variable name type format label variable label ------------------------------------------------------------------------------ > - inlf byte %9.0g =1 if in lab frce, 1975 hours int %9.0g hours worked, 1975 kidslt6 byte %9.0g # kids < 6 years kidsge6 byte %9.0g # kids 6-18 age byte %9.0g woman's age in yrs educ byte %9.0g years of schooling wage float %9.0g est. wage from earn, hrs repwage float %9.0g rep. wage at interview in 1976 hushrs int %9.0g hours worked by husband, 1975 husage byte %9.0g husband's age huseduc byte %9.0g husband's years of schooling huswage float %9.0g husband's hourly wage, 1975 faminc float %9.0g family income, 1975 mtr float %9.0g fed. marg. tax rte facing woman motheduc byte %9.0g mother's years of schooling fatheduc byte %9.0g father's years of schooling unem float %9.0g unem. rate in county of resid. city byte %9.0g =1 if live in SMSA exper byte %9.0g actual labor mkt exper nwifeinc float %9.0g (faminc - wage*hours)/1000 lwage float %9.0g log(wage) expersq int %9.0g exper^2 ------------------------------------------------------------------------------ > - Sorted by: . summ Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- inlf | 753 .5683931 .4956295 0 1 hours | 753 740.5764 871.3142 0 4950 kidslt6 | 753 .2377158 .523959 0 3 kidsge6 | 753 1.353254 1.319874 0 8 age | 753 42.53785 8.072574 30 60 -------------+-------------------------------------------------------- educ | 753 12.28685 2.280246 5 17 wage | 428 4.177682 3.310282 .1282 25 repwage | 753 1.849734 2.419887 0 9.98 hushrs | 753 2267.271 595.5666 175 5010 husage | 753 45.12085 8.058793 30 60 -------------+-------------------------------------------------------- huseduc | 753 12.49137 3.020804 3 17 huswage | 753 7.482179 4.230559 .4121 40.509 faminc | 753 23080.59 12190.2 1500 96000 mtr | 753 .6788632 .0834955 .4415 .9415 motheduc | 753 9.250996 3.367468 0 17 -------------+-------------------------------------------------------- fatheduc | 753 8.808765 3.57229 0 17 unem | 753 8.623506 3.114934 3 14 city | 753 .6427623 .4795042 0 1 exper | 753 10.63081 8.06913 0 45 nwifeinc | 753 20.12896 11.6348 -.0290575 96 -------------+-------------------------------------------------------- lwage | 428 1.190173 .7231978 -2.054164 3.218876 expersq | 753 178.0385 249.6308 0 2025 ** Matriz de correlaciones . corr kids* educ hus* *educ (obs=753) | kidslt6 kidsge6 educ hushrs husage huseduc huswage -------------+--------------------------------------------------------------- kidslt6 | 1.0000 kidsge6 | 0.0842 1.0000 educ | 0.1087 -0.0589 1.0000 hushrs | 0.0243 0.0994 0.0789 1.0000 husage | -0.4430 -0.3502 -0.1335 -0.0954 1.0000 huseduc | 0.1336 0.0094 0.6120 0.1078 -0.1953 1.0000 huswage | 0.0324 -0.0297 0.2849 -0.2360 0.0197 0.3947 1.0000 motheduc | 0.1078 0.0324 0.4353 0.0534 -0.2275 0.3245 0.1267 fatheduc | 0.0961 -0.0268 0.4425 0.0503 -0.1350 0.3667 0.1932 | educ huseduc motheduc fatheduc -------------+------------------------------------ educ | 1.0000 huseduc | 0.6120 1.0000 motheduc | 0.4353 0.3245 1.0000 fatheduc | 0.4425 0.3667 0.5731 1.0000 ** "Modelo probabilistico lineal" para una variable dicotomica ************************************************************** ** Un Modelo OLS (erroneo) para la variable binaria INLF (in labor force): . . reg inlf kids* hus* *educ Source | SS df MS Number of obs = 753 -------------+------------------------------ F( 9, 743) = 14.59 Model | 27.7399587 9 3.08221763 Prob > F = 0.0000 Residual | 156.987797 743 .211289094 R-squared = 0.1502 -------------+------------------------------ Adj R-squared = 0.1399 Total | 184.727756 752 .245648611 Root MSE = .45966 ------------------------------------------------------------------------------ inlf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- kidslt6 | -.3000199 .0358927 -8.36 0.000 -.3704831 -.2295567 kidsge6 | -.0084823 .0137492 -0.62 0.537 -.0354741 .0185096 educ | .0600492 .0099935 6.01 0.000 .0404302 .0796681 hushrs | -.0001039 .0000299 -3.48 0.001 -.0001626 -.0000452 husage | -.0124777 .0025607 -4.87 0.000 -.0175047 -.0074507 huseduc | -.0077128 .0075752 -1.02 0.309 -.0225842 .0071586 huswage | -.0168186 .0045686 -3.68 0.000 -.0257875 -.0078498 motheduc | .0011592 .0063672 0.18 0.856 -.0113407 .013659 fatheduc | -.0021217 .0059865 -0.35 0.723 -.0138741 .0096307 _cons | .9421025 .1805773 5.22 0.000 .5876 1.296605 ------------------------------------------------------------------------------ ***************************** ** Un MODELO PROBIT para INLF . probit inlf kids* hus* *educ, nolog // La opcion nolog omite reportar las iteraciones Probit estimates Number of obs = 753 LR chi2(9) = 123.93 Prob > chi2 = 0.0000 Log likelihood = -452.90987 Pseudo R2 = 0.1203 ------------------------------------------------------------------------------ inlf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- kidslt6 | -.9043187 .1152472 -7.85 0.000 -1.130199 -.6784383 kidsge6 | -.0272939 .0395706 -0.69 0.490 -.104851 .0502631 educ | .1752229 .0298124 5.88 0.000 .1167916 .2336542 hushrs | -.0003071 .0000869 -3.54 0.000 -.0004773 -.0001368 husage | -.0371826 .0076722 -4.85 0.000 -.0522197 -.0221454 huseduc | -.0226122 .0218456 -1.04 0.301 -.0654287 .0202044 huswage | -.0516179 .0140862 -3.66 0.000 -.0792264 -.0240094 motheduc | .0027484 .0183272 0.15 0.881 -.0331724 .0386691 fatheduc | -.0045866 .0173661 -0.26 0.792 -.0386236 .0294503 _cons | 1.344228 .5340123 2.52 0.012 .2975836 2.390873 ------------------------------------------------------------------------------ ** Obteniendo las probabilidades predichas: . predict yprobit (option p assumed; Pr(inlf)) . hist yprobit (bin=27, start=.00139169, width=.0356881) ************************* ** MODELO LOGIT para INLF . logit inlf kids* hus* *educ, nolog Logit estimates Number of obs = 753 LR chi2(9) = 124.09 Prob > chi2 = 0.0000 Log likelihood = -452.82882 Pseudo R2 = 0.1205 ------------------------------------------------------------------------------ inlf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- kidslt6 | -1.496063 .1974564 -7.58 0.000 -1.88307 -1.109055 kidsge6 | -.0462703 .0660042 -0.70 0.483 -.1756362 .0830956 educ | .2890505 .0504301 5.73 0.000 .1902093 .3878918 hushrs | -.000515 .0001467 -3.51 0.000 -.0008025 -.0002274 husage | -.0620702 .012936 -4.80 0.000 -.0874244 -.036716 huseduc | -.0357898 .0362697 -0.99 0.324 -.106877 .0352975 huswage | -.0851693 .0234725 -3.63 0.000 -.1311745 -.0391641 motheduc | .0062894 .0303268 0.21 0.836 -.0531499 .0657288 fatheduc | -.0089954 .0288539 -0.31 0.755 -.065548 .0475573 _cons | 2.246155 .8927149 2.52 0.012 .4964656 3.995844 ------------------------------------------------------------------------------ ** NOTA IMPORTANTE ** El impacto marginal de coeficientes de LOGIT/PROBIT no es constante a lo largo de ** la distribucion acumulada de probabilidad--que tiene forma de "S" ** Por lo que el impacto marginal de cualquier variable depende de si estamos ** en la parte baja, intermedia o alta de la curva de probabilidad ** Pero podemos evaluar el impacto marginal de las variables indep en, digamos, ** los valores medios de nuestra muestra: . dprobit inlf kids* hus* *educ, nolog Probit estimates Number of obs = 753 LR chi2(9) = 123.93 Prob > chi2 = 0.0000 Log likelihood = -452.90987 Pseudo R2 = 0.1203 ------------------------------------------------------------------------------ inlf | dF/dx Std. Err. z P>|z| x-bar [ 95% C.I. ] ---------+-------------------------------------------------------------------- kidslt6 | -.3544754 .0452918 -7.85 0.000 .237716 -.443246 -.265705 kidsge6 | -.0106987 .0155093 -0.69 0.490 1.35325 -.041096 .019699 educ | .068684 .0116761 5.88 0.000 12.2869 .045799 .091569 hushrs | -.0001204 .0000341 -3.54 0.000 2267.27 -.000187 -.000054 husage | -.0145749 .0030039 -4.85 0.000 45.1208 -.020462 -.008687 huseduc | -.0088635 .0085614 -1.04 0.301 12.4914 -.025644 .007916 huswage | -.0202332 .0055256 -3.66 0.000 7.48218 -.031063 -.009403 motheduc | .0010773 .0071839 0.15 0.881 9.251 -.013003 .015157 fatheduc | -.0017979 .0068072 -0.26 0.792 8.80876 -.01514 .011544 ---------+-------------------------------------------------------------------- obs. P | .5683931 pred. P | .5744202 (at x-bar) ------------------------------------------------------------------------------ z and P>|z| are the test of the underlying coefficient being 0 ** DPROBIT calcula un modelo probit y reporta el impacto marginal de cada variable ** en la prob(Y=1), estimado en los valores medios de cada variable ** pero tu puedes ajustar los valores del escenario a simular ** Otra opcion es el comando MFX... . ****************************************************** . ** USANDO LOGIT, MLOGIT y OPROBIT con ENCUESTAS ELECTORALES . ****************************************************** . . use "D:\Mis documentos\Docencia\Clases\AnEmpirico\NES92_clean.dta", clear . . ** Esta es una encuesta post-electoral de la eleccion presidencial de EU en 1992 . ** Donde contendieron Bush papa, Clinton y Perot . . ** Las variables dependientes de interes son: . ** Por quien voto el encuestado (Bush, Clinton, Perot) . ** El nivel de approval de Bush . . ** Estadistica descriptiva . desc Contains data from D:\Mis documentos\Docencia\Clases\AnEmpirico\NES92_clean.dta obs: 750 vars: 22 24 Nov 2004 00:59 size: 69,000 (93.4% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- vote float %9.0g Vote 92: Bush, Clinton, Perot bushapp float %9.0g Bush Approval, 1992 bplace float %9.0g Bush Lib/Con cplace float %9.0g Clinton Lib/Con pplace float %9.0g Perot Lib/Con distbush float %9.0g R-Bush Lib/Con Dist distperot float %9.0g R-Perot Lib/Con Dist oppmilitary float %9.0g Opposition to Use of Military F warok float %9.0g Gulf War Worth Cost education float %9.0g Years of School govemployee float %9.0g Government Employee union float %9.0g Union Household nonwhite float %9.0g Nonwhite place float %9.0g R-Lib/Con distclinton float %9.0g R-Clinton Lib/Con Dist badecon float %9.0g Economy WORSE? partyID float %9.0g PartyID income float %9.0g FamilyIncome, $1000 r1 float %9.0g Pr(v2==0) r2 float %9.0g Pr(v2==1) r3 float %9.0g Pr(v2==2) r4 float %9.0g Pr(v2==3) ------------------------------------------------------------------------------- Sorted by: . summ Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- vote | 601 .8169717 .7186409 0 2 bushapp | 727 1.250344 1.078488 0 3 bplace | 678 5.150442 1.404075 1 7 cplace | 666 3.096096 1.340996 1 7 pplace | 587 4.381601 1.753846 1 7 -------------+-------------------------------------------------------- distbush | 575 2.106087 1.560917 0 6 distperot | 508 2.062992 1.414203 0 5 oppmilitary | 742 2.963612 .8279998 1 5 warok | 713 .5834502 .493333 0 1 education | 745 13.5651 2.574976 2 17 -------------+-------------------------------------------------------- govemployee | 750 .136 .3430173 0 1 union | 750 .1653333 .371729 0 1 nonwhite | 750 .1346667 .341595 0 1 place | 599 4.245409 1.730566 1 7 distclinton | 560 2.067857 1.528162 0 6 -------------+-------------------------------------------------------- badecon | 740 4.014865 .9061782 1 5 partyID | 742 -.1091644 1.999378 -3 3 income | 695 41.92734 30.33958 1.5 140 r1 | 745 .2722288 .0458215 .1040918 .3370141 r2 | 745 .3117232 .0111072 .228696 .3205123 -------------+-------------------------------------------------------- r3 | 745 .3454362 .0352465 .2958629 .4669528 r4 | 745 .0706118 .0211353 .0466106 .2002594 . . ** Matriz de correlaciones . correlate bushapp *place dist* badecon warok (obs=462) | bushapp bplace cplace pplace place distbush distpe~t distcl~n -------------+------------------------------------------------------------------------ bushapp | 1.0000 bplace | -0.0741 1.0000 cplace | -0.2042 -0.2634 1.0000 pplace | -0.0880 0.1096 -0.0251 1.0000 place | 0.4437 -0.2006 -0.1036 -0.0342 1.0000 distbush | -0.5019 0.1227 0.1036 0.0379 -0.6696 1.0000 distperot | -0.0453 0.0025 -0.0397 0.0016 -0.2360 0.3104 1.0000 distclinton | 0.4537 -0.0729 -0.3458 -0.0773 0.6029 -0.4676 -0.0986 1.0000 badecon | -0.4023 0.0622 0.1799 0.0426 -0.2571 0.2573 -0.0243 -0.2592 warok | 0.3921 0.0246 -0.2666 -0.1293 0.3073 -0.3513 -0.1158 0.2910 | badecon warok -------------+------------------ badecon | 1.0000 warok | -0.2654 1.0000 . tab vote Vote 92: | Bush, | Clinton, | Perot | Freq. Percent Cum. ------------+----------------------------------- 0 | 220 36.61 36.61 1 | 271 45.09 81.70 2 | 110 18.30 100.00 ------------+----------------------------------- Total | 601 100.00 . . ** Regresiones para predecir la intencion de voto de cada candidato en particular . . ** La variable VOTE es categorica: 0=si voto x bush, 1=clinton, 2=perot . ** ...entonces necesitamos generar una dummy para el votoclinton y votobush, . **respectivamente . . gen votclinton = vote==1 // genera una dummy para voto por clinton . replace votclinton=. if vote==. // para eliminar a los que no contestaron la pregunta original (149 real changes made, 149 to missing) . . gen votbush = vote==0 . replace votbush = . if vote==. (149 real changes made, 149 to missing) . . ** Ahora si, corremos logits (o probits) para la variable binaria "voto por clinton" . . logit votclinton place dist*, nolog Logit estimates Number of obs = 420 LR chi2(4) = 214.45 Prob > chi2 = 0.0000 Log likelihood = -177.34462 Pseudo R2 = 0.3768 ------------------------------------------------------------------------------ votclinton | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- place | -.1440408 .1046964 -1.38 0.169 -.349242 .0611604 distbush | .5701659 .1184886 4.81 0.000 .3379325 .8023992 distperot | .0876924 .1012567 0.87 0.386 -.1107671 .286152 distclinton | -.7129207 .1199763 -5.94 0.000 -.94807 -.4777714 _cons | .1768722 .6231743 0.28 0.777 -1.044527 1.398272 ------------------------------------------------------------------------------ . . ** .. y para "voto por bush" . logit votbush place dist*, nolog Logit estimates Number of obs = 420 LR chi2(4) = 216.82 Prob > chi2 = 0.0000 Log likelihood = -171.17222 Pseudo R2 = 0.3878 ------------------------------------------------------------------------------ votbush | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- place | -.0446103 .1252357 -0.36 0.722 -.2900677 .2008471 distbush | -1.045509 .1487069 -7.03 0.000 -1.336969 -.7540491 distperot | .3907418 .1143603 3.42 0.001 .1665997 .6148839 distclinton | .6604828 .1221056 5.41 0.000 .4211603 .8998053 _cons | -.7793094 .68421 -1.14 0.255 -2.120336 .5617174 ------------------------------------------------------------------------------ . . /* > MULTINOMIAL LOGIT > Ahora, tambien podemos correr regresiones donde busquemos explicar que factores > afectan la probabilidad de votar por un candidato comparado con la prob de votar por otro mas: > Noten que las categorias de voto NO son ordinales, pues no representan intensidad o magnitud > sino solamente opciones mutuamente excluyentes pero no jerarquicas... > > Ejemplo: "¿si no te parece la guerra, es mas o menos probable que votes por Bush o por Perot, > comparado con el grupo que vota por Clinton?" > > Esto se hace con un modelo MULTINOMIAL LOGIT (mlogit) donde simultaneamente se calcula un > conjunto de parametros para cara valor de la variable categorica VOTO, usando una de las > categorias como base o "grupo de comparacion" > */ . . *** Modelo Multinomial Logit usando el voto x bush como categoria base: . . mlogit vote *place distclinton badecon partyID educ income, basecategory(0) nolog Multinomial logistic regression Number of obs = 401 LR chi2(18) = 328.36 Prob > chi2 = 0.0000 Log likelihood = -258.25423 Pseudo R2 = 0.3887 ------------------------------------------------------------------------------ vote | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1 | bplace | .0607785 .1677342 0.36 0.717 -.2679745 .3895316 cplace | .3719276 .1771238 2.10 0.036 .0247714 .7190838 pplace | .1405128 .1116947 1.26 0.208 -.0784047 .3594304 place | -.2572819 .1419219 -1.81 0.070 -.5354437 .02088 distclinton | -.7401637 .180056 -4.11 0.000 -1.093067 -.3872605 badecon | .3580413 .2175745 1.65 0.100 -.068397 .7844796 partyID | -1.011497 .1248446 -8.10 0.000 -1.256187 -.7668057 education | -.1107004 .0960011 -1.15 0.249 -.2988592 .0774583 income | .006356 .0061135 1.04 0.298 -.0056263 .0183384 _cons | .4299114 2.216729 0.19 0.846 -3.914799 4.774621 -------------+---------------------------------------------------------------- 2 | bplace | -.0535627 .1384878 -0.39 0.699 -.3249938 .2178684 cplace | .1839556 .1535936 1.20 0.231 -.1170823 .4849935 pplace | .0429136 .0950144 0.45 0.652 -.1433111 .2291383 place | -.0933798 .1300544 -0.72 0.473 -.3482817 .1615221 distclinton | -.2345565 .152455 -1.54 0.124 -.5333628 .0642498 badecon | .1618158 .1780369 0.91 0.363 -.1871302 .5107618 partyID | -.3850738 .1036652 -3.71 0.000 -.5882537 -.1818938 education | -.1712072 .0818954 -2.09 0.037 -.3317192 -.0106951 income | .0050459 .0051018 0.99 0.323 -.0049534 .0150453 _cons | 2.074928 1.86533 1.11 0.266 -1.581051 5.730906 ------------------------------------------------------------------------------ (Outcome vote==0 is the comparison group) . . ** La opcion basecategory() se usa para decirle a Stata quien sera el . ** "comparison group" contra el cual se interpretaran los coeficientes . . ** Noten que ahora tenemos dos conjuntos de coeficientes, uno para explicar prob(Votoclinton) . ** y otro para prob(votoPerot)--ambos con respecto a la prob(votoBush) . . ** Ahora podemos hacer pruebas de hipotesis con ambos conjuntos de coeficientes: . . test [1]party= [2]party // Compara si el partyID tiene el mismo impacto en ambos grupos o no ( 1) [1]partyID - [2]partyID = 0 chi2( 1) = 32.67 Prob > chi2 = 0.0000 . test [1]distclinton = [2]distclinton ( 1) [1]distclinton - [2]distclinton = 0 chi2( 1) = 10.24 Prob > chi2 = 0.0014 . ** En ambos casos rechazamos la hipotesis nula de que los coeficientes son iguales . ** entre ambos grupos . . ** Los coeficientes de mlogit permiten hacer comparaciones interesantes, como indicar . ** cómo cambia la probabilidad de que apoyes a uno u otro candidato conforme cambia otra variable. . . . /* > ORDERED PROBIT - "PRESIDENTIAL APPROVAL" DE BUSH PAPA > Ahora, supongamos que queremos medir los determinantes de la aprobacion de Bush papa > Esta es una variable ORDINAL que va de muy poca (0) a mucha aprobacion (3) > Para hacerlo usaremos el comando oprobit -- ordered probit > aunque podrian usar ologit--ordered logit--y obtener resultados similares > */ . . . summ bushapp, detail Bush Approval, 1992 ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 727 25% 0 0 Sum of Wgt. 727 50% 1 Mean 1.250344 Largest Std. Dev. 1.078488 75% 2 3 90% 3 3 Variance 1.163137 95% 3 3 Skewness .2092042 99% 3 3 Kurtosis 1.719896 . . hist bushapp (bin=26, start=0, width=.11538462) . . oprobit bushapp place distclinton badecon party educ income nonwhite, robust nolog Ordered probit estimates Number of obs = 517 Wald chi2(7) = 238.28 Prob > chi2 = 0.0000 Log pseudolikelihood = -554.77769 Pseudo R2 = 0.2009 ------------------------------------------------------------------------------ | Robust bushapp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- place | .0971676 .0382998 2.54 0.011 .0221015 .1722338 distclinton | .1048759 .0455467 2.30 0.021 .0156061 .1941457 badecon | -.3540678 .0610805 -5.80 0.000 -.4737834 -.2343521 partyID | .2643179 .0343609 7.69 0.000 .1969718 .3316641 education | -.0549433 .0256694 -2.14 0.032 -.1052544 -.0046322 income | .0020283 .001666 1.22 0.223 -.001237 .0052935 nonwhite | -.2210481 .1712891 -1.29 0.197 -.5567687 .1146724 -------------+---------------------------------------------------------------- _cut1 | -2.123428 .4939492 (Ancillary parameters) _cut2 | -1.29324 .4871919 _cut3 | -.0153406 .4897463 ------------------------------------------------------------------------------ . . ** Con estos resultados podemos hacer una serie de simulaciones interesantes. . . ** Por ejemplo, si quieres saber como cambia la probabilidad de que apruebes . ** a Bush a distintos niveles ("0=odiar, 1=medio mal, 2=medio bien, y 3=amar a Bush papa") . ** conforme aumenta la escolaridad de los encuestados, pero manteniendo todos los . ** demas factores fijos en sus medias. . . scatter r1 r2 r3 r4 educ // grafica las probabilidades predichas de approval contra educacion . . ** A esto le llamamos "predicted probabilities analysis" y la forma más sencilla de hacerlos . ** es usando las rutinas de CLARIFY que veremos enseguida .