-------------------------------------------------------------------------------
       log:  C:\Stata8\3nov.smcl
  log type:  smcl
 opened on:   3 Nov 2004, 11:09:28

. use "C:\Documents and Settings\computob1\Escritorio\Gpa2.dta", clear

. desc

Contains data from C:\Documents and Settings\computob1\Escritorio\Gpa2.dta
  obs:         4,137                          
 vars:            12                          21 Feb 2000 22:13
 size:       157,206 (85.0% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
sat             int    %10.0g                 combined SAT score
tothrs          int    %10.0g                 total hours through fall semest
colgpa          float  %9.0g                  GPA after fall semester
athlete         byte   %8.0g                  =1 if athlete
verbmath        float  %9.0g                  verbal/math SAT score
hsize           double %10.0g                 size grad. class, 100s
hsrank          int    %10.0g                 rank in grad. class
hsperc          float  %9.0g                  100*(hsrank/hsize)
female          byte   %9.0g                  =1 if female
white           byte   %9.0g                  =1 if white
black           byte   %9.0g                  =1 if black
hsizesq         float  %9.0g                  hsize^2
-------------------------------------------------------------------------------
Sorted by:  

. summ

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         sat |      4137    1030.331    139.4014        470       1540
      tothrs |      4137    52.83225    35.32959          6        137
      colgpa |      4137    2.652686    .6586347          0          4
     athlete |      4137    .0468939    .2114371          0          1
    verbmath |      4137    .8805369    .1491229     .25974    1.66667
-------------+--------------------------------------------------------
       hsize |      4137    2.799727    1.736579        .03        9.4
      hsrank |      4137    52.83007    64.68358          1        634
      hsperc |      4137    19.23707    16.56873   .1666667         92
      female |      4137    .4496012    .4975136          0          1
       white |      4137    .9255499    .2625337          0          1
-------------+--------------------------------------------------------
       black |      4137    .0553541    .2286978          0          1
     hsizesq |      4137    10.85345    12.62305      .0009      88.36

* Esta es una base de datos sobre el promedio de calificaciones de universitarios
* al final de su primer semestre

** Regresiones robustas para COLGPA:

. reg colgpa sat verbmath tothrs hsperc hsize, robust

Regression with robust standard errors                 Number of obs =    4137
                                                       F(  5,  4131) =  348.46
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.2872
                                                       Root MSE      =  .55642

------------------------------------------------------------------------------
             |               Robust
      colgpa |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sat |   .0015065   .0000641    23.51   0.000     .0013809    .0016321
    verbmath |   .0100229   .0559992     0.18   0.858    -.0997658    .1198116
      tothrs |   .0018774   .0002476     7.58   0.000      .001392    .0023628
      hsperc |   -.013143   .0005381   -24.42   0.000     -.014198   -.0120879
       hsize |  -.0212912   .0050618    -4.21   0.000    -.0312152   -.0113673
       _cons |   1.304921   .0899305    14.51   0.000     1.128609    1.481233
------------------------------------------------------------------------------

** Todo es significativo, excepto la nota de verbmath (posiblemente es muy colineal
** con el SAT score.
** Noten que el tamaño de la escuela (HSIZE) tiene un coeficiente de signo negativo...

** Para probar si hay una relacion no lineal entre tamaño de prepa y COLGPA
** Introducimos tamaño de escuela al cuadrado (HSIZESQ):

. reg colgpa sat verbmath tothrs hsperc hsize hsizesq, robust

Regression with robust standard errors                 Number of obs =    4137
                                                       F(  6,  4130) =  291.20
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.2880
                                                       Root MSE      =  .55616

------------------------------------------------------------------------------
             |               Robust
      colgpa |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sat |    .001506    .000064    23.52   0.000     .0013805    .0016316
    verbmath |   .0115704   .0559984     0.21   0.836    -.0982166    .1213575
      tothrs |   .0018617   .0002478     7.51   0.000     .0013759    .0023476
      hsperc |  -.0134109   .0005508   -24.35   0.000    -.0144907   -.0123311
       hsize |   -.055768   .0169133    -3.30   0.001    -.0889272   -.0226088
     hsizesq |   .0049779   .0023381     2.13   0.033      .000394    .0095618
       _cons |   1.352526   .0924403    14.63   0.000     1.171294    1.533759
------------------------------------------------------------------------------

. * colgpa= alfa + b1*prepa + b2*prepacuadrado + etc...

* El tamaño de prepa óptimo es aquel donde la derivada sea igual a cero:
. ** prepa* = beta1 / 2*beta2

. display -.055 / (2*.0049)
-5.6122449

* Para hacer operaciones con los parametros de la ultima regresion en memoria:
. display _b[hsize] / (2*_b[hsizesq])
-5.6015491

** Es una escuela de tamaño negativo! 
** Dado que no existen escuelas con tamaño negativo, para todos fines practicos, las 
** escuelas tienen un impacto positivo en el COLGPA aunque muy pequeño...


** Pruebas de hipotesis:

* ¿Si el SAT score aumenta 100 puntos, que tan probable es que el GPA aumente
*  .25 puntos???

. * Hipotesis nula: 100*BetaSAT = 0.25 ????
. * O bien: BetaSAT = .0025 ?

* Un t-statistic "a manita":
. display (_b[sat] - .0025) / _se[sat]
-15.520369

* El p-value de un t-stat de 15.52 es:
. display  ttail(4130, 15.52)    // 4130 son los grados de libertad = n-k-1
3.842e-53
* ...basicamente cero

. help ttail

** Veamos como el p-value aumenta para un t-stat menor a 2: 
. display  ttail(4130, 1.52)
.06429376

** ¿Y que le pasa al p-value su bajamos los grados de libertad a 413?
. display  ttail(413, 1.52)
.06463807
* ...sube muy poquito porque 413 ya es una muestra relativamente "grande" para un 
* t-stat que esta pensado más para muestras de entre 30 y 70 observaciones...

* ¿Y si lamuestra es de solo 41 obs?
. display  ttail(41, 1.52)
.06809299
* ... el p-value va subiendo...

** Afortunadamente, Stata hace pruebas de hipotesis de manera muy "intuitiva"
* Hipotesis nula: 100*BetaSAT = 0.25 --> BetaSAT = .0025 ?

. test sat = .0025

 ( 1)  sat = .0025

       F(  1,  4130) =  240.88
            Prob > F =    0.0000
* Se rechaza la hipotesis nula... betaSAT no es igual a .0025
* Noten como test hizo una prueba F en vez de una t.

* Otras pruebas: 
. test sat = .002

 ( 1)  sat = .002

       F(  1,  4130) =   59.49
            Prob > F =    0.0000
* y tampoco es igual a 0.002...

. test sat = .0014

 ( 1)  sat = .0014

       F(  1,  4130) =    2.74
            Prob > F =    0.0979
** El 9.7% de las veces, no podemos descartar que betaSAT sea igual a 0.0014:
** Al 5%, no se puede rechazar la hipotesis nula
** Al 10%, si se puede rechazar...


. reg colgpa sat verbmath tothrs hsperc hsize hsizesq, robust

Regression with robust standard errors                 Number of obs =    4137
                                                       F(  6,  4130) =  291.20
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.2880
                                                       Root MSE      =  .55616

------------------------------------------------------------------------------
             |               Robust
      colgpa |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sat |    .001506    .000064    23.52   0.000     .0013805    .0016316
    verbmath |   .0115704   .0559984     0.21   0.836    -.0982166    .1213575
      tothrs |   .0018617   .0002478     7.51   0.000     .0013759    .0023476
      hsperc |  -.0134109   .0005508   -24.35   0.000    -.0144907   -.0123311
       hsize |   -.055768   .0169133    -3.30   0.001    -.0889272   -.0226088
     hsizesq |   .0049779   .0023381     2.13   0.033      .000394    .0095618
       _cons |   1.352526   .0924403    14.63   0.000     1.171294    1.533759
------------------------------------------------------------------------------

** ¿Qué pasa con la regresion si solo tomamos las primeras 2000 obs de la muestra?

. reg colgpa sat verbmath tothrs hsperc hsize hsizesq if _n<2000, robust

Regression with robust standard errors                 Number of obs =    1999
                                                       F(  6,  1992) =  157.69
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.3018
                                                       Root MSE      =  .53186

------------------------------------------------------------------------------
             |               Robust
      colgpa |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sat |   .0014251   .0000889    16.03   0.000     .0012507    .0015995
    verbmath |   .0221356   .0761102     0.29   0.771    -.1271284    .1713996
      tothrs |   .0015865   .0003439     4.61   0.000     .0009121    .0022609
      hsperc |  -.0135714   .0007127   -19.04   0.000    -.0149691   -.0121736
       hsize |  -.0717045   .0236566    -3.03   0.002    -.1180988   -.0253102
     hsizesq |   .0097436   .0034614     2.81   0.005     .0029553     .016532
       _cons |   1.499631    .131667    11.39   0.000     1.241411     1.75785
------------------------------------------------------------------------------

* Los coeficientes cambian un poco, los t-test bajan, y el p-value 
* de cada variable sube--todo a consecuencia de una muestra menor.

** ¿Qué pasa si solo incluimos a las mujeres en la regresion?

. reg colgpa sat verbmath tothrs hsperc hsize hsizesq if female==1, robust

Regression with robust standard errors                 Number of obs =    1860
                                                       F(  6,  1853) =  129.90
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.2797
                                                       Root MSE      =  .52754

------------------------------------------------------------------------------
             |               Robust
      colgpa |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sat |   .0016502   .0000988    16.70   0.000     .0014564     .001844
    verbmath |  -.0749714   .0743683    -1.01   0.314    -.2208257     .070883
      tothrs |   .0018611    .000345     5.39   0.000     .0011845    .0025378
      hsperc |  -.0130226    .000891   -14.62   0.000      -.01477   -.0112752
       hsize |  -.0710805   .0238653    -2.98   0.003    -.1178863   -.0242748
     hsizesq |   .0078156    .003291     2.37   0.018     .0013611    .0142701
       _cons |   1.365823   .1342746    10.17   0.000     1.102478    1.629169
------------------------------------------------------------------------------

** ¿Qué pasa si incluimos sólo a las mujeres blancas?

. reg colgpa sat verbmath tothrs hsperc hsize hsizesq if female==1 & white==1, 
> robust

Regression with robust standard errors                 Number of obs =    1717
                                                       F(  6,  1710) =  118.66
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.2851
                                                       Root MSE      =  .51796

------------------------------------------------------------------------------
             |               Robust
      colgpa |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sat |   .0014918   .0001042    14.31   0.000     .0012874    .0016963
    verbmath |  -.0567374   .0768683    -0.74   0.461    -.2075033    .0940285
      tothrs |   .0017766   .0003568     4.98   0.000     .0010768    .0024765
      hsperc |  -.0144234   .0009484   -15.21   0.000    -.0162835   -.0125634
       hsize |  -.0786244   .0238945    -3.29   0.001    -.1254899    -.031759
     hsizesq |   .0079782   .0032937     2.42   0.016     .0015182    .0144382
       _cons |   1.581291   .1413039    11.19   0.000     1.304144    1.858438
------------------------------------------------------------------------------

** ¿Como les va a los hombres y mujeres en el SAT?

. summ sat   //para todos

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         sat |      4137    1030.331    139.4014        470       1540

. summ sat if female ==1  //para mujeres solamente

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         sat |      1860    1006.624    128.3671        490       1490

. summ sat if female ==1 & white==1  //para mujeres blancas

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         sat |      1717    1014.164    125.7484        670       1490


. ** Regresion con log(tamaño de prepa)
. generate lhsize = log(hsize)

. reg colgpa sat verbmath tothrs hsperc lhsize, robust

Regression with robust standard errors                 Number of obs =    4137
                                                       F(  5,  4131) =  351.23
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.2895
                                                       Root MSE      =  .55551

------------------------------------------------------------------------------
             |               Robust
      colgpa |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sat |   .0014977   .0000639    23.44   0.000     .0013724     .001623
    verbmath |   .0076776   .0558601     0.14   0.891    -.1018383    .1171936
      tothrs |   .0018579   .0002474     7.51   0.000     .0013728    .0023429
      hsperc |  -.0136141   .0005435   -25.05   0.000    -.0146796   -.0125487
      lhsize |  -.0531252   .0094037    -5.65   0.000    -.0715614   -.0346889
       _cons |   1.305874    .089433    14.60   0.000     1.130537    1.481211
------------------------------------------------------------------------------

** El COLGPA decrece conforme log(hsize) aumenta...

** Buscando un modelo más completo para COLGPA
. reg colgpa sat verbmath tothrs hsperc lhsize  female white black athlete, rob
> ust

Regression with robust standard errors                 Number of obs =    4137
                                                       F(  9,  4127) =  211.18
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.3135
                                                       Root MSE      =  .54632

------------------------------------------------------------------------------
             |               Robust
      colgpa |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sat |   .0015173   .0000675    22.47   0.000      .001385    .0016497
    verbmath |  -.0629688   .0562322    -1.12   0.263    -.1732142    .0472766
      tothrs |   .0017564   .0002429     7.23   0.000     .0012803    .0022326
      hsperc |  -.0136979   .0005595   -24.48   0.000    -.0147948   -.0126009
      lhsize |  -.0563712   .0093364    -6.04   0.000    -.0746755   -.0380669
      female |   .1491066   .0179082     8.33   0.000     .1139968    .1842163
       white |  -.0321939    .063265    -0.51   0.611    -.1562274    .0918395
       black |   -.340243   .0734697    -4.63   0.000    -.4842832   -.1962029
     athlete |   .2063544   .0384971     5.36   0.000     .1308794    .2818294
       _cons |   1.329163   .1109884    11.98   0.000     1.111566     1.54676
------------------------------------------------------------------------------

** Las mujeres y los atletas tienen mayores GPAs y la gente de color tiene menores GPAs

** Probando si las mujeres blancas y atletas, como grupo, son particularmente más
** aplicadas en la escuela:
. 

** Generamos una "dummy interactiva" para identificar a ese grupo especifico: 
. generate mm = female * white * athlete

. summ mm female white athlete

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          mm |      4137    .0070099    .0834413          0          1
      female |      4137    .4496012    .4975136          0          1
       white |      4137    .9255499    .2625337          0          1
     athlete |      4137    .0468939    .2114371          0          1

** Menos del 1% de la muestra cumplen las tres condiciones (quiza debido a que hay 
** pocos atletas para empezar)

** Probando si la dummy interactiva es significativa:
. reg colgpa sat verbmath tothrs hsperc lhsize  female white black athlete mm, 
> robust

Regression with robust standard errors                 Number of obs =    4137
                                                       F( 10,  4126) =  190.35
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.3135
                                                       Root MSE      =  .54638

------------------------------------------------------------------------------
             |               Robust
      colgpa |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sat |   .0015181   .0000677    22.44   0.000     .0013854    .0016507
    verbmath |  -.0628068   .0562133    -1.12   0.264    -.1730152    .0474015
      tothrs |   .0017566   .0002429     7.23   0.000     .0012804    .0022329
      hsperc |  -.0137046   .0005598   -24.48   0.000     -.014802   -.0126071
      lhsize |  -.0565267   .0093679    -6.03   0.000    -.0748928   -.0381606
      female |   .1497823   .0181086     8.27   0.000     .1142796     .185285
       white |  -.0316517   .0633466    -0.50   0.617    -.1558453    .0925418
       black |  -.3406885   .0734676    -4.64   0.000    -.4847247   -.1966523
     athlete |   .2114412   .0416056     5.08   0.000     .1298718    .2930105
          mm |  -.0302998   .1076488    -0.28   0.778    -.2413494    .1807498
       _cons |   1.327696   .1112664    11.93   0.000     1.109554    1.545838
------------------------------------------------------------------------------

** La dummy mm no es significante, por lo que descartamos la hipotesis de que el 
** grupo "mujeres atletas blancas" tenga un mayor GPA que el resto de la muestra.
** Noten que esto no impide que los atletas y mujeres, por separado, tengan mejores
** GPAs...

** El comando XI de Stata genera interacciones automaticamente (pero ojo, la notacion
** a veces es medio criptica

help xi

** Un modelo simple con variables interactivas:
. xi: reg colgpa sat i.female*i.white

i.female          _Ifemale_0-1        (naturally coded; _Ifemale_0 omitted)
i.white           _Iwhite_0-1         (naturally coded; _Iwhite_0 omitted)
i.fem~e*i.white   _IfemXwhi_#_#       (coded as above)

** Esto quiere decir que tomo a los hombres no blancos como grupo de control

      Source |       SS       df       MS              Number of obs =    4137
-------------+------------------------------           F(  4,  4132) =  258.52
       Model |  359.140516     4  89.7851289           Prob > F      =  0.0000
    Residual |  1435.05516  4132  .347302797           R-squared     =  0.2002
-------------+------------------------------           Adj R-squared =  0.1994
       Total |  1794.19567  4136  .433799728           Root MSE      =  .58932

------------------------------------------------------------------------------
      colgpa |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sat |   .0020191   .0000682    29.62   0.000     .0018855    .0021527
  _Ifemale_1 |    .035381   .0673407     0.53   0.599     -.096643     .167405
   _Iwhite_1 |   .0094451   .0484129     0.20   0.845    -.0854702    .1043605
_IfemXwhi_~1 |   .2098029    .070029     3.00   0.003     .0725084    .3470974
       _cons |   .4606125   .0783756     5.88   0.000     .3069542    .6142708
------------------------------------------------------------------------------

* En este caso, el grupo "female y white" SI tiene un impacto significativo en GPA,
* aunque el ser mujer o blanca no resultaron significativos por si solos.
* A menudo es util comparar el "main effect" con el "interactive effect" de un grupo
* de variables categoricas.


. log close
       log:  C:\Stata8\3nov.smcl
  log type:  smcl
 closed on:   3 Nov 2004, 11:57:23
-------------------------------------------------------------------------------