A linear regression model about MMF admission

Eric.Z · 3/22/12

For the moment, I have a linear regression model (with GPA as independent variable. I will try to add more variables when I have more data available)for those who applied Boston MMF and who are either accepted or rejected with a GPA between 0 and 4. (Waitlist is treated as rejection.) I am trying to figure out a logit and probit model in excel, which I believe would be more accurate for this kind of models with dummy variable as the dependent variable. I will post the result as soon as possible.

Admission = -0.4822 + .2838 * GPA (Admission is 1 if accepted and zero if rejected.)
P-value of the slope is 0.04502
R-square = 0.0303

The P-value looks not surprising to me at all. I believe if more samples are available and more independent variables can be included in the model, the R square would be much higher. From personal experience, admission result should be predictable if GPA, work experience, major and a few other information are available.

This is the initial work. I will post more results as soon as I get more data. Please HELP if you have any idea for the following questions. It will be appreciated.

1. I managed to import the data from the tracker to Excel. However, excel fails to recognize the tracker as a table; everything in the webpage is imported to one column in Excel. I have thousands of data in the column in the excel file now. I wonder if anyone knows any way to import the tracker data as a table in the excel so that I can include more data in the regression analysis.

2. Any idea how to convert the non-US system GPA to the 4.0 GPA?

3. I am thinking of adding a dummy variable of being international. Would you expect that to be a statistically significant variable?

aaronhotchner · 3/22/12

1. I'd love access to this data once you figure it out.
2. A linear regression makes absolutely no sense for a success/failure model. Your R-square is a good indicator of this.
3. I can *try* writing a program in Stata that parses the information to build the table from the column.

DMX · 3/22/12

1. ive been thinking about these kinds of mdoels for a while (not just for MFE but for college/law school etc). there is definitely existing literature on this though.
2. linear regression doesn't make sense. use a logistic regression for these types of models.
3. ditch excel. use R (free, and it can do this in <5 seconds).

ValueSeeker · 3/22/12

A linear regression makes absolutely no sense for a success/failure model. Your R-square is a good indicator of this.

Agree, I am really surprised by how many people on this site blindly apply econometric models. In this case, using an independent variable that is continuous and a dependent variable that is 1 or zero as an indicator function.

Eric.Z · 3/22/12

aaronhotchner said:
1. I'd love access to this data once you figure it out.
2. A linear regression makes absolutely no sense for a success/failure model. Your R-square is a good indicator of this.
3. I can *try* writing a program in Stata that parses the information to build the table from the column.

Yeah, definitely let me know if you get the data. I would be more than happy to discuss what can be found from the data.

Eric.Z · 3/22/12

ValueSeeker said:
Agree, I am really surprised by how many people on this site blindly apply econometric models. In this case, using an independent variable that is continuous and a dependent variable that is 1 or zero as an indicator function.

I would certainly agree that probit and logit would fit here better. However, I believe a linear model is also informative as long as its limitations are recognized.
For this problem, if you are concerning about the prediction values being negative or larger than one, it will be helpful to take negative predictions as zero and predictions larger than one as 1. The OLS estimators should be unbiased, if i am not mistaken. However, heteroscedasticity is definitly a problem we should be aware of.

ganigorkem · 3/22/12

May I ask, what is your sample size?

Eric.Z · 3/22/12

ganigorkem said:
May I ask, what is your sample size?

It is from the tracker of the MMF program of BU. i think it is between 20 to 40. I doubt it that it can be more than 40.

IlyaKEightSix · 3/22/12

You can take multiple years.

Andy Nguyen · 3/22/12

You can try to use some tool to read the tracker rss feed into a cvs file. Python can do this. When I can get around to it, I will post the tracker data dump for you with username field stripped off.
In any case since we don't have GPA and GRE data for all entries, I think your sample size for Boston is too small.

Eric.Z · 3/22/12

Andy Nguyen said:
You can try to use some tool to read the tracker rss feed into a cvs file. Python can do this. When I can get around to it, I will post the tracker data dump for you with username field stripped off.
In any case since we don't have GPA and GRE data for all entries, I think your sample size for Boston is too small.

Yes, you are right. The sample size is certainly not large, but I think it should be okay. For schools like CMU, NYU or Columbia, there will definitely be more samples. I am not aware of any other ways to do it for BU unless increasing sample sized.

IlyaKEightSix said:
You can take multiple years.

Its a good idea to take multiple years. How to get the data from previous years?

aaronhotchner · 3/22/12

. logit result greq grev if greq>200
Iteration 0: log likelihood = -13.460233
Iteration 1: log likelihood = -11.59538
Iteration 2: log likelihood = -11.584696
Iteration 3: log likelihood = -11.584694
Iteration 4: log likelihood = -11.584694
Logistic regression Number of obs = 20
LR chi2(2) = 3.75
Prob > chi2 = 0.1533
Log likelihood = -11.584694 Pseudo R2 = 0.1393
------------------------------------------------------------------------------
result | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
greq | .030872 .0384086 0.80 0.422 -.0444075 .1061515
grev | -.0105618 .0060417 -1.75 0.080 -.0224034 .0012798
_cons | -16.86222 29.26458 -0.58 0.564 -74.21974 40.49529
------------------------------------------------------------------------------

I didn't bother with GPA because it's so nonstandard, and I only included old GRE scores.

I found this humorous, though:
. logit result greq grev greawa if greq>100&greq<200
note: greawa != 4 predicts success perfectly
greawa dropped and 6 obs not used
outcome = greq > 163 predicts data perfectly

Eric.Z · 3/22/12

aaronhotchner said:
. logit result greq grev if greq>200
Iteration 0: log likelihood = -13.460233
Iteration 1: log likelihood = -11.59538
Iteration 2: log likelihood = -11.584696
Iteration 3: log likelihood = -11.584694
Iteration 4: log likelihood = -11.584694
Logistic regression Number of obs = 20
LR chi2(2) = 3.75
Prob > chi2 = 0.1533
Log likelihood = -11.584694 Pseudo R2 = 0.1393
------------------------------------------------------------------------------
result | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
greq | .030872 .0384086 0.80 0.422 -.0444075 .1061515
grev | -.0105618 .0060417 -1.75 0.080 -.0224034 .0012798
_cons | -16.86222 29.26458 -0.58 0.564 -74.21974 40.49529
------------------------------------------------------------------------------

I didn't bother with GPA because it's so nonstandard, and I only included old GRE scores.

I found this humorous, though:
. logit result greq grev greawa if greq>100&greq<200
note: greawa != 4 predicts success perfectly
greawa dropped and 6 obs not used
outcome = greq > 163 predicts data perfectly

Interesting! Would you mind sending me the data you used?

aaronhotchner · 3/22/12

https://docs.google.com/spreadsheet/ccc?key=0ApkiYkMJSbTxdGhELTZ3Vm9rTVp6OE1oMkc2NzdyNVE#gid=0

Eric.Z · 3/22/12

aaronhotchner said:
https://docs.google.com/spreadsheet/ccc?key=0ApkiYkMJSbTxdGhELTZ3Vm9rTVp6OE1oMkc2NzdyNVE#gid=0

Would you mind showing me how get the data from the track to excel?

aaronhotchner · 3/22/12

I typed all that manually because it was faster than finding a way to extract it.