Doing Linear Regression?

Joined
11/19/07
Messages
18
Points
11
I need to use linear regression to model the regression between the dependent and about 5-6 independent variables. Are there any free libraries out there that might help me out? I am looking for a C/C++ or a .NET library.

Also, I was trying to look at QuantLib for this answer... but I wasn't too sure. Can anyone point me to some documentation that tells me if Quantlib can provide this type of analysis?

Thanks in advance.

-- Bobby Chopra
 
How about Statistics Library ?

Hello

I know a simple and free c/c++ library for statistcal library.

you can download this at Statistics Library by Sgt. Pepper

FYI, the content of header file is as following:

/* statistics interface -------------------------------------- */
/* return sum of values in datalist */
double sum(double *datalist, int listsize);

/* return pointer to an array containing
the min and max values in datalist */
double* min_max(double *datalist, int listsize);

/* return range of values in datalist */
double range(double *datalist, int listsize);

/* return arithmetic mean of values in datalist */
double a_mean(double *datalist, int listsize);

/* return geometric mean of values in datalist */
double g_mean(double *datalist, int listsize);

/* return the harmonic mean of values in datalist */
double h_mean(double *datalist, int listsize);

/* harmonic mean using log transformations */
double logt_hmean(double *datalist, int listsize);

/* Return Tukey's trimean of values in datalist */
double tukeys_trimean(double *datalist, int listsize);

/* Return a trimmed mean */
double trimmed_mean(double *datalist, int listsize, double P);

/* Return the arithmetic-geometric mean of 2 numbers */
double agm(double a, double b);

/* return midrange of values in datalist */
double midrange(double *datalist, int listsize);

/* return mode of values in datalist */
double mode(double *datalist, int listsize);

/* return median of values in datalist */
double median(double *datalist, int listsize);

/* return kth percentile of values in datalist */
double percentile(double *datalist, int listsize, double ptile);

/* return the quartiles of the values in datalist */
double* quartiles(double *datalist, int listsize);

/* return the interquartile range */
double interquartile_range(double *datalist, int listsize);

/* return the sample variance
of values in datalist */
double svar(double *datalist, int listsize);

/* return the population variance
of values in datalist */
double pvar(double *datalist, int listsize);

/* return sample standard deviation
of values in datalist */
double s_stdev(double *datalist, int listsize);

/* return the population standard deviation
of values in datalist */
double p_stdev(double *datalist, int listsize);

/* return the root mean square of values in datalist */
double rms(double *datalist, int listsize);

/* return the coefficient of variability
of the values in datalist */
double coeff_var(double *datalist, int listsize);

/* return the mean deviation of
values in datalist */
double mean_dev(double *datalist, int listsize);

/* return Kth central moment */
double central_moment(double *datalist, int listsize, double k);

/* return the standard error of the mean */
double std_err_mean(double *datalist, int listsize);

/* returns chi square of values in datalist */
double chi_square(double *datalist, int listsize);

/* return the skewness of the values in datalist */
double skewness1(double *datalist, int listsize);

/* return the skewness of the values in datalist */
double skewness2(double *datalist, int listsize);

/* return the kurtosis of the values in datalist */
double kurtosis1(double *datalist, int listsize);

/* return the kurtosis of the values in datalist */
double kurtosis2(double *datalist, int listsize);

/* return the Pierson product moment coefficient
of correlation of a set of x,y data */
double corr_coeff(double *xlist, double *ylist, int xn, int yn);

/* return the slope of the line of best fit
of a set of x,y data */
double slope_bf_line(double *xlist, double *ylist, int xn, int yn);

/* return the y-intercept of the line of
best fit of a set of x,y data */
double y_intercept(double *xlist, double *ylist, int xn, int yn);

/* return the standard deviation of the
points around the line of best fit
of a set of x,y data */
double stdev_points(double *xlist, double *ylist, int xn, int yn);

/* return the standard error of the
regression coefficient (slope) */
double stderr_reg_coeff(double *xlist, double *ylist, int xn, int yn);

/* Return the covariance of 2 data sets. */
double covariance(double *xlist, double *ylist, int xn, int yn);

/* return the Fisher transformation of x */
double fisher(double x);

/* return binomial probability */
double binom_prob(double p, double n, double x);

/* return the standardized random variable */
double stdx(double mean, double sdev, double x);

/* return the standardized random
variable for a large sample: n >= 30 */
double stdx_lg(double xbar,
double mu,
double sdev,
double n);
/* return the normal probability density of x */
double norm_pdf(double mean, double stdev, double x);

/* return the normal cumulative probability of x */
double norm_cdf(double mean, double stdev, double x);

/* return an approximation of the area under the
standard normal curve between X1 and X2. */
double std_ncurve_area(double X1, double X2);

/* Return the kth percentile of the std normal distribution */
double std_norm_ptile(double k);

/* Return the Anderson-Darling statistic */
double anderson_darling_norm(double *datalist, int listsize);

/* return a boolean value indicating whether the normal
distribution can be used to approximate the binomial */
int norm_approx_ok(double p, double n);

/* return a pointer to an array containing mean and
variance for a normal approximation to the
binomial distribution */
double* norm_approx_mv(double p, double n);

/* Return a z-score for a hypothesis test. */
double rndz(double EP, double op, double n);

/* Return log normal probability density at x.
mu=a_mean(ln(X)); sigma=s_stdev(ln(X)) */
double log_norm_pdf(double x, double mu, double sigma);

/* Return gamma(x) */
double gamma(double x);

/* Return the natural log of gamma(x) */
double log_gamma(double x);

/* Return beta(a, b) */
double beta(double a, double b);

/* Return standard beta dist p.d.f. */
double std_betapdf(double x, double a, double b);

/* Return beta dist p.d.f where A < x < B. */
double betapdf(double x, double a, double b, double A, double B);

/* Return area under beta curve in the interval [X1, X2] */
double beta_curve_area(double a, double b, double A,
double B, double X1, double X2);

/* Return probability density of the t-distribution */
double tdist(double t, double v);

/* Return t-distribution cumulative probability */
double tdist_cum(double X1, double X2, double v);

/* Return Student's t for 1 sample*/
double t_test1(double *datalist, int listsize, double mu);

/* Return Student's t for 2 samples */
double t_test2(double *X1, double *X2, double n1, double n2);

/* return t-test degrees of freedom */
int df(double n1, double n2);

/* return a pointer to an array containing the limits
of the 95% confidence interval around a sample mean */
double* confidence_95(double *datalist, int listsize);

/* return a pointer to an array containing the limits
of the 99% confidence interval around a sample mean */
double* confidence_99(double *datalist, int listsize);

/* misc utility -------------------------------------- */
/* qsort comparison function */
int compare(const void *a, const void *b);
/* return rth root of n */
double root(double n, double r);
 
[FONT=Courier, Monospaced]Here is a simple solution to implement linear regression using standard library. [/FONT][FONT=Courier, Monospaced]:smt024[/FONT][FONT=Courier, Monospaced] Credit for this code goes to Dann Corbit.

[/FONT]
Code:
#include <stdio>
#include <iostream>
using namespace std;

int linfit(const double *const x, const double *const y, const size_t
n, double *m, double *b)
{
    int             error = 0;
    double          sumx = 0,
                    sumy = 0,
                    sumx2 = 0,
                    sumxy = 0;
    double          dn = (double) n;
    size_t          i;
    if (n <= 1) {
        *m = 0;
        *b = 0;
        error = 1;
    } else {
        double          divisor;
        error = 0;
        for (i = 0; i < n; i++) {
            sumx += x[i];
            sumy += y[i];
            sumx2 += (x[i] * x[i]);
            sumxy += (x[i] * y[i]);
        }
        divisor = (sumx2 - ((sumx * sumx) / dn));
        if (divisor != 0) {
            *m = (sumxy - ((sumx * sumy) / dn)) / divisor;
            *b = (sumy - ((*m) * sumx)) / dn;
        } else {
            *m = 0;
            *b = 0;
            error = 2;
        }
    }
};
 
int main()
{
/* examples */
       /* exact line: */
double          x0[] =
{-1, 0, 1, 2, 3};
double          y0[] =
{-1, 0, 1, 2, 3};

/* noisy line: */
double          x1[] =
{-1.1, 0.01, .9999, 1.99998, 3.01};
double          y1[] =
{-1.02, -.00001, 1.002, 1.99872, 2.999973};            
    size_t          length = sizeof x0 / sizeof x0[0];
    double          m;
    double          b;
    linfit(x0, y0, length, &m, &b);
    printf("Slope is %f, intercept is %f\n", m, b);
    linfit(x1, y1, length, &m, &b);
    printf("Slope is %f, intercept is %f\n", m, b);
    cin.get();
    return 0;
}
 
I need to use linear regression to model the regression between the dependent and about 5-6 independent variables. Are there any free libraries out there that might help me out? I am looking for a C/C++ or a .NET library.

Also, I was trying to look at QuantLib for this answer... but I wasn't too sure. Can anyone point me to some documentation that tells me if Quantlib can provide this type of analysis?

Thanks in advance.

-- Bobby Chopra

Dude, did you try Google?

Anyway, I think the TA-Lib implements Linear Regression functions. Check Technical Analysis Library and Software
 
Dude, did you try Google?

Google??? What is that? Just kidding ...smile!

But yes, I think I followed proper procedure for approaching the problem. I first asked the local experts, then ventured to search engines and then posted a question on a forum geared to help individuals with such questions.

So, I asked my Quant Director at work if he knew of anything [Mathematica was one of the answers, but I wanted to write C++ or .NET code]. I guess I can learn to use the program to get to the answer and parse the output and use a simple web service as an facade to the Mathematica. I want to use this as a last option due to licensing issues, the learning curve, etc. Also, I will look at 301 Moved Permanently Again, this is last resort

Then, I searched on Google with Queries like
- statistics C++
- linear regression C++
- linear regression
- financial mathematics library
- free statistics library
- dll multivariable regression
- numerical recipes linear regression

Also, scoping stuff down to
linear regression site: http://sourceforge.net
linear regression site: http://www.numerical-recipes.com/forum
regression site: 301 Moved Permanently
multi-variate linear regression site: quantlib.org

I did come across some libraries that were indeed free, but they provided simple regressions for 1 dependent and only 1 independent variable. I needed to accommodate up to 6 independent variables, but I couldn't find anything. Also, I tried searching Google Scholar, Google Books and other stuff, but thought that someone on the forum must have encountered the situation and could recommend something.

I will definitely take a look at TA-lib and the statistics library recommended. Thanks in advance.

Sincerely,
Bobby
 
XLSTAT statistical analysis

XLSTAT 's a good Excel add-in that can do this.
it offers excellent data analysis software products for excel
it covers Klienbaum, Kupper, Muller and Nizam, Applied Regression Analysis and Other Multivariable Methods.
It provides industry vertical specific customizations as well.
[Solutions available

  • XLSTAT-Sensory: For practitioners of sensory data analysis (food, cosmetics, automotive,...).
  • XLSTAT-Medical: For health professionals.
  • XLSTAT-6S: For people involved in six-sigma processes.XLSTAT-Predict: For those wishing to make accurate forecasts.
]
 
I need to use linear regression to model the regression between the dependent and about 5-6 independent variables. Are there any free libraries out there that might help me out? I am looking for a C/C++ or a .NET library.

The actual fitting of a least squares linear regression is pretty straightforward, and should be easy to program in the environments you mention. See, for instance:

http://serc.carleton.edu/introgeo/teachingwdata/StatRegression.html

Regression diagnostics and such, if needed are another matter, but least squares isn't too bad.


-Will Dwinnell
Data Mining in MATLAB
 
Back
Top Bottom