There is an R Package for ridge regression here:

http://astrostatistics.psu.edu/su07/R/html/MASS/html/lm.ridge.html

The

technique comes down to performing a ridge regression5 based on the leaderboard scores. The regularization

parameter was chosen as 0.0015 * 70492.

Page 12 V1 of Milestone 1 paper by Willem

Fine -I sort of get that there is an alpha parameter (which I am guessing is the lambda parameter in the R package), and the R package allows for a vector or scalar for the lambda value. But, if you somehow put the leaderboard scores in the lamda spot (which
I guess I could do) - where the hell do you put the lambda/alpha value?

I have read through some of the stuff on Ridge Regression, but I am guessing ridge regression was invented before data mining competitions - and this isn't pure ridge regression. There were no loeaderboard scores in Tikhonov's day. Part of the problem is
all the math looks like greek to me (I guess some of it is greek) - but I am assuming the alpha/lambda parameter deals with how ridgy it is - and functions similar to the L1/L2 parameter in glmnet. Problem is I don't understand that either :) I am guessing
it is a penalty that punishes worse predictors/features.

Our candidate population contained 79 base models with each sub blend

containing a randomly selected n base models. The process was repeated 1,000

times with a ridge parameter of 0.0001. We built models with various values of

n, with generally increasing leaderboard performance as n increased, but also

also with an increasing probability that the model has overfit to the leaderboard.

The final choice of n (20) was a tactical choice that resulted in a final model

slightly better on the leaderboard than the third placed team.

from mm page 3 of milestone 2 report

This talks about iterations - which none of the academic stuff on ridge regression mentions - in the paragraph before - team mm suggests this is their improvement to straight ridge regression. They do use a regularization parameter - and the leaderboard scores.
All of the academic stuff I have read come down to the whole matrix stuff the Damian mentions.

I am curious as to how much improvement is to be made from straight linear regression to ridge regression.

Something like the very last graphic on this poster:

http://www.commendo.at/UserFiles/commendo/File/kdd2010-poster.pdf
I know that graphic exists in non poster form - I just can't find it right now.

Anyway - if anyone knows on the netflix competition -

If straight linear regression gets you 0.87525 I would be curious at to:

1) What does [obviously did] straight ridge regression with just the alpha/lambda parameter get someone on the Netflix leaderboard?

2) What does using both the alpha/lambda parameter and the leaderboard score gets some - but with out any bagging or iterative process?

3) What does the best possible use of ridge regression get you - using alpha/lambda, leaderboard scores, and bagging or iterative training (but none of the BGBT,NN, or other stuff along those lines)?

with —