Neural Network Software Packages

« Prev
Topic
» Next
Topic

I apologize in advance if this is a stupid question.

it seems that a neural network would be ideally suited to this type of problem, though it might not provide the accuracy needed to win. i have read all the posts in the forum, many centered around sophisticated analytical tools like R ... but i've seen nothing on use of neural network packages.

what am i missing? is anyone using NN software? why not?

You have to know what you are doing to use a NN.  They are prone to overfitting.  This isn't a problem if you know how to take care of it, but you might want to try something like randomForests - you actually have to try to mess them up :)

You can use NN networks in R.  There are some windows based programs as well.  They are certainly fun to play around with, and I am sure some here will be using them.

You might want to give R a shot.

You can easily download and install it and in the other thread I have a script that will do everything for you (including loading in the data, making - you can relatively easily use a NN by just switching the model you use.

http://www.heritagehealthprize.com/c/hhp/forums/t/607/r-questions/4122

Is the thread - you can even use it for a practice submission - you have 40 minutes before the counter resets for the day :) - and could do it by then....

http://cran.r-project.org/

There is no compiling,tarballing, pathsetting or any of that stuff - put the files in  MyDocuments(if you have windows) - and the code I posted will do everything including save the file you need to submit.

I bet your score would even beat 1/3 of the leaderboard (I haven't tried it - but I bet it would)

R in a Nutshell has a good couple chapters on Machine Learning as well.

Chris,

I think it's awesome you've gone through all of that effort in making the RF model avialable.  I'm curious ... at the very top of the Random Forests website they make the following statement: 

"Random Forests(tm) is a trademark of Leo Breiman and Adele Cutler and is licensed exclusively to Salford Systems for the commercial release of the software. Our trademarks also include RF(tm), RandomForests(tm), RandomForest(tm) and Random Forest(tm)." 

This seems to imply some animosity towards those wanting to apply this mathematical method for profit without consent of the  "owners" of the idea (with all the fuzzyness inherent in I.P. law of the trade secret variety).  

I suspect that RF is at least partially, if not fully, behind most of the highest scores on the board.  I was not familiar with the technique until reading these forums, and I do not use R, but was thinking about implementing a RF like algorithim in Mathematica (which has all sorts of tree-graph visualization and manipulation tools built-in) - but I'm curious: if a winner of this competition uses an RF model without the consent of Breiman, Cutler and or Salford will they be open to legal action?  

Thanks again for your hard work, your posts are generally very enlightening.  

M

Chris-- i will download R and use your script to give random forest a try. [I entered the HHP contest to learn this stuff and it's like drinking from a firehose.] as for the neural network, using a windows-based commercial program: how would i know if the NN is "overfitting"? and how would i "take care of it" if it is? thanks much.

Solo Dolo,

Yes, I was wondering the same thing last week when I saw the claim to trademark and exclusive licensing!

But, if I understand the statement correctly, the 'exclusivity' reference is to the trademark rather than the algorithm. I think there could well be conflict with the name. However, I'm curious if that is the case, given R's high profile and the fact that it would compete with the original implementation, why the trademark owners haven't complained, especially with reference to the commercial nature of Revolution Analytic's version of R?

It may also refer to the Fortran implemenation of the algorithm which is available on their site? Actually, that's GPL'd!

Interested in anybody's thoughts on this.

Anthony

Chris Raimondi wrote:
but you might want to try something like randomForests - you actually have to try to mess them up :)

I never used randomForest before and only started using R for this contest, and my attempt with randomForest in R was not encouraging

My current .4698 score was done using linear regression on 8 variables. I decided to give randomForest in R a try.  I downloaded caret package so I'm not limited to RF. At my first attempt, it sat there for a long time and I noticed it was allocating more memory than my PC has (4GB) and disk was thrashing like crazy. After going thru the documentation (R seriously needs improvement here), I changed the number of trees from 500 to 100. Now memory was not an issue, after a long wait it finished training, but it severely overfitted. I Trained it to predict year 3 using data from year 2, and the fitted RMSE was .405, but submission score using year 3 data was .473.

fitted RMSE was .405

It did not overfit - I bet you used:

predict

to get that .405 score - you need to get the OOB prediction (it doesn't matter/NA for the test data )- only the training data.

use:

rf.model$predicted

for what you use to compute the training error. It won't usually overfit the data - but it isn't magic - if your variables aren't similar to what is in the test set - you can have issues (this would apply to any method).

x <- predict(rf.model, your.data) # Use that for getting predictions for test data DO NOT use for training data

y <- rf.model$predicted # will give you the OOB predictions for the training data (not possible to get for other data)

Hope that makes sense

As far as the trademark issue goes - I agree that a trademark has to deal with the name - not the software itself. So I don't mean I am using "Random Forest" - I am using an "ensemble of decision trees" - that happen to be generated - you know - randomly - like in a forest. I am not say no one can complain for other reasons - but the name is the only thing that would be an issue for a trademark.

By the way - your scores WILL be "better" on the training set - especially year 2.  This has nothing to do with overfitting.  It has to do with differences between the years and such.  Expect to get about  ~ 0.015 to 0.020 better on the year 2 vs. year 3

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?