Is MS Excel better for this task than say mysql or php?
|
Thanks 2 Joined 5 Apr '11 Email user |
|
|
Thanks 178 Joined 21 Aug '10 Email user |
I think several tools will be helpful for this competition, including Excel. Many Kaggle competitions are won by a clever insight from the data rather than requiring complicated algorithms or powerful machines. Kaggle's own Jeremy Howard gave a great talk on getting ready for competitions like the Heritage Health Prize and he shows how he uses Excel. It's definitely worth a look. |
|
Joined 14 Apr '11 Email user |
|
|
Thanks 72 Joined 20 Jan '10 Email user |
Thanked by
Cyfarwyddyd
|
|
Thanks 15 Joined 4 Apr '11 Email user |
|
|
Posts 292 Thanks 64 Joined 2 Mar '11 Email user |
TMiranda wrote: I think some advanced correlations could be better mesured with MiniTab
The R packages 'plyr' and 'reshape2' are pretty great for generating features, especially since 'plyr' can be easily parallelized. It probably isn't as fast as using SQL, but I'm a much better R programmer than SQL programmer, so the tradeoff in speed is worth it.
Thanked by
Anthony Goldbloom (Kaggle)
|
|
Thanks 1 Joined 4 Apr '11 Email user |
Hi Alexander
Based on this post and the other post of yours (teaming up to implement your algorithm), and considering that the competition is a long term one, you may start learning and using F#, a free computational language by Microsoft which provides many great features. Off course you need to master several other technologies and concepts too, e.g. DBMSs(MySQL or SQL Server, DB4a , ...), Linq, Visualization, /Mathematics libraries etc.
|
|
Posts 292 Thanks 64 Joined 2 Mar '11 Email user |
|
|
Thanks 2 Joined 5 Apr '11 Email user |
I have been learning mysql but phpadmin won't allow the whole database csv file to import.My web host said file size of 8MB maximimun for csv import. I need 50 -60 MB. This makes life very difficult.Chopping up csv files lots of times and putting them together again just makes an already difficult problem worse!I use a home laptop do I need a more powerful computer to use sql.My laptop would not allow visual studio files onto it and failed to install R. |
|
Posts 292 Thanks 64 Joined 2 Mar '11 Email user |
|
|
Thanks 2 Joined 5 Apr '11 Email user |
|
|
Posts 194 Thanks 90 Joined 9 Jul '10 Email user |
|
|
Thanks 2 Joined 5 Apr '11 Email user |
|
|
Thanks 165 Joined 13 Oct '10 Email user |
I would say start with R and forget the Heritage prize for a while. This is an advanced contest with some complicated data. You will learn much more and get less frustrated if you take on some simpler problems first. Get your feet wet on smaller problems and enjoy the learning. There are going to be teams of the brightest computer scientists and data miners from the best institutions in the world competing for this. Don't worry about the prizes. |
|
Thanks 3 Joined 26 May '10 Email user |
|
|
Joined 4 Apr '11 Email user |
|
|
Thanks 2 Joined 5 Apr '11 Email user |
I have managed to get php and mysql working well now. I want to use this competition to improve my programming skills on a difficult task. I have been learning programming for 6 months now and I am improving.This is definitely a difficult task or else the problem would have been solved years ago-think of all the money governments put into health research. I am a novice at programming but not at stats or biology so that is why I think it is worth having a go. |
|
Joined 15 Apr '11 Email user |
|
|
Joined 17 Mar '11 Email user |
|
|
Thanks 4 Joined 5 Apr '11 Email user |
|
|
Thanks 1 Joined 8 Apr '11 Email user |
|
|
Joined 6 Apr '11 Email user |
|
|
Thanks 3 Joined 2 Apr '11 Email user |
|
|
Thanks 2 Joined 5 Apr '11 Email user |
|
|
Joined 17 Mar '11 Email user |
|
|
Joined 17 Mar '11 Email user |
|
|
Joined 28 Apr '11 Email user |
|
|
Thanks 4 Joined 5 Aug '10 Email user |
Timmay wrote: Is anyone having memory problems getting anything done with the claims data in R, particularly in Revolution?
You need to use vector operators instead of loop constructs. If memory is still a problem you should move to linux and 64 bit versions of R or try using biglm. I have tried 32 and 64 bit XP with R, but available memory is still strongly limited. |
|
Thanks 7 Joined 10 Feb '11 Email user |
Timmay wrote: Is anyone having memory problems getting anything done with the claims data in R, particularly in Revolution?
What opperations are you trying to do?
I'm been testing out Revolution for the past 2 weeks.. it's not bad but still needs lot of work to be a good IDE. Some things drive me nuts in it.. I would of prefered if they built it on top of Eclipse rather than V.Studio.. I have Win 7 64-bit with 4Gb of RAM and everything I do on the claims data seems to be fine. Doing a random forest, I can only use about 200 trees.. if I try 300 or more I run out of memory. |
|
Joined 31 Aug '11 Email user |
|
|
Thanks 72 Joined 20 Jan '10 Email user |
|
Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?
with —