Thanks, Anthony.
I take it that the interpretation of the Judges so far .... and I will pointedly note that under the Contest Rules, the Judges have Final Authority over interpretation of the ALL Contest Rules ... is that there is NOT a requirement that the most detailed
"how we got our results, from A to Z" scripts, that were presumably submitted to Kaggle and the Judges for verification purposes, are going to be also handed over to all the other contestents.(If that was the opinion of the Judges, the Surrender of the Scripts
would have already happened.)
Some contestents seem to believe that they will be handed a copy of runnable code that will allow them to get to the same level of performance as our front-runners, at the push of a button, with no effort.
That would be a VERY BAD IDEA, all around:
(1) It's bad for the Leaders, who have invested considerable time in creating their intellectual property, including the mgmt infrastructure of knowing how to do these kinds of projects, esp. for dispersed teams;
(2) It's bad for the other top-ranked competitors too, who have also invested much effort to get where they are, and get no residual value for that effort if every JohnnyComeLately is handed keys to zoom up to their level with no effort;
(3) It's even bad for other middle of the pack contestants, too. Sorry, running somebody else's script to replicate their results doesn't do anything for you (or me), except get you to where they were a month ago, and with no ideas how to take the next step.
To really LEARN will require intellectual effort, and studying and pondering the slightly generic descriptions of how these folks got where they did should get your mental juices going and maximize the learning. You may even inadvertently implement some of
their ideas differently than they did, and perhaps that'll even be better. There is no totally free lunch, and what we have been given already is very useful;
(4) It's bad for the profession & the supply of data miners, which grows in depth and breadth because of the intense efforts needed in competitions of this caliber;
(5) It's bad for the Client, because having 1000 identical replicants as of Date-X is a genetic bottleneck of sorts that limits the odds that an really good but different solution will emerge; and
(6) It's bad for Kaggle for all of the reasons listed above.
My $0.02. I've wanted to put this to words for a while, sorry I didn't do it sooner.
with —