Clarifying Rule #13 for Milestone 1 - Open Questions & Issues

« Prev
Topic
» Next
Topic
andywocky's image Posts 18
Thanks 8
Joined 17 Jun '11 Email user

@Anthony

There have been several concerns raised in the forum about the impact and interpretation of Rule 13 on the contest, which states that conditional milestone winners must disclose their "Prediction Algorithm and documentation" to the website for competitor review and commentary?   In particular, there are unanswered questions with regard to inconsistencies and/or potentially unfair advantages arising from this rule.  Can you comment on the following specific items so the community has firm, consistent and realistic expectations as we approach the Milestone 1 date?

  • Is it inconsistent, as Sali Mali pointed out in another thread, to require documentation of the winning algorithms be publicly disclosed to all competitors given Rule 20, Entrant Representations?  It seems that this disclosure will encourage other competitors to use aspects of the winning Prediction Algorithm which cause violation, directly or otherwise, of (i) - (iii) and possibly (iv) of that Rule.
  • Can you clarify that code, libraries and software specifications are *not* required to be publicly disclosed to competitors?  These materials and intellectual property appear to be referenced separately from "Prediction Algorithm and documentation."
  • Will Kaggle or Heritage have a moderation or appeals process for handling competitor complaints?  From the winning entrant's point-of-view, they would not want to be forced through the review process to allow back-door answers to code and libraries which accelerate a competitor's integration of the winning solution.
  • Can you comment on the spirit and fairness of the public disclosure of the Prediction Algorithm documentation and it's impact on competitiveness?  In particular, if the documentation truly does meet the requirement of enabling a skilled computer science practitioner to reproduce the winning result, then this places the winning team at an unfair disadavantage: all competitors will have access to their algorithms and research, in addition to the winning algorithm.
  • Can you provide more detailed clarification on the level of documentation required by conditional milestone winners?  The guideline provided by the rules would cover a range of details and description spanning from "lecture notes" to "detailed tutorial" to "whitepaper" to "conference paper", etc.
  • Can you comment on the reproducibility requirement?  For example, it is possible to construct algorithms with stochastic elements that may not be precisely reproducible, even using the same random seed-- is it sufficient for these algorithms to reproduce the submission approximately?  What if they don't reproduce exactly, or reproduce at a prediction accuracy that is worse than the submission score, possibly worse than other competitor submissions?  
Thanks,
Andy
 
Signipinnis's image Posts 94
Thanks 25
Joined 8 Apr '11 Email user

reply deleted by author - I'm not Anthony.

 
Bobby's image Posts 5
Joined 14 Dec '10 Email user

I'm very interested in the answers to these questions as well. The answers will be a make it or break it for a lot of contestants.

 
andywocky's image Posts 18
Thanks 8
Joined 17 Jun '11 Email user

@Signipinnis

I don't mean this thread to be exclusionary -- sorry if it came across this way.  I addressed Anthony because I specifically want to get Kaggle's official comments on these items in addition to any other replies.  Please feel free to share your reply.

 
Christopher Hefele's image Posts 83
Thanks 50
Joined 1 Jul '10 Email user

I believe some of these points were addressed in an early post by Jeremy Howard (at Kaggle):

http://www.heritagehealthprize.com/c/hhp/forums/t/353/initial-questions-about-the-rules-dataset/2133#post2133 

“Only the paper describing the algorithm will be posted publicly. The paper must fully describe the algorithm. If other competitors find that it's missing key information, or doesn't behave as advertised, then they can appeal. The idea of course is that progress prize winners will fully share the results they've used to that point, so that all competitors can benefit for the remainder of the comp, and so that the overall outcome for health care is improved.”

Also, I think you won’t be forced to share your results, even if you’re in the #1 position – but then again, you won’t be able to claim the $30,000 or $20,000 either, unfortunately.  Those are the rules, and it certainly does create a dilemma for top competitors. Whether or not this structure is "fair" I think might be a question for philosophers. As a practical matter, it will spur innovation as people build off of others ideas, trying to stay competitive.   Also, note that there has been some great disclosures already  in the Forums (some with code!)  posted by top competitors (Chris R in particular) which have already helped others. 

Next, I should point out that the Netflix Prize had the same type of milestone prize structure & disclosure requirement.  One team -- Team BellKor -- won milestone / 'progress' prizes, disclosed their methods along the way, and was still able to be part of the team that won the $1MM Grand Prize.  Yes, other people built on the techniques they disclosed (but then again, BellKor's approach built on techniques that other teams had disclosed...).  My point is that in at least that case, it was possible for the leaders to disclose their methods & still remain competitive. 

About the level of detail required.  My opinion is that I would hope that the detail would strive to match the standards set by the Netflix Prize's "Progress Prize"  papers.  See the solution papers referenced in these posts: 

[ EDIT, to address ChrisR's point below ]   There's a lot of 'fancy' math in these papers, but I don't want to imply that that's necessary. In fact, too many equations can hinder understanding, and clear text or pseudocode might be better at times.  My point is that these documents do not try to gloss over any details or hide critical parameters in footnotes, etc. [ /EDIT ]

Finally, just to be clear, much of the above is my own opinion (as a humble Kaggle competitor),  not to be confused with any 'official' response to your questions. 

 
Bobby's image Posts 5
Joined 14 Dec '10 Email user

"The idea of course is that progress prize winners will fully share the results they've used to that point, so that all competitors can benefit for the remainder of the comp, and so that the overall outcome for health care is improved.”

Unacceptable. This is a contest not a group collaboration.

"I think you won’t be forced to share your results, even if you’re in the #1 position – but then again, you won’t be able to claim the $30,000 or $20,000 either, unfortunately. "

That is very unfortunate, and hopefully not true (I still hope a moderator will step in and inform us of the level of detail required). My only motive for this competition is the money, not to help others win money.

"Whether or not this structure is "fair" I think might be a question for philosophers --- but as a practical matter, it will spur innovation as people build off of others ideas."

It's not fair to anyone, and copy-cats stand to benefit. The idea I'm implementing has taken me my ENTIRE LIFE of research to get to. There's not a chance in hell I would willingly give it away for others to get a free shortcut/cheat. I'm standing by to hear the official response before I even make my submission to the leader board.

 
Signipinnis's image Posts 94
Thanks 25
Joined 8 Apr '11 Email user

Bobby wrote:

Unacceptable. This is a contest not a group collaboration.

Not really, this is a hybrid model of a crowd-sourced search for a problem solution. There are two preliminary phases, incentivized by cash awards specifically for collaboration. Then there's the gold rush for the best ultimate solution, arising from the previous shared benchmark/algorithm/methodology.

Bobby wrote:

The idea I'm implementing has taken me my ENTIRE LIFE of research to get to. There's not a chance in hell I would willingly give it away for others to get a free shortcut/cheat. I'm standing by to hear the official response before I even make my submission to the leader board.

So wait until the 3rd phase starts.

The way I see it, there are (likely) a number of people here with a proprietary approach/tool that they think will absolutely, unquestionably, blow the doors off everyone else. And needless to say, if one has that kind of a competitive edge, (esp. if based on one's own intellectual property developed from years of work), one would be extremely reluctant to give it up for a few pieces of silver.

But here's the thing: many may THINK they exclusively have an unbeatable super-algorithm, but by definition, when all is said and done, only one of them can be the Bob Beamon of this contest.

And there are a LOT of excellent data miners, using the best extant good tools and a lot of time & ingenuity, working the solution space. Think of it as a genius ensemble, with a huge amount of available computational time. Odds are very good that a hard-working data miner or health care analyst using existing tools will ultimately barely edge out another hard-working data miner.

But the easy cure for the anyone with "I have proprietary secrets that are worth more than $x0,000" sentiments is to simply sandbag or wait on the sidelines until Phase 3 starts. Then take the Big Prize if you are able.

 
Chris Raimondi's image Rank 38th
Posts 194
Thanks 90
Joined 9 Jul '10 Email user

About the level of detail required.  My opinion is that I would hope that the detail would match the standards set by the Netflix Prize's "Progress Prize"  papers.  See the solution papers referenced in these posts:

I love the netflix papers.  If I am lucky or skilled enought to win a prize - I would do my best to detail it in enough detail as possible - with a couple things in mind:

1) Math - don't know it - can't read it - can't write it .  If people want to see equations - you might be out of luck.

2) Statistics - don't know it - can't read it - can't write it . I have no clue half the time what people are talking about - I have my own methods for figuring stuff out.  I am only somewhat exagerrating here.

3) R - my code sucks, but I would give details on all packages used and any non default settings used as well.  I like the way the netflix papers are written - and it should be a lot shorter without all those equations.  And of course all features used.

4) I don't know why people are so over protective.  Most of the stuff I have is 75% similar to the features Dan posted (I thought he was on the way out :) ).  Dan is beating me now, but most of that stuff was stuff intelligent people could think of.  If you really have a kick ass algo - take your second best kick ass algo and use that instead.  There are tons of people here - including people in this very thread who fininshed #2 (tied for first score wise) in the NetFlix competition.  Do you really think they aren't going to think of something along the same lines?  If you can't think of a second kick ass algo - well then your first one probably sucks too - you just don't know it yet :)

5) IMHO - It IS a collaberation - I plan on going to the strata conference either way - hopefully can meet a few more of you.  Have met two of you already.  I am more than will to talk shop with other people and trade ideas (of course not any SECRET stuff).  I have learned a lot from this forum - from R packages I have never heard of - to better ways of doing things.  The things I have learned at other conferences weren't necessarily (these arent ML conferences, but SEO related) related to what I was looking for, but something someone would say would spark an idea about something else.

6) In the future I think other competitions might want to consider going with a "Second Price Auction" type model.  In google - when you bid on advertising - you pay the price of the person UNDER you.  This encourages people to bid their true value (and according to economists works out best in a GT/NE type way).  Using the same method a - a person winning a progress prize - could be required to produce an algo - that is equal to or better than the person below them.  Obviously doesn't work in this case, but for other ones in the future - maybe other people would like the idea.  This woud allow people to feel more comfortable in including their best stuff...

.... Kind of droned on there - one last thing:

Documentation must be written in English and must be written so that individuals trained in computer science can replicate the winning results.

from the rules. 

I hope people aren't going to try an beat a dead horse with the two teams that win.  In my mind if other people can confirm they are able to reproduce the results - then that settles it for me.  Hopefully I will be able to duplicate it as well, but IMHO it is not their job to get everyones code working.

 
DanB's image Rank 2nd
Posts 58
Thanks 46
Joined 6 Apr '11 Email user

It seems a lot of people are concerned they might win a progress prize they don't want.

My understanding is that you can choose which submission is considered for the prize. If you don't want the progress prize (and everything that comes with it), make a submission where every prediction is 1. For anyone concerned, NOT winning the prize should be very easy.

Personally, I'd be surprised if those at risk of winning a progress prize did this.

Thanked by Zach
 
Signipinnis's image Posts 94
Thanks 25
Joined 8 Apr '11 Email user

ExitingSlowlySoMaybeHe'sNotAndThatsAOkayByMe DanB wrote:

Personally, I'd be surprised if those at risk of winning a progress prize did this.

Nice phrase. Personally, being at risk for winning a prize is something I'm looking forward to.

 
Signipinnis's image Posts 94
Thanks 25
Joined 8 Apr '11 Email user

Speaking of (gasp) "collaboration":

I hope it has not escaped anyone's attention that +/- 18 days ago, DanB announced "I don't have time for this anymore, here's what I've done so far, hope it helps somebody" ... and dumped various parts of his algorithm in forum posts for all to see.

Various questions and answers then followed.

Now DanB is in 5th Place on the Leaderboard.I could be wrong, but I don't think he was Top 10 before.

Collaboration works !  Sometimes in unexpected ways !!!

Thanks DanB.

Hope you're able to stay in after all.

 

 

 
Zach's image Rank 31st
Posts 292
Thanks 64
Joined 2 Mar '11 Email user

Signipinnis wrote:

Thanks DanB.

Hope you're able to stay in after all.

 

Me too!  And keep sharing ideas =)

 
Anthony Goldbloom (Kaggle)'s image
Anthony Goldbloom (Kaggle)
Competition Admin
Kaggle Admin
Posts 382
Thanks 72
Joined 20 Jan '10 Email user
From Kaggle

Hi all,

Not ignoring this thread. Just seeking clarification from HPN on one issue.

Anthony

 
Anthony Goldbloom (Kaggle)'s image
Anthony Goldbloom (Kaggle)
Competition Admin
Kaggle Admin
Posts 382
Thanks 72
Joined 20 Jan '10 Email user
From Kaggle

Sorry for the delay on this, was just clarifying some issues with HPN.

  • Is it inconsistent, as Sali Mali pointed out in another thread, to require documentation of the winning algorithms be publicly disclosed to all competitors given Rule 20, Entrant Representations?  It seems that this disclosure will encourage other competitors to use aspects of the winning Prediction Algorithm which cause violation, directly or otherwise, of (i) - (iii) and possibly (iv) of that Rule.

Rule 20 does not apply to the extent that it prevents (a) competitors other than a milestone prize-winner from using code published by a milestone prize-winner in accordance with competition rules; and (b) a milestone prize-winner from competing subsequently in the competition using code for which it was awarded the milestone prize.

  • Can you clarify that code, libraries and software specifications are *not* required to be publicly disclosed to competitors?  These materials and intellectual property appear to be referenced separately from "Prediction Algorithm and documentation."

Chris correctly points to Jeremy's response in an earlier forum post:

“Only the paper describing the algorithm will be posted publicly. The paper must fully describe the algorithm. If other competitors find that it's missing key information, or doesn't behave as advertised, then they can appeal. The idea of course is that progress prize winners will fully share the results they've used to that point, so that all competitors can benefit for the remainder of the comp, and so that the overall outcome for health care is improved.”

  • Will Kaggle or Heritage have a moderation or appeals process for handling competitor complaints?  From the winning entrant's point-of-view, they would not want to be forced through the review process to allow back-door answers to code and libraries which accelerate a competitor's integration of the winning solution.
Kaggle and the HHP judging panel will moderate the appeals process.
  • Can you comment on the spirit and fairness of the public disclosure of the Prediction Algorithm documentation and it's impact on competitiveness?  In particular, if the documentation truly does meet the requirement of enabling a skilled computer science practitioner to reproduce the winning result, then this places the winning team at an unfair disadavantage: all competitors will have access to their algorithms and research, in addition to the winning algorithm.
This rule is in place to promote collaboration. Those who would prefer not to share can opt out of the prize.
  • Can you provide more detailed clarification on the level of documentation required by conditional milestone winners?  The guideline provided by the rules would cover a range of details and description spanning from "lecture notes" to "detailed tutorial" to "whitepaper" to "conference paper", etc.
Hopefully this was adequately dealt with in Jeremy's response (requoted above). Let me know if further clarification is needed.
  • Can you comment on the reproducibility requirement?  For example, it is possible to construct algorithms with stochastic elements that may not be precisely reproducible, even using the same random seed-- is it sufficient for these algorithms to reproduce the submission approximately?  What if they don't reproduce exactly, or reproduce at a prediction accuracy that is worse than the submission score, possibly worse than other competitor submissions?  

Exactly reproducibility is required. 

 
Sali Mali's image Rank 4th
Posts 292
Thanks 113
Joined 22 Jun '10 Email user

Anthony Goldbloom wrote:

Exactly reproducibility is required. 

Is this reproducability of a submission that gives the same leaderboard score, or does the actual submission file have to be identical?

If it is the latter, then I guess this will be impossible for most people - an I for one am out.

An algorithm that relies on a particular setting of a random number seed to work is no good to anyone. The algorithm should result in the same overall predictive accuracy, but this is different from the exact same predictions.

 

   

Thanked by Chris Raimondi
 
Signipinnis's image Posts 94
Thanks 25
Joined 8 Apr '11 Email user

Anthony Goldbloom wrote:

Exactly reproducibility is required. 

I'm not using anything (yet) that would not be precisely reproducible due to use of different random seeds during a re-run of my process, but you have to make reasonableness allowances on this ... some people may be using tools that don't allow them to lock down their starting seeds.

How about defining "exact reproducibility" as

  * generating same score as their leaderboard score, +/- 0.0002,

and

  * that's still better than the next closest competitor ?

Remember that while this could be show-stopping issue for some competitors that causes some to drop out *NOW*, it is also not in the Heritage Provider's Network's best interest to disqualify the top dog for inconsequential randomness of results, if that's still the best solution out there.

If your re-run of the best algorithm was, due to random factors, not better than the 2nd ranking entrant, you could declare a tie .... essentially that's saying that the confidence intervals on the top submissions overlap such that you could not statistically conclude that one was clearly better than the other. I think most quants would agree that would be an accurate assessment, and fair adjudication, in such a circumstance.

HTH

 

Thanked by Chris Raimondi
 
B Yang's image Rank 2nd
Posts 197
Thanks 46
Joined 12 Nov '10 Email user

I really hope you reconsider the exact reproducibility requirement. In theory it's possible, in practice it'll probably mean the winner has to send you the computer(s) he used. If he used cloud computing, forget about it.

It's not just random seeds. Floating point calculations have inherent loss of precision. Slight differences in compiler, CPU hardware, development environment, 3rd-party libraries, or runtime environments may make things not exactly reproducible.

,

Thanked by Chris Raimondi
 
Jeremy Howard (Kaggle)'s image Posts 166
Thanks 58
Joined 13 Oct '10 Email user
From Kaggle

Guys, I'm pretty confused about the concern over reproducibility, and some of the points issued just don't make sense to me. For example, the loss of precision due to floating point arithmetic is deterministic. As another example, seeding the random number generator at the start of your process is just one line of code - it doesn't mean that the algorithm then doesn't work for different random number, it only means that the specific results can be replicated. Different compilers should not give different random numbers, since a compiler does not define a random number generator (random numbers from a given seed use a given defined function based on the library you call). Ditto with the CPU hardware and development environment. Yes, different libraries can give different results, if they have the same API but use different algorithms - so let's just use the same libraries as you when testing...

Remember, a random number generator is simply a deterministic function that is applied to a seed. The randomisation that is used based on external factors (e.g. MAC address, ticks, process id, etc) is used to pick a seed if a specific one is not given, and can be easily overridden by seeding the RNG.

Has anyone actually in practice during this comp re-run their algorithm using a given random seed, and got different results?

The practical difficulties of setting a specific difference threshold is not at all easy, and I haven't yet heard any practical reason as to why reproducible results should cause problems. For $3m, we're happy to spend some time ensuring we have the same versions of software and libraries as you to meet this reproducibility requirement!

Sorry if I'm missing something obvious - feel free to follow up with specific examples, questions, other issues, etc. We're all very keen to make sure everyone is comfortable with this process. :)

 
Sali Mali's image Rank 4th
Posts 292
Thanks 113
Joined 22 Jun '10 Email user

Hey Jeremy,

The concern is more a practical matter. As a modeller I NEVER set a random seed as it is my belief that you are just fooling yourself your algorithm results depend on this. As a result if I rerun my models, I would never expect the exact same predictions to the nth decimal place.

Now this rule has come to light, I can make sure everything is reproducable from now on, but it would be impossible to go back and re-build 'exactly' over 100 models that make up our solution.

What if an alternative rule was to produce a solution that was more accurate than the nominated milestone prize winner. Would this be acceptable?  This would at least give some breathing space for those like me who are not as diligent.

 

 

 
Chris Raimondi's image Rank 38th
Posts 194
Thanks 90
Joined 9 Jul '10 Email user

Has anyone actually in practice during this comp re-run their algorithm using a given random seed, and got different results?

Haven't tried, but in the HIV competition - I reran the code from scratch on my other computer (as I had icluded code for installing needed packages - and wanted to make sure it ran fine) - and I absolutely got a different result - close - but different.  It may have had a different version of R - I don't remember - cause I set it up to install everything from scratch.

When running the code on my regular computer - I always got the same result. (didn't try running it more than once on the other computer).

 
Sarkis's image Posts 41
Thanks 5
Joined 5 Apr '11 Email user

Sali Mali wrote:

What if an alternative rule was to produce a solution that was more accurate than the nominated milestone prize winner. Would this be acceptable?  This would at least give some breathing space for those like me who are not as diligent.

I'm not sure if I understand this correctly. If you can produce a solution that is more accurate than the nominated milestone prize, why not do it now? You have 10+ days to do that. Also, how long would it take to a produce solution that is more accurate than the nominated milestone prize?

My G+ invitation still stands. Cheers! https://plus.google.com/106416594206743737486/posts. This song is stuck in my head and couldn't embed this here - http://www.youtube.com/watch?v=X5XKWiGcw3k

Update: Unless you have been living under a rock this one will get you up to speed: https://plus.google.com/101725032603312323854/posts/2tQD3uotqJJ

Help, I need somebody, / Help, not just anybody, / Help, you know I need someone, / Help!

https://plus.google.com/106416594206743737486/posts/ZfZi8ASBCZ9

 
Chris Raimondi's image Rank 38th
Posts 194
Thanks 90
Joined 9 Jul '10 Email user

Also - here are more two REAL concerns about this:

1) I plan on spending a decent amount of time making sure my code is reproducible.  I have no problem with this rule - as long as you 100% will be enforcing it.  If someone who doesn't make this attempt wins - and isn't disqualified - that will be unfair to those of us trying to make sure our code is reproducible (as we could have spent more time on our algos).

2) I don't know enough about computers to understand (I get the gist) the points you made about being deterministic and such - and I notice the rules allow:

Sponsor may require conditional winners to submit computer hardware or a virtual machine instance that runs the Prediction Algorithm’s code

If someone is concerned about this - and you can't replicate the results - may WE be allowed to submit a computer if we think it necessary?

Sorry if I seem paranoid - I just would hate for someone to lose over something random (pun intended).

 

Thanked by Sarkis
 
Christopher Hefele's image Posts 83
Thanks 50
Joined 1 Jul '10 Email user

Jeremy Howard (Kaggle) wrote:
Guys, I'm pretty confused about the concern over reproducibility.... seeding the random number generator at the start of your process is just one line of code....a random number generator is simply a deterministic function that is applied to a seed....Sorry if I'm missing something obvious - feel free to follow up with specific examples, questions, other issues, etc.

Well, Bo Yang made a good point elsewhere in the forums; he noted that if you're writing your own multi-threaded code & the threads share a random number generator, your results can be non-deterministic.

Agreed, if you set the seed up front, the random number generator will spit out the same random numbers in the same sequence. But multiple threads may call the random number generator in a different order depending on how the OS schedules those threads to run (depending on, for example, what else is running on your system).  So in one run, thread #1 in your program may grab the first random number & thread #2 grabs the second one, but when you rerun your program, thread #2 may grab the first random number, and thread #1 the second. Given this, your results could change, depending on what you've coded thread #1 to do with its random numbers vs thread #2.

I've attached a small Python program that demonstrates this.  It spawns a thread that sets the RNG seed & then immediately grabs a random number and prints it. In parallel, the other (main) thread just grabs random numbers.  The result is that the random number that the program prints out differs between runs, even though the same seed is set each run. 

For people using R or single-threaded code, this isn't an issue. But for any of people writing multi-threaded code so they can use all their cores, it seems that exact reproducibility might not be guaranteed by just simply setting the seed (you'd probably have to set the seed(s) AND structure your code a little differently to avoid this issue). 

1 Attachment —
 
Signipinnis's image Posts 94
Thanks 25
Joined 8 Apr '11 Email user

Jeremy Howard (Kaggle) wrote:

Remember, a random number generator is simply a deterministic function that is applied to a seed. The randomisation that is used based on external factors (e.g. MAC address, ticks, process id, etc) is used to pick a seed if a specific one is not given, and can be easily overridden by seeding the RNG.

Well, I use a lot of SAS, which has a large number of functions that generate pseudo-random numbers. For some of them, the built-in default choice is to use the system clock to generate the initial seed. So from that I am going to presume that using the system clock to set the initial seed value is a well-accepted practice for pseudo-randomization. To the best of my knowledge, if one is using built-in random numbers functions and the system clock in SAS as the initial seed, there is no mechanism possible to determine what the actual initial seed value was. And I am willing to take on faith that if one cannot determine the precise nano-second value used, the odds of a verification re-run of the code to ever re-enter the deterministic stream of numbers at exactly the same point as the submission run are infinitely small.

Now in SAS-code I write, I have the ability to control the seed, and can make sure I never call a seed based on the system clock. But if I use some complied code constructed by someone else (and yes, people do sometimes distribute pre-compiled SAS macros, as a way of protecting their intellectual property), I could be using a random number function where I have not been given the ability to set the initial seed.

Likewise, if anyone is using applications or code build by others, it is very conceivable that the logic to partition the data in test vs validation datasets, pick variables in a rain forest ensemble, etc, may be controlled by unseen and untouchable random number generators that are tied to the system clock.

I have the impression that a number of people are using the caret package for R ... shout out Thanks to Max Kuhn right here .... SALUTE !!! .... it is possible to explicitly set the seed for every modeling package for which caret is a wrapper ? Are there any where there's an inaccessible, clock-based seed under the covers? Fact is, I don't know, and I absolutely do not want to HAVE to know about such details, and use that information to decide which modeling packages are safe to use for this contest, and which aren't.

The fact that I can't cite an example of a specific piece of s/w with hard-coded and inaccessible clock-based seed doesn't mean such s/w doesn't exist; I've never had to worry about that level of reproducibility before, so I haven't been accumulating a list of examples. But in an earlier stage of my career, I was a data analyst for a statistical consulting company, where most of the work we did was litigation support, intended to be used as evidence in trials. Our working assumption was always the other side could pay other people equally as proficient more money to go through our data and analysis line by line in an effort to discredit it ... which some did attempt .... and even in that adversarial environment where convictions meant jail or multi-million dollar fines, the issue of "exact reproducibility to 0.0000000001" never was an issue.)

But here and now it is becoming an issue, because of the interpretation you all are putting on "exactly reproducible."

Has anyone actually in practice during this comp re-run their algorithm using a given random seed, and got different results?

Until this came up, most people probably weren't thinking they'd be wise to test their code on a different hardware platform. And most people probably eyeball their self-generated "Kaggle score" at 4 decimals of significance, not full double-digit precision. So here again, the absence of citable evidence is not compelling.

The practical difficulties of setting a specific difference threshold is not at all easy,

Really? How about "reproducible to within +/- 0.001% of the contestant's leaderboard score?" How hard is that? It's an arbitrary mgmt decision about how close is close enough to draw a reasonable conclusion that the code and algorithm you were given did in fact generate the predicted scores that were submitted. What you are trying to rule out is the possibility that the submitted scores were generated from illicit access to the uncensored, non-public HPN data, rather than a legitimate, predictive algorithm, constructed using the datasets made available to all contestants.

and I haven't yet heard any practical reason as to why reproducible results should cause problems. (... snip snip ...) We're all very keen to make sure everyone is comfortable with this process. :)

(a) YOU haven't given any plausible reasons why a tolerance limit of +/- .0000x% is NOT acceptable,

and

(b) You have contestants, such as Sali Mali, who at one point was #1 on the leaderboard, telling you ("if it is the latter, then I guess this will be impossible for most people - an I for one am out"), and B Yang, currently ranked #6 ("I really hope you reconsider the exact reproducibility requirement. In theory it's possible, in practice it'll probably mean the winner has to send you the computer(s) he used. If he used cloud computing, forget about it.") that this is a significant issue to them. Feedback from your free labor pool is telling you that some are NOT comfortable with exact reproducibility to the last 0.0000000000001.

What you do with that information is entirely your mgmt decision, as it rightfully should be.

And how current and potential contestants react in turn to whatever decision you make is entirely their decision.

(I am not trying to be a hard- er "nose" on this ... but I AM trying to forcefully and clearly state why a reasonable tolerance limit is fair, practical, easy to administer, and accomplishes the essential requirements of the "reproducibility requirement" from the viewpoint of the client, your company, and the contestants.)

Post-edit comment: this was written at the same time Christopher Hefele was writing his post on the inconsistent (non-reproducible) effects when multiple threads are hitting the same stream from a number generator. An issue I had read about, but forgotten. Had I seen his post first, this one would have been much shorter !

 
Sali Mali's image Rank 4th
Posts 292
Thanks 113
Joined 22 Jun '10 Email user

Sarkis wrote:

Sali Mali wrote:

What if an alternative rule was to produce a solution that was more accurate than the nominated milestone prize winner. Would this be acceptable?  This would at least give some breathing space for those like me who are not as diligent.

I'm not sure if I understand this correctly. If you can produce a solution that is more accurate than the nominated milestone prize, why not do it now? 

What I mean is that to win the milestone you have to EXACTLY replicated the submitted file, to the 99th decimal place of every prediction. This will be pretty tough for most contestants and a complete nightmare for the organisers to police. I would have thought it would have been enough to sumit a solution that genreates a prediction file that at least gives the same accuracy as this proves your method works, event though it might not be EXACTLY the same. And if the solution you submit is more accurate, then this also proves your method works  - which is the whole point of the validation exercise.

I would hate for anyone to be disqualified if they submitted a solution that had the same accuracy, had a correlation of 0.99 but was not EXACTLY the same.

 

 

 
B Yang's image Rank 2nd
Posts 197
Thanks 46
Joined 12 Nov '10 Email user

Jeremy Howard (Kaggle) wrote:
Sorry if I'm missing something obvious - feel free to follow up with specific examples, questions, other issues, etc. We're all very keen to make sure everyone is comfortable with this process. :)

The main point is, this is a pointless requirement that requires non-trivial amount of effort to meet, and we hate stuff like this.

You're right about random numbers and I won't use a hardware random number generator, but there're other issues.

For example, if you change a common function to improve performance, fixed unrelated bugs, or add new features, then you have to rerun all models that use this function.

If you have a program that generate multiple models, then every time you change code for one model, you'll have to rerun the others too.

Multithreaded programs may need extra coding to make sure the results the same regardless of the number of processors.

For software like R, you have to use the same version for all packages used. How can you guarantee all previous versions can be found ? Again, it'll end up requiring submitting the computer(s) used. What if the hardware got damaged during shipping ?

Jeremy Howard (Kaggle) wrote:
the loss of precision due to floating point arithmetic is deterministic.

I'm willing to accept it is deterministic, given the same sequence of instructions on the same processor. On different processors, well maybe the results will still the same after first 10 million calculations, but I won't be suprised if they're not.

Also see this intel document titled "Consistency of Floating Point
Results or Why doesn’t my application always give the same answer?":

http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/

That's just one compiler and presumably doesn't cover AMD processors. If you read it carefully, even intel doesn't guarantee exact reproducibility on its own compilers.

 
Jeremy Howard (Kaggle)'s image Posts 166
Thanks 58
Joined 13 Oct '10 Email user
From Kaggle

Many thanks for all the thoughtful and interesting replies. (And Signipinnis: you don't need to convince me to listen to Sali Mali and B Yang - I have been beaten by both of them in past competitions so am well aware of their skills!)

Really, a key concern of mine is that I absolutely don't want to change the rules mid-competition. This has always been something that I've felt very strongly about in all competitions. As Chris Raimondi points out, all competitors would already need to be spending plenty of time on ensuring reproducibility, and we don't want to disadvantage those who have been working hard on this already by removing the requirement mid-comp. There's only about 10 days until the first milestone prize date arrives!

Rule 12 specifies that the judges will be looking at results of the algorithm to the sixth decimal place. It also specifies that your algorithm will be judged on its predictive accuracy. So if your submitted algorithm's predictive accuracy is exactly the same as your submitted entry's score to the sixth decimal place, then that meets the rules as they are written.

I suggest that all competitors who think they might be in the running for a progress prize ensure over the coming days that they can recreate their best model's performance to 6 decimal places - or if necessary, to re-run it using a fixed random seed and submit that instead, being careful to not select models for consideration that have results you can not reproduce.

Thanked by Chris Raimondi
 
Jeremy Howard (Kaggle)'s image Posts 166
Thanks 58
Joined 13 Oct '10 Email user
From Kaggle

Sali Mali wrote:

What I mean is that to win the milestone you have to EXACTLY replicated the submitted file, to the 99th decimal place of every prediction. This will be pretty tough for most contestants and a complete nightmare for the organisers to police.

Whilst I don't see why in practice this should be hard to either achieve or police, note that we don't require this. I can see looking back over this thread that we haven't previously made this clear, for which I apologise. The rules require simply that the accuracy can be replicated to 6 decimal places.

However, I suggest that competitors do check their reproducibility now, because you may be surprised at how hard it is to go back and reproduce a result if you didn't keep this requirement in mind throughout the process. Yes, I am speaking from experience as someone who has been burnt by this before!

Thanked by Sali Mali
 
Sali Mali's image Rank 4th
Posts 292
Thanks 113
Joined 22 Jun '10 Email user

Thanks Jeremy, this clarification now makes more sense.

One further question though, if the the submitted algorithm to the judges, by act of randomness gave a more accurate score than was expected, would this count as having passed the test or would it be disqualified?

 
Jeremy Howard (Kaggle)'s image Posts 166
Thanks 58
Joined 13 Oct '10 Email user
From Kaggle

Sali Mali wrote:

One further question though, if the the submitted algorithm to the judges, by act of randomness gave a more accurate score than was expected, would this count as having passed the test or would it be disqualified?

That would pass the test.

Thanked by Chris Raimondi , and Sali Mali
 
Sali Mali's image Rank 4th
Posts 292
Thanks 113
Joined 22 Jun '10 Email user

Sorry to harp on again and if the answer to this question has already been answered then more apologies - but I think it is important to make sure everyone is on the same page of understanding of exactly what is required.

Say my ''predictive algorithm'' was logistic regression. Is it OK to just say we used logistic regression and these are the coefficients that the particular variant of logistic regression came up with, or do we need to provide the training data, the exact logistic regression solver used (fixed hessian newton, quasi-newton, bfgs, conjugate gradient etc.all which will give different answers) and you then need to replicate our modelling to come up with the exact same coefficients.

Or in other words, is it enough to just supply an ''equation'' that transofms the raw data into the prediction (which I guess is all HHP are interested in), or do we need to be able to replicate the algorithm that generated that equation.

For example, all my Tiberius models are built and then spit out SQL code that lets me score up the test set (you actually see a peek of this SQL in the Catalyst program!). This makes my submitted models 100% reproducable in the sense that it will transofrm the data into the prediction. Is submitting this SQL ''equation'' acceptable as the ''algorithm''?

 

 
Sarkis's image Posts 41
Thanks 5
Joined 5 Apr '11 Email user

Sali Mali wrote:

...

For example, all my Tiberius models are built and then spit out SQL code that lets me score up the test set (you actually see a peek of this SQL in the Catalyst program!). This makes my submitted models 100% reproducable in the sense that it will transofrm the data into the prediction. Is submitting this SQL ''equation'' acceptable as the ''algorithm''?

To be eligible for a prize, the conditional winner must deliver the Prediction Algorithm’s code. In other words, is it not enough to just supply an ''equation'' that transforms the raw data into the prediction. Rule 12 provides more details about this - http://www.heritagehealthprize.com/c/hhp/Details/Rules

Once an Entry is selected as eligible for a prize, the conditional winner must deliver the Prediction Algorithm’s code and documentation to Sponsor for verification within 21 days. Documentation must be written in English and must be written so that individuals trained in computer science can replicate the winning results. Source code must contain a description of resources required to build and run the method. Conditional winners must be available to provide assistance to the judges verifying their Entries. Sponsor may require conditional winners to submit computer hardware or a virtual machine instance that runs the Prediction Algorithm’s code. If the judges cannot verify an Entry using the Prediction Algorithm after two attempts, Sponsor reserves the right to disqualify the Entry. Sponsor also reserves the right to test winning Entries on additional data sets. If a Prediction Algorithm fails to produce similar accuracy on any such additional data set, Sponsor may in its sole discretion disqualify the Entry.

See also: http://www.heritagehealthprize.com/c/hhp/forums/t/349/external-data

 
Bobby's image Posts 5
Joined 14 Dec '10 Email user

Not to drive this thread into the ground with specifics as I may be a minority here (so feel free to ignore this concern), but what if reproducing the result of the methodology is outright impractical? I have software that creates (complex) models, which evolve their own unique (and dynamic) rules for evaluating and predicting data. The internal working data is nonsense to anyone who were to look at it, but it is indeed a self-contained system that can produce readable results. Forget random numbers, re-creating this software and having it recreate the exact model, with the exact optimized rules for evaluating and predicting is improbable.

 
Sali Mali's image Rank 4th
Posts 292
Thanks 113
Joined 22 Jun '10 Email user

Sarkis wrote:

Sali Mali wrote:

...

For example, all my Tiberius models are built and then spit out SQL code that lets me score up the test set (you actually see a peek of this SQL in the Catalyst program!). This makes my submitted models 100% reproducable in the sense that it will transofrm the data into the prediction. Is submitting this SQL ''equation'' acceptable as the ''algorithm''?

To be eligible for a prize, the conditional winner must deliver the Prediction Algorithm’s code. In other words, is it not enough to just supply an ''equation'' that transforms the raw data into the prediction. Rule 12 provides more details about this - http://www.heritagehealthprize.com/c/hhp/Details/Rules

Once an Entry is selected as eligible for a prize, the conditional winner must deliver the Prediction Algorithm’s code and documentation to Sponsor for verification within 21 days.

 

Is it posible to get an official definiation of what is meant by Prediction Algorithm's code. To me this means the code that converts the raw data into the prediction.

If we are required to take it to the next level then where do you stop at the definition of code?

If we use r then is it enough to supply an r model object that can score the data, or do we need to supply the r code that generated the object?

 

 

 

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?