Clarifying Rule #13 for Milestone 1 - Open Questions & Issues

« Prev
Topic
» Next
Topic
<123>
Sali Mali's image Rank 4th
Posts 292
Thanks 113
Joined 22 Jun '10 Email user

Sorry to harp on again and if the answer to this question has already been answered then more apologies - but I think it is important to make sure everyone is on the same page of understanding of exactly what is required.

Say my ''predictive algorithm'' was logistic regression. Is it OK to just say we used logistic regression and these are the coefficients that the particular variant of logistic regression came up with, or do we need to provide the training data, the exact logistic regression solver used (fixed hessian newton, quasi-newton, bfgs, conjugate gradient etc.all which will give different answers) and you then need to replicate our modelling to come up with the exact same coefficients.

Or in other words, is it enough to just supply an ''equation'' that transofms the raw data into the prediction (which I guess is all HHP are interested in), or do we need to be able to replicate the algorithm that generated that equation.

For example, all my Tiberius models are built and then spit out SQL code that lets me score up the test set (you actually see a peek of this SQL in the Catalyst program!). This makes my submitted models 100% reproducable in the sense that it will transofrm the data into the prediction. Is submitting this SQL ''equation'' acceptable as the ''algorithm''?

 

 
Sarkis's image Posts 41
Thanks 5
Joined 5 Apr '11 Email user

Sali Mali wrote:

...

For example, all my Tiberius models are built and then spit out SQL code that lets me score up the test set (you actually see a peek of this SQL in the Catalyst program!). This makes my submitted models 100% reproducable in the sense that it will transofrm the data into the prediction. Is submitting this SQL ''equation'' acceptable as the ''algorithm''?

To be eligible for a prize, the conditional winner must deliver the Prediction Algorithm’s code. In other words, is it not enough to just supply an ''equation'' that transforms the raw data into the prediction. Rule 12 provides more details about this - http://www.heritagehealthprize.com/c/hhp/Details/Rules

Once an Entry is selected as eligible for a prize, the conditional winner must deliver the Prediction Algorithm’s code and documentation to Sponsor for verification within 21 days. Documentation must be written in English and must be written so that individuals trained in computer science can replicate the winning results. Source code must contain a description of resources required to build and run the method. Conditional winners must be available to provide assistance to the judges verifying their Entries. Sponsor may require conditional winners to submit computer hardware or a virtual machine instance that runs the Prediction Algorithm’s code. If the judges cannot verify an Entry using the Prediction Algorithm after two attempts, Sponsor reserves the right to disqualify the Entry. Sponsor also reserves the right to test winning Entries on additional data sets. If a Prediction Algorithm fails to produce similar accuracy on any such additional data set, Sponsor may in its sole discretion disqualify the Entry.

See also: http://www.heritagehealthprize.com/c/hhp/forums/t/349/external-data

 
Bobby's image Posts 5
Joined 14 Dec '10 Email user

Not to drive this thread into the ground with specifics as I may be a minority here (so feel free to ignore this concern), but what if reproducing the result of the methodology is outright impractical? I have software that creates (complex) models, which evolve their own unique (and dynamic) rules for evaluating and predicting data. The internal working data is nonsense to anyone who were to look at it, but it is indeed a self-contained system that can produce readable results. Forget random numbers, re-creating this software and having it recreate the exact model, with the exact optimized rules for evaluating and predicting is improbable.

 
Sali Mali's image Rank 4th
Posts 292
Thanks 113
Joined 22 Jun '10 Email user

Sarkis wrote:

Sali Mali wrote:

...

For example, all my Tiberius models are built and then spit out SQL code that lets me score up the test set (you actually see a peek of this SQL in the Catalyst program!). This makes my submitted models 100% reproducable in the sense that it will transofrm the data into the prediction. Is submitting this SQL ''equation'' acceptable as the ''algorithm''?

To be eligible for a prize, the conditional winner must deliver the Prediction Algorithm’s code. In other words, is it not enough to just supply an ''equation'' that transforms the raw data into the prediction. Rule 12 provides more details about this - http://www.heritagehealthprize.com/c/hhp/Details/Rules

Once an Entry is selected as eligible for a prize, the conditional winner must deliver the Prediction Algorithm’s code and documentation to Sponsor for verification within 21 days.

 

Is it posible to get an official definiation of what is meant by Prediction Algorithm's code. To me this means the code that converts the raw data into the prediction.

If we are required to take it to the next level then where do you stop at the definition of code?

If we use r then is it enough to supply an r model object that can score the data, or do we need to supply the r code that generated the object?

 

 

 
<123>

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?