"Once an Entry is selected as eligible for a prize, the conditional winner must deliver the Prediction Algorithm’s code and documentation to Sponsor for verification within 21 days. Documentation must be written in English and must be written so that individuals trained in computer science can replicate the winning results. Source code must contain a description of resources required to build and run the method. Conditional winners must be available to provide assistance to the judges verifying their Entries"

Consider and entry that uses a search heuristic (e.g. generic algorithm, hill climbing, etc.) to do pre-processing.  For example, one may use a search heuristic to generate 50 pairs of feature subsets and classifier choices based on some fitness measure.  Then say those 50 constituent classifiers are trained on their assigned subset of features, and used to generate predictions.  Finally, the 50 sets of predictions are combined using an ensemble method such as stacked regression.

The ambiguity is there doesn't seem to be a restriction on computing resources required to recreate a winning entry.

For example, suppose:

1) each heuristic search required 24 hours processing time (on a high-end workstation).

2) training each constituent classifiers required 2 hours processing time.

3) the final ensemble required 4 hours processing time.

If the Sponsor required full recreation of the winning entry from contest data, then 1304 computing hours are required or about 54 days.  Is it reasonable to assume that Sponsor will wait almost 2 months for verification?  The rules are unclear here!

If the Sponsor required only recreation from the selected feature subsets and classifier choices (using the output of the preprocessing steps as a starting point), then verification is reduced to 104 hours or about 4 1/3 days.  Or perhaps it is sufficient to also validate only a small sample of preprocessing steps which couple be completed a few days.  But again the rules are unclear here.

This is ignoring the impact of parallel computing, for example, perhaps the Sponsor has ample computing resources and can fully recreate the winning entry in 48 hours using a farm of 100 workstations.  But is it the contestant's or the Sponsor's responsibility to identify and implement the parallel computations?

Basically, I don't want to run a program on my workstation for 2 months to generate an entry and be disqualified on a technicality.

Thanks!

Jim

Let me simply the question.

Is there a restriction on the computing time required to recreate a winning entry?

It's feasible, for example, that the computer program that created a winning entry could require months to complete when run on a high end workstation.

Thanks

Jim

There's no ambiguity at all ... there is no rule in this contest limiting the run-time of a being-evaluated-to-a-winner solution.

While the sponsor or judges could implement a parallel approach for evaluation, it has been shown rather convincingly that a parallel/multi-core approach will distribute numbers from a "random number" function differently than if a single core is involved. Thus if any algorighms that use any kind of random function for sampling or initializations are involved, multi-core results will not be precisely the same as the algorithm run on a single core. Could that be a 4th decimal place difference, or a 12th-decimal place difference? We don't know. If the judges tested a multi-core implementation of a team's single core soluton, and did NOT "exactly" reproduce the original reported results, could they unequivocally state that the discrepency was not caused by the inherent noise introduced by their multicore re-do of the original? Possibly not. For that reason, it seems likely that the judges will run the submitted algorithm exactly "as is," although I assume they would look for the fastest available comparable hardware platform.

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?