Hi,
a quick question about the test data set.
The supplied training data consists of 113,000 data sets (counted per member). The entries you submit to the public leaderboard consist of 70,942 data sets (a subset of the members). When uploading, it says that the evaluation is performed on 30% of the available test data, and the final evaluation will be performed on the other 70%. So that means that there are around 236,000 members in the final test data, if I am calculating correctly.
I don't understand these numbers. If we only have data for 133,000 members, and 70,942 of these will not be used for the final evaluation, that leaves us with 42,058 members. How can we then submit a prediction for 236,000 members? Will there be another data set for the final evaluation? Then, why do we only use 70,942 members for the leaderboard, and not the other 42,058?
Thank you, Julian
Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?

with —