AMULET Analytics's image Posts 4
Joined 14 Jul '12 Email user

Question about predicted values;

In the DaysInHospital_Y2 and _Y3 training data files, the DaysInHospital field values (predicted fields) are all discrete values like 0, 1, 2, 3, ... but in the SampleEntry file which is supposed to be the model of what we need to submit, the DaysInHospital field has real-valued numbers, e.g. 0.203322991. So how does the predictor algormith yield real-valued numbers if our training data is all discrete?

 
Chris Raimondi's image Rank 38th
Posts 194
Thanks 90
Joined 9 Jul '10 Email user
Different algos will make different predictions. You could use an algo that only made discrete predictions, but the scoring method is RMSLE - So you will often be better off making a prediction between 0 and 1 (for example) rather than picking one or the other. The ACTUAL days in hospital are still discrete values - it is only the predictions that are not( they can be if you want). As you are almost never 100% sure of a prediction - the scoring method used - basically forces you to pick a non discrete number to minimize the error. Most regression algos don't give discrete values anyway - so unless you are using a classification algo - you will get non discrete values.
 
AMULET Analytics's image Posts 4
Joined 14 Jul '12 Email user

So we can use the training-set for Y1 (that includes training examples plus predicted variable DaysInHospital) to predict the Y2 value for DaysInHospital based on the Y2 training examples. We can check the performance of our algorithm  this way.

We can repeat this process using Y2 training examples, and predicted variable DaysInHospital to predict the Y3 value for DaysInHospital based on the Y3 training examples. 

The question is what examples (feature variables) do we use to take the next step? We'd need Y4 examples to run through the machine learning algorithm to predict the Y4 DaysInHospital, yes?

What am I missing?

 
ChipMonkey's image Rank 84th
Posts 60
Thanks 14
Joined 20 Mar '11 Email user

Following your model:
Claims in Y1 predict DaysInHospital for Y2 (train/test data)
Claims in Y2 predict DaysInHospital for Y3 (train/test data)
Claims in Y3 predict DaysInHospital for Y4 (this becomes your submission)

So, you need Y3 (not Y4) Claims data (which is provided) to predict Y4 DaysInHospital.

ObHint: You can also use a combination of Y1+Y2 data to train predictions about Y3 and then run those predictions against Y2+Y3 data to predict Y4. The best published solutions so far have included multiple year approaches like this.

 

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?