# Predicted values

« Prev
Topic
» Next
Topic
 Posts 4 Joined 14 Jul '12 Email user Question about predicted values; In the DaysInHospital_Y2 and _Y3 training data files, the DaysInHospital field values (predicted fields) are all discrete values like 0, 1, 2, 3, ... but in the SampleEntry file which is supposed to be the model of what we need to submit, the DaysInHospital field has real-valued numbers, e.g. 0.203322991. So how does the predictor algormith yield real-valued numbers if our training data is all discrete? #1 / Posted 10 months ago
 Rank 38th Posts 194 Thanks 90 Joined 9 Jul '10 Email user Different algos will make different predictions. You could use an algo that only made discrete predictions, but the scoring method is RMSLE - So you will often be better off making a prediction between 0 and 1 (for example) rather than picking one or the other. The ACTUAL days in hospital are still discrete values - it is only the predictions that are not( they can be if you want). As you are almost never 100% sure of a prediction - the scoring method used - basically forces you to pick a non discrete number to minimize the error. Most regression algos don't give discrete values anyway - so unless you are using a classification algo - you will get non discrete values. #2 / Posted 10 months ago
 Posts 4 Joined 14 Jul '12 Email user So we can use the training-set for Y1 (that includes training examples plus predicted variable DaysInHospital) to predict the Y2 value for DaysInHospital based on the Y2 training examples. We can check the performance of our algorithm  this way. We can repeat this process using Y2 training examples, and predicted variable DaysInHospital to predict the Y3 value for DaysInHospital based on the Y3 training examples.  The question is what examples (feature variables) do we use to take the next step? We'd need Y4 examples to run through the machine learning algorithm to predict the Y4 DaysInHospital, yes? What am I missing? #3 / Posted 9 months ago
 Rank 84th Posts 60 Thanks 14 Joined 20 Mar '11 Email user Following your model: Claims in Y1 predict DaysInHospital for Y2 (train/test data) Claims in Y2 predict DaysInHospital for Y3 (train/test data) Claims in Y3 predict DaysInHospital for Y4 (this becomes your submission) So, you need Y3 (not Y4) Claims data (which is provided) to predict Y4 DaysInHospital. ObHint: You can also use a combination of Y1+Y2 data to train predictions about Y3 and then run those predictions against Y2+Y3 data to predict Y4. The best published solutions so far have included multiple year approaches like this. #4 / Posted 9 months ago