# Missing values in DayInHospital_Y2

« Prev
Topic
» Next
Topic
 Posts 10 Joined 10 Jan '11 Email user I'm wondering why there would be missing values for some patients in the DayInHospital_Y2 table? Wouldn't a missing value imply zero days in hospital? #1 / Posted 2 years ago
 Posts 2 Joined 4 Apr '11 Email user perhaps death of the patient, or that they dropped out of the study, or left the network. This is annoying however. I wish they had padded the data with NaN or NA. #2 / Posted 2 years ago
 Rank 31st Posts 68 Thanks 25 Joined 21 Oct '10 Email user Might be death, based on the profiles....i am however confused with the CharlsonIndex. Is it correct that they are dates? #3 / Posted 2 years ago
 Posts 4 Thanks 3 Joined 20 Mar '11 Email user I'm betting those turn out to be people new to the network in Y2. But I guess we'll have to wait for the Y2 claims data to find out. #4 / Posted 2 years ago
 Jeremy Howard (Kaggle) Kaggle Admin Posts 166 Thanks 58 Joined 13 Oct '10 Email user Charlson Index info: http://en.wikipedia.org/wiki/Comorbidity . The data has ranges for each record, rather than exact values. #5 / Posted 2 years ago
 Posts 1 Joined 4 Apr '11 Email user If you see dates for the Charlson Index, that's probably Excel "helpfully" trying to interpret a "1-2" as "2-Jan". #6 / Posted 2 years ago
 Posts 4 Joined 5 Apr '11 Email user Well, one question vis missing data in Y4 is: if a person dies during a certain year (or leaves the network, etc) -- how will we be scored on our prediction for that person? #7 / Posted 2 years ago
 Posts 5 Joined 5 Apr '11 Email user so how are these nulls treated in terms of scoring? Should we assign them 0? or just disregard them? #8 / Posted 2 years ago
 Rank 31st Posts 68 Thanks 25 Joined 21 Oct '10 Email user cmgast wrote: If you see dates for the Charlson Index, that's probably Excel "helpfully" trying to interpret a "1-2" as "2-Jan". Yes, thanks heaps! #9 / Posted 2 years ago
 Posts 1 Thanks 1 Joined 5 Apr '11 Email user @cmgast, That has to be it. I looked at the raw CSV and I see "1-2". Same with "10-19" at the "Members_Y1" table, which gets interpreted by XL as "19-Oct". Thanked by James Cunningham #10 / Posted 2 years ago
 Anthony Goldbloom (Kaggle) Competition Admin Kaggle Admin Posts 382 Thanks 72 Joined 20 Jan '10 Email user Apologies, this was an error. Thanks for drawing our attention to it. The missing values are for those people who have been in hospital for more than two weeks. They should be replaced with a 15. You can either do this yourself or download the updated dataset. For information, members who have in hospital for more than two weeks have been grouped for privacy reasons (they are rare, so may otherwise be identifiable). The implication of this grouping is that if you expect somebody to be in hospital for more than two weeks, you should predict 15 days. This grouping should not have a big impact because: a. members who are in hospital for more than two weeks are rare (about one per cent of members); b. the evaluation metric favors algorithms that accurately predict fewer days in hospital (on the assumption that these are more preventable). Thanked by Eu Jin Lok , ogenex , and Ken L. #11 / Posted 2 years ago
 Posts 10 Joined 10 Jan '11 Email user Excellent, thanks for the clarification. Glad I continued to monitor this thread otherwise may have missed it. Interesting implication for the range of predicted values, I guess it is nice to have a cap on hospital duration. #12 / Posted 2 years ago