Missing values in DayInHospital_Y2

« Prev
Topic
» Next
Topic
ogenex's image Posts 10
Joined 10 Jan '11 Email user

I'm wondering why there would be missing values for some patients in the DayInHospital_Y2 table? Wouldn't a missing value imply zero days in hospital?

 
noiin's image Posts 2
Joined 4 Apr '11 Email user
perhaps death of the patient, or that they dropped out of the study, or left the network. This is annoying however. I wish they had padded the data with NaN or NA.
 
Eu Jin Lok's image Rank 31st
Posts 68
Thanks 25
Joined 21 Oct '10 Email user
Might be death, based on the profiles....i am however confused with the CharlsonIndex. Is it correct that they are dates?
 
Todd Trimble's image Posts 4
Thanks 3
Joined 20 Mar '11 Email user

I'm betting those turn out to be people new to the network in Y2. But I guess we'll have to wait for the Y2 claims data to find out.

 
Jeremy Howard (Kaggle)'s image Posts 166
Thanks 58
Joined 13 Oct '10 Email user
From Kaggle
Charlson Index info: http://en.wikipedia.org/wiki/Comorbidity . The data has ranges for each record, rather than exact values.
 
ChrisG's image Posts 1
Joined 4 Apr '11 Email user
If you see dates for the Charlson Index, that's probably Excel "helpfully" trying to interpret a "1-2" as "2-Jan".
 
factfiber's image Posts 4
Joined 5 Apr '11 Email user
Well, one question vis missing data in Y4 is: if a person dies during a certain year (or leaves the network, etc) -- how will we be scored on our prediction for that person?
 
Chaseshaw's image Posts 5
Joined 5 Apr '11 Email user
so how are these nulls treated in terms of scoring? Should we assign them 0? or just disregard them?
 
Eu Jin Lok's image Rank 31st
Posts 68
Thanks 25
Joined 21 Oct '10 Email user

cmgast wrote:
If you see dates for the Charlson Index, that's probably Excel "helpfully" trying to interpret a "1-2" as "2-Jan".
Yes, thanks heaps!

 
CA17-South's image Posts 1
Thanks 1
Joined 5 Apr '11 Email user
@cmgast, That has to be it. I looked at the raw CSV and I see "1-2". Same with "10-19" at the "Members_Y1" table, which gets interpreted by XL as "19-Oct".
Thanked by James Cunningham
 
Anthony Goldbloom (Kaggle)'s image
Anthony Goldbloom (Kaggle)
Competition Admin
Kaggle Admin
Posts 382
Thanks 72
Joined 20 Jan '10 Email user
From Kaggle
Apologies, this was an error. Thanks for drawing our attention to it. The missing values are for those people who have been in hospital for more than two weeks. They should be replaced with a 15. You can either do this yourself or download the updated dataset. For information, members who have in hospital for more than two weeks have been grouped for privacy reasons (they are rare, so may otherwise be identifiable). The implication of this grouping is that if you expect somebody to be in hospital for more than two weeks, you should predict 15 days. This grouping should not have a big impact because: a. members who are in hospital for more than two weeks are rare (about one per cent of members); b. the evaluation metric favors algorithms that accurately predict fewer days in hospital (on the assumption that these are more preventable).
Thanked by Eu Jin Lok , ogenex , and Ken L.
 
ogenex's image Posts 10
Joined 10 Jan '11 Email user
Excellent, thanks for the clarification. Glad I continued to monitor this thread otherwise may have missed it. Interesting implication for the range of predicted values, I guess it is nice to have a cap on hospital duration.
 

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?