I realize the provided data is ridiculously poor, and that's probably why the competition has to run so long, since every team has to come up with their own methods for handling this problem. Why wasn't the data cleaned up? That seems to be a more difficult problem than the supposed solution being sought.

In year 2 Member ID 41073844 appears to have LengthOfStay value between 4 and 8 weeks per month (never longer) and sometimes an additional 2-4 weeks. It's ridiculous.

Yes, I can handle the data in a manner I determine to best represent what it should have been. I've read the posts related to the quality of the data. But the real question is how does Heritage handle the data for year 4? Do we assume Heritage has correct numbers, and not junk, and if so, why give us junk for the prior years and distract from their goal? In calculating results, would Heritage have used the total of 1009 days for this one member in year two, and said my estimate is a little short? Gee, I thought they would only be in the hospital for 365 days.

Since this patient's Place of Service is "Other" and they have 4-8wks for almost every DSFS, you can assume that they are in a nursing facility or some long term care facility and the LOS is different than inpatient length of stay.  Often times these types of care facilities bill the insurance company monthly.  So you would see multiple stays of length 1 month (4weeks) if they spend more than one month there.   

I am finally getting around to exploring this data after such a long time.  So I am not an expert on "this" data set but in my experience this is typical behavior.


