I realize the provided data is ridiculously poor, and that's probably why the competition has to run so long, since every team has to come up with their own methods for handling this problem. Why wasn't the data cleaned up? That seems to be a more difficult problem than the supposed solution being sought.
In year 2 Member ID 41073844 appears to have LengthOfStay value between 4 and 8 weeks per month (never longer) and sometimes an additional 2-4 weeks. It's ridiculous.
Yes, I can handle the data in a manner I determine to best represent what it should have been. I've read the posts related to the quality of the data. But the real question is how does Heritage handle the data for year 4? Do we assume Heritage has correct numbers, and not junk, and if so, why give us junk for the prior years and distract from their goal? In calculating results, would Heritage have used the total of 1009 days for this one member in year two, and said my estimate is a little short? Gee, I thought they would only be in the hospital for 365 days.