Can you provide days in Hospital in Y1?

« Prev
Topic
» Next
Topic
<12>
NeoStrata's image Posts 2
Joined 11 Nov '11 Email user

Dear, Anthony Goldbloom,

Firstly; thanks for a great initiative, Kaggle is awesome !


We belive that Y1DaysInHospital is critically important, as it then becomes possible to

A:

train on forecasting : Y2 with: claimsY1 and Y1DaysInHospitalY1 and
train on forecasting : Y3 with: claimsY2 and Y1DaysInHospitalY2
To forecast               : Y4 with: claimsY3 and Y1DaysInHospitalY3

And

B:
to train on forecasting :Y3 with :

claimsY1 and Y1DaysInHospitalY1 and
claimsY2 and Y1DaysInHospitalY2

To forecast             :Y4 with:

claimsY2 and Y1DaysInHospitalY2 and
claimsY3 and Y1DaysInHospitalY3
 

So without Y1DaysInHospital the whole Option B. falls away, and option A has only half the complete data to train on.

We feel that without this information, it will be hard if not impossible to construct at truly good model that can have real use for the sponsors,
and might make reaching the .4 mark unachievable.

 
DavidChudzicki's image
DavidChudzicki
Kaggle Admin
Posts 424
Thanks 106
Joined 21 Nov '10 Email user
From Kaggle

Glad you're enjoying the contest. But unfortunately, there's no way to change anything about the data at this point.

 
NeoStrata's image Posts 2
Joined 11 Nov '11 Email user

Even more critical is the fact that this effectively shuts the competition of from time series modeling, cutting this option of by not making the data time series friendly. Sadly the days in hospital Y1 is the only data that would be needed to use the data as a full time series. It's hard to understand that the sponsors would wish that this competition should be flawed by not employing the vast tool sets from time series modeling, and by deliberately closing of this possibility, how can one then expect state of the art results ? Which one would think was the primary incentive.

Furhtermore the heedless reply from DavidChudzicki is contradictory to the reply of Anthony Goldbloom, who explained that it is indeed possible, albeit a hard decision as the limits of anonymization is already stretched, hence the waterbed example, but cutting of time series modeling tools is an extremely critical decision, and should perhaps be re-evaluated. Especially since full data for days in hospital Y2 and Y3 already exist, and are secured by the other ample anonymization strategies, this leads one to think that adding Y1 can't possibly do more harm than Y2 and Y3 already constituates.

 
AIC's image
AIC
Posts 1
Joined 26 Dec '11 Email user

@NeoStrata Not having Y1 DIH data is detrimental but you can make some intelligent assessments and weigh them. Sure its not perfect but all contestants are working with this handicap. I entirely side with David for dismissing any changes to the data out of hand. The last time Anthony mentioned some flexibility was 16 months prior to your post. The contest in nearing its end hence any changes now simply lead to too many problems.

 
<12>

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?