So, I figure we should have a thread for sharing thoughts/ideas about how we're getting good prediction results.
Of course, no one wants to give away the secret edge that's going to win them the prizes :-) But there are clearly also going to be a range of 'standard' ideas that everyone will end up figuring out and using. If we pool them here on the forum, we can all benefit and get on with working on cleverer/sneakier approaches.
To put my money where my mouth is, here are some things I've learned so far:
Generating informative sets of features seems pretty important, straight off the bat. I've found the following features to be informative.
Sex, Age, nDaysInHosptial (previous year)
And from the claims data for the previous year:
total nClaims, nCharlsonIndex of each category, Counts of primary conditions, Counts of procedures, Counts of placeSvc, Counts of speciality
There may also be a benefit from also using the same features two years previous to thetarget values, but the effect seems pretty small.
(I feel like there's more one could do with the Claims data, but there are issues with large number os features)
Method-wise, I've started with simple linear regression (with stepwise feature selection). I'm pretty sure this is too restrictive to be useful, but it's very handy for data exploration. I'll be trying out some more interesting models in the near future.
I hope this is useful to people. If you would like to reciprocate, that would be awesome :-) And if this thread gets going, I'm happy to keep contributing my thoughts to it, as I think we'll all benefit from it.
*braces for deluge of useful responses*