Can I boost your score on my way out?

« Prev
Topic
» Next
Topic
<12>
Uri Blass's image Posts 253
Thanks 4
Joined 5 Aug '10 Email user

Thanks
One question that you did not reply.
How do you estimate length of stay for claims when SupLos=1 that tells that we do not know the length of stay?

 
DanB's image Rank 2nd
Posts 58
Thanks 46
Joined 6 Apr '11 Email user

Uri,

I don't do anything special with length of stay for those claims. I include a variable for number of supLos claims, and I hope that offsets the measurement error problem for observations with supLos claims.

If computing power were unlimited, you could break down SupLos for each person/year observation by the type of claim it is associated with. The regression algebra would work out so the coefficient on the "days" variables would be the effect for non-supLos claims, and ((days*coefficient on days)+)# supLos claims * coefficient on supLos claims)) would be the estimate of that category for supLos claims.

Given that there are something like 11,000 supLos claims, I haven't put in the time to drill down too far into this. I think including a count of supLos claims for each person/year combo is a good first step.

 
Uri Blass's image Posts 253
Thanks 4
Joined 5 Aug '10 Email user

The following seems to give no variables or almost no variables

Counts for Inpatient claims in each primary condition group with at least 50,000 claims and 5,000 in-patient claims.

If I look at only one year I found no primary condition group that has 5000 in patient claims and if I count all the years I found only 2 primary condition groups
namely:ARTHSPIN and GIBLEED

I wonder if I have a bug or other also get the same result.

 
DanB's image Rank 2nd
Posts 58
Thanks 46
Joined 6 Apr '11 Email user

I apologize Uri. Looking at my code, I created inpatient * pcg for every primary condition group.

The 50,000-5000 rule was for interaction variables between primary condition group and procedure group. I created pcgcondition for every pcg/procedure combination with 5,000 claims in the pcg/procedure intersection, and 50,000 for that pcg that are not that procedure. Some of those interaction terms didn't turn out to be very predictive, so I cut some of them on an ad-hoc basis. This left the pcg-procedure combination combos in the bottom of my orignal list. But there were pcginpatient combos for everything.

Thanked by John
 
LeoB1's image Posts 2
Thanks 1
Joined 10 Oct '11 Email user

DanB wrote:

I used the fitted values from that regression as an index of predicted health usage. I ran a very simple non-parametric estimator to map the index to predictions that minimizes rmsle.

Dan, can you please specify which estimator you used and how you used it?

Thanks!

 
DanB's image Rank 2nd
Posts 58
Thanks 46
Joined 6 Apr '11 Email user

The non-parametric (2nd step) estimator was something I wrote myself.  I don't think it has a name, but it's vaguely similar to an m-estimator.  In the end, it was a bunch of extra code, and I don't think it had any significant advantage over LOESS or an m-estimator.  

That estimator is just creating a function f such that f(index) minimized rmsle.  If I were starting over, I'd run OLS in the first stage with a ton of variables (like I did), and then use LOESS or an m-estimator in the 2nd stage.  You should use log(1+daysinhospital) rather than daysinhospital as the realized values you are try to match in the 2nd stage.  

If you use my general strategy, finding good explanatory variables in the first stage will make a much bigger difference than choosing the "optimal" non-parametric estimator for the 2nd stage.    

Whether you use the estimator I wrote, LOESS or an m-estimator won't affect your score materially.  So why not start with LOESS, which is widely available and relatively easy to understand.

To recap, I suggest

1) Throwing a bunch of powerful explanatory variables into OLS in the first stage.  

2) Make predictions, which I will call "index."

3) Use LOESS to create a function f that gives a good fit to log(1+daysinhospital)=f(index)

4) Apply your OLS parameters and LOESS function f to the target data

5) Submit

6) Find better explanatory variables, and repeat.

Thanked by LeoB1
 
<12>

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?