Thanks
One question that you did not reply.
How do you estimate length of stay for claims when SupLos=1 that tells that we do not know the length of stay?
|
Thanks 4 Joined 5 Aug '10 Email user |
|
|
Posts 58 Thanks 46 Joined 6 Apr '11 Email user |
Uri, I don't do anything special with length of stay for those claims. I include a variable for number of supLos claims, and I hope that offsets the measurement error problem for observations with supLos claims. If computing power were unlimited, you could break down SupLos for each person/year observation by the type of claim it is associated with. The regression algebra would work out so the coefficient on the "days" variables would be the effect for non-supLos claims, and ((days*coefficient on days)+)# supLos claims * coefficient on supLos claims)) would be the estimate of that category for supLos claims. Given that there are something like 11,000 supLos claims, I haven't put in the time to drill down too far into this. I think including a count of supLos claims for each person/year combo is a good first step. |
|
Thanks 4 Joined 5 Aug '10 Email user |
The following seems to give no variables or almost no variables Counts for Inpatient claims in each primary condition group with at least 50,000 claims and 5,000 in-patient claims. If I look at only one year I found no primary condition group that has 5000 in patient claims and if I count all the years I found only 2 primary condition groups I wonder if I have a bug or other also get the same result. |
|
Posts 58 Thanks 46 Joined 6 Apr '11 Email user |
I apologize Uri. Looking at my code, I created inpatient * pcg for every primary condition group. The 50,000-5000 rule was for interaction variables between primary condition group and procedure group. I created pcgcondition for every pcg/procedure combination with 5,000 claims in the pcg/procedure intersection, and 50,000 for that pcg that are not that procedure. Some of those interaction terms didn't turn out to be very predictive, so I cut some of them on an ad-hoc basis. This left the pcg-procedure combination combos in the bottom of my orignal list. But there were pcginpatient combos for everything.
Thanked by
John
|
|
Thanks 1 Joined 10 Oct '11 Email user |
|
|
Posts 58 Thanks 46 Joined 6 Apr '11 Email user |
The non-parametric (2nd step) estimator was something I wrote myself. I don't think it has a name, but it's vaguely similar to an m-estimator. In the end, it was a bunch of extra code, and I don't think it had any significant advantage over LOESS or an m-estimator. That estimator is just creating a function f such that f(index) minimized rmsle. If I were starting over, I'd run OLS in the first stage with a ton of variables (like I did), and then use LOESS or an m-estimator in the 2nd stage. You should use log(1+daysinhospital) rather than daysinhospital as the realized values you are try to match in the 2nd stage. If you use my general strategy, finding good explanatory variables in the first stage will make a much bigger difference than choosing the "optimal" non-parametric estimator for the 2nd stage. Whether you use the estimator I wrote, LOESS or an m-estimator won't affect your score materially. So why not start with LOESS, which is widely available and relatively easy to understand. To recap, I suggest 1) Throwing a bunch of powerful explanatory variables into OLS in the first stage. 2) Make predictions, which I will call "index." 3) Use LOESS to create a function f that gives a good fit to log(1+daysinhospital)=f(index) 4) Apply your OLS parameters and LOESS function f to the target data 5) Submit 6) Find better explanatory variables, and repeat.
Thanked by
LeoB1
|
Reply
Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?

with —