# Can I boost your score on my way out?

« Prev
Topic
» Next
Topic
<12>
 Posts 253 Thanks 4 Joined 5 Aug '10 Email user Thanks One question that you did not reply. How do you estimate length of stay for claims when SupLos=1 that tells that we do not know the length of stay? #16 / Posted 21 months ago
 Rank 2nd Posts 58 Thanks 46 Joined 6 Apr '11 Email user Uri, I don't do anything special with length of stay for those claims. I include a variable for number of supLos claims, and I hope that offsets the measurement error problem for observations with supLos claims. If computing power were unlimited, you could break down SupLos for each person/year observation by the type of claim it is associated with. The regression algebra would work out so the coefficient on the "days" variables would be the effect for non-supLos claims, and ((days*coefficient on days)+)# supLos claims * coefficient on supLos claims)) would be the estimate of that category for supLos claims. Given that there are something like 11,000 supLos claims, I haven't put in the time to drill down too far into this. I think including a count of supLos claims for each person/year combo is a good first step. #17 / Posted 21 months ago
 Posts 253 Thanks 4 Joined 5 Aug '10 Email user The following seems to give no variables or almost no variables Counts for Inpatient claims in each primary condition group with at least 50,000 claims and 5,000 in-patient claims. If I look at only one year I found no primary condition group that has 5000 in patient claims and if I count all the years I found only 2 primary condition groups namely:ARTHSPIN and GIBLEED I wonder if I have a bug or other also get the same result. #18 / Posted 21 months ago
 Rank 2nd Posts 58 Thanks 46 Joined 6 Apr '11 Email user I apologize Uri. Looking at my code, I created inpatient * pcg for every primary condition group. The 50,000-5000 rule was for interaction variables between primary condition group and procedure group. I created pcgcondition for every pcg/procedure combination with 5,000 claims in the pcg/procedure intersection, and 50,000 for that pcg that are not that procedure. Some of those interaction terms didn't turn out to be very predictive, so I cut some of them on an ad-hoc basis. This left the pcg-procedure combination combos in the bottom of my orignal list. But there were pcginpatient combos for everything. Thanked by John #19 / Posted 21 months ago
 Posts 2 Thanks 1 Joined 10 Oct '11 Email user DanB wrote: I used the fitted values from that regression as an index of predicted health usage. I ran a very simple non-parametric estimator to map the index to predictions that minimizes rmsle. Dan, can you please specify which estimator you used and how you used it? Thanks! #20 / Posted 18 months ago
 Rank 2nd Posts 58 Thanks 46 Joined 6 Apr '11 Email user The non-parametric (2nd step) estimator was something I wrote myself.  I don't think it has a name, but it's vaguely similar to an m-estimator.  In the end, it was a bunch of extra code, and I don't think it had any significant advantage over LOESS or an m-estimator.   That estimator is just creating a function f such that f(index) minimized rmsle.  If I were starting over, I'd run OLS in the first stage with a ton of variables (like I did), and then use LOESS or an m-estimator in the 2nd stage.  You should use log(1+daysinhospital) rather than daysinhospital as the realized values you are try to match in the 2nd stage.   If you use my general strategy, finding good explanatory variables in the first stage will make a much bigger difference than choosing the "optimal" non-parametric estimator for the 2nd stage.     Whether you use the estimator I wrote, LOESS or an m-estimator won't affect your score materially.  So why not start with LOESS, which is widely available and relatively easy to understand. To recap, I suggest 1) Throwing a bunch of powerful explanatory variables into OLS in the first stage.   2) Make predictions, which I will call "index." 3) Use LOESS to create a function f that gives a good fit to log(1+daysinhospital)=f(index) 4) Apply your OLS parameters and LOESS function f to the target data 5) Submit 6) Find better explanatory variables, and repeat. Thanked by LeoB1 #21 / Posted 17 months ago
<12>