# Calculating the probability that a patient associated with an entity will visit the hospital

« Prev
Topic
» Next
Topic
 Rank 31st Posts 292 Thanks 64 Joined 2 Mar '11 Email user In their milestone 1 paper, the team "market makers" say the following on page 13: For each Primary Care Physician (PCP), Vendor and Provider, a value was calculated that was the probability that a patient associated with the entity would visit hospital. Each patient was then allocated the highest probability of all the PCPs (Vendors or Providers) that they were associated with, generating 3 fields in total. I'm trying to replicate their methodology.  Is this probability only calculated using the claims data, or is the actual days in hospital for the next year merged in as well?  Here's my first shot at this.  I've already read in the claims data, converted LengthOfStay to numeric, and replaced missing values with zero: library(plyr)Claims$visit <- as.numeric(Claims$LengthOfStay>0)providerProbs <- ddply(Claims,c('ProviderID','Year'),function(x) c('prob'=mean(x$visit)), .progress='text') The result is the percent of visits to a given provider that resulted in hospitalization: > head(providerProbs[providerProbs$prob>0,],10) ProviderID Year prob29 12890 Y1 1.000000056 23379 Y1 0.529411857 23379 Y2 0.722222258 23379 Y3 0.416666797 40154 Y1 0.428571498 40154 Y2 0.571428699 40154 Y3 0.5000000466 173881 Y1 1.0000000467 173881 Y2 0.9375000468 173881 Y3 0.5600000 Am I on the right track?  Or should I be using the "DaysInHospital" table, rather than "LengthOfStay" in the claims table? #1 / Posted 16 months ago
 Rank 4th Posts 292 Thanks 113 Joined 22 Jun '10 Email user Hi Zach, I've just been reading what I wrote and it is not 100% clear is it! What is meant when I say   'probability that a patient associated with the entity would visit hospital' should probably be 'probability patients associated with the entity would have at least one day in hospital in the following year' So basically get all the patients who have visited a particular entity - get the DaysInHospital for those patients, rounding anything above 1 to 1. If you just then take an average of these DaysInHospital then that is the probability for that entity. It is nothing to do with the length of stay field in the claims data.  Thanked by Dipanjan #2 / Posted 16 months ago
 Rank 31st Posts 292 Thanks 64 Joined 2 Mar '11 Email user Great, thank you. What function did you use to compute probabilities (if it wasn't the mean)? #3 / Posted 16 months ago