In their milestone 1 paper, the team "market makers" say the following on page 13:
For each Primary Care Physician (PCP), Vendor and Provider, a value was calculated that was the
probability that a patient associated with the entity would visit hospital. Each patient was then
allocated the highest probability of all the PCPs (Vendors or Providers) that they were associated
with, generating 3 fields in total.
I'm trying to replicate their methodology. Is this probability only calculated using the claims data, or is the actual days in hospital for the next year merged in as well? Here's my first shot at this. I've already read in the claims data, converted LengthOfStay to numeric, and replaced missing values with zero:
library(plyr)
Claims$visit <- as.numeric(Claims$LengthOfStay>0)
providerProbs <- ddply(Claims,c('ProviderID','Year'),function(x) c('prob'=mean(x$visit)), .progress='text')
The result is the percent of visits to a given provider that resulted in hospitalization:
> head(providerProbs[providerProbs$prob>0,],10)
ProviderID Year prob
29 12890 Y1 1.0000000
56 23379 Y1 0.5294118
57 23379 Y2 0.7222222
58 23379 Y3 0.4166667
97 40154 Y1 0.4285714
98 40154 Y2 0.5714286
99 40154 Y3 0.5000000
466 173881 Y1 1.0000000
467 173881 Y2 0.9375000
468 173881 Y3 0.5600000

Am I on the right track? Or should I be using the "DaysInHospital" table, rather than "LengthOfStay" in the claims table?
Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?

with —