Missing values of ordered or numeric variables

« Prev
Topic
» Next
Topic
Zach's image Rank 31st
Posts 292
Thanks 64
Joined 2 Mar '11 Email user

How are people dealing with missing values of numeric or ordered variables, such as DSFS, or Length of Stay?  For now, I am recoding those missing values to zero, but I was wondering if there was a better solution.

 
Allan Engelhardt's image Posts 77
Thanks 29
Joined 28 May '10 Email user
I put LengthOfStay to zero and am looking to drop observations with missing DSFS but zero is a good value if you want to keep them.
 
arbuckle's image
arbuckle
HHP Advisor
Posts 38
Thanks 21
Joined 5 May '11 Email user
See impuation and alternatives: http://en.wikipedia.org/wiki/Imputation_(statistics)
 
José A. Guerrero's image Rank 19th
Posts 144
Thanks 21
Joined 27 Jan '11 Email user

then the preparation phase of the data, can include a script to assign values ​​to the missing values ​​that may be deductible or highly probable and correct errors that can be observed in the data from direct observation of these?

 
Signipinnis's image Posts 94
Thanks 25
Joined 8 Apr '11 Email user

Blind Ape wrote:

then the preparation phase of the data, can include a script to assign values ​​to the missing values ​​that may be deductible or highly probable and correct errors that can be observed in the data from direct observation of these?

For claims-based utilization such as this, it will be GENERALLY true that utilization is zero, unless there are one or more claims showing that utilization is more than zero.

By analogy, what was your avg cost of lunch at McDonalds yesterday, if you didn't eat at McDonalds and thus don't have a reciept ?

There are some exceptions however. For example, pharmacy utilization may be understated/underestimated if some of a person's prescriptions are available cheaper for cash at a retail outlet than the person's normal co-pay would be. In my area, several chains have a list of prescriptions that are available for $4, which I'm sure leaves an info hole in the records of insurors/payors. Such gaps do not necessarily cause predictive biases, however, particularly if you're looking at a commercially insured population. It would probably be more of a confusion factor if we had Medicaid+SelfPay+Commercial+Medicare all mixed together, but happily for us, we don't.

With plans that have an annual deductible amount, i.e. some amount ($500 or $1000 perhaps) that the covered person must pay first, before the insurance starts paying the rest of the bills during the year, some people have a tendency to sit on the claims they paid themselves, until they get close to satisfying the annual deductible. If they never get close to that limit, there's no economic benefit to them of filing the paperwork, and some never do. That's just more random noise in the system. (But it means we have less data on the people we would be projecting to have low utilization anyway.)

But as a sharp wit said in one of the previous posts somewhere on this forum, you can assume/impute anything you like, if it makes your predictions better.

HTH

 

 

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?