# Missing values of ordered or numeric variables

 How are people dealing with missing values of numeric or ordered variables, such as DSFS, or Length of Stay?  For now, I am recoding those missing values to zero, but I was wondering if there was a better solution.
 I put LengthOfStay to zero and am looking to drop observations with missing DSFS but zero is a good value if you want to keep them.
 See impuation and alternatives: http://en.wikipedia.org/wiki/Imputation_(statistics)
 then the preparation phase of the data, can include a script to assign values ​​to the missing values ​​that may be deductible or highly probable and correct errors that can be observed in the data from direct observation of these?
 Blind Ape wrote: then the preparation phase of the data, can include a script to assign values ​​to the missing values ​​that may be deductible or highly probable and correct errors that can be observed in the data from direct observation of these?

For claims-based utilization such as this, it will be GENERALLY true that utilization is zero, unless there are one or more claims showing that utilization is more than zero. By analogy, what was your avg cost of lunch at McDonalds yesterday, if you didn't eat at McDonalds and thus don't have a reciept ?

There are some exceptions however. For example, pharmacy utilization may be understated/underestimated if some of a person's prescriptions are available cheaper for cash at a retail outlet than the person's normal co-pay would be. In my area, several chains have a list of prescriptions that are available for \$4, which I'm sure leaves an info hole in the records of insurors/payors.

Such gaps do not necessarily cause predictive biases, however, particularly if you're looking at a commercially insured population. It would probably be more of a confusion factor if we had Medicaid+SelfPay+Commercial+Medicare all mixed together, but happily for us, we don't.

With plans that have an annual deductible amount, i.e. some amount (\$500 or \$1000 perhaps) that the covered person must pay first, before the insurance starts paying the rest of the bills during the year, some people have a tendency to sit on the claims they paid themselves, until they get close to satisfying the annual deductible. If they never get close to that limit, there's no economic benefit to them of filing the paperwork, and some never do. That's just more random noise in the system. (But it means we have less data on the people we would be projecting to have low utilization anyway.)

But as a sharp wit said in one of the previous posts somewhere on this forum, you can assume/impute anything you like, if it makes your predictions better.