There are about 38 thousand records in the Claims dataset that have one or more exact duplicates in all columns. Is this a data error? One possibility is, becaues we don't have the exact date, the same procedure has been done on multiple days of the same DSFS. With the limited number of columns that we have in the Claims data, it would be impossible to tell apart one claim from another if they are less than a month apart, and have everything else equal.
Any thoughts? Which is it, error in data or legitimate duplicates?