Here's a list of issues I've discovered so far with the new dataset:
1) In the Claims.csv file, some of the ID numbers have extra zeros in front compared with release 2, but there are no changes to any of the values.
2) The DaysInHospital_Y2.csv has one extra row for MemberID 24027423
3) The new DrugCount.csv file contains some entries which don't correspond with Claims.csv
210,Y3,7- 8 months,1
210,Y3,8- 9 months,1
210,Y1,4- 5 months,1
210,Y3,5- 6 months,2
A I understand it, there should be claims for MemberID 210 appearing in Claims.csv for those particular year and DSFC combinations, but they are missing. The other rows of DrugCount.csv have a corresponding claim in Claims.csv.
(Addendum) That last sentence isn't actually true. I've just looked at the data again, and there are a lot more MemberIDs where 3) applies. It turns out that every one of those MemberIDs have ClaimsTruncated=1 in the DaysInHospital csv files.
I suppose this means that the claims were anonymized, but the drug count data associated with them wasn't. This raises a few questions: Is the drug count data complete for all members, or did DrugCount.csv get anonymized as well? Either way, the missing claims can be used to obtain a better estimate of the true number of claims for those members :-)