# Dataset release 3 issues

« Prev
Topic
» Next
Topic
 1 vote Here's a list of issues I've discovered so far with the new dataset: 1) In the Claims.csv file, some of the ID numbers have extra zeros in front compared with release 2, but there are no changes to any of the values. 2) The DaysInHospital_Y2.csv has one extra row for MemberID  24027423 3) The new DrugCount.csv file contains some entries which don't correspond with Claims.csv 210,Y3,7- 8 months,1 210,Y3,8- 9 months,1 210,Y1,4- 5 months,1 210,Y3,5- 6 months,2 A I understand it, there should be claims for MemberID 210 appearing in Claims.csv for those particular year and DSFC combinations, but they are missing. The other rows of DrugCount.csv have a corresponding claim in Claims.csv.  (Addendum) That last sentence isn't actually true. I've just looked at the data again, and there are a lot more MemberIDs where 3) applies. It turns out that every one of those MemberIDs have ClaimsTruncated=1 in the DaysInHospital csv files. I suppose this means that the claims were anonymized, but the drug count data associated with them wasn't. This raises a few questions: Is the drug count data complete for all members, or did DrugCount.csv  get anonymized as well? Either way, the missing claims can be used to obtain a better estimate of the true number of claims for those members :-) #1 | Posted 5 years ago | Edited 5 years ago Posts 23 | Votes 10 Joined 2 Dec '10 | Email User
 0 votes Ford Prefect wrote: 2) The DaysInHospital_Y2.csv has one extra row for MemberID  24027423 Thanks for all the observations: keep them coiming! #2 | Posted 5 years ago Posts 77 | Votes 29 Joined 28 May '10 | Email User
 0 votes I noted that in the LabCount.csv file LabCount has been maximized to 10+ and in the DrugCount.csv file DrugCount has been maximized to 7+ There is no information as to what laboratory tests were carried out. Also there is no information as to what Drugs were supplied. Question to HPN Administrator - Are the count or number of these services the only data to be provided for security reasons? Thanking you Jim #3 | Posted 5 years ago Posts 8 Joined 8 Apr '11 | Email User
 0 votes Quote from the data page: c. Labs Table, which will contain certain details of lab tests provided to members. d. RX Table, which will contain certain details of prescriptions filled by members. There are no details? just sums? Also lots of memberID&year combinations are missing in the 2 datasets. Does it mean that no lab tests or prescriptions are given that year for that memberID? #4 | Posted 5 years ago Posts 13 | Votes 8 Joined 8 Apr '11 | Email User
 1 vote @Kwaak The drug counts are all at least 1, so it makes sense that there are missing combinations - those claims would have 0 drugs prescribed. Same with the lab counts, only nonzero combinations are provided. If the lab counts and drug counts are complete, then it's ok to put zero counts in all the other claims, but if there was anonymization, then that may not be appropriate. #5 | Posted 5 years ago Posts 23 | Votes 10 Joined 2 Dec '10 | Email User
 1 vote I noticed that the Data page still says "HHP_release2.zip contains the latest files, so you can ignore HHP_release1.zip.". Shouldn't that be updated to refer to HHP_release3.zip obsoleting HHP_release2.zip? #6 | Posted 5 years ago Competition 10th | Overall 345th Posts 109 | Votes 77 Joined 5 Aug '10 | Email User
 0 votes Thanks Dave. The data description has been fixed. #7 | Posted 5 years ago Anthony Goldbloom Competition Admin Kaggle Admin Posts 439 | Votes 135 Joined 20 Jan '10 | Email User
 0 votes So according to Ford Prefect we should match the drug and lab data by memberid, year and dsfs to claims?? #8 | Posted 5 years ago Posts 84 | Votes 4 Joined 26 May '10 | Email User
 0 votes We have drugs where there are no claims, e.g. > drug[MemberID == "10002388" & Year == "Y1" & DSFS == "1- 2 months",]     MemberID Year        DSFS DrugCount[1,] 10002388   Y1 1- 2 months         2> claims[MemberID == "10002388" & Year == "Y1" & DSFS == "1- 2 months",]NULL data table Are they free drugs, drugs paid for on a previous (later?) claim, or is dispensing of drugs not considered a treatment for the purposes of the claims table? #9 | Posted 5 years ago Posts 77 | Votes 29 Joined 28 May '10 | Email User
 2 votes The date a person fills out a prescription for drugs won't necessarily correspond to the date in the claims. The drugs could be a refill, a recurring condition, or a "if it doesn't get better in a week fill in this prescription". #10 | Posted 5 years ago Posts 38 | Votes 21 Joined 5 May '11 | Email User
 0 votes @Dirk  Of course you're free to fit the data together any way you wish :) However, if you do try to match drugs and labs to claims, then every one of the lab counts can be attached to an existing claim, but some claims won't have lab counts. Maybe the lab counts have no relation to the claim other than they occurred at the same time, but maybe the claim information complements the lab counts. I don't know :) @Allan With drug counts, there are many cases where the drug count doesn't correspond to a claim in the dataset. There are 818241 drug counts, and assuming my code is correct I've identified 311958 (38%) instances which cannot be attached to an existing claim, whereas the remaining ones can. But all the 311958 instances have a MemberID where ClaimsTruncated=1 in the DaysInHospital file, check your example. My current theory is that these 311958 instances indicate phantom claims, ie anonymized claims we can't see but which generated a drug prescription. If we count the real claims and the phantom claims together, that's about 3 million claims, ie 10% more than the real claims alone. However that may be too simple. The comment by @arbuckle suggests trying to match phantom claims to existing claims that are possibly earlier. That won't work in your example, but might work in a number of other cases. #11 | Posted 5 years ago Posts 23 | Votes 10 Joined 2 Dec '10 | Email User
 0 votes Dear Anthony I couldn't find the RX table and Labs table in the third dataset. Were they removed recently? Thanks Z #12 | Posted 3 years ago Posts 1 Joined 12 Jun '12 | Email User