« Prev
Topic

HHP de-identification methods

» Next
Topic
kelemam's image
kelemam
HHP Advisor
Posts 17
Thanks 16
Joined 5 Apr '11

We are hosting a webinar on 6th March (noon EST) on how the HHP data set was de-identified. This will explain in detail the steps that were used to de-identify the data, the risk thresholds that were used, the rationale for the transformations applied, and the methods used. You can register here: http://ehil.ca/yopcCq

There is an article that will appear in the Journal of Medical Internet Research shortly providing more information and I will post that on-line as soon as it is available.

 
kelemam's image
kelemam
HHP Advisor
Posts 17
Thanks 16
Joined 5 Apr '11

And the full paper in JMIR describing the approach is now available here: http://ehil.ca/z3CcZ6

This will be presented during the abovementioned webinar. Note that the on-line appendix contains many of the technical details.

 
Signipinnis's image Posts 84
Thanks 23
Joined 8 Apr '11

Could be just me, but I couldn't get anything intelligible out of the link to the appendix.

Re "high-risk patients" .... that's "high-risk" based on legal/business/PR considerations, correct? Because other than that, it would seem (to amateur me) that the vulnerability of those classes of patients to attack would still be based on their calculated Equivalence Classes. But seemingly those were not calculated or considered. Just curious. (If I'm understanding correctly, I'd have done the same.)

 I especially look forward to the follow-up article in a few years comparing the performance of the final winning algorithms on the original data vs the de-identified data.

Thanks for sharing.

 
kelemam's image
kelemam
HHP Advisor
Posts 17
Thanks 16
Joined 5 Apr '11

We've asked the journal to fix this by re-posting the appendix file. It should be sorted out soon. I have attached it to this posting for now.

For "high risk" I am assuming you mean those that were removed at the outset. The main drivers were legal as there are additional restrictions on the disclosure of health information for specific patient groups, such as those diagnosed with HIV. Also, some diagnoses and procedures were considered very sensitive, such as abortions. We contacted a number of data custodians to get their list of sensitive procedures and diagnoses and ensured that we were consistent with these. Finally, from a re-identification risk perspective, some diagnoses are rare and visible. This means that such individuals do stand out in the community. That last group also had small equivalence classes in the data and would likely have been removed either way, but they were specifically targeted at the outset.

1 Attachment —
Thanked by Sarkis , and Signipinnis
 
kelemam's image
kelemam
HHP Advisor
Posts 17
Thanks 16
Joined 5 Apr '11

The recording of the webinar on the de-identification of the HHP data set, as well as the slide deck, are here:

http://www.youtube.com/watch?v=I0RkG3TybmQ&context=C4a75be6ADvjVQa1PpcFPardSV81YAWb8G4_lJh1r__IXsAH5b8xk=

https://www.ehealthinformation.ca/survey/webinarmar062012.aspx

 

 

 
Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?