 I have the same question as Mercicle and have read through the whole forum. It think it is a little disorganized as far as a means of declaring which external data people are using and what has been approved. I am sure I can pick through it and pull what I think fits the bill out. I do have a couple of questions:

1) If I do not see that a Kaggle admin has explicitly said not to use a posted source then it is fair game? This is assuming you all have actually checked the sources out at this point. Don't get me wrong, I will check them myself but I wanted to see if this assumption was correct.

2) I see there was a reply posted to theafh about the Census Bureau data that was never fully confirmed and it was stated that it most likely cannot be used. This is a little confusing because the rules just say "You may not, however, link the Data Sets to records in other external databases such that new demographic, socioeconomic or clinical information about the members in the Data Sets is gained. " But, Census Bureau data is anonymous and should not give insight into demographic, socioeconomic or clinical information about an individual member. I would think this is to cover the privacy of any inidividuals in the data but maybe you do mean it to cover people as a whole?

(1) According to Rule 7, you don't need special permission for external data, as long as you satisfy the requirements. In some cases, we've clarified that certain external data isn't allowed. If there are particular ones you're still wondering about, feel free to ask.

(2) It's a good point, but I guess the sponsor just wanted to be totally safe.

Becky, was all of this information you listed approved? It says posted 6 months ago (not the exact date), which is right around the deadline...so I'm not sure whether it's usable or not.  I am also new to the competition, so still figuring out how things work.  Like others, I tend to think it would be nice if someone could summarize all of the approved info that made it in before the deadline... I guess someone could try to go through and compile it, then double check with others and or the admins to verify everything is approved and nothing is missing.  I might give that a shot later.


We are planning to leverage the following data and information which is free to the public:









None of my submissions to date have used external data. If another competitor has requested (or stated prior to the deadline) that they have used external data, and provided the source of the data, am I free to also use that data at my discretion?

From some of the links here, it seems that people are trying to link up publicly available provider-specific and hospital-specific information with the HPN data. I have two questions:

1) Is this legal, according to the rules? I know the rules explicitly ban trying to match up patient data

2) Can anyone share how they are matching up this data, since the provider ids are masked?





