# External Data

» Next
Topic
<123>
 DavidChudzicki Kaggle Admin Posts 424 Thanks 106 Joined 21 Nov '10 Email user JeremyA, I'm sorry -- I think we have to say not to use it. -David #31 / Posted 13 months ago
 Posts 1 Joined 11 Oct '11 Email user Hi Kaggle Admins, census.gov was already mentioned in this thread… I’m thinking about use of other external data from that source with a social-economic dimension. Like the data linked from that document: http://www.census.gov/hhes/www/income/income.html Would it be ok to integrate that data in my models? -theafh #32 / Posted 13 months ago
 DavidChudzicki Kaggle Admin Posts 424 Thanks 106 Joined 21 Nov '10 Email user Hi Theafh, I'll have to look into it and get back to you within a week, but I fear the answer will be the same as for JeremyA's question. Thanks, David #33 / Posted 13 months ago
 Posts 1 Joined 22 Feb '12 Email user Hi, We are planning to leverage the following data and information which is free to the public: Thanks! #34 / Posted 13 months ago
 Posts 1 Joined 8 Jun '12 Email user Hi Admins,  I just wanted to know what I needed to do to get approval for the use of external data sets after the april 4th deadline. In addition if I have compiled data via automated data mining from published and freely available journal articles, must I provide links to each article, or just provide the compiled dataset ? It may be easier to provide the compiled dataset as the number of articles used would be huge.  Cheers! #35 / Posted 11 months ago
 DavidChudzicki Kaggle Admin Posts 424 Thanks 106 Joined 21 Nov '10 Email user As a general rule, external data won't be approved after the deadline. #36 / Posted 11 months ago
 Posts 1 Joined 13 Jun '12 Email user Hi, I'm just starting with the contest. Did any external data sources get approved? It didn't look like it from this forum, but I wanted to be sure. Thanks, David #37 / Posted 11 months ago
 DavidChudzicki Kaggle Admin Posts 424 Thanks 106 Joined 21 Nov '10 Email user David Gainer wrote: I'm just starting with the contest. Did any external data sources get approved? It didn't look like it from this forum, but I wanted to be sure. Prior to April 4, 2012 external data didn't need approval (as long as all of the conditions in the rules were satisfied). That's why you see people posting it here (without approval). After that date, external data requires approval, which is unlikely to happen. #38 / Posted 11 months ago
 Posts 1 Joined 27 Jul '12 Email user David, Can you provide a final list of specific external data sources that we can use? Some people have listed websites which seems vague. If any data from any url posted before the deadline can be used that you can just verify this. Thanks, John #39 / Posted 9 months ago
 DavidChudzicki Kaggle Admin Posts 424 Thanks 106 Joined 21 Nov '10 Email user If there are particular cases that aren't clear from questions & responses on the forum thread, can you ask about those specifically? #40 / Posted 9 months ago
 Posts 1 Joined 25 Aug '12 Email user Hi David,  I have the same question as Mercicle and have read through the whole forum. It think it is a little disorganized as far as a means of declaring which external data people are using and what has been approved. I am sure I can pick through it and pull what I think fits the bill out. I do have a couple of questions: 1) If I do not see that a Kaggle admin has explicitly said not to use a posted source then it is fair game? This is assuming you all have actually checked the sources out at this point. Don't get me wrong, I will check them myself but I wanted to see if this assumption was correct. 2) I see there was a reply posted to theafh about the Census Bureau data that was never fully confirmed and it was stated that it most likely cannot be used. This is a little confusing because the rules just say "You may not, however, link the Data Sets to records in other external databases such that new demographic, socioeconomic or clinical information about the members in the Data Sets is gained. " But, Census Bureau data is anonymous and should not give insight into demographic, socioeconomic or clinical information about an individual member. I would think this is to cover the privacy of any inidividuals in the data but maybe you do mean it to cover people as a whole? #41 / Posted 8 months ago
 DavidChudzicki Kaggle Admin Posts 424 Thanks 106 Joined 21 Nov '10 Email user (1) According to Rule 7, you don't need special permission for external data, as long as you satisfy the requirements. In some cases, we've clarified that certain external data isn't allowed. If there are particular ones you're still wondering about, feel free to ask. (2) It's a good point, but I guess the sponsor just wanted to be totally safe. #42 / Posted 8 months ago
 Rank 57th Posts 12 Joined 18 Sep '12 Email user Becky, was all of this information you listed approved? It says posted 6 months ago (not the exact date), which is right around the deadline...so I'm not sure whether it's usable or not.  I am also new to the competition, so still figuring out how things work.  Like others, I tend to think it would be nice if someone could summarize all of the approved info that made it in before the deadline... I guess someone could try to go through and compile it, then double check with others and or the admins to verify everything is approved and nothing is missing.  I might give that a shot later. Hi, We are planning to leverage the following data and information which is free to the public: Thanks! #43 / Posted 7 months ago
 Posts 12 Thanks 1 Joined 21 Aug '11 Email user None of my submissions to date have used external data. If another competitor has requested (or stated prior to the deadline) that they have used external data, and provided the source of the data, am I free to also use that data at my discretion? #44 / Posted 5 months ago
 Posts 4 Joined 4 Feb '12 Email user From some of the links here, it seems that people are trying to link up publicly available provider-specific and hospital-specific information with the HPN data. I have two questions: 1) Is this legal, according to the rules? I know the rules explicitly ban trying to match up patient data 2) Can anyone share how they are matching up this data, since the provider ids are masked? Thanks #45 / Posted 3 months ago
<123>

## Reply

Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?