<123>
DavidChudzicki's image
DavidChudzicki
Kaggle Admin
Posts 424
Thanks 106
Joined 21 Nov '10 Email user
From Kaggle

JeremyA, I'm sorry -- I think we have to say not to use it.

-David

 
theafh's image Posts 1
Joined 11 Oct '11 Email user

Hi Kaggle Admins,

census.gov was already mentioned in this thread… I’m thinking about use of other external data from that source with a social-economic dimension. Like the data linked from that document: http://www.census.gov/hhes/www/income/income.html

Would it be ok to integrate that data in my models?

-theafh

 
DavidChudzicki's image
DavidChudzicki
Kaggle Admin
Posts 424
Thanks 106
Joined 21 Nov '10 Email user
From Kaggle

Hi Theafh,

I'll have to look into it and get back to you within a week, but I fear the answer will be the same as for JeremyA's question.

Thanks,

David

 
Becky's image Posts 1
Joined 22 Feb '12 Email user

Hi,

We are planning to leverage the following data and information which is free to the public:

http://www.dartmouthatlas.org

http://www.cdc.gov/nchs/data/nvsr/nvsr59/nvsr59_09.pdf

 ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHDS/NHDS_2009_Documentation.pdf

 http://www.cdc.gov/nchs/data/nvsr/nvsr60/nvsr60_04.pdf

 http://www.statehealthfacts.org

http://www.cdc.gov/nchs/fastats/hospital.htm

www.ahrq.gov

Thanks!

 
Varun Mazumdar's image Posts 1
Joined 8 Jun '12 Email user

Hi Admins,

             I just wanted to know what I needed to do to get approval for the use of external data sets after the april 4th deadline. In addition if I have compiled data via automated data mining from published and freely available journal articles, must I provide links to each article, or just provide the compiled dataset ? It may be easier to provide the compiled dataset as the number of articles used would be huge.

Cheers!

 
DavidChudzicki's image
DavidChudzicki
Kaggle Admin
Posts 424
Thanks 106
Joined 21 Nov '10 Email user
From Kaggle

As a general rule, external data won't be approved after the deadline.

 
David Gainer's image Posts 1
Joined 13 Jun '12 Email user

Hi,

I'm just starting with the contest. Did any external data sources get approved? It didn't look like it from this forum, but I wanted to be sure.

Thanks,

David

 
DavidChudzicki's image
DavidChudzicki
Kaggle Admin
Posts 424
Thanks 106
Joined 21 Nov '10 Email user
From Kaggle

David Gainer wrote:

I'm just starting with the contest. Did any external data sources get approved? It didn't look like it from this forum, but I wanted to be sure.

Prior to April 4, 2012 external data didn't need approval (as long as all of the conditions in the rules were satisfied). That's why you see people posting it here (without approval).

After that date, external data requires approval, which is unlikely to happen.

 
Mercicle's image Posts 1
Joined 27 Jul '12 Email user

David,

Can you provide a final list of specific external data sources that we can use? Some people have listed websites which seems vague. If any data from any url posted before the deadline can be used that you can just verify this.

Thanks,

John

 
DavidChudzicki's image
DavidChudzicki
Kaggle Admin
Posts 424
Thanks 106
Joined 21 Nov '10 Email user
From Kaggle

If there are particular cases that aren't clear from questions & responses on the forum thread, can you ask about those specifically?

 
baller's image Posts 1
Joined 25 Aug '12 Email user

Hi David,

 I have the same question as Mercicle and have read through the whole forum. It think it is a little disorganized as far as a means of declaring which external data people are using and what has been approved. I am sure I can pick through it and pull what I think fits the bill out. I do have a couple of questions:

1) If I do not see that a Kaggle admin has explicitly said not to use a posted source then it is fair game? This is assuming you all have actually checked the sources out at this point. Don't get me wrong, I will check them myself but I wanted to see if this assumption was correct.

2) I see there was a reply posted to theafh about the Census Bureau data that was never fully confirmed and it was stated that it most likely cannot be used. This is a little confusing because the rules just say "You may not, however, link the Data Sets to records in other external databases such that new demographic, socioeconomic or clinical information about the members in the Data Sets is gained. " But, Census Bureau data is anonymous and should not give insight into demographic, socioeconomic or clinical information about an individual member. I would think this is to cover the privacy of any inidividuals in the data but maybe you do mean it to cover people as a whole?

 

 
DavidChudzicki's image
DavidChudzicki
Kaggle Admin
Posts 424
Thanks 106
Joined 21 Nov '10 Email user
From Kaggle

(1) According to Rule 7, you don't need special permission for external data, as long as you satisfy the requirements. In some cases, we've clarified that certain external data isn't allowed. If there are particular ones you're still wondering about, feel free to ask.

(2) It's a good point, but I guess the sponsor just wanted to be totally safe.

 
K-czar's image Rank 57th
Posts 12
Joined 18 Sep '12 Email user

Becky, was all of this information you listed approved? It says posted 6 months ago (not the exact date), which is right around the deadline...so I'm not sure whether it's usable or not.  I am also new to the competition, so still figuring out how things work.  Like others, I tend to think it would be nice if someone could summarize all of the approved info that made it in before the deadline... I guess someone could try to go through and compile it, then double check with others and or the admins to verify everything is approved and nothing is missing.  I might give that a shot later.

Hi,

We are planning to leverage the following data and information which is free to the public:

http://www.dartmouthatlas.org

http://www.cdc.gov/nchs/data/nvsr/nvsr59/nvsr59_09.pdf

ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHDS/NHDS_2009_Documentation.pdf

http://www.cdc.gov/nchs/data/nvsr/nvsr60/nvsr60_04.pdf

http://www.statehealthfacts.org

http://www.cdc.gov/nchs/fastats/hospital.htm

www.ahrq.gov

Thanks!

 
ADP's image
ADP
Posts 12
Thanks 1
Joined 21 Aug '11 Email user

None of my submissions to date have used external data. If another competitor has requested (or stated prior to the deadline) that they have used external data, and provided the source of the data, am I free to also use that data at my discretion?

 
Sajid Z's image Posts 4
Joined 4 Feb '12 Email user

From some of the links here, it seems that people are trying to link up publicly available provider-specific and hospital-specific information with the HPN data. I have two questions:

1) Is this legal, according to the rules? I know the rules explicitly ban trying to match up patient data

2) Can anyone share how they are matching up this data, since the provider ids are masked?

Thanks

 
<123>

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?