<123>
_JeremyA's image
Posts 23
Thanks 6
Joined 5 Apr '11
Email User

http://www.heritageprovidernetwork.com/?p=medical-groups

http://www.calhospitalcompare.org/

I'll probably use the data located on these webpages as inputs at some point, I assume this is considered 'external data'?

~jba

 
DavidChudzicki's image
DavidChudzicki
Kaggle Admin
Posts 447
Thanks 107
Joined 21 Nov '10
Email User
From Kaggle

JeremyA wrote:

http://www.heritageprovidernetwork.com/?p=medical-groups

http://www.calhospitalcompare.org/

I'll probably use the data located on these webpages as inputs at some point, I assume this is considered 'external data'?

~jba

Yes, anything other than the data sets provided with the competition are "external data." 

 
G's image
G
Posts 1
Joined 18 Mar '11
Email User

In section "7. USE OF OTHER DATA" of the rules it states: "You may not, however, link the Data Sets to records in other external databases such that new demographic, socioeconomic or clinical information about the members in the Data Sets is gained."

Is a concise definition available for what exactly constitutes demographic, socioeconomic, and clinical information in the context of this sentence?

thanks

 
_JeremyA's image
Posts 23
Thanks 6
Joined 5 Apr '11
Email User

G wrote:

In section "7. USE OF OTHER DATA" of the rules it states: "You may not, however, link the Data Sets to records in other external databases such that new demographic, socioeconomic or clinical information about the members in the Data Sets is gained."

Is a concise definition available for what exactly constitutes demographic, socioeconomic, and clinical information in the context of this sentence?

thanks

Do the two links I've provided fall under this rule? The avg LoS for California as well as the in-service provider info from the Hertiage Health wesite certainly qualify as "new demographic, socioeconomic or clinical information about the members", just not for the purposes of 'Patient Identification/Privacy'; which is what I thought the rule was geared towards...?

Thanks in Advance,

~jba

 
DavidChudzicki's image
DavidChudzicki
Kaggle Admin
Posts 447
Thanks 107
Joined 21 Nov '10
Email User
From Kaggle
G-- I'm sorry, but that's what we have. I think we'll just have to figure out how that applies on a case-by-case basis. JeremyA-- We'll need to have a look at that data and think about it with HHN. I'll be sure to give a response by Friday next week (March 16). Thanks, David
 
DavidChudzicki's image
DavidChudzicki
Kaggle Admin
Posts 447
Thanks 107
Joined 21 Nov '10
Email User
From Kaggle

JeremyA-- I'm sorry. I'm still trying to find out what HHN thinks of this. I'll be in touch again as soon as I can.

 
_JeremyA's image
Posts 23
Thanks 6
Joined 5 Apr '11
Email User

Don't worry.  I anticipated it might elicit some difficulty.
And there's lots of time left in the competition.

~jba

 
Kno.e.sis's image
Posts 4
Joined 28 Nov '11
Email User

We haven't submitted any prediction model yet to the competition but I will get to it some time. However, I'm planning to use some external data sources. Please let me know if I should be posting links to these datasets here.

Thanks a lot in advance!
Pramod.

 
DavidChudzicki's image
DavidChudzicki
Kaggle Admin
Posts 447
Thanks 107
Joined 21 Nov '10
Email User
From Kaggle

Yes, you should post links here.

7. USE OF OTHER DATA

Entrants may use data other than the Data Sets to develop and test their Prediction Algorithms and Entries provided that (i) such data are freely available to all other Entrants and (ii) the data and/or a link to the data are published in the "External Data" topic in the Forums section of the Website within one (1) week of the date on which an Entry that uses such data is submitted to the Website. Entrants may not use new external data in connection with the development of their Entries after 11:59:59 UTC on April 4, 2012 without the prior written permission of Sponsor. Any third-party service provider, consultant or contractor of Sponsor that received or receives data or other information in connection with work performed for or on behalf of Sponsor may not use such data or other information in connection with the Competition.

You may not, however, link the Data Sets to records in other external databases such that new demographic, socioeconomic or clinical information about the members in the Data Sets is gained. Sponsor reserves the right in its sole discretion to disqualify any Entrant who Sponsor discovers has undertaken or attempted to undertake such linking of the Data Sets.

 
Kno.e.sis's image
Posts 4
Joined 28 Nov '11
Email User

Thanks a lot for the reply!

Here are some of the external data sets I'm planning to use:

Pubmed: http://www.ncbi.nlm.nih.gov/pubmed/ 

datasets on LODD : http://www.w3.org/wiki/HCLSIG/LODD/Data 

ICD9 data: http://www.cdc.gov/nchs/icd/icd9.htm 

Mortality statistics: http://www.cdc.gov/nchs/deaths.htm 

Disease ontology: http://do-wiki.nubic.northwestern.edu/index.php/Main_Page 

 
DavidChudzicki's image
DavidChudzicki
Kaggle Admin
Posts 447
Thanks 107
Joined 21 Nov '10
Email User
From Kaggle

JeremyA, I'm sorry -- I think we have to say not to use it.

-David

 
theafh's image
Posts 1
Joined 11 Oct '11
Email User

Hi Kaggle Admins,

census.gov was already mentioned in this thread… I’m thinking about use of other external data from that source with a social-economic dimension. Like the data linked from that document: http://www.census.gov/hhes/www/income/income.html

Would it be ok to integrate that data in my models?

-theafh

 
DavidChudzicki's image
DavidChudzicki
Kaggle Admin
Posts 447
Thanks 107
Joined 21 Nov '10
Email User
From Kaggle

Hi Theafh,

I'll have to look into it and get back to you within a week, but I fear the answer will be the same as for JeremyA's question.

Thanks,

David

 
Becky's image
Posts 1
Joined 22 Feb '12
Email User

Hi,

We are planning to leverage the following data and information which is free to the public:

http://www.dartmouthatlas.org

http://www.cdc.gov/nchs/data/nvsr/nvsr59/nvsr59_09.pdf

 ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHDS/NHDS_2009_Documentation.pdf

 http://www.cdc.gov/nchs/data/nvsr/nvsr60/nvsr60_04.pdf

 http://www.statehealthfacts.org

http://www.cdc.gov/nchs/fastats/hospital.htm

www.ahrq.gov

Thanks!

 
Varun Mazumdar's image
Posts 1
Joined 8 Jun '12
Email User

Hi Admins,

             I just wanted to know what I needed to do to get approval for the use of external data sets after the april 4th deadline. In addition if I have compiled data via automated data mining from published and freely available journal articles, must I provide links to each article, or just provide the compiled dataset ? It may be easier to provide the compiled dataset as the number of articles used would be huge.

Cheers!

 
DavidChudzicki's image
DavidChudzicki
Kaggle Admin
Posts 447
Thanks 107
Joined 21 Nov '10
Email User
From Kaggle

As a general rule, external data won't be approved after the deadline.

 
David Gainer's image
Posts 1
Joined 13 Jun '12
Email User

Hi,

I'm just starting with the contest. Did any external data sources get approved? It didn't look like it from this forum, but I wanted to be sure.

Thanks,

David

 
DavidChudzicki's image
DavidChudzicki
Kaggle Admin
Posts 447
Thanks 107
Joined 21 Nov '10
Email User
From Kaggle

David Gainer wrote:

I'm just starting with the contest. Did any external data sources get approved? It didn't look like it from this forum, but I wanted to be sure.

Prior to April 4, 2012 external data didn't need approval (as long as all of the conditions in the rules were satisfied). That's why you see people posting it here (without approval).

After that date, external data requires approval, which is unlikely to happen.

 
Mercicle's image
Posts 1
Joined 27 Jul '12
Email User

David,

Can you provide a final list of specific external data sources that we can use? Some people have listed websites which seems vague. If any data from any url posted before the deadline can be used that you can just verify this.

Thanks,

John

 
DavidChudzicki's image
DavidChudzicki
Kaggle Admin
Posts 447
Thanks 107
Joined 21 Nov '10
Email User
From Kaggle

If there are particular cases that aren't clear from questions & responses on the forum thread, can you ask about those specifically?

 
<123>

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?