Entrants are welcome to use other data to develop and test their algorithms and entries until 11:59:59 UTC on April 4, 2012 if the data are (i) freely available to all other Entrants’ and (i) published (or a link provided) to the data in the “External Data” on this Forum topic within one (1) week of an entry submission using the other data. Entrants may not use any data other than the Data Sets after 11:59:59 UTC on April 4, 2012 without prior approval.
External Data
» NextTopic
|
Thanks 72 Joined 20 Jan '10 Email user |
|
|
Thanks 50 Joined 1 Jul '10 Email user |
|
|
Thanks 58 Joined 13 Oct '10 Email user |
|
|
Joined 5 Apr '11 Email user |
|
|
Thanks 58 Joined 13 Oct '10 Email user |
|
|
Thanks 58 Joined 13 Oct '10 Email user |
|
|
Thanks 4 Joined 5 Aug '10 Email user |
I wonder how can you prove that people do not use other data to develop and test their algorithm. People may use other data but practically deny it and explain the constants that they use or the conditions that they check as a simple guess because even people who do not use other data guess and a doctor with some experience may simply guess better based on her(his) experience. |
|
Joined 12 Dec '10 Email user |
I am also curious to know what the answer ti Uri's question would be - since we're not required to provide any model equation. Of course it can be made required for the top performers at a later stage. If there is any other way of checking the use of external data, admin, can you please let us know? |
|
Thanks 4 Joined 5 Aug '10 Email user |
I do not use external data but I am certainly guessing things by try and error in the leaderboard because I have no way to test. The problem is that we have no set of people with data from year 1 to year 4 so we have no way to test models of how to predict year 4 based on year 1,2,3. We can try to predict year 3 based on year 1 and year 2 data but it is clearly a different problem than predicting year 4 based on years 1,2,3. Data about different people(that we do not need to predict) for years 1,2,3,4 certainly can help. |
|
Joined 7 Apr '11 Email user |
|
|
Thanks 15 Joined 26 Aug '10 Email user |
@Jeremy. I just ran a number of queries on HCUPnet and downloaded the results in spreadsheet format. At no point was I presented with any terms to accept and I can find no mention of limitations on the use or redistribution of the data on the site. What I could find was... "You can purchase many HCUP databases to do more detailed analyses not possible through HCUPnet". and "Many of the databases that are featured in HCUPnet can be purchased through the HCUP Central Distributor or from the States. If you find that HCUPnet does not answer all your questions, or you need more sophisticated statistics, then you may wish to purchase the databases and do your own analysis... You will need to complete an application form and sign a data use agreement before purchasing your data." This suggests that the aggregate data available through the site is entirely free to use in any way. Are you in a position to confirm that this is so and that we can use it in this contest? Despite its limitations, it looks extremely useful! |
|
Thanks 15 Joined 26 Aug '10 Email user |
FYI, I have just submitted the following query to AHRQ, who manage HCPUnet: "Hello. I cannot find on your websites any information concerning limitations on the use of the aggregare data that is freely available through the HCUPnet query interface. Can I ask you to clarify: Are there any limitations and, if so, where are these described? Or are any tables of figures produced by the free query interface essentially in the public domain?" |
|
Thanks 58 Joined 13 Oct '10 Email user |
According to HCUPNet, "It is the responsibility of the user to contact and obtain the needed copyright permissions prior to reproducing materials in any form" (http://www.ahrq.gov/news/gdlcopyr.htm ). So I think we should wait for you to receive a reply from your query to HCUPNet - or directly contact the copyright owner of the data you wish to use. |
|
Thanks 15 Joined 26 Aug '10 Email user |
That link would seem to apply to "clinical practice guidelines" only, which I do not think are anything to do with the databases. At any rate, I received this today, in direct response to the email which I copied above... "Dear Mr. Washtell: Thank you for your e-mail and interest in HCUPnet. Your e-mail was forwarded to the HCUP User Support inbox. Information obtained through HCUPnet is considered public information and no special permission is required to publish the statistics. We do, however, request that you source the information with the appropriate citation. Recommendations for citing are located on the HCUP-US Website at http://hcup-us.ahrq.gov/tech_assist/citations.jsp. If you have any additional questions, please contact User Support at this address. Sincerely, HCUP User Support" Can I take it from this then that I/we can build models using this data as long as I/we post a copy of the data (and/or sufficient information for other users to generate the exact same data through the HCUPnet interface) on here - along with the requisite citation of course. I can forward the actual email I received to Kaggle/HPN if necessary. |
|
Thanks 15 Joined 26 Aug '10 Email user |
Further... Dear Mr. Washtell: I am responding to your inquiry on behalf of Randie Siegel, AHRQ's associate director for publishing and electronic dissemination. You want to know about limitations to publishing research based on HCUPnet data. The tables produced by HCUPnet are in the public domain, but source citation is greatly appreciated and encouraged. As far as I can tell, no data use agreement is needed. There is a page on the HCUP Web site that addresses the issue of publication requirements, “Requirements for Publishing Results with HCUP Data” (http://www.hcup-us.ahrq.gov/db/publishing.jsp). I see from the description page about HCUPnet, that HCUPnet is programmed to automatically abide by the privacy rules set down for using any of the HCUP databases. Otherwise, the publishing requirements page links to a page of suggested citations (http://www.hcup-us.ahrq.gov/tech_assist/citations.jsp), with a section on citing HCUPnet: Citing HCUPnet: First list HCUPnet, then HCUP, followed by the appropriate data years, and then AHRQ and the related Web link. Lastly, include the date of access. Consider the following example: If you still have questions about using HCUPnet data, contact HCUP User Support via email (hcup@ahrq.gov). Sincerely, |
|
Posts 144 Thanks 21 Joined 27 Jan '11 Email user |
|
|
Thanks 7 Joined 5 Sep '11 Email user |
"Prognostic Indices" - Study behind paywall: http://jama.ama-assn.org/content/307/2/182.short Graphic: http://www.eprognosis.org/ (Note: charlson index not mentioned) NYT summary: http://well.blogs.nytimes.com/2012/01/19/why-doctors-cant-predict-how-long-a-patient-will-live/?ref=health
|
|
Joined 7 Jan '12 Email user |
|
|
Thanks 58 Joined 13 Oct '10 Email user |
|
|
Posts 8 Thanks 1 Joined 1 Nov '11 Email user |
I am sharing an external data source that might be useful for this competition. It is attached with this post. It is the 2010 census data that is publicly available from the census.gov website http://www.census.gov/prod/cen2010/briefs/c2010br-03.pdf. I just contacted the census.gov, what I was told is as follows: "Census data is "public domain", you do not need our permission to use it, copy it, publish it, or cite it." 1 Attachment — |
|
Thanks 6 Joined 5 Apr '11 Email user |
http://www.heritageprovidernetwork.com/?p=medical-groups
~jba |
|
Thanks 106 Joined 21 Nov '10 Email user |
JeremyA wrote: http://www.heritageprovidernetwork.com/?p=medical-groups
~jba
Yes, anything other than the data sets provided with the competition are "external data." |
|
Joined 18 Mar '11 Email user |
In section "7. USE OF OTHER DATA" of the rules it states: "You may not, however, link the Data Sets to records in other external databases such that new demographic, socioeconomic or clinical information about the members in the Data Sets is gained." Is a concise definition available for what exactly constitutes demographic, socioeconomic, and clinical information in the context of this sentence? thanks |
|
Thanks 6 Joined 5 Apr '11 Email user |
G wrote: In section "7. USE OF OTHER DATA" of the rules it states: "You may not, however, link the Data Sets to records in other external databases such that new demographic, socioeconomic or clinical information about the members in the Data Sets is gained." Is a concise definition available for what exactly constitutes demographic, socioeconomic, and clinical information in the context of this sentence? thanks
Do the two links I've provided fall under this rule? The avg LoS for California as well as the in-service provider info from the Hertiage Health wesite certainly qualify as "new demographic, socioeconomic or clinical information about the members", just not for the purposes of 'Patient Identification/Privacy'; which is what I thought the rule was geared towards...?
Thanks in Advance, ~jba |
|
Thanks 106 Joined 21 Nov '10 Email user |
|
|
Thanks 106 Joined 21 Nov '10 Email user |
|
|
Thanks 6 Joined 5 Apr '11 Email user |
|
|
Joined 28 Nov '11 Email user |
|
|
Thanks 106 Joined 21 Nov '10 Email user |
Yes, you should post links here.
|
|
Joined 28 Nov '11 Email user |
Thanks a lot for the reply! Here are some of the external data sets I'm planning to use: Pubmed: http://www.ncbi.nlm.nih.gov/pubmed/ datasets on LODD : http://www.w3.org/wiki/HCLSIG/LODD/Data ICD9 data: http://www.cdc.gov/nchs/icd/icd9.htm Mortality statistics: http://www.cdc.gov/nchs/deaths.htm Disease ontology: http://do-wiki.nubic.northwestern.edu/index.php/Main_Page
|
|
Thanks 106 Joined 21 Nov '10 Email user |
|
|
Joined 11 Oct '11 Email user |
Hi Kaggle Admins, census.gov was already mentioned in this thread… I’m thinking about use of other external data from that source with a social-economic dimension. Like the data linked from that document: http://www.census.gov/hhes/www/income/income.html Would it be ok to integrate that data in my models? -theafh |
|
Thanks 106 Joined 21 Nov '10 Email user |
|
|
Joined 22 Feb '12 Email user |
Hi, We are planning to leverage the following data and information which is free to the public: http://www.cdc.gov/nchs/data/nvsr/nvsr59/nvsr59_09.pdf ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHDS/NHDS_2009_Documentation.pdf http://www.cdc.gov/nchs/data/nvsr/nvsr60/nvsr60_04.pdf http://www.statehealthfacts.org http://www.cdc.gov/nchs/fastats/hospital.htm Thanks! |
|
Joined 8 Jun '12 Email user |
Hi Admins,
Cheers! |
|
Thanks 106 Joined 21 Nov '10 Email user |
|
|
Joined 13 Jun '12 Email user |
|
|
Thanks 106 Joined 21 Nov '10 Email user |
David Gainer wrote: I'm just starting with the contest. Did any external data sources get approved? It didn't look like it from this forum, but I wanted to be sure.
Prior to April 4, 2012 external data didn't need approval (as long as all of the conditions in the rules were satisfied). That's why you see people posting it here (without approval). After that date, external data requires approval, which is unlikely to happen. |
|
Joined 27 Jul '12 Email user |
|
|
Thanks 106 Joined 21 Nov '10 Email user |
|
|
Joined 25 Aug '12 Email user |
Hi David, I have the same question as Mercicle and have read through the whole forum. It think it is a little disorganized as far as a means of declaring which external data people are using and what has been approved. I am sure I can pick through it and pull what I think fits the bill out. I do have a couple of questions: 1) If I do not see that a Kaggle admin has explicitly said not to use a posted source then it is fair game? This is assuming you all have actually checked the sources out at this point. Don't get me wrong, I will check them myself but I wanted to see if this assumption was correct. 2) I see there was a reply posted to theafh about the Census Bureau data that was never fully confirmed and it was stated that it most likely cannot be used. This is a little confusing because the rules just say "You may not, however, link the Data Sets to records in other external databases such that new demographic, socioeconomic or clinical information about the members in the Data Sets is gained. " But, Census Bureau data is anonymous and should not give insight into demographic, socioeconomic or clinical information about an individual member. I would think this is to cover the privacy of any inidividuals in the data but maybe you do mean it to cover people as a whole?
|
|
Thanks 106 Joined 21 Nov '10 Email user |
(1) According to Rule 7, you don't need special permission for external data, as long as you satisfy the requirements. In some cases, we've clarified that certain external data isn't allowed. If there are particular ones you're still wondering about, feel free to ask. (2) It's a good point, but I guess the sponsor just wanted to be totally safe. |
|
Posts 12 Joined 18 Sep '12 Email user |
Becky, was all of this information you listed approved? It says posted 6 months ago (not the exact date), which is right around the deadline...so I'm not sure whether it's usable or not. I am also new to the competition, so still figuring out how things work. Like others, I tend to think it would be nice if someone could summarize all of the approved info that made it in before the deadline... I guess someone could try to go through and compile it, then double check with others and or the admins to verify everything is approved and nothing is missing. I might give that a shot later.
|
|
Thanks 1 Joined 21 Aug '11 Email user |
|
|
Joined 4 Feb '12 Email user |
From some of the links here, it seems that people are trying to link up publicly available provider-specific and hospital-specific information with the HPN data. I have two questions: 1) Is this legal, according to the rules? I know the rules explicitly ban trying to match up patient data 2) Can anyone share how they are matching up this data, since the provider ids are masked? Thanks |
Reply
Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?

with —