Call to Boycott Heritage Health Prize

« Prev
Topic
» Next
Topic
Uri Blass's image Posts 253
Thanks 4
Joined 5 Aug '10 Email user
I wonder what happens if somebody who does not participate in the competition get the data from one of the participants and publish research based on the data. Of course the participant who gave him the data is quilty but the sponsors is going to have a problem to find him(considering the fact that many people compete and the quilty person may not even be one of the participants but a hacker who stole the data from him). Can the sponsor do something against the researcher that can claim that he did not agree not to use the data for research purposes and what exactly the sponsor can do about it? Do you know if something similiar happened in the past? I am not a lawyer and I would like to know what can practically happen in this case based on the experience of the past.
 
arbuckle's image
arbuckle
HHP Advisor
Posts 38
Thanks 21
Joined 5 May '11 Email user
I think you would be complicit. If it’s illegal to download pirated movies, it must be the same with this data.
 
Justin Washtell's image Posts 48
Thanks 15
Joined 26 Aug '10 Email user

Perhaps the data providers could use some kind of steganographic or watermarking approach to uniquely associate each downloaded version of future datasets with the user who downloaded it - at the cost of introducing a very small amount of noise to the data. In order to try and remove any traces of the identification, many users would have to work together, and this kind of activity would be strictly against the rules. For one person working alone to be *sure* that they had removed it, the data would have to be significantly "dumbed-down", rendering it of less interest to third parties.

Just an idea. Probably all sorts of problems with that in practice. Perhaps not least that the data has already been "significantly dumbed-down".

 
ijvaughn's image Posts 6
Joined 6 May '11 Email user

Uri Blass wrote:

I wonder what happens if somebody who does not participate in the competition get the data from one of the participants and publish research based on the data. Of course the participant who gave him the data is quilty but the sponsors is going to have a problem to find him(considering the fact that many people compete and the quilty person may not even be one of the participants but a hacker who stole the data from him). Can the sponsor do something against the researcher that can claim that he did not agree not to use the data for research purposes and what exactly the sponsor can do about it? Do you know if something similiar happened in the past? I am not a lawyer and I would like to know what can practically happen in this case based on the experience of the past.

 

I wasn't advocating this path.  What I meant was, where can researchers obtain relevant data similar to the Heritage Prize data, but without the ridiculous licensing restrictions.  Perhaps someone with more to gain, like an insurance company, would release a properly anonymized dataset with little to no IP restrictions.

 
Jeremy Howard (Kaggle)'s image Posts 166
Thanks 58
Joined 13 Oct '10 Email user
From Kaggle
The rules and the FAQ explain that you can use the data for research purposes, as long as you first get permission from HPN. So if you're interested in doing research, please just ask!
 
ijvaughn's image Posts 6
Joined 6 May '11 Email user

Jeremy Howard (Kaggle) wrote:

The rules and the FAQ explain that you can use the data for research purposes, as long as you first get permission from HPN. So if you're interested in doing research, please just ask!

This is (probably, as I won't agree to the current terms so haven't actually looked at it yet) a good test set for various algorithms.  Sure, I could ask, but my main point was the following :

The terms and license agreement as they currently stand are ambiguous and leave open the possiblity of my research, including the algorithm that I am developing for a different purpose, being transfered to Heritage forcefully in court.  This is due to

1) the term "ALGORITHM" not being properly defined in the standard industry sense

2) contradictory and amiguous terms about what constitutes the "ALGORITHM" and what exactly the exclusive, non-transferable license that we are granting Heritage is actually for.  Is it for the specific parameters and set of software, applied to the data, to obtain the score?  This would seem the most reasonable, but the wording allows for any developed software used on the data to be granted to Heritage.

 
Signipinnis's image Posts 94
Thanks 25
Joined 8 Apr '11 Email user

ijvaughn wrote:

The terms and license agreement as they currently stand are ambiguous and leave open the possiblity of my research, including the algorithm that I am developing for a different purpose, being transfered to Heritage forcefully in court. 

 

If you win, and think your "algorithm" is just too darn special to share .... then don't submit the required documentation with the stated time limits ... that disqualifies you.

 

Game ON !

 

 
Jeremy Howard (Kaggle)'s image Posts 166
Thanks 58
Joined 13 Oct '10 Email user
From Kaggle

ijvaughn wrote:

The terms and license agreement as they currently stand are ambiguous and leave open the possiblity of my research, including the algorithm that I am developing for a different purpose, being transfered to Heritage forcefully in court.  This is due to

1) the term "ALGORITHM" not being properly defined in the standard industry sense

2) contradictory and amiguous terms about what constitutes the "ALGORITHM" and what exactly the exclusive, non-transferable license that we are granting Heritage is actually for.

The license is extremely specific and uses terms that are given an exact definition. The license is to the: "Entry and Prediction Algorithm used to produce the Entry". "Entry" is defined as: "the data submitted in the manner and format specified on the Website via the Website on Entry form". "Prediction Algorithm" is defined as "the algorithm used to produce the data in an Entry taken as a whole (i.e., its particular total configuration) but does not include individual components of the Prediction Algorithm or tools used for analysis or development of the Prediction Algorithm".

This is an extremely competitor-friendly definition - only the actual final configuration of parameters, inputs, etc taken as a whole is being transferred to HPN. The document both explicitly says it's the "particular total configuration" that's being covered, and even goes so far as to remove any possible ambiguity but explicitly stating that all component parts are not covered.

Thanked by Eric Jackson
 
alexanderr's image Posts 42
Thanks 2
Joined 5 Apr '11 Email user
If people want to develop algorithms to benefit themselves in future they should not do this competition.This competitition is about helping people get better healthcare not helping competitors to further their careers.I would try just as hard to win if there was no prize money at all.Apart from anything else this is an interesting problem to solve and not an easy one.
 
Justin Washtell's image Posts 48
Thanks 15
Joined 26 Aug '10 Email user

alexanderr wrote:

If people want to develop algorithms to benefit themselves in future they should not do this competition.This competitition is about helping people get better healthcare not helping competitors to further their careers.I would try just as hard to win if there was no prize money at all.Apart from anything else this is an interesting problem to solve and not an easy one.

You've given at least two good reasons for participating. I would observe that Kaggle is about furthering transferable technology as much as it is about solving immediate problems. I doubt that anybody who is motivated solely by their career (rather than a passion for the technical or people-oriented problems) will make much progress. But if they should, and solutions to those other two problems are achieved incidentally in the process, then I would say that is a good outcome. Wouldn't you? I expect, by and large, that these causes benefit from having as many participants as possible.
 
Seref Arikan's image Posts 5
Thanks 1
Joined 4 Apr '11 Email user
Just to provide the point of view of someone who has been waiting for this contest for quite some time: As far as I can understand, every submission grants rights to contents of the entry, in whatever way that is written in the rules at the moment. This is a big problem. I am doing a PhD in AI in healthcare (roughly) and I certainly have a few ideas about tackling the problem in this contest. However, if I move forward and submit an entry, whether or not my entry wins, I'm giving away rights to it anyway. I don't think there will be a winning entry which is going the clearly outperform everyone else. The winner may get a good reward, but what about the rest? Their work, which is likely to be almost as good as the winning one, is given away free with the current rules. I would submit just for fun, just to see if what I have in mind can match the high ranking entries in terms of performance, or to have discussions with others like me, not to win, but to share insight and improve what I (we) are doing. This is what I do all the time in the department, talk to researchers from other disciplines, exchange views. Please introduce the restrictions to winning entry, or to best performing ones at milestones. Believe me, there would be a lot of entries and sharing if the restrictions applied only to winner. If I'm wrong about the way the rules are formed, that'd be great news for me.
 
Jeremy Howard (Kaggle)'s image Posts 166
Thanks 58
Joined 13 Oct '10 Email user
From Kaggle

Seref, see my earlier answer - surely you are not "giving away" rights to anything in a way that will restrict your academic progress in any way? The total package required to create a submission requires many steps - it's only the complete package of all of them taken as a whole that you're allowing HPN to have. Surely your PhD doesn't require the exact same file import steps, submission file export steps, etc? If it's not the exact same complete package, then there's no impact of the license exclusivity.

 
Seref Arikan's image Posts 5
Thanks 1
Joined 4 Apr '11 Email user

Hi Jeremy, 

I really appreciate your effort to keep this going, thanks. This sounds like really good news. May I simply ask if the flow of events below (which is likely to be the case for many others) is possible? Do you see any issues here?:

1) I use an approach that is either directly used in my PhD, or is similar to an approach that I'm using in PhD/Other research, and I submit.

2) I fail to take the top spot (almost guaranteed) 

3) I repeat step 1, talk to others, maybe come up with some new ideas and try them too. Meanwhile, I publish a paper, which uses the algorithm(s) in my submission(s), also I use some of them at work (I'm also working as a software developer) I only use the algorithm, no data or anything else from HHP

4) The contest is over, someone won (not me)

5) I keep publishing and working, using some of the methods/algorithms I've developed during steps 1-4. No body has any reason or base to sue me in the future. 

 

Is this correct? This is probably the case for 90% of people asking for clarifications.

 

Regards

Seref

Thanked by Information Man
 
Jeremy Howard (Kaggle)'s image Posts 166
Thanks 58
Joined 13 Oct '10 Email user
From Kaggle

That all sounds fine Seref.

 
Tom SF Haines's image Posts 15
Joined 5 May '11 Email user
So what exactly is the line between what hpn declare exclusivity over and not? i.e. what kind of change is required to the code base before you no longer claim exclusive rights over it? The description being presented makes it sound like slightly tweaking even a single parameter circumvents exclusivity, which would make the use of the term irrelevant.
 

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?