|
Thanks 4 Joined 5 Aug '10 Email user |
|
|
Thanks 21 Joined 5 May '11 Email user |
|
|
Thanks 15 Joined 26 Aug '10 Email user |
Perhaps the data providers could use some kind of steganographic or watermarking approach to uniquely associate each downloaded version of future datasets with the user who downloaded it - at the cost of introducing a very small amount of noise to the data. In order to try and remove any traces of the identification, many users would have to work together, and this kind of activity would be strictly against the rules. For one person working alone to be *sure* that they had removed it, the data would have to be significantly "dumbed-down", rendering it of less interest to third parties. Just an idea. Probably all sorts of problems with that in practice. Perhaps not least that the data has already been "significantly dumbed-down". |
|
Joined 6 May '11 Email user |
Uri Blass wrote: I wonder what happens if somebody who does not participate in the competition get the data from one of the participants and publish research based on the data. Of course the participant who gave him the data is quilty but the sponsors is going to have a problem to find him(considering the fact that many people compete and the quilty person may not even be one of the participants but a hacker who stole the data from him). Can the sponsor do something against the researcher that can claim that he did not agree not to use the data for research purposes and what exactly the sponsor can do about it? Do you know if something similiar happened in the past? I am not a lawyer and I would like to know what can practically happen in this case based on the experience of the past.
I wasn't advocating this path. What I meant was, where can researchers obtain relevant data similar to the Heritage Prize data, but without the ridiculous licensing restrictions. Perhaps someone with more to gain, like an insurance company, would release a properly anonymized dataset with little to no IP restrictions. |
|
Thanks 58 Joined 13 Oct '10 Email user |
|
|
Joined 6 May '11 Email user |
Jeremy Howard (Kaggle) wrote: The rules and the FAQ explain that you can use the data for research purposes, as long as you first get permission from HPN. So if you're interested in doing research, please just ask!
This is (probably, as I won't agree to the current terms so haven't actually looked at it yet) a good test set for various algorithms. Sure, I could ask, but my main point was the following : The terms and license agreement as they currently stand are ambiguous and leave open the possiblity of my research, including the algorithm that I am developing for a different purpose, being transfered to Heritage forcefully in court. This is due to 1) the term "ALGORITHM" not being properly defined in the standard industry sense 2) contradictory and amiguous terms about what constitutes the "ALGORITHM" and what exactly the exclusive, non-transferable license that we are granting Heritage is actually for. Is it for the specific parameters and set of software, applied to the data, to obtain the score? This would seem the most reasonable, but the wording allows for any developed software used on the data to be granted to Heritage. |
|
Thanks 25 Joined 8 Apr '11 Email user |
ijvaughn wrote: The terms and license agreement as they currently stand are ambiguous and leave open the possiblity of my research, including the algorithm that I am developing for a different purpose, being transfered to Heritage forcefully in court.
If you win, and think your "algorithm" is just too darn special to share .... then don't submit the required documentation with the stated time limits ... that disqualifies you.
Game ON !
|
|
Thanks 58 Joined 13 Oct '10 Email user |
ijvaughn wrote: The terms and license agreement as they currently stand are ambiguous and leave open the possiblity of my research, including the algorithm that I am developing for a different purpose, being transfered to Heritage forcefully in court. This is due to 1) the term "ALGORITHM" not being properly defined in the standard industry sense 2) contradictory and amiguous terms about what constitutes the "ALGORITHM" and what exactly the exclusive, non-transferable license that we are granting Heritage is actually for.
The license is extremely specific and uses terms that are given an exact definition. The license is to the: "Entry and Prediction Algorithm used to produce the Entry". "Entry" is defined as: "the data submitted in the manner and format specified on the Website via the Website on Entry form". "Prediction Algorithm" is defined as "the algorithm used to produce the data in an Entry taken as a whole (i.e., its particular total configuration) but does not include individual components of the Prediction Algorithm or tools used for analysis or development of the Prediction Algorithm". This is an extremely competitor-friendly definition - only the actual final configuration of parameters, inputs, etc taken as a whole is being transferred to HPN. The document both explicitly says it's the "particular total configuration" that's being covered, and even goes so far as to remove any possible ambiguity but explicitly stating that all component parts are not covered.
Thanked by
Eric Jackson
|
|
Thanks 2 Joined 5 Apr '11 Email user |
|
|
Thanks 15 Joined 26 Aug '10 Email user |
alexanderr wrote:
If people want to develop algorithms to benefit themselves in future they should not do this competition.This competitition is about helping people get better healthcare not helping competitors to further their careers.I would try just as hard to win if there
was no prize money at all.Apart from anything else this is an interesting problem to solve and not an easy one.
You've given at least two good reasons for participating. I would observe that Kaggle is about furthering transferable technology as much as it is about solving immediate problems. I doubt that anybody who is motivated solely by their career (rather than a
passion for the technical or people-oriented problems) will make much progress. But if they should, and solutions to those other two problems are achieved incidentally in the process, then I would say that is a good outcome. Wouldn't you? I expect, by and
large, that these causes benefit from having as many participants as possible.
|
|
Thanks 1 Joined 4 Apr '11 Email user |
|
|
Thanks 58 Joined 13 Oct '10 Email user |
Seref, see my earlier answer - surely you are not "giving away" rights to anything in a way that will restrict your academic progress in any way? The total package required to create a submission requires many steps - it's only the complete package of all of them taken as a whole that you're allowing HPN to have. Surely your PhD doesn't require the exact same file import steps, submission file export steps, etc? If it's not the exact same complete package, then there's no impact of the license exclusivity. |
|
Thanks 1 Joined 4 Apr '11 Email user |
Hi Jeremy, I really appreciate your effort to keep this going, thanks. This sounds like really good news. May I simply ask if the flow of events below (which is likely to be the case for many others) is possible? Do you see any issues here?: 1) I use an approach that is either directly used in my PhD, or is similar to an approach that I'm using in PhD/Other research, and I submit. 2) I fail to take the top spot (almost guaranteed) 3) I repeat step 1, talk to others, maybe come up with some new ideas and try them too. Meanwhile, I publish a paper, which uses the algorithm(s) in my submission(s), also I use some of them at work (I'm also working as a software developer) I only use the algorithm, no data or anything else from HHP 4) The contest is over, someone won (not me) 5) I keep publishing and working, using some of the methods/algorithms I've developed during steps 1-4. No body has any reason or base to sue me in the future.
Is this correct? This is probably the case for 90% of people asking for clarifications.
Regards Seref
Thanked by
Information Man
|
|
Thanks 58 Joined 13 Oct '10 Email user |
|
|
Joined 5 May '11 Email user |
|
Reply
Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?

with —