Rickr's image Posts 2
Joined 4 Apr '11 Email user

Several coworkers and I would like to form a team.  We plan to leverage some proprietary software from our company to help learn a model that can make the necessary predictions.  This software is general purpose (domain independent) learning software that will learn a predictive model (algorithm) that will then be used to assess each exemplar and predict some number of inpatient days.  The question concerns the licensing agreement.  If we submit entries, would we be required to give you all of the learning software, or just the predictive model that is used to make predictions in this specific domain?  We can't give away our proprietary software, so this would be a show stopper for us...

 
Jeremy Howard (Kaggle)'s image Posts 166
Thanks 58
Joined 13 Oct '10 Email user
From Kaggle
For HPN to be able to update the model as more years of data become available in the future, would they need your proprietary system to re-train it?
 
Justin Washtell's image Posts 48
Thanks 15
Joined 26 Aug '10 Email user

I am in a similar position.

Isn't the purpose of the extensive evaluation to ensure that the *resultant model* is robust and will apply well to future data? Or will HPN want to adapt the winning model to contend with very long-term health trends and changing data availability, on the assumption that the best underlying model on the *current* data is the best starting point for that? Although it does not follow that the winning solution in the first case will be the best solution in the latter case and vice-versa, I suppose they would like to strike a balance.

That being the case, would providing HPN with a perpetual licence to use any proporietary software (and source code and full technical documentation as necessary) in order to re-learn the winning model (and any variants thereof) be acceptable? (e.g. with the proviso that no deribative of the software itself, nor the detail of its workings, is made publically available or resold).

 
Rickr's image Posts 2
Joined 4 Apr '11 Email user

jphoward wrote:
For HPN to be able to update the model as more years of data become available in the future, would they need your proprietary system to re-train it?

Yes, "retraining" the model would require the proprietary learning software.

 
Tony F.'s image Posts 5
Joined 1 Apr '11 Email user

I have a question along the same lines. I will be using pre written algorithms that are part of the SPSS Modeler software. I have taken some graduate level data mining courses but I am by no means an expert in comparison to many of you posting messages on this board.

I think I read a posting earlier today that stated the purpose of the competition is to write your own algorithm and that's why the contest runs for two years. I guess I have a slightly different interpretation of the purpose of the contest and I would like to get some clarity. 

It seems to me that if you can show that whatever tool you use is the most predictive and others are able to replicate those results to help identify at risk patients then you should win the contest. If one were to use a commercially available data mining product like Clementine and win the competition, would they be eligible to win the monetary award?

I'm assuming that the algorithms that are part of the Clementine (SPSS Modeler) package are proprietary so they can't be modified. But isn't the contest to see who can come up with the most accurate predictions on the test set of data for the number of days a patient is likely to spend in the hospital over the next 12 months? As I mentioned I'm new to these types of contests and I don't know as much as you all do, so please help me understand what I'm missing.

If you actually have to build your own algorithm I'll still enter the contest but at least I'll know that I'm not eligible to win the 3 million dollars that I was sure I was going to win :-) 

Thanks everyone. I look forward to learning a lot from the user community as this contest continues to evolve.

 
abbas shojaee's image Posts 9
Thanks 1
Joined 4 Apr '11 Email user

Commercial tools are the fruit of good and established academic works but not the most recent or advanced and the contest is because of the significant gap, between desired level of accuracy and what can be achieved by even the best existing commercial/ laboratory tools. So it is unlikely to achieve the best results without making innovations in your usage of those tools.

 
Tony F.'s image Posts 5
Joined 1 Apr '11 Email user

Thanks for the reply ashojaee.

So these types of contests are always won by people/groups that develop their own algorithms?  I guess I didn't realize that there is such a gap in accuracy between pre-packaged algorithms vs homemade versions.

Can someone please address my question about whether or not a submission using commercially available tools (like Clementine) would be eligible for a prize if by some strange occurrence the most predictive model is built using a pre-fab algorithm.

 
mathlawguy's image Posts 1
Joined 7 Apr '11 Email user

My team is concerned and confused on the IP issue as well.  I am hoping someone with authority to speak for Heritage can answer this question.

1. Suppose that each member of our team has a license to use software owned by X Corp. It may or may not be the case that the owners of X Corp. include members of the team. It is clear, however, that working on the prize does not violate the terms of the team members' license with X Corp.

2. We submit an entry. It is not clear to me at this point whether the entry is a list of predictions derived from a formula/algorithm the team developed or whether it is a formula/algorithm of some sort that permits Heritage to make predictions.  (Some clarity on this point would help). The entry is not, however, the underlying software that was used to develop the algorithm.

3.  Paragraph 21 of the agreement says that each Entrant -- even those that don't win -- grants Sponsor a saleable free and exclusive right to sell "the entry and the algorithm used to produce the entry ..."

4. It makes complete sense to us that Sponsor gets the (exclusive) right to use our predictions.  It makes complete sense to us that Sponsor gets the (exclusive) right to use our algorithm.  This is particularly true if it needs to validate the algorithm on other data. What doesn't make sense, however, is that Sponsor would get the (exclusive) right to use and sell the X Corp. product or algorithm that was used, in some sense, to produce the specific algorithm and the entry.  We can't believe Sponsor would be claiming that it has this right but the language of the contract appears broad enough to encompass this possibility. 

5.  To put it more abstractly, X Corp has an algorithm (we can call it the Background Algorithm) for producing algorithms (Specific Algorithms) .  We are happy to cough up the Specific Algorithm produced using the X Corp. Background Algorithm.  We are not happy to cough up the Background Algorithm used to produce the Specific Algorithm that we cough up.  And this is true in part because we don't actually own that Background Algorithm, though we have a license to use it.  Neither we nor, we assume, the Sponsor wants trouble down the road.

6.  Are we correct in believing that Sponsor is not seeking and will not seek rights in the Background Algorithm but only in the Specific Algorithm used to produce predictions about hospitalization?  We, as I suspect most teams, would be willing to use the Background Algorithm at reasonable time intervals to develop new Specific Algorithms as more data becomes available, but the team can't give away exclusive rights in the Background Algorithm, even for a shot at $3 million.

 

Thanks

 
FineLineSysDes's image Posts 27
Thanks 6
Joined 4 Apr '11 Email user

mathlawguy wrote:

My team is concerned and confused on the IP issue as well.  I am hoping someone with authority to speak for Heritage can answer this question.

1. Suppose that each member of our team has a license to use software owned by X Corp. It may or may not be the case that the owners of X Corp. include members of the team. It is clear, however, that working on the prize does not violate the terms of the team members' license with X Corp.

The rules state that you cannot use outside data that isn't publically available.  I assume that also means you can't use outside libraries that aren't publically available.  Since you needed to license the libraries, you can't use them (because HPN can't enforce its licensing rules on the third party in this case).

 

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?