The papers written by the milestone winners are now available as attachments to this forum post. As described in section 13 of the rules, if you have any concerns about these papers, you have 30 days from their posting to provide your feedback.
2 Attachments —|
votes
|
Both the docx and pdf files for Edward / Willem's paper appear to be missing formula and symbols at least on all the machine / software combinations I tried it on. |
|
votes
|
My browser also has a display problem with the Edward+Willem paper, but it works fine for me if I save it (edit: the docx version) to disk and open it with Word. |
|
votes
|
Thanks Signipinnis. I have downloaded and can open the file - there is text but formula appear to be missing. For example on the bottom of page 5 I have the sentence, "Now we have obtained the following variables: " and then ", , , , and" which suggest that something is missing. If someone could confirm more complete text, then the problem is at my end with the rendering. I've checked the docx xml code and there are only white space between the commas indicated above. The formula do not appear on pdf version, again on multiple systems. |
|
votes
|
Scott Thompson wrote: I have downloaded and can open the file - there is text but formula appear to be missing. For example on the bottom of page 5 I have the sentence, "Now we have obtained the following variables: " and then ", , , , and" which suggest that something is missing. Looks like pdf version corrupted. But docx is ok. Try pdf in attachment (converted from docx) 1 Attachment — |
|
votes
|
Congratulations, Willem & Edward team on your win. We have a question from our team regarding table 1 (Claims distribution of paydelay vs Dsfs) in your paper. From the paper, on what is counted in the table: "Important is that only claims are used that belong to members of Year3, that have a maximum DSFS value of "11-12" (month), because we then know to which real month all the claims belong." When we select number of claims with DSFS=12 and Year=3 we get exactly same counts as in the table: 16044 claims with paydelay=0, 152 claims with paydelay=1...10, etc. However, for any other DSFS, we get counts much larger than in the table. Could you please let us know what do we miss here? |
|
votes
|
Hi Team Crescendo. Congratulations on your milestone achievement. |
|
votes
|
Congrats! Willem & Edward Team on wining the milestone The modelling on DSFS looks interesting. Yeah, I agree with Oleg Vasilyev the number do not matches with the paper also the papers lacks on details around the simulations used to attain the constants or interpolated values used to evaluate the offset values. Could you please provide more detail on this? Thanks |
|
votes
|
Congratulations to the milestone winners! Willem & Edward, very impressive improvement yet again. Crescendo, impressive improvement on the private data compared to the public leaderboard! The crescendo report is very well written, thanks for taking the effort. I do have a few remaining questions though:
Thanks in advance! |
|
votes
|
A question for team Crescendo: how did you deal with missing values ? For example, in feature set m1, what did you do for numeric Age value if the member's age is missing ? |
|
votes
|
Hi Tim,
We used our own implementation.
Our code can handle sparse data efficiently.
Thanks for suggestions, but we choose not to provide those numbers, which are not needed for replicating the winning entry. In the course of replicating the results based on the current documentation, one would obtain those numbers as side products. It'd be just too tedious to list up or even read so many numbers. I hope you understand. Rie |
|
votes
|
B Yang> how did you deal with missing values ? For example, in feature set m1, what did you do for numeric Age value if the member's age is missing ? Please see the third bullet under "Notation" in A.2 "Features derived from Claim data" for how missing values in Claim data were treated. As for "numeric Age", the documentation refers to [4] for conversion of categorical values to numerical values; however, I just realized that we assigned -1 for missing age, whereas [4] assigned 80. |
|
votes
|
infty wrote:
Please see the third bullet under "Notation" in A.2 "Features derived from Claim data" for how missing values in Claim data were treated. As for "numeric Age", the documentation refers to [4] for conversion of categorical values to numerical
values; however, I just realized that we assigned -1 for missing age, whereas [4] assigned 80.
Thanks for the quick reply. Do your algorithms handle missing values or just use them as is ? For the numeric Age case, do they treat -1 as 'age missing' and do something special, or do they treat it same as other valid age values ? How about values like avgs and SDs of various fields, for example LengthOfStay. For members who have at least one LengthOfStay value, do you just calculate min/max/avg/whatever while ignoring the missing values ? What about members who have all LengthOfStay values missing ? |
|
vote
|
Oleg wrote:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
votes
|
B Yang> Do your algorithms handle missing values or just use them as is ? For the numeric Age case, do they treat -1 as 'age missing' and do something special, or do they treat it same as other valid age values ? How about values like avgs and SDs of various fields, for example LengthOfStay. For members who have at least one LengthOfStay value, do you just calculate min/max/avg/whatever while ignoring the missing values ? What about members who have all LengthOfStay values missing ? In our system, the algorithms (RGF, GBDT, random forests, etc.), do not handle missing values. You can't add or compare missing values, so yes, missing values were ignored in (i.e., excluded from) the computation of min/max/avg etc. For the members with all the values missing, min/max/avg etc. were set to zero except for "range" which was set to -1. I hope that this answers all of your questions. |
|
votes
|
Hi Rie,
|
Reply
Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?

with —