<12>
infty's image Rank 4th
Posts 7
Thanks 2
Joined 27 Mar '12 Email user

B Yang> how did you deal with missing values ? For example, in feature set m1, what did you do for numeric Age value if the member's age is missing ?

Please see the third bullet under "Notation" in A.2 "Features derived from Claim data" for how missing values in Claim data were treated. As for "numeric Age", the documentation refers to [4] for conversion of categorical values to numerical values; however, I just realized that we assigned -1 for missing age, whereas [4] assigned 80.

 
B Yang's image Rank 2nd
Posts 195
Thanks 46
Joined 12 Nov '10 Email user

infty wrote:
Please see the third bullet under "Notation" in A.2 "Features derived from Claim data" for how missing values in Claim data were treated. As for "numeric Age", the documentation refers to [4] for conversion of categorical values to numerical values; however, I just realized that we assigned -1 for missing age, whereas [4] assigned 80.

Thanks for the quick reply. Do your algorithms handle missing values or just use them as is ? For the numeric Age case, do they treat -1 as 'age missing' and do something special, or do they treat it same as other valid age values ?

How about values like avgs and SDs of various fields, for example LengthOfStay. For members who have at least one LengthOfStay value, do you just calculate min/max/avg/whatever while ignoring the missing values ? What about members who have all LengthOfStay values missing ?

 
Edward's image Rank 4th
Posts 5
Thanks 7
Joined 16 Feb '11 Email user
Oleg wrote:
"Could you please let us know what do we miss here?"

Good observation about table 1.
Table 1 was generated in order to clarify the paydelay distribution.
The routine used to generate that table contained an error which resulted in incorrect counts.
The correct table is presented below.
Sorry for the confustion, and thanks for finding and reporting this fault.

         DSFS:
       month:
paydelay:

0-1
Jan

1-2
Feb

2-3
Mar

3-4
Apr

4-5
May

5-6
Jun

6-7
Jul

7-8
Aug

8-9
Sep

9-10
Oct

10-11
Nov

11-12
Dec

0....0

898

518

525

561

532

708

769

932

1666

2599

6907

16044

1...10

688

484

564

527

371

145

66

103

87

139

250

152

11...20

4978

3457

2810

2510

1569

1267

1196

1371

3991

4531

3692

1258

21...30

8391

4729

5255

5426

5668

5709

5597

6280

5626

4401

4294

327

31...40

3984

2433

1995

2009

1672

2282

2129

2336

1987

2041

1515

0

41...50

3016

1618

1314

1254

1041

1617

1170

1395

1173

1443

615

0

51...60

2186

770

864

602

692

1160

978

822

689

904

64

0

61...70

891

657

411

347

509

602

983

560

423

622

0

0

71...80

454

533

330

321

661

449

528

350

564

290

0

0

81...90

328

328

356

391

329

277

219

174

481

35

0

0

91..100

408

240

218

352

211

219

143

184

164

0

0

0

101..110

478

166

194

264

117

121

76

126

29

0

0

0

111..120

219

105

138

191

127

110

59

44

9

0

0

0

121..130

97

42

106

107

43

70

28

38

0

0

0

0

131..140

66

50

61

43

24

51

74

61

0

0

0

0

141..150

63

53

89

24

25

35

25

0

0

0

0

0

151..161

41

42

39

46

27

24

25

0

0

0

0

0

162..162

517

263

204

225

121

76

9

0

0

0

0

0


In the milestone 3 report, the text where month(6-7) is stated should now be changed month(7-8).
We will update this in the milestone 3 report.

Edward.
Thanked by Oleg Vasilyev
 
infty's image Rank 4th
Posts 7
Thanks 2
Joined 27 Mar '12 Email user

B Yang> Do your algorithms handle missing values or just use them as is ? For the numeric Age case, do they treat -1 as 'age missing' and do something special, or do they treat it same as other valid age values ? How about values like avgs and SDs of various fields, for example LengthOfStay. For members who have at least one LengthOfStay value, do you just calculate min/max/avg/whatever while ignoring the missing values ? What about members who have all LengthOfStay values missing ?

In our system, the algorithms (RGF, GBDT, random forests, etc.), do not handle missing values. You can't add or compare missing values, so yes, missing values were ignored in (i.e., excluded from) the computation of min/max/avg etc. For the members with all the values missing, min/max/avg etc. were set to zero except for "range" which was set to -1. I hope that this answers all of your questions.

 
DanB's image Rank 2nd
Posts 58
Thanks 46
Joined 6 Apr '11 Email user

Hi Rie,
I'm trying to use the RGF code you linked to in your paper. When I try to compile the code by running the makefile in os x, I get the error


Dans-Laptop:rgf1.2 dan2$ make
/bin/rm -f bin/rgf
g++ src/tet/driv_rgf.cpp src/com/AzDmat.cpp src/tet/AzFindSplit.cpp src/com/AzIntPool.cpp src/com/AzLoss.cpp src/tet/AzOptOnTree_TreeReg.cpp src/tet/AzOptOnTree.cpp src/com/AzParam.cpp src/tet/AzReg_Tsrbase.cpp src/tet/AzReg_TsrOpt.cpp src/tet/AzReg_TsrSib.cpp src/tet/AzRgf_FindSplit_Dflt.cpp src/tet/AzRgf_FindSplit_TreeReg.cpp src/tet/AzRgf_Optimizer_Dflt.cpp src/tet/AzRgforest.cpp src/tet/AzRgfTree.cpp src/com/AzSmat.cpp src/tet/AzSortedFeat.cpp src/com/AzStrPool.cpp src/com/AzSvDataS.cpp src/com/AzTaskTools.cpp src/tet/AzTETmain.cpp src/tet/AzTETproc.cpp src/com/AzTools.cpp src/tet/AzTree.cpp src/tet/AzTreeEnsemble.cpp src/tet/AzTrTree.cpp src/tet/AzTrTreeFeat.cpp src/com/AzUtil.cpp -Isrc/com -Isrc/tet_tools -O2 -o bin/rgf
src/tet/AzTree.cpp: In member function ‘void AzTree::finfo(AzIFarr*, AzIFarr*) const’:
src/tet/AzTree.cpp:232: error: call of overloaded ‘abs(double&)’ is ambiguous
/usr/include/stdlib.h:146: note: candidates are: int abs(int)
/usr/include/c++/4.2.1/cstdlib:174: note: long long int __gnu_cxx::abs(long long int)
/usr/include/c++/4.2.1/cstdlib:143: note: long int std::abs(long int)
make: *** [all] Error 1


Hopefully it's just a matter of changing the type declarations, but I'm years out of practice in C++.
Thanks for any help,
Dan

 
infty's image Rank 4th
Posts 7
Thanks 2
Joined 27 Mar '12 Email user

Hi DanB,

Interesting. That "abs" is a typo for "fabs", but the two compilers I tested happened to have "double abs(double)", so I missed it. Thanks much for catching this.

Could you please change "abs(w)" in Line 232 of rgf1.2/src/tet/AzTree.cpp to "fabs(w)"?
If you run into further problems, please email to "Contact" in the readme.

Rie

Thanked by DanB
 
David J. Slate's image Rank 13th
Posts 65
Thanks 25
Joined 5 Aug '10 Email user

Congratulations to the Milestone 3 winners.
Question for crescendo:

Hi Rie,

In section 4.1 of crescendo_milestone3.pdf (Blending to produce the winning submission), you discuss the approximate ridge regression solution using Leaderboard scores, and note that:

||p(j) - y||^2 can be approximated by n*sj where sj is the Leaderboard score of the j-th run pj.

If the leaderboard scores are root mean squared errors, shouldn't that be n*sj*sj rather than n*sj, or am I confused?

Thanks,

-- Dave Slate

 
infty's image Rank 4th
Posts 7
Thanks 2
Joined 27 Mar '12 Email user

Hi David,

David wrote> shouldn't that be nsjsj rather than n*sj, or am I confused?

Right. That's a typo. That s_j should have a superscript "2".

Rie

 
<12>

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?