Interesting submissions with scores?

« Prev
Topic
» Next
Topic
Allan Engelhardt's image Posts 77
Thanks 29
Joined 28 May '10 Email user

Did anybody do any interesting submissions they want to share?

I submitted $p_i = 0.18584427052136$ for all $i$ giving a public score of  0.486849.

If anybody has submitted all zeros, then we can calculate the mean of $a_i$ for the sample.

Thanked by Domcastro
 
Allan Engelhardt's image Posts 77
Thanks 29
Joined 28 May '10 Email user
If those submissions at 0.522226 on the leaderboard are from constant zero submissions, then I make $mean(\log(a_i+1)) = 0.189941$ for the 30% sample, compared with 0.1863221 and 0.178212 in Y2 and Y3, respectively. Can anybody check my math, please?
 
Valentin Tiriac's image Posts 16
Thanks 4
Joined 6 May '10 Email user

Thanks, you just saved me a submission. You're right about 0.522226 being the score for constant 0. I get the same log mean+1.

EDIT:

I calculate the score for the mean should be 0.486435. Can anyone confirm?

Thanked by trezza , and Domcastro
 
Zaccak Solutions's image Posts 39
Thanks 7
Joined 10 Feb '11 Email user
     

I submitting the constant value 0.18584427052136 and got a score of 0.486849

 
Tatiana McClintock's image Posts 9
Joined 15 Apr '11 Email user
What am I doing here? You are speaking some odd langiage to me. Submitting a constant value? What the h is that? And getting the score hmm who is scoring?
 
Chris Raimondi's image Rank 38th
Posts 194
Thanks 90
Joined 9 Jul '10 Email user
If you forget to supress the rownames in R and therefore - have "ClaimsTruncated" as your prediction - you get: 0.509697
 
Allan Engelhardt's image Posts 77
Thanks 29
Joined 28 May '10 Email user

Valentin Tiriac wrote:

I calculate the score for the mean should be 0.486435. Can anyone confirm?

Hmm, isn't it

> print(sqrt(0.522226^2 - 0.189941^2))
0.4864591

Or maybe I am just sleepy again but it does agree with two submissions on the leaderboard.

 
Valentin Tiriac's image Posts 16
Thanks 4
Joined 6 May '10 Email user

Allan Engelhardt wrote:

Valentin Tiriac wrote:

I calculate the score for the mean should be 0.486435. Can anyone confirm?

Hmm, isn't it

> print(sqrt(0.522226^2 - 0.189941^2))
0.4864591

Or maybe I am just sleepy again but it does agree with two submissions on the leaderboard.

Yep. The way I calculated it was by fitting a parabola in Excel, so it had rounding errors, but your way is better.

 
Zach's image Rank 31st
Posts 292
Thanks 64
Joined 2 Mar '11 Email user

Tatiana McClintock wrote:

What am I doing here? You are speaking some odd langiage to me. Submitting a constant value? What the h is that?

Predicting that EVERYONE in the dataset will be hospitalized for about .19 days.

Tatiana McClintock wrote:

And getting the score hmm who is scoring?

Kaggle is: http://www.heritagehealthprize.com/c/hhp/Leaderboard

 
Eric Jackson's image Posts 21
Thanks 9
Joined 9 Sep '10 Email user

Allan Engelhardt wrote:

If those submissions at 0.522226 on the leaderboard are from constant zero submissions, then I make $mean(\log(a_i+1)) = 0.189941$ for the 30% sample, compared with 0.1863221 and 0.178212 in Y2 and Y3, respectively. Can anybody check my math, please?

Hmm, I get 0.188965 using your figures (0.522226 RMSE for constant zero submission, 0.486849 for constant 0.18584427 submission).

 
Allan Engelhardt's image Posts 77
Thanks 29
Joined 28 May '10 Email user

Eric Jackson wrote:

Allan Engelhardt wrote:

If those submissions at 0.522226 on the leaderboard are from constant zero submissions, then I make \\(mean(\log(a_i+1)) = 0.189941\\) for the 30% sample, compared with 0.1863221 and 0.178212 in Y2 and Y3, respectively. Can anybody check my math, please?

Hmm, I get 0.188965 using your figures (0.522226 RMSE for constant zero submission, 0.486849 for constant 0.18584427 submission).

Too much maths: my head will hurt :-)  Anyhow, my calculations and reasoning are:

Let \\(A_i = \log(a_i + 1)\\).  For a submission of \\(p_i = 0\\) the score is \\(\epsilon = \sqrt{ \mean(A_{i}^2) }\\)and we have the value \\(0.522226\\) for this.

For a constant submission \\(p_i = p\\) and setting \\(P = \log(p + 1)\\) we have \\(\epsilon = \sqrt{P^2 + \mean(A_{i}^2) - 2 P \mean(A_i)}\\).  For \\(p = 0.18584427052136\\) we got a score of \\(\epsilon = 0.486849\\).  Solving for \\(\mean(A_i)\\) gives

$$
\mean(A_i) = \frac{P^2 + \mean(A_i^2) - \epsilon^2}{2 P}
$$

Using \\(\mean(A_i^2) = (0.522226)^2\\) and the \\(\epsilon\\) of \\(0.486849\\) gives \\(\mean(A_i) = 0.189941\\).

In R, I do the calculation as

P E MAI2 (P^2 + MAI2 - E^2)/(2*P)
## [1] 0.1899415

I still can't see my mistake, but that certainly doesn't mean there isn't one!

 
Uri Blass's image Posts 253
Thanks 4
Joined 5 Aug '10 Email user

Erik Jackson,

Your mistake is that you probably that you used 0.18584427052136 in your formula instead of log(1.18584427052136)

Allan Engelhardt seems to be right and we get log(a+1)=0.189941 and it mean a=0.209179 and it means that the best submision of constant should be 0.209179 days for everybody(I hope that I have no errors).

 

Thanked by Eric Jackson
 
Uri Blass's image Posts 253
Thanks 4
Joined 5 Aug '10 Email user

Allan,Are you sure that you got 0.1863221 and 0.178212 in Y2 and Y3?

The first number is ok but the second number is for me 0.178228 so it may be possible that I have an error in reading the data of Y3.

 
Uri Blass's image Posts 253
Thanks 4
Joined 5 Aug '10 Email user
It seems that my error was not in the reading of the data but in my calculation. I did average on the first 71345 numbers instead of doing an average on the 71435 numbers.
 
Allan Engelhardt's image Posts 77
Thanks 29
Joined 28 May '10 Email user

Allan Engelhardt wrote:

[...]In R, I do the calculation as

P E MAI2 (P^2 + MAI2 - E^2)/(2*P)
## [1] 0.1899415

I still can't see my mistake, but that certainly doesn't mean there isn't one!

Hmm, Jeff Moser edited that after I posted so it now makes no sense at all.  Let me try again and see if Jeff can keep his fingers off the edit button this time....:

P <- log1p(0.18584427052136)
E <- 0.486849
MAI2 <- (0.522226)^2
(P^2 + MAI2 - E^2)/(2*P)

 
Jeff Moser's image
Jeff Moser
Kaggle Admin
Posts 356
Thanks 178
Joined 21 Aug '10 Email user
From Kaggle

Allan Engelhardt wrote:

Hmm, Jeff Moser edited that after I posted so it now makes no sense at all.  Let me try again and see if Jeff can keep his fingers off the edit button this time....:

Sorry about that.. your post has shown me that I need to tweak how inline MathJax is rendered. That's what I was experimenting with. Currently only displaymode math works (i.e. math surrounded by double dollar signs on each side). Displaymode puts equations on a separate line which looks odd. I'd like to get inline math to work as well (i.e. with single dollar sign delimeters). The problem is that some programming languages use dollar signs and confuse MathJax.

Just wanted to give an update on what I was doing.

 
Eric Jackson's image Posts 21
Thanks 9
Joined 9 Sep '10 Email user

Uri was absolutely right on my error.  I now get 0.189941 like Allan.

 

More significantly for me, what Uri's correction made me realize was that I had been mistakenly producing predictions in the log domain rather than the real domain.  In other words, I have been submitting files with predictions for log(D+1) rather than predictions for D.  It certainly helped me to fix that problem, although not as much as you might have thought - my leaderboard score improved by only 0.001.

 
Allan Engelhardt's image Posts 77
Thanks 29
Joined 28 May '10 Email user

@Jeff Moser: No problem, thanks.  And don't worry about \mean; I have

#+LATEX_HEADER: \usepackage[fleqn]{amsmath}
#+LATEX_HEADER: \DeclareMathOperator \mean {mean}

in my Org-Mode headers.

 
Jeff Moser's image
Jeff Moser
Kaggle Admin
Posts 356
Thanks 178
Joined 21 Aug '10 Email user
From Kaggle

More details on making beautiful math posts in these forums can be found at http://www.kaggle.com/forums/t/581/tips-for-beautiful-math-posts

Thanked by inf2207
 
Allan Engelhardt's image Posts 77
Thanks 29
Joined 28 May '10 Email user

Eric Jackson wrote:

More significantly for me, what Uri's correction made me realize was that I had been mistakenly producing predictions in the log domain rather than the real domain.  In other words, I have been submitting files with predictions for log(D+1) rather than predictions for D.  It certainly helped me to fix that problem, although not as much as you might have thought - my leaderboard score improved by only 0.001.

For small numbers \\(\log(1+x) \approx x\\) so your score woudn’t change much.  For example, \\(\log(1+0.189941) = 0.1739037\\).

 
Andreas's image Posts 8
Thanks 5
Joined 20 Mar '11 Email user
I so far only worked with with y2 data. With that and a constant zero prediction I get close to 0.522 as the error, quite a surprise when watching the leaderboard! I sampled prediction vectors directly from the Y2 values and got a mean of 0.69. Then I plotted RMSE versus mean squared error of simulated predictions and found a correlation (duh). Though the math isn't completely clear to me I think optimizing the model for MSE rather than RMSE will yield suboptimal results... I also simulated predictions that are exactly right with some percentage(0..100%) and sampled from a uniform (0..15) distribution otherwise. Under this model, you need to get 95% of all predictions exactly right to be better than the current leaders.
 
Allan Engelhardt's image Posts 77
Thanks 29
Joined 28 May '10 Email user

A simple linear model on Sex and AgeAtFirstClaim gives a public score of 0.478118

http://www.heritagehealthprize.com/c/hhp/forums/t/648/cases-missing-sex-and-age-code-in-release-3/4217#post4217



Thanked by Chris Raimondi
 

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?