<1234>
thonda's image Posts 2
Joined 4 Oct '11 Email user

I have quick question to Willem Mestrom.

You are using Provider ID, PCP, and Vendor ID for MC2 dataset.. But there are some Provider, PCP, and Vendor who appears only on prediction dataset. How do you create the fi, gi for these categories? Are you just using average fi, gi values? Or you have better way to estimate these parameters.

Thanks.

 
Willem Mestrom's image Rank 4th
Posts 24
Thanks 9
Joined 28 Feb '11 Email user

Hi thonda,

That is a good question. I didn't think of it so I'm not doing anything smart with it. The fi and gi are initialized with random data (uniform between -0.01 and +0.01). If there is no data in the learning set they will never be updated and will still have the original (random) data when the predictions are made. Probably it would be better to set them to the overall mean or perhaps the mean of just the ones with few observations if that is significantly different.

@John: Browsing through the topic I noticed I missed your final question. I don't know any rule of thumb the find a good value for alpha parameter. Try and error is not going to work since you will be using the alpha parameter to prevent overfitting the leaderboard and improve the private score so you don't get any feedback. An alpha of zero is probably going to give the best leaderboard score. I tried to find a good value based on a similar set of predictions for Y1 and simulate the leaderboard scoring and blending procedure.

Willem

 
Jeremy Howard (Kaggle)'s image Posts 166
Thanks 58
Joined 13 Oct '10 Email user
From Kaggle

Quick update: I've now received the final judge's comments. Hopefully I'll be able to have compiled it all together by tomorrow; Monday at the latest.

 
tim245's image Posts 2
Joined 21 May '12 Email user

I have question to Willem Mestrom.
In 1st milestone solution you are using stochastic gradient descent. You gave detailed example of update in model CatVec1. The question is about it.
What is 'e' between nf and (\sum gi) in update for \hat(f)i. There should be only nf and gradient ( which is sum ), but what is 'e'?
Thank you and congratulation with your results!

 
yu123's image Posts 2
Joined 6 Nov '12 Email user

Hello everyone,
I am a student at college thinking of choosing this topic as a data mining project to work on just for my class. So I found out about this competition now and signed up to assess to the forums and such but I could not view winners' paper because I did not accept the rules which I really could not because "This competition is CLOSED TO NEW ENTRANTS" Could anybody share the papers by those winners if it's allowed and legal? Thanks ahead

 
yu123's image Posts 2
Joined 6 Nov '12 Email user

Hello everyone,
I am a student at college thinking of choosing this topic as a data mining project to work on just for my class. So I found out about this competition now and signed up to assess to the forums and such but I could not view winners' paper because I did not accept the rules which I really could not because "This competition is CLOSED TO NEW ENTRANTS" Could anybody share the papers by those winners if it's allowed and legal? Thanks ahead

 
DavidChudzicki's image
DavidChudzicki
Kaggle Admin
Posts 424
Thanks 106
Joined 21 Nov '10 Email user
From Kaggle

Anyone can access the papers here: https://www.kaggle.com/wiki/HeritageMilestonePapers

I'll work on getting the links fixed.

 
<1234>

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?