plotting the leaderboard in R

« Prev
Topic
» Next
Topic
Sali Mali's image Rank 4th
Posts 292
Thanks 113
Joined 22 Jun '10 Email user

I cobbled together some R code that will plot the live leaderboard and show you where you are. If you can enhance this or make it more efficient then please let us know.

 

http://anotherdataminingblog.blogspot.com/2011/06/scraping-up-leaderboard.html

 

 
Jeff Moser's image
Jeff Moser
Kaggle Admin
Posts 356
Thanks 178
Joined 21 Aug '10 Email user
From Kaggle

Nice!

You might also want to look at the ZIP file containing the raw data for the leaderboard that's at the bottom of the leaderboard page. It contains a CSV file that tracks every team's improvement since the start of the competition. Effectively, it gives you the same data as if you had continually scraped the leaderboard page to track changes.

Thanked by Sarkis
 
Ford Prefect's image Posts 23
Thanks 9
Joined 2 Dec '10 Email user

@Jeff If you're feeling up to it, it would be neat to have an interactive plot with the top teams on one side, so that if you mouse over a team name, then the history of their submissions gets highlighted on the graph. That way we can see who's hovering and who's on the bleeding edge over time.

 
Jeff Moser's image
Jeff Moser
Kaggle Admin
Posts 356
Thanks 178
Joined 21 Aug '10 Email user
From Kaggle

Ford Prefect wrote:

@Jeff If you're feeling up to it, it would be neat to have an interactive plot with the top teams on one side, so that if you mouse over a team name, then the history of their submissions gets highlighted on the graph. That way we can see who's hovering and who's on the bleeding edge over time.

I tried to get a simple start on this with the arrows in the leaderboard showing trends. One hidden leaderboard feature is that you can specify the "delta" period that you'd like to look at by specifying the time period using "h" for hours, "d" for days, "w" for weeks, and "m" for months. For example:

http://www.heritagehealthprize.com/c/hhp/Leaderboard?delta=2w

will show trend indicators comparing each team's position from 2 weeks ago to now

http://www.heritagehealthprize.com/c/hhp/Leaderboard?delta=12h

will show the past half day (as would http://www.heritagehealthprize.com/c/hhp/Leaderboard?delta=0.5d )

I tried to write a basic algorithm that slowly grows the delta period towards one week as a competition progresses. I'd be up for suggestions on tweaking this algorithm (is 1 week too much? too little?)

I like the idea of more interactive graphs and would be willing to write the backend code if someone knows of a great graphing (JavaScript) tool that's relatively easy to integrate.

 
Ford Prefect's image Posts 23
Thanks 9
Joined 2 Dec '10 Email user

Jeff Moser wrote:

I like the idea of more interactive graphs and would be willing to write the backend code if someone knows of a great graphing (JavaScript) tool that's relatively easy to integrate.

Jeff, maybe this suggestion isn't web 2.0 enough, but couldn't you have a simple image map with rollovers? That way, the static images are just swapped and the processing logic is entirely segregated to the backend.

 
Pablo Ruggia's image Posts 7
Thanks 8
Joined 3 Jun '11 Email user

How about this motion chart?

http://code.google.com/apis/chart/interactive/docs/gallery/motionchart.html#Example

You can create a motion chart that shows how each team evolved its score over time, its automatically animated. Don't know how fast it will be with more than 200 circles though ...

 

 

 
Allan Engelhardt's image Posts 77
Thanks 29
Joined 28 May '10 Email user

Good one, thanks.  I have been scraping it regularly for a while:

Evolution of best score

 

PDF: http://static.cybaea.net/Kaggle/HHP/history.pdf

library("XML", quietly = TRUE)
start.time = Sys.time()
start.time.num <- round(as.numeric(start.time))
lb <- readHTMLTable("http://www.heritagehealthprize.com/c/hhp/Leaderboard&quot;,
                    which = 1, stringsAsFactors = FALSE)

names(lb) <- make.names(names(lb), unique = TRUE)
names(lb)[1] <- "Rank"
names(lb)[6] <- "Last.Best"

lb[["Rank"]] <- as.numeric(lb[["Rank"]])

for (change in grep("^Δ", names(lb), value = TRUE)) {
    delta <- rep(0L, NROW(lb))
    up <- grepl("^↑", lb[[change]])
    down <- grepl("^↓", lb[[change]])
    delta[up] <- as.numeric(substring(lb[[change]][up], 2))
    delta[down] <- as.numeric(substring(lb[[change]][down], 2))
    lb[[change]] <- delta
}

lb[["Team.Name"]] <-
    sapply(strsplit(lb[["Team.Name"]], "[\r\n]"), function (x) x[1])

lb[["RMSLE"]] <- as.numeric(lb[["RMSLE"]])

lb[["Entries"]] <- as.integer(lb[["Entries"]])

save(lb, start.time,
     file = paste("lb", start.time.num, "RData", sep = "."),
     compress = "xz")

args <- c(list(date = start.time), score=lb[1:10, "RMSLE"])
my.score <- do.call(data.frame, args)

load(file = "history.RData")
score <- rbind(score, my.score)
save(score, file = "history.RData")

PlotIt <- function () {
    plot(score1 ~ date, data = score, type = "s",
         ylim = c(0.39, 0.47),
         main = "Evolution of best score",
         sub = "Heritage Health Price (http://www.heritagehealthprize.com/)",
         xlab = "Date", ylab = "RMSLE score (less is better)")
    abline(h = 0.4, col = "red")
}

pdf(file = "history.pdf",
    title = "Evolution of best score in Heritage Health Price")
PlotIt()
png(file = "history.png", width = 21, height = 21, units = "cm", res = 300)
PlotIt()


Thanked by Sali Mali
 
Sali Mali's image Rank 4th
Posts 292
Thanks 113
Joined 22 Jun '10 Email user

Thanks Allan. Viewing your code educated me on how to remove half the lines from mine. Function to extract the team name now updated.

 
_JeremyA's image Posts 23
Thanks 6
Joined 5 Apr '11 Email user

If the entire leaderboard dataset is made available, I can put the teams into a google motion chart.

~jba

 
Momchil Georgiev's image Rank 80th
Posts 158
Thanks 92
Joined 6 Apr '11 Email user

JeremyA wrote:

If the entire leaderboard dataset is made available, I can put the teams into a google motion chart.

~jba

There is a link at the bottom to download "raw data". Combine that with a quick script to run through all historical boards by day and you'll get what you need.

 

Reply

Flag alert Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?