 1)What is the best tutorial to learn the relevant parts of R for this competition 2)Is there a function in R to do binary search(note that I found that I can use order to replace the order of lines to have one vector in non decreasing order Members<-read.csv(file="Members.csv",head=TRUE,sep=",") OrderMembers<-Members[order(Members$MemberID),] Now the question is if I want to find the place of MemberID 78832045 in this file by binary search then how do I do it in R. I need to find 22222 in this example because OrderMembers$MemberID[22222]=78832045 but I want to do binary search and use the fact that OrderMembers$MemberID is an increasing sequence.

Not sure if this is what you mean, but you can use "which" to find where something is - as in: idx <- which(Members$MemberID=="12345678") note the two equal signs - not one and you can then use: Members[idx,] to show all rows with that MemberID   edited to add: Oh and I have two books on R: R in a nutshell and The R Book I think R in a nutshell would be my first choice, but that might be because I read it first. Also -  the vignettes for various functions (not available for all) are sometimes very helpful.  I would certainly recommend reading all the vignettes for the "Caret" function.
 The vignettes are here, towards the bottom: http://cran.r-project.org/web/packages/caret/index.html
 I've said this before, but I think Jeremy's tutorial is really excellent although it is not focussed on HHP. He is hoping to get the opportunity to do an HHP tutorial in the next few months.
 Thanks I did not know about the which command and I thought to use a special function for that purpose but it is not exactly what I asked. My question is about finding it faster. The which command can help me to find a member with specific member id but it does not assume nothing about order of the vector. I have Members<-read.csv(file="Members.csv",head=TRUE,sep=",") OrderMembers<-Members[order(Members$MemberID),] After doing it I have for every i OrderMembers$MemberID[i]
 library("data.table") does what you want.  I use it all the time.
 I do not understand how to use library("data.table") I get the error  there is no package called 'data.table' if I simply type it.
 Here you go Uri.  This should help you out.   I wanted to create a list of all the claims per each member id.  So, I wrote a function that accepts a data.table and a vector of the unique member ID's as inputs.  Here is the function:   CreateListOfMemberClaims <- function(dt, ids) {### FUNCTION NAME: CreateListOfMemberClaims# INPUTS: dt = A data.table of the claims for a particular year# ids = The member id's to look for## OUTPUTS: memberList = list of all the claims as a data.frame for each member id.## memberList <- list() for(i in 1:length(ids)) { memberList[[i]] <- data.frame(dt[J(ids[i])]) } return(memberList)}   Here's how you use the function: Notice that I have converted only the data.frame for Y1 claims into a data.table. I called this new  data.table qq for lack of a better name. Now you need to specify what the key of this table is. I chose MemberID as the key. library(data.table) qq <- data.table(claims[which(claims$Year == "Y1"), ]) setkey(qq, MemberID)     #now use the function. You need the data.table and a vector of all the unique ID's t1 <- which(claims$Year == "Y1") uniqueVectorOfIDs <- unique(claims$MemberID[locYXList[[1]]])   y1MemberList <- CreateListOfMemberClaims(qq, uniqueVectorOfIDs)   # For all this to work, simply copy paste the function from above into R, make sure you install the package data.table and then copy paste the  rest of the code. In order to install the package you can type install.packages("data.table") It's a two step process..First you install a package, then you load it by  typing either library(packageName) or require(packageName)
 Harry,I do not understand this function maybe because I am relatively a beginner in R and I did not use a good tutorial to learn R but if you use the file claims.csv then I could expect to see the name of that file in your code and I do not see it. Note that I do not want to use the claims file at this point of time and I prefer to understand how I can use the other files to make a simple prediction only based on age and gender. I tried not to use binary search with R after not understanding how to use binary search and the result is clearly disappointing(relative to C without binary search) With C I could make a prediction only based on age and gender even with a bad algorithm(that does not do binary search) in less than a minute With R it seems that something like this without binary search is going to take many hours and the result is that I even do not like the idea of using binary search with R because if R is slower relative to C by a factor of 5 or 10 I can live with it but not if it is slower by a factor of more than 1000.
 Posts 1 Joined 23 May '11 Email user @Uri: As you correctly noticed, lm is the linear regression function. You can read about its mathematical coverage here: predict is another R function that uses the regression model in order to produce predicted values that are subsequently compared to real values to calculate the accuracy of the model. Refer to the R reference to know more or simply type help(function_name) in the console. P.S. Not that I'm a huge R expert either :) #12 / Posted 24 months ago
 Thanks for the code. It works to generate a submission based on age and gender but the problem is that I do not understand exactly what it does. There are many functions that I simply did not know about them in R and I do not know where to learn them. I searched in google for log1p and for expm1 and I understood them I also searched in google for merge by and understood that it simply build a new table that has all the ages and gender(I guess that merge use a binary search in order to do it efficiently) I do not understand exactly what does lm and predict(searching in google I found that it is about regression and linear model but I prefer to see also a mathematical formula to be sure what it does exactly).
 @Uri: As you correctly noticed, lm is the linear regression function. You can read about its mathematical coverage here: predict is another R function that uses the regression model in order to produce predicted values that are subsequently compared to real values to calculate the accuracy of the model. Refer to the R reference to know more or simply type help(function_name) in the console. P.S. Not that I'm a huge R expert either :)
 Rank 31st Posts 292 Thanks 64 Joined 2 Mar '11 Email user Uri Blass wrote: Gender and Age are strings in the file so it is not clear how lm use them and if lm simply translates them to integers then it is not clear to me how you translate them. Gender is actually a factor, not a string. Factors are a rather unqiue data type used to represent categorical data.  Age can also be represented as an ordered factor, but I suggest you find a way to convert it to a continuous variable, as that makes intuitive sense. #15 / Posted 24 months ago
