# R questions

« Prev
Topic
» Next
Topic
<1234>
 Posts 253 Thanks 4 Joined 5 Aug '10 Email user 1)What is the best tutorial to learn the relevant parts of R for this competition 2)Is there a function in R to do binary search(note that I found that I can use order to replace the order of lines to have one vector in non decreasing order Members<-read.csv(file="Members.csv",head=TRUE,sep=",") OrderMembers<-Members[order(Members$MemberID),] Now the question is if I want to find the place of MemberID 78832045 in this file by binary search then how do I do it in R. I need to find 22222 in this example because OrderMembers$MemberID[22222]=78832045 but I want to do binary search and use the fact that OrderMembers$MemberID is an increasing sequence. #1 / Posted 24 months ago  Rank 38th Posts 194 Thanks 90 Joined 9 Jul '10 Email user Not sure if this is what you mean, but you can use "which" to find where something is - as in: idx <- which(Members$MemberID=="12345678") note the two equal signs - not one and you can then use: Members[idx,] to show all rows with that MemberID   edited to add: Oh and I have two books on R: R in a nutshell and The R Book I think R in a nutshell would be my first choice, but that might be because I read it first. Also -  the vignettes for various functions (not available for all) are sometimes very helpful.  I would certainly recommend reading all the vignettes for the "Caret" function. #2 / Posted 24 months ago
 Rank 31st Posts 292 Thanks 64 Joined 2 Mar '11 Email user The vignettes are here, towards the bottom: http://cran.r-project.org/web/packages/caret/index.html #3 / Posted 24 months ago
 Anthony Goldbloom (Kaggle) Competition Admin Kaggle Admin Posts 382 Thanks 72 Joined 20 Jan '10 Email user I've said this before, but I think Jeremy's tutorial is really excellent although it is not focussed on HHP. He is hoping to get the opportunity to do an HHP tutorial in the next few months. #4 / Posted 24 months ago
 Posts 253 Thanks 4 Joined 5 Aug '10 Email user Thanks I did not know about the which command and I thought to use a special function for that purpose but it is not exactly what I asked. My question is about finding it faster. The which command can help me to find a member with specific member id but it does not assume nothing about order of the vector. I have Members<-read.csv(file="Members.csv",head=TRUE,sep=",") OrderMembers<-Members[order(Members$MemberID),] After doing it I have for every i OrderMembers$MemberID[i]
 Posts 77 Thanks 29 Joined 28 May '10 Email user Uri Blass wrote: I did not know about the which command and I thought to use a special function for that purpose but it is not exactly what I asked. My question is about finding it faster.[...] library("data.table") does what you want.  I use it all the time. #6 / Posted 24 months ago
 Posts 253 Thanks 4 Joined 5 Aug '10 Email user I do not understand how to use library("data.table") I get the error  there is no package called 'data.table' if I simply type it. #7 / Posted 24 months ago
 Posts 3 Thanks 2 Joined 28 Mar '11 Email user Here you go Uri.  This should help you out.   I wanted to create a list of all the claims per each member id.  So, I wrote a function that accepts a data.table and a vector of the unique member ID's as inputs.  Here is the function:   CreateListOfMemberClaims <- function(dt, ids) {### FUNCTION NAME: CreateListOfMemberClaims# INPUTS: dt = A data.table of the claims for a particular year# ids = The member id's to look for## OUTPUTS: memberList = list of all the claims as a data.frame for each member id.## memberList <- list() for(i in 1:length(ids)) { memberList[[i]] <- data.frame(dt[J(ids[i])]) } return(memberList)}   Here's how you use the function: Notice that I have converted only the data.frame for Y1 claims into a data.table. I called this new  data.table qq for lack of a better name. Now you need to specify what the key of this table is. I chose MemberID as the key. library(data.table) qq <- data.table(claims[which(claims$Year == "Y1"), ]) setkey(qq, MemberID)     #now use the function. You need the data.table and a vector of all the unique ID's t1 <- which(claims$Year == "Y1") uniqueVectorOfIDs <- unique(claims$MemberID[locYXList[[1]]])   y1MemberList <- CreateListOfMemberClaims(qq, uniqueVectorOfIDs)   # For all this to work, simply copy paste the function from above into R, make sure you install the package data.table and then copy paste the  rest of the code. In order to install the package you can type install.packages("data.table") It's a two step process..First you install a package, then you load it by  typing either library(packageName) or require(packageName) #8 / Posted 24 months ago  Posts 253 Thanks 4 Joined 5 Aug '10 Email user Harry,I do not understand this function maybe because I am relatively a beginner in R and I did not use a good tutorial to learn R but if you use the file claims.csv then I could expect to see the name of that file in your code and I do not see it. Note that I do not want to use the claims file at this point of time and I prefer to understand how I can use the other files to make a simple prediction only based on age and gender. I tried not to use binary search with R after not understanding how to use binary search and the result is clearly disappointing(relative to C without binary search) With C I could make a prediction only based on age and gender even with a bad algorithm(that does not do binary search) in less than a minute With R it seems that something like this without binary search is going to take many hours and the result is that I even do not like the idea of using binary search with R because if R is slower relative to C by a factor of 5 or 10 I can live with it but not if it is slower by a factor of more than 1000. #9 / Posted 24 months ago  Rank 38th Posts 194 Thanks 90 Joined 9 Jul '10 Email user The following code installs the needed files (not the claims), makes a simple model based on age and sex, then uses that to predict the next years DaysInHospital. It compute an error score (for what was just predicted) and then makes another prediction for year 4, and saves the file. This took just under 5 seconds (including loading the files) on my computer. aa <- Sys.time() # # Import Files (have your files in a subdirectory called "hhp2" - or just delete the characters "hhp2" # members.all <- read.csv(file="hhp2/Members.csv") hospital.y2 <- read.csv(file="hhp2/DaysInHospital_Y2.csv") hospital.y3 <- read.csv(file="hhp2/DaysInHospital_Y3.csv") hospital.y4 <- read.csv(file="hhp2/Target.csv") hospital.y2$logdays <- log1p(hospital.y2$DaysInHospital) hospital.y3$logdays <- log1p(hospital.y3$DaysInHospital) hospital.y4$logdays <- NA # # Make Right Hand Files # right.a <- merge(hospital.y2, members.all, by.x="MemberID",by.y="MemberID", all.x=TRUE, sort=FALSE) right.b <- merge(hospital.y3, members.all, by.x="MemberID",by.y="MemberID", all.x=TRUE, sort=FALSE) right.c <- merge(hospital.y4, members.all, by.x="MemberID",by.y="MemberID", all.x=TRUE, sort=FALSE) # Compute a simple model lm.1.log.based <- lm(logdays ~ AgeAtFirstClaim + Sex, data=right.a) pred.b.log.based <- predict(lm.1.log.based, right.b) pred.b.log.based <- expm1(pred.b.log.based) # Compute RMSLE err  <- function(obs, pred) sqrt( 1/length(obs) * sum((log(pred+1) - log(obs+1))^2)) err(pred.b.log.based, right.b$DaysInHospital) # Predict for unknown year four pred.c.log.based <- predict(lm.1.log.based, right.c) pred.c.log.based <- expm1(pred.c.log.based) # Place in target file and save hospital.y4$DaysInHospital <- pred.c.log.based hospital.y4 <- hospital.y4[,1:3] write.csv(hospital.y4,file="my.submission.csv",quote=FALSE,row.names=FALSE) ab <- Sys.time() aa-ab Thanked by Uri Blass , pim# , Heuristic , Wei-shou Hsu , Vtaylor , and jujung #10 / Posted 24 months ago
 Posts 253 Thanks 4 Joined 5 Aug '10 Email user Thanks for the code. It works to generate a submission based on age and gender but the problem is that I do not understand exactly what it does. There are many functions that I simply did not know about them in R and I do not know where to learn them. I searched in google for log1p and for expm1 and I understood them I also searched in google for merge by and understood that it simply build a new table that has all the ages and gender(I guess that merge use a binary search in order to do it efficiently) I do not understand exactly what does lm and predict(searching in google I found that it is about regression and linear model but I prefer to see also a mathematical formula to be sure what it does exactly). Thanked by lenne20 #11 / Posted 24 months ago
 Posts 1 Joined 23 May '11 Email user @Uri: As you correctly noticed, lm is the linear regression function. You can read about its mathematical coverage here: predict is another R function that uses the regression model in order to produce predicted values that are subsequently compared to real values to calculate the accuracy of the model. Refer to the R reference to know more or simply type help(function_name) in the console. P.S. Not that I'm a huge R expert either :) #12 / Posted 24 months ago
 Posts 253 Thanks 4 Joined 5 Aug '10 Email user Thanks for the code of chris. It clearly helped me to understand how it is possible to use R with vectors and not with single numbers. I will probably not use the lm function because I dislike using formulas that I am not sure exactly how they work. The problem is not what is linear regression but how do you calculate it. Gender and Age are strings in the file so it is not clear how lm use them and if lm simply translates them to integers then it is not clear to me how you translate them. I prefer to use my own functions when I know exactly what I do. #13 / Posted 24 months ago
 Rank 31st Posts 292 Thanks 64 Joined 2 Mar '11 Email user Use "?" and "??" to read the manual files for a given function. e.g. ?lm ??lm The lm command in R is extremely well-tested, and would probably take you many hours of effort to replicate. Furthermore, it has interfaces to other R functions, such as "summary" and "plot" that you may find useful in the future. If you're really suspicious of code written in R you can usually type the name of the function with no parenthesis to view it's source code, e.g: lm This is a good way to learn about programming conventions in R. I suggest you compare the predictions from R's lm command to predictions from your own regression function to assure yourself that they match. #14 / Posted 24 months ago
 Rank 31st Posts 292 Thanks 64 Joined 2 Mar '11 Email user Uri Blass wrote: Gender and Age are strings in the file so it is not clear how lm use them and if lm simply translates them to integers then it is not clear to me how you translate them. Gender is actually a factor, not a string. Factors are a rather unqiue data type used to represent categorical data.  Age can also be represented as an ordered factor, but I suggest you find a way to convert it to a continuous variable, as that makes intuitive sense. #15 / Posted 24 months ago
<1234>