Zach:
Thanks - I am going to give it a shot again in a week and might take you up on that....
Uri:
I am by no means an R expert and some of this stuff isn't easy for me - I saw your question, but your data appears to be organized different than mine. I have my data split up into year - by the hospital tables. It appears you are organizing into one master
"table" for lack of a better word. I could do it in theory, but it would take me a while. It took my quite some time to make my code.
I would suggest you start by cleaning the data - get rid of the spaces and replace the empty strings:
for example (assuming the claims are in a data.frame called claims.all):
claims.all$Specialty <- gsub(" ", "", claims.all$Specialty)
claims.all$PlaceSvc <- gsub(" ", "", claims.all$PlaceSvc)
Those are just two examples - you will have to spend a decent amount of time on this step before going to the next. There are a lot of decisions to be made here - and some of them I am probably should have done differently.
Allan wrote an excellent post on this (some of it is outdated as it deals with release one of the data, but 95% of it I think still works):
http://www.cybaea.net/Blogs/Data/Getting-started-with-HHP.html
Then - use the following function to break it up by year.
getCleanClaims <- function(x="Y1", y=hospital.y2) {
sand <- claims.all[claims.all$Year==x,]
all.in <- sand$MemberID %in% y$MemberID
sand <- sand[all.in,]
}
Read in the hospital files:
hospital.y2 <- read.csv(file="hhp2/DaysInHospital_Y2.csv")
hospital.y3 <- read.csv(file="hhp2/DaysInHospital_Y3.csv")
hospital.y4 <- read.csv(file="hhp2/Target.csv")
hospital.y2$logdays <- log1p(hospital.y2$DaysInHospital)
hospital.y3$logdays <- log1p(hospital.y3$DaysInHospital)
hospital.y4$logdays <- NA
hospital.y2$bindays <- ifelse(hospital.y2$DaysInHospital > 0, 1, 0)
hospital.y3$bindays <- ifelse(hospital.y3$DaysInHospital > 0, 1, 0)
hospital.y4$bindays <- NA
Use the function from above:
clean.1 <- getCleanClaims("Y1", hospital.y2)
clean.2 <- getCleanClaims("Y2", hospital.y3)
clean.3 <- getCleanClaims("Y3", hospital.y4)
Now you have three seperate files - one for each year - and you know that you are matched up with all the members being in each)
Then it appears you are trying to get counts by PrimaryCondition - I don't know/remember how much cleaning I had to do to that - so if you don't clean up the empty strings and such - you will run into problems. But once they are cleaned - you can use something
like:
makeTab <- function(x,y) {
temp <- table(x,y)
class(temp) <- "matrix"
temp <- as.data.frame(temp, stringsAsFactors = FALSE)
temp <- cbind(row.names(temp),temp)
temp[,1] <- as.numeric(as.character(temp[,1]))
colnames(temp) <- c("MemberID",colnames(temp)[-1])
temp
}
# There is probably a better way, but I couldn't figure out how
#
# Assuming you have the members file in a data.frame called "members.all"
#
right.a <- merge(hospital.y2, members.all, by.x="MemberID",by.y="MemberID", all.x=TRUE, sort=FALSE)
right.b <- merge(hospital.y3, members.all, by.x="MemberID",by.y="MemberID", all.x=TRUE, sort=FALSE)
right.c <- merge(hospital.y4, members.all, by.x="MemberID",by.y="MemberID", all.x=TRUE, sort=FALSE)
#
# Now make another short function to make it shorter...
#
mergeIt <- function(x,y=temp) { merge(x, y, by.x="MemberID",by.y="MemberID", all.x=TRUE, sort=FALSE)}
#
# Expand out the Conditions as Columns Primary ConditionGroup
#
temp <- makeTab(clean.1$MemberID, clean.1$PrimaryConditionGroup)
right.a <- mergeIt(right.a)
temp <- makeTab(clean.2$MemberID, clean.2$PrimaryConditionGroup)
right.b <- mergeIt(right.b)
temp <- makeTab(clean.3$MemberID, clean.3$PrimaryConditionGroup)
right.c <- mergeIt(right.c)
Also - get:
"R in a Nutshell"
http://www.amazon.com/Nutshell-Desktop-Quick-Reference-OReilly/dp/059680170X
You probably also will want to look at the packages plyr and reshape.
I have spent a whole bunch of time cleaning the data - and will need to spend a whole bunch more time.
It is the boring part of this, but necessary.
with —