I hate strings and I wonder if there is a program that simply translate all the strings that we have in the data to integers when different strings get different integers(when the program treat both 0234 and 234 as the same 234 integer).
A missing number in a column can be translated to -1(or to a different number that is not in the column(if the column include also -1 when the program tell me that some number means missing value)
The program should also generate files that explain the meaning of the numbers in every column(except columns that include only numbers)
for example in 6th column of claims.csv it may generate file with the following content
claims.csc
Anesthesiology=0,Diagnostic Imaging=1,Emergency=2,...
I think that it is going to be easier if people who participate in this conmpetition do not need to deal with strings and the need to deal with strings is part of the reason that so far I did not make a submission in this contest.
Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?

with —