Data Files

File Name Available Formats
HHP_release1 .zip (7.28 mb)
HHP_release2 .zip (46.58 mb)
SampleEntry .csv (1.61 mb)
Data_Dictionary_release3 .pdf (150.76 kb)
HHP_release3 .zip (52.69 mb)

Note: This competition is NOW CLOSED. DATA IS NO LONGER AVAILABLE FOR ANY REASON.


IMPORTANT NOTE: The information provided below is intended only to provide general guidance to participants in the Heritage Health Prize Competition and is subject to the Competition Official Rules. Any capitalized term not defined below is defined in the Competition Official Rules. Please consult the Competition Official Rules for complete details.

Heritage Provider Network is providing Competition Entrants with deidentified member data collected during a forty-eight month period that is allocated among three data sets (the "Data Sets"). Competition Entrants will use the Data Sets to develop and test their algorithms for accurately predicting the number of days that the members will spend in a hospital (inpatient or emergency room visit) during the 12-month period following the Data Set cut-off date.

HHP_release3.zip contains the latest files, so you can ignore HHP_release2.zip. SampleEntry.CSV shows you how an entry should look.

Data Sets will be released to Entrants after registration on the Website according to the following schedule:

April 4, 2011 Claims Table - Y1 and DaysInHospital Table - Y2
May 4, 2011 All other Data Sets except Labs Table and Rx Table
June 4, 2011 Labs Table and Rx Table

Entrants are welcome to use other data to develop and test their algorithms and entries until 11:59:59 UTC on April 4, 2012 if the data are (i) freely available to all other Entrants and (i) published (or a link provided) to the data in the External Data portion of the Forum within one (1) week of an entry submission using the other data. Entrants may not use any data other than the Data Sets after 11:59:59 UTC on April 4, 2012 without prior approval.

Tables
Each of the Data Sets will be comprised of tables as follows:

  • a. Members Table, which will include:
    • i. MemberID (a unique member ID)
    • ii. AgeAtFirstClaim (member's age when first claim was made in the Data Set period)
    • iii. Sex
  • b. Claims Table, which will include:
    • i. MemberID
    • ii. ProviderID (the ID of the doctor or specialist providing the service)
    • iii. Vendor (the company that issues the bill)
    • iv. PCP (member's primary care physician)
    • v. Year (the year of the claim, Y1, Y2, Y3)
    • vi. Specialty
    • vii. PlaceSvc (place where the member was treated)
    • viii. PayDelay (the delay between the claim and the day the claim was paid for)
    • ix. LengthOfStay
    • x. DSFS (days since first service that year)
    • xi. PrimaryConditionGroup (a generalization of the primary diagnosis codes)
    • xii. CharlsonIndex (a generalization of the diagnosis codes in the form of a categorized comorbidity score)
    • xiii. ProcedureGroup (a generalization of the CPT code or treatment code)
    • xiv. SupLOS (a flag that indicates if LengthOfStay is null because it has been suppressed)
  • c. Labs Table, which will contain certain details of lab tests provided to members.
  • d. RX Table, which will contain certain details of prescriptions filled by members.
  • e. DaysInHospital Tables - Y2 and Y3, which will contain the number of days of hospitalization for each eligible member during Y2 and Y3 and will include:
    • i. MemberID;
    • ii. ClaimsTruncated (a flag for members who have had claims suppressed. If the flag is 1 for member xxx in DaysInHospital_Y2, some claims for member xxx will have been suppressed in Y1).
    • iii. DaysInHospital (the number of days in hospital Y2 or Y3, as applicable).
    These two Tables are intended for use by Entrants to train and validate their algorithms. DaysInHospital Tables are based on the Claims Table with admissions in Y2 or Y3, as applicable. As a privacy measure, any member who spent more than two weeks in hospital is grouped; they are treated as though they spent 15 days in hospital.
  • f. Target - is "DaysInHospital_Y4" but doesn't include DaysInHospital. DaysInHospital data for Y4 are to be filled in by Entrants to produce entries. Seem SampleEntry.csv as an example.

For more information on Competition Data, please see the Official Rules (particularly Rules 5-7), the FAQs or the Forum or send us a message through the "Contact Us" function.