Health Language Blog

3 Reasons Data Normalization is Critical for Data Warehousing

Posted on 09/26/14


The “big data” technology trend that has swept across various industries is also impacting the healthcare sector. 

With big data, organizations grapple with the task of analyzing extremely large data sets and identifying important patterns. The healthcare industry’s big data focal point is the clinical data repository, or CDR. A CDR functions as a data warehouse, pulling together patient-oriented health data from a variety of IT systems. Use cases include monitoring the caregiving process, collecting data for quality programs, creating predictive models, determining the cost of care and population health management.

Data warehoused in a health delivery organization’s CDR could include everything from diagnostic data represented by text strings or ICD-9/10 codes to problem lists encoded using the SNOMED CT terminology sourced from various electronic health record systems. And therein lies the problem: the diversity of terminologies used within these disparate data types complicates the data analytics task. Many big data initiatives are falling short due to the complexities in normalizing all this data into a format that renders it useful.  

Data normalization is all about mapping. Specifically, normalization provides semantic mapping between disparate reference terminologies, classification systems, localized proprietary codes that have meaning only for internal applications, and even unstructured blocks of text. Data normalization addresses the problem of finding ‘meaning’ in an information model that is semantically fragmented.  

Here are three ways data normalization can bolster a data warehouse:

1. Bridging the Gap Between Clinician Terms

Take, for example, a CDR that stores diagnosis data as text strings. A simple query asking for all the instances in which “diabetes mellitus” is the admitting diagnosis will retrieve only records in which the diagnosis was recorded as the text string “diabetes mellitus.” Such a query, however, will not recognize “diabetes,” “IDDM,” “AODM,” “DM,” and “adult onset diabetes”— all legitimate terms used by clinicians. This form of unrecognized equality is one of the primary roadblocks to retrieving meaningful information from these types of data warehouses. A normalization solution could automatically identify all possible terms used to describe “diabetes mellitus.”  

2. Improving Data Queries

To illustrate this, suppose that patient records are stored by ICD-9-CM (diagnosis) codes. A researcher seeking to retrieve all records containing diagnoses of “Asthma” might begin by looking up “asthma” in an ICD-9 code book. Asthma is classified in section 439, and three 5-digit ICD codes that begin with 439 involve asthma.  The researcher might design a CDR query requesting records containing “ICD 439.” Unfortunately, the ICD hierarchies were not designed with clinical utility in mind, and several forms of asthma are classified in ICD sections other than 439. The researcher would fail to retrieve records that contain diagnoses of these other forms of asthma, including platinosis and drug-induced asthma. A data normalization solution would enable the researcher to identify all of the “asthma” ICD-9 codes using any of the words caregivers use to express them.

3. Reconciling Claims and Clinical Data

A healthcare researcher striving for an accurate picture of health across identified patient populations relies on the ability to normalize clinical and claims data into standardized terminologies. The integration of clinically sourced and claims data ranks among the toughest challenges in the field of data sharing. Claims and clinical data have been collected separately over the years and maintained in separate IT systems, making them hard to reconcile. Data normalization, however, can rationalize the differences between claims and clinical data, providing the ability to match vastly different terminology and coding schemas to a system of shared meaning.

Overcoming Challenges

Data normalization can assist in the transformation of stored data into an integrated, focused repository accessible to users across your enterprise.

What are your top data warehousing challenges? Have you considered adopting a data normalization solution? Leave your comments below.

data normalization

Topics: data normalization

About the Author

Jason D. Wolfson is the Vice President of Product Management. From upstream strategy identification to the planning & execution processes required to link strategy to operations, Jason is responsible for the systemic, holistic business management of a portfolio of solutions within Wolters Kluwer, Clinical Solutions.