The software tools within the FHIN consortium are based on the open-source OHDSI software tools. The coding system for all healthcare data in the FHIN consortium is based on the OHDSI vocabularies which can be online consulted via their Athena tool. FHIN is mainly relying on additional tools on top of the existing OHDSI tools. These additional tools were originally developed by the RADar innovation center and further refined within the FHIN consortium.
The FHIN software tools follow a logic where data is sequentially processed from the care provider's local source data, via ETL tooling up to the final OMOP common data model where it can serve multiple purposes in light of data-driven healthcare. This is schematically represented in following 'Data Centralisation Scheme'.
A Unique System for Guaranteed Harmonization
Simply speaking, harmonization is understood as a method that maximizes the probability of different persons at different times being able to map source concept descriptions from different hospitals, into an unambiguously defines target concept. For example, "AF", "A.F.", "AFib", "Attr. Fib." are all mapped into the OMOP concept representing "Atrial Fibrillation". This is a far from trivial task.
The method proposed is based on pre-defined reference-datasets per disease. Such reference-datasets are constructed from a transmural patient-centric point of view: from the first symptom to the final disease outcome. Each step in this clinical process forms the context of a data point that is required to administer top-quality care and medicine. All hospitals in the FHIN consortium map their source data towards such predefined reference-datasets.
Source Data
It all starts from the data-storing systems in a hospital (Source Data Silo's). There are two main systems for this:
A database is a collection of a large number of tables (similar to the familiar Excel tables) which hold some sort of relation. Large number means 1000's. The number of databases in a hospital easily counts hundreds.
The second major data-storage system in a hospital is file storage on disks, such as the image produced at radiology.
ETL
All this data has to be Extracted from these sources, transferred to the correct code from the standardized OMOP vocabulary and finally be loaded in the correct table of the OMOP Common Data Model. The process of "Extract, Transfer, Load" is abbreviated as ETL. This is the work horse of data processing. OHDSI offers tools to design the ETL process (White Rabbit and Rabbit In A Hat). After designing the ETL process, it needs to be implemented. The RADar innovation center developed Rabbit In A Blender (RiaB) for this purpose. It is a unique tool, capable of transferring + 8 billion data points from hospital's data sources in a fully automated manner in less than half an hour. It means that the tool is capable of providing near real-time data harmonization and standardization.
The RiaB tool requires two inputs:
The connection with the hospital source systems (more technically: queries on the source databases)
A mapping table, which is nothing more than a long list which provides the OMOP standardized code for each data points that is daily used in the hospital.
Advanced methods to deal with unstructured data
Large Language Models (LLMs) represent data as vectors. This results in computers having semantic understanding of human text and the ability to process it.