Informatics, digital & computational pathology
Laboratory information systems
Data representation and communication standards

Topic Completed: 6 May 2021

Minor changes: 6 May 2021

Copyright: 2021,, Inc.

PubMed Search: Data [title] communication [title] standards

Ngoc Tran, M.D., M.S.
Chris Williams, M.D.
Page views in 2021 to date: 27
Cite this page: Tran N, Williams C. Data representation and communication standards. website. Accessed May 15th, 2021.
Definition / general
  • Laboratory information is stored and transferred using a variety of formats and standards
  • Forethought about what information is captured, the format used to capture data and how it is organized can help maximize the future value of laboratory information
Essential features
  • Capturing information as structured data, utilizing common code sets when possible, facilitates future research, analysis and collaboration (PLoS Comput Biol 2016;12:e1005097)
  • Laboratory instrumentation and information systems are increasingly operating within the larger ecosystems of enterprise information systems, increasing the complexity of networking and connectivity
  • Health Level Seven (HL7) is the common messaging standard used for intralaboratory and interlaboratory communication, as well as healthcare information exchange at the enterprise level
  • Binary data: the native data representation for digital systems as a collection of 1s and 0s
  • String: a series of binary data representing alphanumeric characters, such as a, b, c, etc.
  • Transmission Control Protocol (TCP) / Internet Protocol (IP): a collection of protocols that govern how data is transferred across the internet
  • American Society for Testing and Materials (ASTM): in the context of pathology, ASTM generally refers to the E1381 and E1394 standards that define the interface used by laboratory instruments
  • HL7: a collection of standards designed specifically for the exchange of healthcare information
  • Structured data: data stored in an organized and unambiguous format, generally machine readable
  • Unstructured data: data stored as blocks of text, human readable but not easily interpreted by a machine and may be difficult to use for analytics
  • Binary data
    • 1s and 0s
      • Values stored in computers as electronic signals that are either on (having value 1) or off (having value 0)
      • These 2 digits, 1 and 0, well represent this 2 state electronic system
    • Bits / bytes:
      • Each 0 or 1 is a bit (binary digit)
      • Byte is a set of 8 bits or a storage measurement equal to 8 bits
        • Bytes have no inherent context but must be interpreted by the computer as instructions, numeric data, textual data, etc.
        • i.e. the byte 011110 0001 can represent a variety of concepts including the decimal value 97 or the letter a
  • Numeric data
    • Integers: whole numbers without fractional parts; can be positive, negative or zero (e.g. -3, 0, 202)
    • Unsigned: an unsigned number is always positive and can include zero (e.g. age)
    • Signed: a signed number can be positive or negative (e.g. -7.2, 13)
    • Floating point: contains a decimal point (e.g. 1.034, 9.81)
    • Numbers can be represented by varying number of bytes, depending on the range of values required
  • Textual data
    • American Standard Code for Information Interchange (ASCII)
      • 8 bit character coding scheme originally developed for use with teletype machines
      • Typical computer code, used in most personal computers
      • Example: this byte, 0010 1011, represents a decimal number of 43 and an ASCII character of a plus sign (+)
      • However, ASCII (256 values) does not have enough codes to represent characters from many international alphabets, while Unicode, using 16 bits or more, can accommodate much larger character sets
      • UTF-8 is a Unicode codeset that is backward compatible with ASCII and the most commonly encountered encoding in many applications
  • Unstructured data
    • Is not structured via predefined data formats or is not stored in a fixed record length format
    • Examples: free text comments or notes in pathology reports, free style diagnosis texts, paper based test orders
    • Can be nontextual data, such as specimen images, microscopic images or data charts (e.g. flow cytometry)
    • Can be mined for research purposes using natural language processing but searches may be difficult to construct, results may be unreliable and processing large data sets can be computationally intensive
  • Structured data
    • Data is confined to a predefined, constrained vocabulary
    • May be numeric or textual data, often stored as unsigned integers (machine readable) that are mapped to text labels (human readable)
    • Facilitates sharing information between information systems, acts as a shared vocabulary (see Coding) (PLoS Comput Biol 2016;12:e1005097)
    • Facilitates searching for research purposes, queries are user friendly and large data sets can be searched quickly and reliably
  • Data structures
    • It is often necessary to aggregate numerous observations for analysis
    • Organization or lack thereof may greatly influence what type of analysis may be performed and the efficiency of processing (Int J Med Inform 2017;97:293)
    • Data may need to be transformed prior to processing into a format defined by the target algorithm
    • Common structures include:
      • Arrays
        • Simple series of data of the same data type
        • Often a collection of primitive data types
          • i.e. strings, numeric, Boolean etc.
        • Often 1 dimensional but may have 2 or more dimensions
      • Objects
        • Data points composed of multiple constituent elements
        • Elements may be primitive data types, arrays or even other objects
        • Series of objects may also be collected into an array of objects
Digital images
  • Pixel (or picture element)
    • Fundamental, constituent element of a digital image
    • Represents the color and intensity of light at each location within an image
    • Typically described by a value 0 - 255 (1 byte) for a red, green and blue channel
    • Optional alpha channel may be included for each pixel in image formats that support translucency (resulting in 4 bytes/pixel in an image)
  • Arrays of pixels (J Pathol Inform 2020;11:23)
    • Digital image is represented by a 2 dimensional array of pixels
    • Higher pixels counts, i.e. 800 x 600 (SVGA) versus 1360 x 768 (HD), provides higher resolution
    • Higher resolution comes with the tradeoff of higher storage and bandwidth requirements
  • Compression
    • Images can be compressed to a fraction of their original size with compression algorithms for storage and transfer and then uncompressed as needed for display (J Am Med Inform Assoc 2008;15:794)
    • Compression can result in a file size less than 1% of the original
    • Lossy compression
      • Sacrifices quality to achieve maximal compression
      • Uncompressing the image results in a similar, although not exact, representation of the original
      • Degree of compression may be adjusted based on needed quality
      • JPEG is the ubiquitous lossy format most often encountered
    • Lossless compression
      • Uncompressing from a lossless format results in an exact bitwise copy of the original image
      • Compression ratios are less substantial than comparable lossy algorithms
      • PNG, TIFF, GIF and JPEG 2000 are examples of lossless formats
      • Whole slide image formats generally use lossless compression to maintain fidelity
      • Images intended for image analysis should likely be stored with lossless formats, compression artifact could potentially affect algorithm performance
Diagrams / tables

Contributed by Chris Williams, M.D.
Highlighting pixels

Highlighting pixels

PNG (lossless)

PNG (lossless)

JPG (lossy) high quality

JPG (lossy) high quality

JPG (lossy) low quality

JPG (lossy) low quality

Images hosted on other servers:

Tagged diagnostic data within CCD

Untidy dataset and tidy equivalent

Relative pixel density and size of display resolution

  • Coding systems are common examples of structured data in healthcare
  • Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) and Logical Observation Identifiers Names and Codes (LOINC) are particularly useful for searching data in the Laboratory Information System (LIS), if available
  • International Classification of Diseases (ICD)
    • Coding scheme to capture diagnoses (ICD-10-CM) (AJNR Am J Neuroradiol 2016;37:596)
    • Also can capture procedures performed in the healthcare setting (ICD-10-PCS)
    • Generally used for billing but can also aid the laboratory by providing context as to why a test is being performed
    • ICD-10 was first used in 1994, has had many versions and will be replaced by ICD-11, which is fully electronic, easier to implement and effective on 01/01/2022 (WHO: International Statistical Classification of Diseases and Related Health Problems [Accessed 6 April 2021])
    • ICD for oncology (ICD-O):
      • Primarily used in tumor and cancer registries
      • ICD-O-3.2 is the latest version
      • Morphology section of ICD-O is incorporated into SNOMED
    • ICDs are managed and published by WHO
  • SNOMED CT (SNOMED International [Accessed 6 April 2021])
    • Originally developed by the College of American Pathologists, now maintained by the International Health Terminology Standards Development Organization (IHTSDO)
    • Hierarchical ontology classifying diseases, clinical findings, procedures and more
    • Multilingual
    • Key feature is compositional approach, making SNOMED CT easily extensible
      • Terms can be combined
      • Modifiers can be added
    • One of the most comprehensive medical terminology systems (> 1 million terms)
  • Current Procedural Terminology (CPT) (AMA: CPT [Accessed 6 April 2021])
    • Classification of procedures performed by physicians
    • Required for reimbursement by government and health insurance companies in the U.S.
    • Developed and maintained by the American Medical Association (AMA)
  • LOINC (Arch Pathol Lab Med 2020;144:478)
    • “A common language (a set of identifiers, names and codes) for identifying health measurements, observations and documents” (LOINC: About LOINC [Accessed 6 April 2021])
    • Has 6 parts: component, property, time, specimen, scale and method
    • Examples:
      • Blood glucose → GLUCOSE:MCNC:PT:BLD:QN:
      • Serum glucose → GLUCOSE:MCNC:PT:SER:QN:
      • Urine glucose concentration → GLUCOSE:MCNC:PT:UR:QN:
      • Urine glucose by dipstick → GLUCOSE:MCNC:PT:UR:SQ:TEST STRIP
    • Developed and maintained by Regenstrief Institute
  • Diagnosis Related Groups (DRG)
    • Set of patient classes that relate a patient’s case mix complexity to the resource demands and hospital associated costs (Centers for Medicare & Medicaid Services: Design and development of the Diagnosis Related Group (DRG) [Accessed 6 April 2021])
    • Basic unit of payment in Medicare’s hospital reimbursement system
    • Some health insurance companies also have adopted the DRG system
    • Medicare reimburses hospitals a flat fee based on the assigned Medicare severity DRG (MS-DRG) in order to encourage cost savings by hospitals
    • MS-DRG are assigned based on the ICD diagnosis and procedure codes
    • DRG system initially developed at Yale University in the 1970s for statistical classification of hospital cases
    • Due to the financial implications, getting an accurate DRG, supported by the appropriate ICD codes, is of critical importance
      • Undercoding an admission results is lost revenue
      • Upcoding of an admission puts the institution at risk for financial penalties
  • Networking
    • Transmission Control Protocol and the Internet Protocol (TCP / IP)
      • Foundational protocols used to transmit data across the internet
      • Describes an end to end communication channel capable of spanning numerous, disparate networks, including encoding and packetizing of data, addressing, routing, transmittal and acknowledgement of recipient of each packet
      • Many, if not all, newly installed laboratory instruments will come with a Network Interface Card (NIC) to be connected directly to a network or will need to be attached to a computer for network connectivity of some kind
      • Digital pathology is an emerging area where networking requirements should be considered carefully during planning and implementation
      • Potential networking needs include admit / discharge / transfer (ADT), orders and results feeds, quality control (QC) monitoring and direct vendor connections for monitoring and remote support
    • IP address / ports
      • Each device attached to computer networks, both wired and wireless, must have a unique IP address in order to transmit or receive information
      • This IP address is analogous to a post office box and is required to ensure information is delivered to the intended recipient
      • In order to facilitate multiple simultaneous connections from the same IP address, each application can communicate using a unique port number, 1-65535
      • Commonly used ports include port 80 for Hypertext Transfer Protocol (HTTP) and port 443 for Hypertext Transfer Protocol Secure (HTTPS) communication
      • During the implementation phase of new projects with networked devices, it is often helpful to create a network diagram to visually depict the various devices with the IP address and required ports for each service
    • Firewalls
      • May be either a physical device or a software application that monitors network traffic and is capable of filtering or blocking unwanted network traffic
      • May be configured to let all traffic through by default and block only user identified traffic
      • Conversely, a firewall may be configured to block all traffic by default and allow only specifically designated traffic
      • Firewall rules are generally configured to filter traffic based on IP or port addresses; however, deep packet inspection to allow finer granularity is possible
      • Laboratory will often need to work with information technology (IT) when installing new systems to ensure firewalls are configured correctly to allow required communication in a secure manner
    • Virtual private network (VPN)
      • Creates a "secure tunnel" to access resources, across the internet, located on an otherwise inaccessible private network
      • Uses encryption to protect information traversing public networks
      • May be used for remote access for individual users, such as accessing the hospital network from home
      • Can be used to link multiple geographically diverse buildings, clinics, etc. onto a single, unified virtual network for securely sharing and accessing information resources
      • Accessing a network via a VPN bypasses many firewall protections, which is intended for authorized users but may be detrimental if unauthorized users are able to gain access to the VPN
  • Interfaces
    • Internal: information systems within a laboratory or within a healthcare system require the ability to exchange information (Am J Clin Pathol 1996;105:S48)
      • American Society for Testing and Materials (ASTM)
        • In the context of laboratory instrument communication, refers to standards maintained by ASTM international (ASTM International [Accessed 6 April 2021])
        • Standards E1381 and E1394 describe low level communication protocols that could be used to transfer information between clinical laboratory equipment and computer systems
        • These standard have been superseded by Clinical & Laboratory Standards Institute (CLSI) LAB01 (CLSI [Accessed 6 April 2021])
        • Originally used to connect instruments directly to an LIS and later to middleware over a serial connection
        • Standards have been adopted to function over a TCP / IP network connection
        • While no longer the preferred interface for newer instruments, this interface may be encountered in legacy hardware and many interfaces still support this as an option to maintain backward compatibility
      • HL7
        • Please refer to the HL7 topic for more detail
        • HL7 messaging can be exchanged via TCP / IP networking, serial connections and file transfer
        • Adopted by healthcare IT vendors, both within and outside the laboratory
        • Specific message types of note:
          • Admit / discharge / transfer (ADT)
            • Most common message type found on the message bus
            • Tracks and updates patient, demographic and billing information as patients enter and transverse the system
            • Required functionality for most information systems to ensure patient information remains current
          • Orders / results
            • Order message (ORM) and observation result - unsolicited (ORU) are HL7 orders and results message types, respectively and of particular interest to the LIS
            • Order messages are matched to patient information received via ADT
            • Following testing, the result message will be returned to the originating system, as well as any other system listening for the result message type
      • Digital imaging and communications in medicine (DICOM)
        • A standard for storage and transmitting medical images, most commonly in radiology
        • Universally supported by picture archiving and communication systems vendors
        • DICOM Supplements 122 and 145 have been added to accommodate whole slide imaging (J Pathol Inform 2018 Nov;9:37)
        • Wide scale adoption of DICOM within the digital pathology ecosystem has been slow but vendor support is increasing and demand to consolidate medical imaging into a single repository is likely to drive further adoption
      • Enterprise message bus
        • Many organizations have a single messaging bus or integration engine, similar to middleware in the laboratory, connecting the various information systems, as opposed to point to point connections
        • Usually exchanges HL7 messages; however, other communication can be accommodated
        • Facilitates adding new systems to the existing milieu with minimal effort
    • External: in addition to internal communication, the need also exists and is now federally mandated to exchange healthcare information with the government and among healthcare systems electronically
      • Continuity of Care Document (CCD) (Am J Public Health 2012;102:e1)
        • Extensible Markup Language (XML) based standard and a joint effort of HL7 and ASTM
        • Promotes interoperability of clinical data
          • Physicians can send electronic health information to other providers without loss of meaning
          • Improvement of patient care
      • HL7 2.5 reportable conditions
        • HL7 version 2.5.1 is widely used as a messaging standard protocol that helps transmit laboratory results and clinical information from one party to another
          • Transmitting reportable conditions from the laboratory to public health agencies is primarily accomplished via HL7 2.5.1
      • College of American Pathologists (CAP) checklists / cancer registries / cancer staging, etc.
        • CAP checklists used in 3 formats, including dictation from the case summary template, the use of Word macros and the use of the computer readable XML case summary format known as CAP electronic Cancer Checklists (eCCs) used before 2019 and replaced by Structure Data Capture (SDC) in 2019) (CAP: electronic Cancer Checklists (eCCs) [Accessed 6 April 2021])
        • SDC based pathology reports can be transmitted automatically to registries in a well defined structure and computer readable format that does not require manual parsing or the use of natural language processing approach
        • Each section, question and answer in an SDC template has a unique identifier (the ID known as a Ckey or composite key)
        • SDC IDs are mapped to ICD-O-3 codes, to the North American Association of Central Cancer Registries (NAACCR) data item fields and codes and to SNOMED codes (NAACCR: Laboratory Electronic Reporting for Pathology [Accessed 6 April 2021])
        • This mapping work enables interoperability between registries and clinical research; this interoperability will be unavailable for text based reports because of not being readily transformed into registry conformant codes
  • Security
    • All information sets containing protected health information are governed by HIPAA (CDC: Health Insurance Portability and Accountability Act of 1996 (HIPAA) [Accessed 6 April 2021])
    • Information systems should encompass access controls, such as authentication, authorization (role based access) and audit capture
    • In addition to other safeguards, data should be protected in transit between information systems
    • Specific encryption schemes are not mandated but any encryption used should meet the Federal Information Processing Standard (FIPS) 140 standard (CSRC: FIPS 140-3 - Security Requirements for Cryptographic Modules [Accessed 6 April 2021])
    • Firewalls, as described above, are another secure measure to prevent or at least deter unauthorized access of data in transit
    • Healthcare IT vendors should be able to provide proof, preferably independent third party verification, that their solutions meet the current applicable standards
    • Security should be an initial concern when considering new laboratory initiatives

Healthcare data standards


Board review style question #1
Which of the following is structured data?

  1. Comment or note in a pathology report
  2. CPT code
  3. Paper based test order
  4. Specimen barcode
  5. Specimen gross description
Board review style answer #1
Board review style question #2
A result message forwarded to a state health department for a reportable disease should be in which format?

  1. ASTM
  2. CSV
  3. Excel
  4. HL7
  5. HTML
Board review style answer #2
Back to top
Image 01 Image 02