Home > Informatics, digital & computational pathology > Database fundamentals

Informatics, digital & computational pathology

Database

Database fundamentals

Authors: Michael Olp, M.D., Ph.D., Jerome Cheng, M.D.

Editorial Board Member: Lewis A. Hassell, M.D.

Last author update: 26 September 2023

Last staff update: 22 January 2024

Copyright: 2023-2024, PathologyOutlines.com, Inc.

PubMed Search: Database fundamentals

Page views in 2023: 178

Page views in 2024 to date: 61

Table of Contents

Cite this page: Olp M, Cheng J. Database fundamentals. PathologyOutlines.com website. https://www.pathologyoutlines.com/topic/informaticsdatabasefundamentals.html. Accessed April 19th, 2024.

Definition / general

Database fundamentals involve the principles and techniques for designing, creating and managing structured data in a centralized repository to allow for efficient data management, analysis and retrieval

Essential features

Data modeling involves designing the database structure and defining the relationships between tables and fields
Data normalization is a process that involves organizing data to minimize redundancy and ensure data consistency
Data storage and retrieval involves storing and retrieving data in a way that is efficient and reliable while ensuring data security and integrity
Database management system (DBMS) software allows users to create, manage and manipulate databases; it provides tools for data storage, retrieval and analysis, as well as data security and backup features
Data querying and reporting involves using tools and languages like SQL to retrieve data from the database and generate reports and visualizations

Terminology

An entity refers to a distinct object or concept represented in a database, such as a patient, an order or a laboratory test
An attribute is a characteristic of an entity represented in a database, such as a patient's name, date of birth or laboratory test result
A table is a collection of related entities organized in a structured manner in a database
A record is a single instance of an entity stored in a table
A primary key is a unique identifier for each record in a table to ensure that records can be uniquely identified and accessed
A foreign key is a field in a table that refers to the primary key of another table and is used to establish a relationship between the 2 tables
A query is a request to retrieve data from a database, usually using a language like SQL
An index is a data structure used to improve the performance of queries by allowing data to be searched more efficiently
Atomicity ensures that all operations within a transaction are completed or none of them are completed, preserving the consistency of the database
Consistency ensures that the database is always valid, with all constraints and rules applied correctly
Isolation ensures that multiple transactions can be executed concurrently without interfering with each other
Durability ensures that once a transaction is committed, its changes are permanent and will survive system failures
Normalization is the process of organizing data in a database in a way that minimizes redundancy and improves data consistency and integrity
Reference: Pantanowitz: Pathology Informatics - Theory & Practice, 1st Edition, 2012

Types of databases

A hierarchical database model organizes data in a tree-like structure, where each record has 1 parent and multiple children; this model helps manage data with a fixed structure and superficial relationships but it may not be ideal for more complex data
In a flat database model, data are stored in a single table with a list of records without any relationship or structure; this model is simple and easy to understand but can be less efficient for managing large amounts of data
A network database model organizes data in a network-like structure, where each record can have multiple parents and children; this model helps manage complex relationships but may be less intuitive and more complicated to implement than the hierarchical model
A relational database model organizes data into tables, each representing a specific entity or relationship; relationships between tables are established through keys, such as primary keys and foreign keys
- This model is the most widely used and is well suited for managing large amounts of data and complex relationships
In an object oriented database model, data are organized into objects, which are instances of classes and can include methods and properties; this model helps store complex and dynamic data structures but can be more challenging to implement and manage than relational databases
In a graph database model, data are organized as nodes and edges, where nodes represent entities and edges represent relationships between them; this model helps manage data with complex relationships, such as social networks, recommendation systems or knowledge graphs
NoSQL database model refers to any database management system that does not use a traditional relational database model; instead, NoSQL databases use a variety of data models (such as key value, document, column family or graph) to store and retrieve data
- NoSQL databases are often used for managing large volumes of unstructured or semistructured data and for handling high levels of traffic and scalability (IBM: What Are NoSQL Databases? [Accessed 5 June 2023])

Applications

Laboratory information management
- Databases can manage laboratory information, such as patient demographics, test orders, results and billing information
Quality control and assurance
- Databases can be used to track and manage quality control and assurance data, such as instrument calibration data, proficiency testing results and corrective actions (Ann Lab Med 2023;43:418)
Clinical decision support
- Databases can provide pathologists and clinicians with decision support tools that integrate patient data, laboratory test results and other clinical information to help guide diagnosis and treatment decisions (J Pathol Inform 2023;14:100303, Clin Biochem 2023;113:70)
Digital pathology image management
- Databases can store and manage digital pathology images, allowing pathologists to easily access and share images with colleagues (J Pathol Inform 2011;2:32)
Research and data analysis
- Databases can store and analyze large amounts of pathology data, such as tissue sample characteristics, diagnostic criteria and treatment outcomes, to support research and inform clinical decision making (Mod Pathol 2022;35:23)
Biobanking and specimen tracking
- Databases can track biobank specimens, including information about specimen collection, storage and distribution (Cell Genome 2022;2:100192)

Implementation

Identify the need
- Determine the specific area(s) in pathology where a database could be helpful (e.g., you may want to create a database to store and manage digital images, track laboratory testing or facilitate research collaborations)
Choose a database management system (DBMS)
- Once you have identified the need, choose a DBMS appropriate for your requirements; some commonly used DBMSs in pathology include MySQL, Microsoft SQL Server, Oracle and PostgreSQL
Define data requirements
- Determine the specific data elements that will be included in the database (e.g., if you are creating a database to store digital images, you may need to include information such as patient demographics, sample type, staining method and imaging modality)
Design the database schema
- Create a schema that defines the database structure and how the data elements will be organized and related
Develop the database
- Using the chosen DBMS, create and populate the database with data; this process may involve importing data from existing systems or entering data manually
Test the database
- Thoroughly test it to ensure that it functions as expected and that data can be accessed and manipulated accurately
Train users
- Provide training and support to users accessing and using the database, including pathologists, laboratory technicians and other healthcare professionals
Monitor and maintain the database
- Regularly monitor the database for errors and performance and perform routine maintenance tasks such as backing up data and optimizing database performance
Reference: PLoS Comput Biol 2016;12:e1005097

Advantages

Improved efficiency
- Databases can help streamline pathology workflows by making it easier to access and analyze patient and laboratory data; this functionality can help reduce the time and resources required to perform pathology tasks and improve turnaround times for laboratory test results (Singapore Med J 2018;59:597)
Increased accuracy
- Databases can help reduce errors in pathology by providing a centralized location for data that can be easily accessed and validated; this ability can help improve the accuracy of laboratory test results and reduce the risk of misdiagnosis or treatment error (J Am Med Inform Assoc 2001;8:527)
Better collaboration
- Databases can facilitate collaboration between pathologists and other healthcare professionals by providing a platform for sharing patient data and laboratory test results; this function can help improve communication and coordination of care
Improved patient outcomes
- By providing access to accurate and up to date patient data, databases can help pathologists make more informed diagnostic and treatment decisions, improving patient outcomes (NPJ Digit Med 2020;3:17)
Enhanced research
- Databases can store and manage large volumes of research data, which can be analyzed to identify patterns and relationships that may be useful in diagnosis, treatment or research; this process can help advance the field of pathology and improve patient outcomes (Diabetes Technol Ther 2011;13:343)

Limitations

Data quality
- Databases are only as good as the quality of their data; pathologists must ensure that data entered into the database are accurate, complete and up to date, which may require additional resources and training
Data privacy and security
- Patient data stored in databases are subject to privacy and security concerns, including unauthorized access and potential breaches; pathologists must ensure proper security measures are in place to protect patient data
Complexity
- Databases can be complex and require specialized skills and expertise to manage and maintain; pathologists may need additional training or support to use and manage a database effectively
Compatibility
- Databases may not be compatible with all pathology information systems or laboratory equipment, making integrating data from multiple sources challenging
Cost
- Developing and maintaining a database can be costly, requiring significant resources for hardware, software and personnel
Legal and regulatory compliance
- Pathologists must ensure that their use of databases complies with legal and regulatory requirements, including data privacy laws such as HIPAA and data sharing agreements (CDC: Health Insurance Portability and Accountability Act of 1996 (HIPAA) [Accessed 5 June 2023])

Software

MySQL is an open source relational database management system (RDBMS) widely used in pathology and healthcare applications; it is popular for managing large volumes of structured data, such as laboratory test results and patient information
Microsoft SQL Server is a commercial RDBMS commonly used in healthcare and pathology settings; it provides tools for managing large volumes of data and can be used to store and analyze laboratory test results, medical images and patient records
Oracle Database is a commercial RDBMS widely used in healthcare and pathology applications; it provides data management, analysis and security features and can store and manage large volumes of structured data
PostgreSQL is an open source RDBMS commonly used in pathology and healthcare settings; it is designed to handle large volumes of data and provides data management, analysis and security features
SQLite is an open source embedded database management system commonly used in mobile and web applications; it is lightweight and easy to use, making it a popular choice for managing small to medium sized pathology research and development datasets

Additional references

Connolly: Database Systems - A Practical Approach to Design, Implementation, and Management, 6th Edition, 2014, Perspect Health Inf Manag 2004;1:6

Board review style question #1

Which of the following is a primary goal of data modeling?

Create a replica of the data in a database
Ensure the accuracy and consistency of data
Improve the efficiency of data storage
Increase the speed of data processing

Board review style answer #1

B. Ensure the accuracy and consistency of data. Data modeling is the process of creating a conceptual representation of data and their relationships. One of the primary goals of data modeling is to ensure the accuracy and consistency of data by defining rules and constraints that govern how data can be stored and manipulated. Answer A is incorrect because creating a replica of the data in a database is not a goal of data modeling, as data modeling is concerned with making a conceptual representation of the data rather than a physical representation. Answers C and D are incorrect because while data modeling can improve the efficiency of data storage and increase the speed of data processing, these are secondary goals achieved through appropriate data structures and algorithms rather than through data modeling itself (Pantanowitz: Pathology Informatics - Theory & Practice, 1st Edition, 2012).

Comment Here

Reference: Database fundamentals

Board review style question #2

Which of the following best describes atomicity in database fundamentals?

Ability to ensure that a transaction either succeeds ultimately or fails completely
Ability to ensure that only authorized users can access the database
Ability to handle multiple transactions simultaneously
Ability to store and retrieve large amounts of data quickly

Board review style answer #2

A. Ability to ensure that a transaction either succeeds ultimately or fails completely. Atomicity is a fundamental property of database transactions that ensures that a transaction is treated as a single, indivisible unit of work. In other words, a transaction is either completed or rolled back to its initial state, ensuring that the database remains consistent and reliable (Pantanowitz: Pathology Informatics - Theory & Practice, 1st Edition, 2012). The other options provided are not accurate descriptions of atomicity. Answer B is incorrect because it describes the concept of authentication and access control, which is different from atomicity. It focuses on ensuring that only authorized users can access the database. Answer C is incorrect because it describes the concept of concurrency control, which deals with managing and coordinating multiple transactions executing concurrently. It is related to atomicity but not synonymous with it. Answer D is incorrect because it describes the concept of database performance and scalability, which is unrelated to atomicity. Atomicity focuses on the consistency and reliability of transactions, rather than the speed of data storage and retrieval.

Comment Here

Reference: Database fundamentals

Home > Informatics, digital & computational pathology > Database fundamentals