Molecular markers

NGS library prep



Last author update: 26 September 2024
Last staff update: 26 September 2024

Copyright: 2023-2025, PathologyOutlines.com, Inc.

PubMed Search: Next generation sequencing library prep

Stuart Hanviriyapunt, M.D.
Megan Parilla, M.D.
Page views in 2024: 133
Page views in 2025 to date: 6
Cite this page: Hanviriyapunt S, Parilla M. NGS library prep. PathologyOutlines.com website. https://www.pathologyoutlines.com/topic/molecularNGSlibprep.html. Accessed January 17th, 2025.
Definition / general
  • Library preparation consists of the steps required to ready nucleic acids for analysis by next generation sequencing (NGS)
Essential features
  • Although there are many types of NGS, the most common forms used in clinical labs today consist of 4 steps: nucleic acid extraction, library preparation, sequencing by synthesis and data analysis / reporting
  • Second step, library preparation, is itself 3 steps: modification of nucleic acids, target selection (enrichment) and application to a solid phase for sequencing
  • Each of these steps varies depending on the breadth of the genome sequenced and the technology used (e.g., optical sequencing versus hydrogen ion based technology)
Diagrams / tables

Contributed by Megan Parilla, M.D.
WGS, WES and targeted sequencing

WGS, WES and targeted sequencing

Optical and ion semiconductor sequencing

Optical and ion semiconductor sequencing

Clinical images

Contributed by Megan Parilla, M.D.
Macrodissection for NGS

Macrodissection for NGS

NGS overview
Library preparation overview
  • Method of library preparation will depend on the amount of the genome sequenced and the type of sequencer used
  • For clarity, only the most widely used technologies / clinically applicable methods are discussed here
Library preparation stratified by amount of the genome sequenced
Whole genome sequencing (WGS)
  • In clinical WGS, the entire human genome is examined; most commonly, DNA is extracted from whole blood for the purposes of identifying constitutional / germline genetic alterations (Genet Med 2021;23:1399, Nat Commun 2017;8:1377)
  • Modification of nucleic acids
    • After DNA extraction / purification, the DNA is sheared into smaller fragments; this is most commonly done by acoustic energy / sonication or with enzymes
    • Fragmentation may damage the ends of DNA; DNA ends are repaired and often the repaired DNA fragments are then subjected to a process called dA tailing, where an adenosine is added to the 3' ends of the now blunt ended DNA molecules
    • Adapter ligation: oligonucleotides with a known sequence are added to the ends of each fragment of DNA; these oligonucleotides are called adapters or barcodes
      • Structurally the adapters consist of 3 regions and a thymidine overhang that complements the newly added adenosine; the 3 regions are
        • A sequence that is complementary to the solid phase (see Library preparation stratified by technology (platform) used)
        • A universal primer binding site (sequencing binding site) for the synthesis reaction
        • A patient specific barcode so that multiple patients can be sequenced at once and demultiplexed after sequencing
  • Selection / enrichment
    • There is no selection of DNA in WGS; everything is applied to the solid phase
  • Application to a solid phase (see Library preparation stratified by technology (platform) used)

Whole exome sequencing (WES)
  • In clinical WES, the entire protein coding genome is sequenced, which is ~2% of the human genome; most commonly, DNA is extracted from whole blood for the purposes of identifying constitutional / germline genetic alterations (Genet Med 2021;23:1399, G3 (Bethesda) 2015;5:1543)
  • Modification of nucleic acids
    • DNA is fragmented, as described in WGS above
    • DNA is repaired and dA tailed, as described in WGS above
    • Adapters are ligated, as described in WGS above
  • Selection / enrichment
    • Because only a portion of the DNA is required for sequencing, one must select the protein coding genes only; this is most commonly done using hybrid capture
    • Hybrid capture consists of 3 steps
      • Incubation with biotinylated probes: biotinylated probes (sometimes called baits) are oligonucleotides, which are complementary to all regions of the genome that are desired to be sequenced; these oligonucleotides are modified / covalently bonded to biotin, an organic compound with a high affinity to streptavidin
      • Magnetic beads coated with streptavidin bind the probes and capture complementary DNA from the sample; all other noncomplementary DNA is washed away
      • Captured target DNA from the sample is eluted from the streptavidin beads and can subsequently be used in sequencing
  • Application to a solid phase (see Library preparation stratified by technology (platform) used)

Targeted sequencing
  • Targeted sequencing, excluding WES, has a broad range of clinical purposes and any portion of the genome can be selected for sequencing, whether it is 5 genes, 500 genes or 5,000 genes; most somatic / tumor specific sequencing is targeted sequencing (Diagnostics (Basel) 2022;12:1539)
  • Modification of nucleic acids
    • DNA is fragmented if selection is hybrid capture
    • DNA is repaired and dA tailed if selection is hybrid capture
    • Adapters are ligated if selection is hybrid capture
  • Selection / enrichment
    • Because only a portion of the DNA is required for sequencing, one must select the regions of interest to use in the sequencing reaction
    • Unlike WES, which only uses hybrid capture for target selection, smaller panels can use either hybrid capture or amplicon / PCR selection
      • Hybrid capture selection: catching the DNA molecules of interest with complementary DNA and washing away the remaining DNA; described in WES above
        • Pros: good for large panels (panels with many genes)
        • Cons: requires more DNA than amplicon technology
      • Amplicon / PCR selection: multiplex PCR amplifies the regions of interest above the background DNA
        • Pros: requires less DNA than capture technology
        • Cons: large multiplex panels often have issues with amplicon dropout, where some primer pairs perform worse than others and some regions of interest may not be well represented in the final sequencing reaction; additionally, PCR amplification can introduce artifacts, which may be seen in the sequencing reaction as variants
      • In amplicon library preparation, DNA selection and adapter ligation may happen simultaneously (with modified primers that include an adapter) or the adapter ligation will happen after the PCR selection; additionally, because PCR creates small fragments, initial DNA fragmentation is not required
      • Target selection steps for amplicon based enrichment are as follows
        • Extracted DNA is added to pools of primers and lower cycle PCR is performed
        • If needed, adapters are ligated to the PCR products
  • Application to a solid phase (see Library preparation stratified by technology (platform) used)
Library preparation stratified by technology (platform) used
Optical analysis specific details (Illumina)
  • This NGS platform uses sequencing by synthesis where nucleotides (A, C, T, G) are added one at a time; each nucleotide is labeled with a unique fluorophore and photographs are taken after each addition (Curr Protoc Mol Biol 2018;122:e59)
  • Modification of nucleic acids
    • Adapters are platform specific
    • At present, Illumina adapters are actually of 2 types, termed i5 and i7
    • These adapters bind such that the DNA of interest is sandwiched between 2 different adapters, allowing paired end information to assist in mapping and resolution of complex rearrangements
  • Selection / enrichment
    • Not platform specific
  • Application to a solid phase
    • Fragments with adapters attached are complementary to oligonucleotide sequences on the surface of the flow cell, a specialized glass slide on which the sequencing reaction happens
    • Modified DNA is added to the flow cell and adapted molecules bind to the flow cell's oligonucleotides somewhat randomly
      • Ideally, the molecules are both not too close and not too far away from other molecules (termed overclustering and underclustering, respectively)
    • A modified form of PCR (technically isothermal bridge amplification) occurs to locally amplify the bound molecule and create a cluster
      • This cluster is composed of molecules that are identical to the original molecule that bound
      • Reason for this amplification is so that there are enough molecules to be seen with a high powered camera
    • Flow cell is now ready for the sequencing reaction

Ion semiconductor specific details (Ion Torrent)
  • This NGS platform uses sequencing by synthesis, where nucleotides are unmodified and the release of a hydrogen ion, which occurs naturally during DNA synthesis, is detected by a change in pH; nucleotides are flooded in one at a time and the pH is measured after each type of nucleotide (A, T, G, C) is added (Curr Protoc Mol Biol 2018;122:e59)
  • Modification of nucleic acids
    • Adapters are platform specific
    • Torrent barcodes are complementary to beads, which will act as the surface of the sequencing reaction instead of the flow cell, as seen in Illumina
  • Selection / enrichment
    • Not platform specific
  • Application to a solid phase
    • Fragments with adapters attached are complementary to oligonucleotide sequences on the surface of specialized beads, the surface on which the sequencing reaction happens
    • Ideally 1 molecule binds to a bead; postsequencing data analysis will eliminate data coming from polyclonal beads
    • Emulsion PCR occurs to locally amplify the bound molecule and coat the bead
      • Bead is coated in molecules that are identical to the original molecule that bound
      • Reason for this amplification is so that there are enough molecules to release hydrogen ions so the pH will change during the sequencing reaction
    • Beads are added to a chip, which contains millions of baskets that can each contain a single bead
      • These baskets contain the powerful pH meters that detect pH changes in that well only
    • Chip is now ready for the sequencing reaction
Additional caveats
  • RNA sequencing (Diagnostics (Basel) 2020;10:521)
    • Sequencing RNA may be desired over sequencing DNA, especially in the detection of gene fusions
    • RNA cannot directly be sequenced using the technologies described above but must be converted to cDNA first
    • After the conversion of RNA to cDNA, sequencing is as described above
  • Cell free DNA (cfDNA) sequencing and minimal residual disease (MRD) testing (Science 2021;372:eaaw3616, DNA Res 2015;22:269)
    • When sequencing DNA to look for very low frequency events, as in the case of cfDNA and MRD testing, specialized adapters with a fourth element, molecular barcodes, can be used to assist in discriminating PCR artifacts from true variants in the sample
    • This is done via a process of collapsing duplicates, where multiple molecules containing the same molecular barcode sequence are treated as 1 molecule with a consensus sequence
Board review style question #1

Which of the following is true regarding whole genome sequencing (WGS)?

  1. It is commonly used to identify constitutional (germline) disorders
  2. It is the most common sequencing done on tumor tissue for biomarker identification
  3. It requires amplicon based target selection
  4. It requires hybrid capture target selection
Board review style answer #1
A. It is commonly used to identify constitutional (germline) disorders. Whole genome sequencing covers the entire genome (large sequencing breath); as such, it has low sequencing depth and a limited ability to identify low frequency variants. It is mostly used to identify variants at a VAF of 50% or 100%, as seen in constitutional disorders. Answer B is incorrect because targeted sequencing is the most common sequencing done on tumor tissue for biomarker identification. Tumor tissue is never 100% tumor, with inflammatory cells, blood vessels and desmoplastic stroma always found in tumor tissue. Tumor specific mutations are thus diluted by background normal tissue DNA and a greater sequencing depth is required to identify low frequency variants (e.g., VAF 10%). Sequencing depth comes at a trade off of sequencing breath. Answers C and D are incorrect because whole genome sequencing by definition does not target or select any regions of the genome by amplicon or hybrid capture methods.

Comment Here

Reference: NGS library preparation

Back to top
Image 01 Image 02