Mechanism of De Novo Sequencing
De Novo sequencing is a genome sequencing method that does not rely on reference genome sequences and is mainly used for the assembly and annotation of new species genomes. With the advancement of high-throughput sequencing technology, De Novo sequencing has played an essential role in biological research.
The core of De Novo sequencing lies in directly extracting DNA from biological samples, sequencing it with a sequencer, and subsequently analyzing the data and assembling the genome using bioinformatics tools. This process mainly includes four stages: sample preparation, sequencing, data processing, and genome assembly.
Sample Preparation
Sample preparation is the first step in De Novo sequencing and directly affects the quality of subsequent data. It includes DNA extraction, purification, and library construction. High-quality DNA extraction is critical for successful sequencing. The extracted DNA needs to be purified to remove proteins, RNA, and other impurities. Library construction involves fragmenting the purified DNA and ligating sequencing adapters to facilitate recognition and sequencing by high-throughput sequencing platforms.
Sequencing Technology
Currently, the primary high-throughput sequencing technologies include Illumina sequencing, PacBio sequencing, and Oxford Nanopore sequencing. Illumina sequencing offers high accuracy and throughput but has shorter read lengths. PacBio and Oxford Nanopore sequencing provide longer read lengths but have relatively higher error rates. In De Novo sequencing, multiple sequencing technologies are often combined to compensate for each other's shortcomings and obtain high-quality genome data.
Data Processing
The raw data generated by sequencing needs to undergo quality control, error correction, and data filtering. Quality control includes removing low-quality reads and adapter sequences. Error correction mainly targets data from PacBio and Nanopore sequencing to improve data accuracy. Data filtering removes contaminant sequences and duplicates to ensure data purity.
Genome Assembly
Genome assembly is the core step of De Novo sequencing, aiming to stitch together a large number of short reads or long reads into a complete genome sequence. Common assembly algorithms include the de Bruijn graph algorithm and the Overlap-Layout-Consensus (OLC) algorithm. The de Bruijn graph algorithm is suitable for short-read data, while the OLC algorithm is more appropriate for long-read data. To improve assembly quality, multiple algorithms are often combined, and reference genomes are used for auxiliary assembly and correction.
Applications
De Novo sequencing has broad applications in new species genome assembly, evolutionary research, complex genome analysis, and pathogen detection. For example, De Novo sequencing can assemble a complete genome of a new species, revealing its genome structure and function and providing foundational data for evolutionary biology research. In pathogen detection, De Novo sequencing can quickly decode the genome of unknown pathogens, helping to determine their classification and infection mechanisms.
As a powerful genomic tool, De Novo sequencing has significantly advanced biological research. Through advanced sequencing technology and efficient bioinformatics algorithms, it realizes the complete process from DNA extraction to complete genome assembly.
How to order?