Mass Spectrometry-Based De Novo Protein Sequencing

De novo protein sequencing refers to the process of determining the amino acid sequence of peptides using tandem mass spectrometry (MS/MS) without relying on pre-existing sequence databases.

The core principle of de novo protein sequencing is to calculate the mass of amino acid residues in the peptide backbone by measuring the mass difference between fragment ions. The mass-to-charge ratio (m/z) of these ions typically allows for the unique identification of the residue type, thereby determining the amino acid sequence of an unknown peptide.

Advantages

1. No Dependence on Known Information

(1) Suitable for novel proteins: For newly discovered proteins or those lacking database entries, traditional sequencing methods that rely on database comparisons are ineffective. De novo protein sequencing directly determines the amino acid sequence, offering a viable solution in the absence of existing database information, and providing essential data for the structural and functional studies of novel proteins.

(2) Handling complex samples: In cases involving complex or specially treated protein samples, sequence information may be incomplete or difficult to match with existing databases. For example, proteins extracted from extreme environments or artificially modified proteins may present such challenges. De novo protein sequencing can accurately determine the sequence of these proteins, providing reliable data for their structural and functional analysis.

2. Comprehensive Sequence Information

(1) High sequence coverage: By utilizing multiple proteases for digestion, complementary peptide fragments obtained from different enzyme cleavages enable high sequence coverage, ensuring that the full amino acid sequence is captured. This comprehensive sequencing is essential for accurately understanding protein structure and function, as even small sequence gaps can hinder the correct interpretation of these characteristics.

(2) Detection of post-translational modifications: In addition to determining the primary amino acid sequence, de novo protein sequencing can also identify various post-translational modifications (PTMs) such as phosphorylation, glycosylation, and acetylation. These modifications are crucial to protein function and structure, and de novo protein sequencing enables precise identification and localization of these PTMs.

3. Assisting Structural Analysis

(1) Supporting protein 3D structure prediction: The primary structure (amino acid sequence) forms the foundation for predicting a protein's three-dimensional structure. Accurate sequence data obtained via de novo protein sequencing provide key information for 3D structure prediction, which can be further refined using computational modeling to understand protein folding and its potential 3D configuration.

(2) Discovering novel structural features: In some cases, newly identified proteins may possess unique structural features that traditional methods cannot fully uncover. De novo sequencing can reveal entirely new sequence information, helping researchers identify novel domains or motifs within the protein, offering insights for further studies on protein function.

4. Supporting Functional Research

(1) Identifying functional sites: Determining the protein sequence allows for the identification of critical functional sites, such as enzyme active sites or regions involved in molecular interactions. These sites are pivotal for understanding protein function and designing experiments to validate their roles.

(2) Studying protein interactions: Accurate sequence data is essential for investigating protein interactions with other molecules. De novo sequencing provides the foundational sequence data needed for studying interactions with ligands, receptors, and antibodies, offering valuable insights into the mechanisms and specificity of these interactions.

Analysis Workflow

The protein sample is subjected to digestion by multiple proteases, and the resulting peptides are analyzed by mass spectrometry. The generated spectra are processed with de novo algorithms to derive peptide sequences. Through alignment and assembly of the peptide sequences, the complete amino acid sequence of the protein is reconstructed, from the N-terminus to the C-terminus.

Data Analysis

1. Peptide Coverage

The use of both specific and non-specific proteases (e.g., Trypsin, Chymotrypsin) ensures 100% protein coverage and produces abundant, highly overlapping peptide fragments, providing high-quality data for sequencing.

2. Peptide Assembly

High-energy collisional dissociation (HCD) generates B/Y ions, which are used to deduce peptide sequences. However, due to the redundancy of peptides produced by multiple enzyme digestions, peptide selection and refinement are necessary to ensure accurate sequence determination.

Applications

De novo protein sequencing is employed in numerous fields, including disease biomarker discovery, cancer research, and antibody drug development. In antibody drug development, it facilitates rapid and accurate analysis of monoclonal antibody primary structures, which is essential for advancing both drug development and clinical applications.

MtoZ Biolabs, an integrated chromatography and mass spectrometry (MS) services provider.

Related Services

De Novo Protein Sequencing Service

Submit Inquiry

How to order?