Chapter 7 Annotation

Once a peak table or feature table has been generated, annotation becomes the key step that connects mass spectrometry signals to chemical meaning. For broader overviews, see(Domingo-Almenara, Montenegro-Burke, Benton, et al. 2018; Chaleckis et al. 2019; Lai et al. 2018; Nash and Dunn 2019; Viant et al. 2017; Allard et al. 2017). The first review proposed five levels for current computational annotation strategies.

  • Level 1: Peak grouping: MS pseudospectra extraction based on peak shape similarity and peak abundance correlation

  • Level 2: Peak annotation: adducts, neutral losses, isotopes, and other mass relationships based on mass distances

  • Level 3: Biochemical knowledge based on putative identification, potential biochemical reactions, and related statistical analysis

  • Level 4: Use and integration of tandem MS data based on data-dependent or data-independent acquisition mode, or in silico prediction

  • Level 5: Retention time prediction based on library-available retention indices or quantitative structure-retention relationship (QSRR) models.

Most of the software are at level 1 or 2. If we only have compounds structure, we could guess ions under different ionization method. If we have mass spectrum, we could match the mass spectral by a similarity analysis to the database. In metabolomics, we only have mass spectrum or mass-to-charge ratios. Single mass-to-charge ratio is not enough for identification. That’s the one bottleneck for annotation. So prediction is always performed on MS/MS data.

7.1 A practical annotation workflow

In practice, annotation is easier to understand as a workflow rather than as a list of tools. A useful order is:

  1. understand annotation limits and confidence levels
  2. constrain candidate formulas from accurate mass and isotopic information
  3. collapse redundant MS1 features such as adducts, isotopes, and in-source fragments
  4. connect representative MS1 features to MS2 or MSn data
  5. rank candidate structures with MS/MS libraries and in silico tools
  6. refine the result with retention, biochemical knowledge, and experimental context
  7. report annotation confidence clearly instead of overclaiming identification

The dark metabolome debate is part of this workflow rather than a side topic: before interpreting unknowns, we need to ask how many features represent real unique compounds and how many are redundant signals from the same chemistry.

7.2 Step 0: understand annotation limits

The major issue in annotation is the redundancy peaks from same metabolite. Unlike genomes, peaks or features from peak selection are not independent with each other. Adducts, in-source fragments and isotopes would lead to wrong annotation. A common solution is that use known adducts, neutral losses, molecular multimers or multiple charged ions to compare mass distances.

Another issue is about the MS/MS database. Only 10% of known metabolites in databases have experimental spectral data. Thus in silico prediction is required. Some works try to fill the gap between experimental data, theoretical values(from chemical database like chemspider) and prediction together. Here is a nice review about MS/MS prediction(Hufsky et al. 2014).

7.2.1 Common sources of peak misidentification

  • Isomer

Use separation methods such as chromatography, ion mobility MS, MS/MS. Reversed-phase ion-pairing chromatography and HILIC is useful. Chemical derivatization is another option.

  • Interfering compounds

20ppm is the least exact mass accuracy for HRMS.

  • In-source degradation products

7.2.2 Annotation vs. Identification

According to the definition from the Chemical Analysis Working Group of the Metabolomics Standards Initiative(Sumner et al. 2007; Viant et al. 2017), four levels of confidence could be assigned to identification:

  • Level 1 ‘identified metabolites’
  • Level 2 ‘Putatively annotated compounds’
  • Level 3 ‘Putatively characterised compound classes’
  • Level 4 ‘Unknown’

Schymanski et al. proposed a complementary five-level confidence scheme specifically designed for high-resolution mass spectrometry that has become widely adopted in environmental and exposome studies(Schymanski et al. 2014). The levels range from Level 1 (confirmed structure by reference standard) through Level 5 (exact mass only), with intermediate levels for probable structure (Level 2, diagnostic evidence but no reference standard), tentative candidates (Level 3, evidence for possible structures), and unequivocal molecular formula (Level 4). This framework provides more granularity than the MSI scheme and is especially useful for non-target screening where reference standards are unavailable for most detected features.

In practice, data analysis based annotation could reach Level 2. For Level 1, we need extra methods such as MS/MS, retention time, accurate mass, 2D NMR spectra, and so on to confirm the compounds. However, standards are always required for solid proof.

For specific group of compounds such as PFASs, the communication of confidence level could be slightly different(Charbonnet et al. 2022).

Through MS/MS seemed a required step for identification, recent study found ESI might also generate fragments ions for structure identification (Xue, Guijas, et al. 2020; Xue et al. 2021, 2023; Bernardo-Bermejo et al. 2023).

7.2.3 The Dark Metabolome

In a typical untargeted LC-MS experiment, thousands of features are detected, but only a small fraction can be annotated with known metabolite identities. The vast majority of detected features remain unidentified, a phenomenon often referred to as the “dark matter” of the metabolome(Silva et al. 2015). Even in well-studied matrices like blood plasma, if a feature is detected in over 50% of samples, there is roughly a 50% chance it has a known annotation; for low-abundance features, annotation rates often fall below 5%. This annotation gap is one of the central challenges in metabolomics and has sparked active debate about its causes and solutions(Petras et al. 2018; Koelmel et al. 2025).

7.2.3.1 What makes up the dark matter?

The detected features in an untargeted experiment are not all independent metabolites. A single compound can generate multiple features through several mechanisms:

  • In-source fragmentation (ISF): Molecules can fragment during the electrospray ionization process before entering the mass analyzer, generating fragment ions that appear as separate features. The extent of ISF is a subject of active debate (see below).

  • Adducts and multimers: Beyond the protonated or deprotonated molecular ion, compounds readily form sodium, potassium, ammonium adducts, as well as dimers and trimers, each appearing as a separate feature.

  • Isotope peaks: Natural isotope distributions (especially 13C, 34S, 37Cl) generate satellite peaks for every compound.

  • Multiple charge states: Some compounds, particularly larger ones, can carry multiple charges.

After accounting for these redundancies, the number of unique compounds in a sample is substantially smaller than the number of detected features. However, even after deduplication, the majority of unique compounds remain unannotated because they are absent from existing spectral databases.

7.2.3.2 The in-source fragmentation debate

In 2024, Giera et al. analyzed the METLIN database of 931,000 molecular standards and reported that in-source fragmentation could account for over 70% of the peaks observed in typical LC-MS metabolomic datasets(Giera et al. 2024). This finding suggested that the dark metabolome might be largely a measurement artifact rather than representing genuine molecular diversity. A follow-up study further examined the relationship between ISF and the dark metabolome/lipidome(Uritboonthai et al. 2025).

However, this conclusion was challenged. Li and Mahieu performed a systematic analysis of 61 representative public LC-MS datasets and found that in-source fragments contribute to less than 10% of features in real experimental data(Li and Mahieu 2025). Their khipu-based pre-annotation approach showed that the majority of abundant features have identifiable ion patterns (adducts, isotopes, etc.), and that the dark matter is explainable in an abundance-dependent manner: most features come from real compounds, but the number of unique compounds is much smaller than the number of features. A separate perspective from de la Briere et al. further examined the impact of unintentional fragments on molecular networking and public repository-scale analysis(Briere et al. 2025).

This debate reflects a key tension in the field: the extent of ISF depends heavily on the instrument, ionization source design, mobile phase composition and applied voltages, making it difficult to generalize from standard databases to real experimental conditions. What is clear is that both ISF and genuine molecular diversity contribute to the dark metabolome, and the relative contribution varies by sample type and analytical conditions.

7.2.3.3 A missing perspective: data-driven deduplication

Notably, the above debate has largely focused on either standard-library-based estimation (Giera/Siuzdak) or ion-pattern-based pre-annotation (Li/Mahieu). Neither side has systematically addressed the question from a mass distance statistics perspective. The Paired Mass Distance (PMD) approach(M. Yu et al. 2019) offers a complementary, data-driven strategy: by analyzing the frequency distribution of mass differences between all feature pairs in a dataset, high-frequency PMDs that correspond to known chemical relationships (adducts, in-source fragments, biotransformations) can be identified directly from the data without relying on a predefined adduct list or a standard compound library.

The GlobalStd algorithm(M. Yu et al. 2019), built on PMD analysis, takes this further by selecting independent “precursor ions” from the feature list — features whose mass differences to other features cannot be explained by high-frequency PMDs are more likely to represent unique parent compounds. This effectively collapses a feature list of thousands into a much smaller set of candidate independent compounds, providing a quantitative and data-driven estimate of the true chemical complexity in a sample. Such an approach could provide additional evidence for the dark matter debate: by applying GlobalStd to the same public datasets used by either side, one could independently estimate the ratio of redundant features to unique compounds without assumptions about ISF rates from standard libraries.

Furthermore, the PMDDA workflow(Yu et al. 2022) extends this logic to connect MS1 features with MS2 data acquisition: by using GlobalStd to prioritize truly independent precursor ions for targeted MS2 collection, it addresses both the redundancy problem and the annotation gap simultaneously.

7.2.3.4 Strategies to address the dark matter

Several complementary approaches are being developed to reduce the annotation gap:

  • Pre-annotation and feature deduplication: Tools like khipu(Li and Mahieu 2025) and CAMERA(Kuhl et al. 2012) group related features (adducts, isotopes, fragments) based on ion pattern recognition and co-elution. The PMD/GlobalStd approach(M. Yu et al. 2019) provides an orthogonal, data-driven method by identifying redundant features through mass distance frequency analysis without requiring a predefined adduct list.

  • Molecular networking: GNPS-based molecular networking(Wang et al. 2016) connects structurally related compounds through MS/MS spectral similarity, enabling propagation of annotations from known to unknown compounds within the same spectral family.

  • In silico prediction: Tools like SIRIUS/CSI:FingerID(Dührkop et al. 2019), CFM-ID(Allen et al. 2014), and MS2DeepScore(Huber et al. 2020) predict molecular properties or spectra from structure, expanding the searchable chemical space beyond experimentally characterized compounds.

  • Machine learning approaches: Deep learning models are increasingly used to embed spectral data into chemical space, enabling analogue searching and compound class prediction even without exact database matches (see Statistical Analysis chapter for details).

For a comprehensive review of bioinformatics tools tackling the dark metabolome, see Koelmel et al.(Koelmel et al. 2025).

7.3 Step 1: constrain candidates by molecular formula

Once the annotation limits are clear, the next step is to reduce the candidate space using accurate mass, isotope patterns, and chemical plausibility rules before structural searching.

Cheminformatics will help for MS annotation. The first task is molecular formula assignment. For a given accurate mass, the formula should be constrained by predefined element type and atom number, mass error window and rules of chemical bonding, such as double bond equivalent (DBE) and the nitrogen rule. The nitrogen rule is that an odd nominal molecular mass implies also an odd number of nitrogen. This rule should only be used with nominal (integer) masses. Degree of unsaturation or DBE use rings-plus-double-bonds equivalent (RDBE) values, which should be interger. The elements oxygen and sulphur were not taken into account. Otherwise the molecular formula will not be true.

\[RDBE = C+Si - 1/2(H+F+Cl+Br+I) + 1/2(N+P)+1 \]

To assign molecular formula to a mass to charge ratio, Seven Golden Rules (Kind and Fiehn 2007) for heuristic filtering of molecular formulas should be considered:

  • Apply heuristic restrictions for number of elements during formula generation. This is the table for known compounds:
##   Mass.Range.[Da] Library C.max H.max N.max O.max P.max S.max F.max Cl.max
## 1           < 500     DNP    29    72    10    18     4     7    15      8
## 2            <NA>   Wiley    39    72    20    20     9    10    16     10
## 3          < 1000     DNP    66   126    25    27     6     8    16     11
## 4            <NA>   Wiley    78   126    20    27     9    14    34     12
## 5          < 2000     DNP   115   236    32    63     6     8    16     11
## 6            <NA>   Wiley   156   180    20    40     9    14    48     12
## 7          < 3000     DNP   162   208    48    78     6     9    16     11
##   Br.max Si.max
## 1      5     NA
## 2      4      8
## 3      8     NA
## 4      8     14
## 5      8     NA
## 6     10     15
## 7      8     NA
  • Perform LEWIS and SENIOR check. The LEWIS rule demands that molecules consisting of main group elements, especially carbon, nitrogen and oxygen, share electrons in a way that all atoms have completely filled s, p-valence shells (‘octet rule’). Senior’s theorem requires three essential conditions for the existence of molecular graphs

    • The sum of valences or the total number of atoms having odd valences is even;

    • The sum of valences is greater than or equal to twice the maximum valence;

    • The sum of valences is greater than or equal to twice the number of atoms minus 1.

  • Perform isotopic pattern filter. Isotope ratio abundance was included in the algorithm as an additional orthogonal constraint, assuming high quality data acquisitions, specifically sufficient ion statistics and high signal/noise ratio for the detection of the M+1 and M+2 abundances. For monoisotopic elements (F, Na, P, I) this rule has no impact. isotope pattern will be useful for brominated, chlorinated small molecules and sulphur-containing peptides.

  • Perform H/C ratio check (hydrogen/carbon ratio). In most cases the hydrogen/carbon ratio does not exceed H/C > 3 with rare exception such as in methylhydrazine (CH6N2). Conversely, the H/C ratio is usually smaller than 2, and should not be less than 0.125 like in the case of tetracyanopyrrole (C8HN5).

  • Perform NOPS ratio check (N, O, P, S/C ratios).

##   Element.ratios Common.range.(covering.99.7%) Extended.range.(covering.99.99%)
## 1            H/C                       0.2–3.1                            0.1–6
## 2            F/C                         0–1.5                              0–6
## 3           Cl/C                         0–0.8                              0–2
## 4           Br/C                         0–0.8                              0–2
## 5            N/C                         0–1.3                              0–4
## 6            O/C                         0–1.2                              0–3
## 7            P/C                         0–0.3                              0–2
## 8            S/C                         0–0.8                              0–3
## 9           Si/C                         0–0.5                              0–1
##   Extreme.range.(beyond.99.99%)
## 1                 < 0.1 and 6–9
## 2                         > 1.5
## 3                         > 0.8
## 4                         > 0.8
## 5                         > 1.3
## 6                         > 1.2
## 7                         > 0.3
## 8                         > 0.8
## 9                         > 0.5
  • Perform heuristic HNOPS probability check (H, N, O, P, S/C high probability ratios)
df <- data.frame(
                stringsAsFactors = FALSE,
                  Element.counts = c("NOPS all > 1","NOP all > 3","OPS all > 1",
                                     "PSN all > 1","NOS all > 6"),
                  Heuristic.Rule = c("N< 10, O < 20, P < 4, S < 3",
                                     "N < 11, O < 22, P < 6","O < 14, P < 3, S < 3",
                                     "P < 3, S < 3, N < 4","N < 19 O < 14 S < 8"),
  DB.examples.for.maximum.values = c("C15H34N9O8PS, C22H44N4O14P2S2, C24H38N7O19P3S","C20H28N10O21P4, C10H18N5O20P5",
                                     "C22H44N4O14P2S2, C16H36N4O4P2S2",
                                     "C22H44N4O14P2S2, C16H36N4O4P2S2","C59H64N18O14S7")
)
df
##   Element.counts              Heuristic.Rule
## 1   NOPS all > 1 N< 10, O < 20, P < 4, S < 3
## 2    NOP all > 3       N < 11, O < 22, P < 6
## 3    OPS all > 1        O < 14, P < 3, S < 3
## 4    PSN all > 1         P < 3, S < 3, N < 4
## 5    NOS all > 6         N < 19 O < 14 S < 8
##                  DB.examples.for.maximum.values
## 1 C15H34N9O8PS, C22H44N4O14P2S2, C24H38N7O19P3S
## 2                 C20H28N10O21P4, C10H18N5O20P5
## 3               C22H44N4O14P2S2, C16H36N4O4P2S2
## 4               C22H44N4O14P2S2, C16H36N4O4P2S2
## 5                                C59H64N18O14S7
  • Perform TMS check (for GC-MS if a silylation step is involved). For TMS derivatized molecules detected in GC/MS analyses, the rules on element ratio checks and valence tests are hence best applied after TMS groups are subtracted, in a similar manner as adducts need to be first recognized and subtracted in LC/MS analyses.

Seven Golden Rules were built for GC-MS and Hydrogen Rearrangement Rules were major designed for LC-CID-MS/MS(Tsugawa et al. 2016). Based on extensively curated database records and enthalpy calculations, “hydrogen rearrangement (HR) rules” could be extending the even-electron rule for carbon (C) and heteroatoms, oxygen (O), nitrogen (N), phosphorus (P), and sulfur (S). They used high abundance MS/MS peaks that exceeded 10% of their base peaks to identify common features in terms of 4 HR rules for positive mode and 5 HR rules for negative mode.

Seven Golden Rules and Hydrogen Rearrangement Rules might also be captured by statistical models. However, such heuristic rules could reduce the searching space of possible formula.

molgen generating all structures (connectivity isomers, constitutions) that correspond to a given molecular formula, with optional further restrictions, e.g. presence or absence of particular substructures (Gugisch et al. 2015).

mfFinder can predict formula based on accurate mass (Patiny and Borel 2013).

RAMSI is the robust automated mass spectra interpretation and chemical formula calculation method using mixed integer linear programming optimization (Baran and Northen 2013).

Here is some other Cheminformatics tools, which could be used to assign meaningful formula or structures for mass spectra.

  • RDKit Open-Source Cheminformatics Software
  • cdk The Chemistry Development Kit (CDK) is a scientific, LGPL-ed library for bio- and cheminformatics and computational chemistry written in Java (Guha 2007).
  • Open Babel Open Babel is a chemical toolbox designed to speak the many languages of chemical data (O’Boyle et al. 2011).
  • ClassyFire is a tool for automated chemical classification with a comprehensive, computable taxonomy (Djoumbou Feunang et al. 2016).
  • BUDDY can perform molecular formula discovery via bottom-up MS/MS interrogation(Xing et al. 2023).

7.4 Step 2: collapse redundant MS1 features

Before MS/MS ranking or database matching, it is usually necessary to reduce the feature table into a smaller set of representative precursor candidates.

Full scan mass spectra always contain lots of redundant peaks such as adducts, isotope, fragments, multiple charged ions and other oligomers. Such peaks dominated the features table(Xu et al. 2015; Sindelar and Patti 2020; Mahieu and Patti 2017). Annotation tools could label those peaks either by known list or frequency analysis of the paired mass distances(Ju et al. 2020; Kouřil et al. 2020).

7.4.1 Adducts list

You could find adducts list here from commonMZ project.

7.4.2 Isotope

Here is Isotope pattern prediction.

7.4.3 CAMERA

Common annotation for xcms workflow(Kuhl et al. 2012).

7.4.4 RAMClustR

The software could be found here (Broeckling et al. 2014; Broeckling et al. 2016). The package included a vignette to follow.

7.4.5 BioCAn

BioCAn combines the results from database searches and in silico fragmentation analyses and places these results into a relevant biological context for the sample as captured by a metabolic model (Alden et al. 2017).

7.4.6 mzMatch

mzMatch is a modular, open source and platform independent data processing pipeline for metabolomics LC/MS data written in the Java language. (Chokkathukalam et al. 2013; Scheltema et al. 2011) and MetAssign is a probabilistic annotation method using a Bayesian clustering approach, which is part of mzMatch(Daly et al. 2014).

7.4.7 xMSannotator

The software could be found here(Uppal et al. 2017).

7.4.8 mWise

mWise is an Algorithm for Context-Based Annotation of Liquid Chromatography–Mass Spectrometry Features through Diffusion in Graphs(Barranco-Altirriba et al. 2021).

7.4.9 MAIT

You could find source code here(Fernández-Albert et al. 2014).

7.4.10 pmd

Paired Mass Distance(PMD) analysis for GC/LC-MS based nontarget analysis to remove redundant peaks(M. Yu et al. 2019).

7.4.11 nontarget

nontarget could find Isotope & adduct peak grouping, and perform homologue series detection (Loos and Singer 2017).

7.4.12 Binner

Binner Deep annotation of untargeted LC-MS metabolomics data (Kachman et al. 2020)

7.4.13 mz.unity

You could find source code here (Mahieu, Spalding, Gelman, et al. 2016) and it’s for detecting and exploring complex relationships in accurate-mass mass spectrometry data.

7.4.14 MS-FLO

ms-flo A Tool To Minimize False Positive Peak Reports in Untargeted Liquid Chromatography–Mass Spectroscopy (LC-MS) Data Processing (DeFelice et al. 2017).

7.4.15 CliqueMS

CliqueMS is a computational tool for annotating in-source metabolite ions from LC-MS untargeted metabolomics data based on a coelution similarity network (Senan et al. 2019).

7.4.16 InterpretMSSpectrum

This package is for annotate and interpret deconvoluted mass spectra (mass*intensity pairs) from high resolution mass spectrometry devices. You could use this package to find molecular ions for GC-MS (Jaeger et al. 2016).

7.4.17 NetID

NetID is a global network optimization approach to annotate untargeted LC-MS metabolomics data(Chen et al. 2021).

7.4.18 ISfrag

De Novo Recognition of In-Source Fragments for Liquid Chromatography–Mass Spectrometry Data(Guo et al. 2021)

7.4.19 FastEI

Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library(Yang et al. 2023)

7.5 Step 3: connect MS1 features to MS2 acquisition

After redundant features are reduced, the next task is to connect the representative MS1 features to informative MS2 data for structural ranking.

7.5.1 PMDDA

Three step workflow: MS1 full scan peak-picking, GlobalStd algorithm to select precursor ions for MS2 from MS1 data and collect the MS2 data and annotation with GNPS(Yu et al. 2022).

7.5.2 HERMES

A molecular-formula-oriented method to target the metabolome(Giné et al. 2021).

7.5.3 dpDDA

Similar work can be found here with inclusion list of differential and preidentified ions (dpDDA)(Y. Zhang et al. 2023).

7.5.4 Extending from MS2 to MSn

A computational approach to generate adatabase of high-resolution-MS n spectra by converting existing low-resolution MSn spectra using complementary high-resolution-MS2 spectra generated by beam-type CAD(Lieng et al. 2023).

7.6 Step 4: score and prioritize candidates with MS/MS

MS/MS evidence is usually the most informative step for moving from formula-level annotation to candidate structure ranking.

MS/MS annotation is performed to generate a matching score with library spectra. The most popular matching algorithm is dot product similarity. A recent study found spectral entropy algorithm outperformed dot product similarity (Li et al. 2021; Li and Fiehn 2023). Comparison of Cosine, Modified Cosine, and Neutral Loss Based Spectrum Alignment showed modified cosine similarity outperformed neutral loss matching and the cosine similarity in all cases. The performance of MS/MS spectrum alignment depends on the location and type of the modification, as well as the chemical compound class of fragmented molecules(Bittremieux et al. 2022). This work proposed a method weighting low-intensity MS/MS ions and m/z frequency for spectral library annotation, which will help annotate unknown spectra(Engler Hart et al. 2024). BLINK enables ultrafast tandem mass spectrometry cosine similarity scoring(Harwood et al. 2023). MS2Query enables the reliable and scalable MS2 mass spectra-based analogue search by machine learning(de Jonge et al. 2023). However, A spectroscopic test suggests that fragment ion structure annotations in MS/MS libraries are frequently incorrect(van Tetering et al. 2024).

Machine learning can also be applied for MS2 annotation(Codrean et al. 2023; Guo et al. 2023; Bilbao et al. 2023). Recent advancements in 2024 and 2025 have seen a surge in deep learning and large language model (LLM) applications. For instance, MSBERT uses mask learning and contrastive learning to embed tandem mass spectra into a chemically rational space(H. Zhang et al. 2024). Graph neural networks and transformers are also being adapted; Graph Transformers have been used for tandem mass spectrum prediction(Young et al. 2024), and graph embedding techniques are enhancing precursor-product ion pair analysis(Zheng et al. 2024). Furthermore, contrastive learning frameworks like CSU-MS2 are improving cross-modal compound identification(Xie et al. 2025), and LLMs are now being empowered to derive spectral embeddings(Xu et al. 2025) and predict collision cross-sections(Zhu et al. 2025).

You could check \[Workflow\] section for popular platform. Here are some stand-alone annotation software:

7.6.1 Matchms

Matchms is an open-source Python package to import, process, clean, and compare mass spectrometry data (MS/MS). It allows to implement and run an easy-to-follow, easy-to-reproduce workflow from raw mass spectra to pre- and post-processed spectral data. Spectral data can be imported from common formats such mzML, mzXML, msp, metabolomics-USI, MGF, or json (e.g. GNPS-syle json files). Matchms then provides filters for metadata cleaning and checking, as well as for basic peak filtering. Finally, matchms was build to import and apply different similarity measures to compare large amounts of spectra. This includes common Cosine scores, but can also easily be extended by custom measures. Example for spectrum similarity measures that were designed to work in matchms are Spec2Vec and MS2DeepScore(Huber et al. 2020).

7.6.2 MetDNA

MetDNA is the Metabolic reaction network-based recursive metabolite annotation for untargeted metabolomics (Shen et al. 2019).

7.6.3 MetFusion

Java based integration of compound identification strategies. You could access the application here (Gerlich and Neumann 2013).

7.6.4 MS2Analyzer

MS2Analyzer could annotate small molecule substructure from accurate tandem mass spectra. (Ma et al. 2014)

7.6.5 MetFrag

MetFrag could be used to make in silico prediction/match of MS/MS data(Ruttkies et al. 2016; Wolf et al. 2010).

7.6.6 CFM-ID

CFM-ID use Metlin’s data to make prediction (Allen et al. 2014) and 4.0 (Allen et al. 2014).

7.6.7 LC-MS2Struct

A machine learning framework for structural annotation of small-molecule data arising from liquid chromatography–tandem mass spectrometry (LC-MS2) measurements.(Bach et al. 2022)

7.6.8 LipidFrag

LipidFrag could be used to make in silico prediction/match of lipid related MS/MS data (Witting et al. 2017).

7.6.9 Lipidmatch

in silico: in silico lipid mass spectrum search (Koelmel et al. 2017).

7.6.10 BarCoding

Bar coding select mass-to-charge regions containing the most informative metabolite fragments and designate them as bins. Then translate each metabolite fragmentation pattern into a binary code by assigning 1’s to bins containing fragments and 0’s to bins without fragments. Such coding annotation could be used for MRM data (Spalding et al. 2016).

7.6.11 iMet

This online application is a network-based computation method for annotation (Aguilar-Mogas et al. 2017).

7.6.12 DNMS2Purifier

XGBoost based MS/MS spectral cleaning tool using intensity ratio fluctuation, appearance rate, and relative intensity(Zhao et al. 2023).

7.6.13 IDSL.CSA

Composite Spectra Analysis for Chemical Annotation of Untargeted Metabolomics Datasets(Baygi et al. 2023).

7.7 Step 5: refine annotation with retention, biology, and prior knowledge

After candidate structures are ranked by MS/MS evidence, orthogonal information can further improve confidence or reject implausible assignments.

7.7.1 Experimental design

Physicochemical Property can be used for annotation with a specific experimental design(Abrahamsson et al. 2023).

7.7.3 ProbMetab

Provides probability ranking to candidate compounds assigned to masses, with the prior assumption of connected sample and additional previous and spectral information modeled by the user. You could find source code here (Silva et al. 2014).

7.7.4 MI-Pack

You could find python software here (Weber and Viant 2010).

7.7.5 MetExpert

MetExpert is an expert system to assist users with limited expertise in informatics to interpret GCMS data for metabolite identification without querying spectral databases (Qiu et al. 2018).

7.7.6 MycompoundID

MycompoundID could be used to search known and unknown metabolites online (L. Li et al. 2013).

7.7.7 MetFamily

Shiny app for MS and MS/MS data annotation (Treutler et al. 2016).

7.7.8 CoA-Blast

For certain group of compounds such as Acyl-CoA, you might build a class level in silico database to annotated compounds with certain structure(Keshet et al. 2022).

7.7.9 KGMN

Knowledge-guided multi-layer network (KGMN) integrates three-layer networks, including knowledge-based metabolic reaction network, knowledge-guided MS/MS similarity network, and global peak correlation network for annotation (Zhou et al. 2022).

7.7.10 CCMN

CCMNs were then constructed using metabolic features shared classes, which facilitated the structure- or class annotation for completely unknown metabolic features(X. Zhang et al. 2024).

7.8 Step 6: search spectral databases

Spectral databases are most useful after candidate space has already been narrowed by formula, feature grouping, and MS/MS evidence.

7.8.1 MS

7.8.2 MS/MS

LibGen can generate high quality spectral libraries of Natural Products for EAD-, UVPD-, and HCD-High Resolution Mass Spectrometers(Kong et al. 2023).

  • MoNA Platform to collect all other open source database

  • MassBank

  • GNPS use inner correlationship in the data and make network analysis at peaks’ level instead of annotated compounds to annotate the data.

  • ReSpect: phytochemicals

  • Metlin is another useful online application for annotation(Guijas et al. 2018).

  • LipidBlast: in silico prediction

  • Lipid Maps

  • MZcloud

  • NIST: Not free

  • GMDB a multistage tandem mass spectral database using a variety of structurally defined glycans.

  • HMDB is a freely available electronic database containing detailed information about small molecule metabolites found in the human body. HMDB 5.0(Wishart et al. 2022) greatly expanded the database to over 217,000 metabolite entries with enhanced spectral data coverage and improved chemical taxonomy.

  • KEGG is a collection of small molecules, biopolymers, and other chemical substances that are relevant to biological systems.

7.9 Step 7: search compound databases

Compound databases extend the searchable chemical space beyond spectral libraries, but they should be used with explicit uncertainty because many candidate structures will share the same elemental formula or similar exact mass.

  • PubChem is an open chemistry database at the National Institutes of Health (NIH).

  • Chemspider is a free chemical structure database providing fast text and structure search access to over 67 million structures from hundreds of data sources.

  • ChEBI is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds.

  • RefMet A Reference list of Metabolite names.

  • CAS Largest substance database

  • CompTox compounds, exposure and toxicity database. Here is related data.

  • T3DB is a unique bioinformatics resource that combines detailed toxin data with comprehensive toxin target information.

  • FooDB is the world’s largest and most comprehensive resource on food constituents, chemistry and biology.

  • Phenol explorer is the first comprehensive database on polyphenol content in foods.

  • Drugbank is a unique bioinformatics and cheminformatics resource that combines detailed drug data with comprehensive drug target information.

  • LMDB is a freely available electronic database containing detailed information about small molecule metabolites found in different livestock species.

  • HPV High Production Volume Information System

There are also metabolites atlas for specific domain.

References

Abrahamsson, Dimitri, Christopher L. Brueck, Carsten Prasse, et al. 2023. “Extracting Structural Information from Physicochemical Property Measurements Using Machine Learning-A New Approach for Structure Elucidation in Non-targeted Analysis.” Environmental Science & Technology, ahead of print, September. https://doi.org/10.1021/acs.est.3c03003.
Aguilar-Mogas, Antoni, Marta Sales-Pardo, Miriam Navarro, Roger Guimerà, and Oscar Yanes. 2017. iMet: A Network-Based Computational Tool To Assist in the Annotation of Metabolites from Tandem Mass Spectra.” Analytical Chemistry 89 (6): 3474–82. https://doi.org/10.1021/acs.analchem.6b04512.
Alden, Nicholas, Smitha Krishnan, Vladimir Porokhin, et al. 2017. “Biologically Consistent Annotation of Metabolomics Data.” Analytical Chemistry, ahead of print, November. https://doi.org/10.1021/acs.analchem.7b02162.
Allard, Pierre-Marie, Grégory Genta-Jouve, and Jean-Luc Wolfender. 2017. “Deep Metabolome Annotation in Natural Products Research: Towards a Virtuous Cycle in Metabolite Identification.” Current Opinion in Chemical Biology, Omics, vol. 36 (February): 40–49. https://doi.org/10.1016/j.cbpa.2016.12.022.
Allen, Felicity, Allison Pon, Michael Wilson, Russ Greiner, and David Wishart. 2014. CFM-ID: A Web Server for Annotation, Spectrum Prediction and Metabolite Identification from Tandem Mass Spectra.” Nucleic Acids Research 42 (W1): W94–99. https://doi.org/10.1093/nar/gku436.
Bach, Eric, Emma L. Schymanski, and Juho Rousu. 2022. “Joint Structural Annotation of Small Molecules Using Liquid Chromatography Retention Order and Tandem Mass Spectrometry Data.” Nature Machine Intelligence 4 (12): 1224–37. https://doi.org/10.1038/s42256-022-00577-2.
Baran, Richard, and Trent R. Northen. 2013. “Robust Automated Mass Spectra Interpretation and Chemical Formula Calculation Using Mixed Integer Linear Programming.” Analytical Chemistry 85 (20): 9777–84. https://doi.org/10.1021/ac402180c.
Barranco-Altirriba, Maria, Pol Solà-Santos, Sergio Picart-Armada, Samir Kanaan-Izquierdo, Jordi Fonollosa, and Alexandre Perera-Lluna. 2021. mWISE: An Algorithm for Context-Based Annotation of Liquid ChromatographyMass Spectrometry Features Through Diffusion in Graphs.” Analytical Chemistry 93 (31): 10772–78. https://doi.org/10.1021/acs.analchem.1c00238.
Baygi, Sadjad Fakouri, Yashwant Kumar, and Dinesh Kumar Barupal. 2023. IDSL.CSA: Composite Spectra Analysis for Chemical Annotation of Untargeted Metabolomics Datasets.” IDSL.CSA: Composite Spectra Analysis for Chemical Annotation of Untargeted Metabolomics Datasets, ahead of print, June. https://doi.org/10.1021/acs.analchem.3c00376.
Bernardo-Bermejo, Samuel, Jingchuan Xue, Linh Hoang, et al. 2023. “Quantitative Multiple Fragment Monitoring with Enhanced in-Source Fragmentation/Annotation Mass Spectrometry.” Nature Protocols, February, 1–20. https://doi.org/10.1038/s41596-023-00803-0.
Bilbao, Aivett, Nathalie Munoz, Joonhoon Kim, et al. 2023. PeakDecoder Enables Machine Learning-Based Metabolite Annotation and Accurate Profiling in Multidimensional Mass Spectrometry Measurements.” Nature Communications 14 (1): 2461. https://doi.org/10.1038/s41467-023-37031-9.
Bittremieux, Wout, Robin Schmid, Florian Huber, Justin J. J. van der Hooft, Mingxun Wang, and Pieter C. Dorrestein. 2022. “Comparison of Cosine, Modified Cosine, and Neutral Loss Based Spectrum Alignment For Discovery of Structurally Related Molecules.” Journal of the American Society for Mass Spectrometry 33 (9): 1733–44. https://doi.org/10.1021/jasms.2c00153.
Bonini, Paolo, Tobias Kind, Hiroshi Tsugawa, Dinesh Kumar Barupal, and Oliver Fiehn. 2020. “Retip: Retention Time Prediction for Compound Annotation in Untargeted Metabolomics.” Analytical Chemistry 92 (11): 7515–22. https://doi.org/10.1021/acs.analchem.9b05765.
Briere, Yaset Caicedo de la, Louis-Felix Nothias, and Pieter C Dorrestein. 2025. “A Perspective on Unintentional Fragments and Their Impact on the Dark Metabolome, Untargeted Profiling, Molecular Networking, Public Data, and Repository Scale Analysis.” JACS Au 5 (7): 2766–76. https://doi.org/10.1021/jacsau.5c01063.
Broeckling, C. D., F. A. Afsar, S. Neumann, A. Ben-Hur, and J. E. Prenni. 2014. RAMClust: A Novel Feature Clustering Method Enables Spectral-Matching-Based Annotation for Metabolomics Data.” Analytical Chemistry 86 (14): 6812–17. https://doi.org/10.1021/ac501530d.
Broeckling, Corey D., Andrea Ganna, Mark Layer, et al. 2016. “Enabling Efficient and Confident Annotation of LC-MS Metabolomics Data Through MS1 Spectrum and Time Prediction.” Analytical Chemistry 88 (18): 9226–34. https://doi.org/10.1021/acs.analchem.6b02479.
Chaleckis, Romanas, Isabel Meister, Pei Zhang, and Craig E Wheelock. 2019. “Challenges, Progress and Promises of Metabolite Annotation for LCMS-based Metabolomics.” Current Opinion in Biotechnology, Analytical Biotechnology, vol. 55 (February): 44–50. https://doi.org/10.1016/j.copbio.2018.07.010.
Charbonnet, Joseph A., Carrie A. McDonough, Feng Xiao, et al. 2022. “Communicating Confidence of Per- and Polyfluoroalkyl Substance Identification via High-Resolution Mass Spectrometry.” Environmental Science & Technology Letters, ahead of print, May. https://doi.org/10.1021/acs.estlett.2c00206.
Chen, Li, Wenyun Lu, Lin Wang, et al. 2021. “Metabolite Discovery Through Global Annotation of Untargeted Metabolomics Data.” Nature Methods 18 (11): 1377–85. https://doi.org/10.1038/s41592-021-01303-3.
Chokkathukalam, Achuthanunni, Andris Jankevics, Darren J. Creek, Fiona Achcar, Michael P. Barrett, and Rainer Breitling. 2013. mzMatchISO: An R Tool for the Annotation and Relative Quantification of Isotope-Labelled Mass Spectrometry Data.” Bioinformatics 29 (2): 281–83. https://doi.org/10.1093/bioinformatics/bts674.
Codrean, S., B. Kruit, N. Meekel, D. Vughs, and F. Béen. 2023. “Predicting the Diagnostic Information of Tandem Mass Spectra of Environmentally Relevant Compounds Using Machine Learning.” Analytical Chemistry, ahead of print, October. https://doi.org/10.1021/acs.analchem.3c03470.
Daly, Rónán, Simon Rogers, Joe Wandy, Andris Jankevics, Karl E. V. Burgess, and Rainer Breitling. 2014. MetAssign: Probabilistic Annotation of Metabolites from LCMS Data Using a Bayesian Clustering Approach.” Bioinformatics 30 (19): 2764–71. https://doi.org/10.1093/bioinformatics/btu370.
de Jonge, Niek F., Joris J. R. Louwen, Elena Chekmeneva, et al. 2023. MS2Query: Reliable and Scalable MS2 Mass Spectra-Based Analogue Search.” Nature Communications 14 (1): 1752. https://doi.org/10.1038/s41467-023-37446-4.
DeFelice, Brian C., Sajjan Singh Mehta, Stephanie Samra, et al. 2017. “Mass Spectral Feature List Optimizer (MS-FLO): A Tool To Minimize False Positive Peak Reports in Untargeted Liquid ChromatographyMass Spectroscopy (LC-MS) Data Processing.” Analytical Chemistry 89 (6): 3250–55. https://doi.org/10.1021/acs.analchem.6b04372.
Djoumbou Feunang, Yannick, Roman Eisner, Craig Knox, et al. 2016. ClassyFire: Automated Chemical Classification with a Comprehensive, Computable Taxonomy.” Journal of Cheminformatics 8 (1): 61. https://doi.org/10.1186/s13321-016-0174-y.
Domingo-Almenara, Xavier, J. Rafael Montenegro-Burke, H. Paul Benton, and Gary Siuzdak. 2018. “Annotation: A Computational Solution for Streamlining Metabolomics Analysis.” Analytical Chemistry 90 (1): 480–89. https://doi.org/10.1021/acs.analchem.7b03929.
Dührkop, Kai, Markus Fleischauer, Marcus Ludwig, et al. 2019. SIRIUS 4: A Rapid Tool for Turning Tandem Mass Spectra into Metabolite Structure Information.” Nature Methods 16 (4): 299–302. https://doi.org/10.1038/s41592-019-0344-8.
Dyar, Kenneth A., Dominik Lutter, Anna Artati, et al. 2018. “Atlas of Circadian Metabolism Reveals System-wide Coordination and Communication Between Clocks.” Cell 174 (6): 1571–1585.e11. https://doi.org/10.1016/j.cell.2018.08.042.
Engler Hart, Chloe, Tobias Kind, Pieter C. Dorrestein, David Healey, and Daniel Domingo-Fernández. 2024. “Weighting Low-Intensity MS/MS Ions and m/z Frequency for Spectral Library Annotation.” Journal of the American Society for Mass Spectrometry 35 (2): 266–74. https://doi.org/10.1021/jasms.3c00353.
Fernández-Albert, Francesc, Rafael Llorach, Cristina Andrés-Lacueva, and Alexandre Perera. 2014. “An R Package to Analyse LC/MS Metabolomic Data: MAIT (Metabolite Automatic Identification Toolkit).” Bioinformatics 30 (13): 1937–39. https://doi.org/10.1093/bioinformatics/btu136.
Gerlich, Michael, and Steffen Neumann. 2013. MetFusion: Integration of Compound Identification Strategies.” Journal of Mass Spectrometry 48 (3): 291–98. https://doi.org/10.1002/jms.3123.
Giera, Martin, Aries Aisporna, Winnie Uritboonthai, and Gary Siuzdak. 2024. “The Hidden Impact of in-Source Fragmentation in Metabolic and Chemical Mass Spectrometry Data Interpretation.” Nature Metabolism, June, 1–2. https://doi.org/10.1038/s42255-024-01076-x.
Giné, Roger, Jordi Capellades, Josep M. Badia, et al. 2021. HERMES: A Molecular-Formula-Oriented Method to Target the Metabolome.” Nature Methods 18 (11): 1370–76. https://doi.org/10.1038/s41592-021-01307-z.
Gugisch, Ralf, Adalbert Kerber, Axel Kohnert, et al. 2015. “Chapter 6 - MOLGEN 5.0, A Molecular Structure Generator.” In Advances in Mathematical Chemistry and Applications, edited by Subhash C. Basak, Guillermo Restrepo, and José L. Villaveces. Bentham Science Publishers. https://doi.org/10.1016/B978-1-68108-198-4.50006-0.
Guha, Rajarshi. 2007. “Chemical Informatics Functionality in R.” Journal of Statistical Software 18 (1): 1–16. https://doi.org/10.18637/jss.v018.i05.
Guijas, Carlos, J. Rafael Montenegro-Burke, Xavier Domingo-Almenara, et al. 2018. METLIN: A Technology Platform for Identifying Knowns and Unknowns.” Analytical Chemistry 90 (5): 3156–64. https://doi.org/10.1021/acs.analchem.7b04424.
Guo, Hao, Kebing Xue, Haiming Sun, Weihao Jiang, and Shiliang Pu. 2023. “Contrastive Learning-Based Embedder for the Representation of Tandem Mass Spectra.” Analytical Chemistry, ahead of print, May. https://doi.org/10.1021/acs.analchem.3c00260.
Guo, Jian, Sam Shen, Shipei Xing, Huaxu Yu, and Tao Huan. 2021. ISFrag: De Novo Recognition of In-Source Fragments for Liquid ChromatographyMass Spectrometry Data.” Analytical Chemistry, ahead of print, July. https://doi.org/10.1021/acs.analchem.1c01644.
Harwood, Thomas V., Daniel G. C. Treen, Mingxun Wang, Wibe de Jong, Trent R. Northen, and Benjamin P. Bowen. 2023. BLINK Enables Ultrafast Tandem Mass Spectrometry Cosine Similarity Scoring.” Scientific Reports 13 (1): 13462. https://doi.org/10.1038/s41598-023-40496-9.
Huber, Florian, Stefan Verhoeven, Christiaan Meijer, et al. 2020. “Matchms - Processing and Similarity Evaluation of Mass Spectrometry Data.” Journal of Open Source Software 5 (52): 2411. https://doi.org/10.21105/joss.02411.
Hufsky, Franziska, Kerstin Scheubert, and Sebastian Böcker. 2014. “Computational Mass Spectrometry for Small-Molecule Fragmentation.” TrAC Trends in Analytical Chemistry 53 (January): 41–48. https://doi.org/10.1016/j.trac.2013.09.008.
Jaeger, Carsten, Friederike Hoffmann, Clemens A. Schmitt, and Jan Lisec. 2016. “Automated Annotation and Evaluation of In-Source Mass Spectra in GC/Atmospheric Pressure Chemical Ionization-MS-Based Metabolomics.” Analytical Chemistry 88 (19): 9386–90. https://doi.org/10.1021/acs.analchem.6b02743.
Ju, Ran, Xinyu Liu, Fujian Zheng, et al. 2020. “A Graph Density-Based Strategy for Features Fusion from Different Peak Extract Software to Achieve More Metabolites in Metabolic Profiling from High-Resolution Mass Spectrometry.” Analytica Chimica Acta 1139 (December): 8–14. https://doi.org/10.1016/j.aca.2020.09.029.
Kachman, Maureen, Hani Habra, William Duren, et al. 2020. “Deep Annotation of Untargeted LC-MS Metabolomics Data with Binner.” Bioinformatics 36 (6): 1801–6. https://doi.org/10.1093/bioinformatics/btz798.
Keshet, Uri, Tobias Kind, Xinchen Lu, Sarita Devi, and Oliver Fiehn. 2022. “Acyl-CoA Identification in Mouse Liver Samples Using the In Silico CoA-Blast Tandem Mass Spectral Library.” Analytical Chemistry 94 (6): 2732–39. https://doi.org/10.1021/acs.analchem.1c03272.
Kind, Tobias, and Oliver Fiehn. 2007. “Seven Golden Rules for Heuristic Filtering of Molecular Formulas Obtained by Accurate Mass Spectrometry.” BMC Bioinformatics 8 (1): 105. https://doi.org/10.1186/1471-2105-8-105.
Koelmel, Jeremy P et al. 2025. “Unveiling the Dark Matter of the Metabolome: A Narrative Review of Bioinformatics Tools for LC-HRMS-Based Compound Annotation.” Talanta 290: 127633. https://doi.org/10.1016/j.talanta.2025.127633.
Koelmel, Jeremy P., Nicholas M. Kroeger, Candice Z. Ulmer, et al. 2017. LipidMatch: An Automated Workflow for Rule-Based Lipid Identification Using Untargeted High-Resolution Tandem Mass Spectrometry Data.” BMC Bioinformatics 18 (July): 331. https://doi.org/10.1186/s12859-017-1744-3.
Kong, Fanzhou, Uri Keshet, Tong Shen, Elys Rodriguez, and Oliver Fiehn. 2023. LibGen: Generating High Quality Spectral Libraries of Natural Products for EAD-, UVPD-, and HCD-High Resolution Mass Spectrometers.” Analytical Chemistry, ahead of print, November. https://doi.org/10.1021/acs.analchem.3c02263.
Kouřil, Štěpán, Julie de Sousa, Jan Václavík, David Friedecký, and Tomáš Adam. 2020. CROP: Correlation-Based Reduction of Feature Multiplicities in Untargeted Metabolomic Data.” Bioinformatics 36 (9): 2941–42. https://doi.org/10.1093/bioinformatics/btaa012.
Kuhl, Carsten, Ralf Tautenhahn, Christoph Böttcher, Tony R. Larson, and Steffen Neumann. 2012. CAMERA: An Integrated Strategy for Compound Spectra Extraction and Annotation of Liquid Chromatography/Mass Spectrometry Data Sets.” Analytical Chemistry 84 (1): 283–89. https://doi.org/10.1021/ac202450g.
Lai, Zijuan, Hiroshi Tsugawa, Gert Wohlgemuth, et al. 2018. “Identifying Metabolites by Integrating Metabolome Databases with Mass Spectrometry Cheminformatics.” Nature Methods 15 (1): 53–56. https://doi.org/10.1038/nmeth.4512.
Li, Liang, Ronghong Li, Jianjun Zhou, et al. 2013. MyCompoundID: Using an Evidence-Based Metabolome Library for Metabolite Identification.” Analytical Chemistry 85 (6): 3401–8. https://doi.org/10.1021/ac400099b.
Li, Shuzhao, and Nathaniel G Mahieu. 2025. “Systematic Pre-Annotation Explains the ‘Dark Matter’ in LC-MS Metabolomics.” Analytical Chemistry 97 (7): 3680–88. https://doi.org/10.1021/acs.analchem.4c05537.
Li, Yuanyue, and Oliver Fiehn. 2023. “Flash Entropy Search to Query All Mass Spectral Libraries in Real Time.” Nature Methods 20 (10): 1475–78. https://doi.org/10.1038/s41592-023-02012-9.
Li, Yuanyue, Tobias Kind, Jacob Folz, Arpana Vaniya, Sajjan Singh Mehta, and Oliver Fiehn. 2021. “Spectral Entropy Outperforms MS/MS Dot Product Similarity for Small-Molecule Compound Identification.” Nature Methods 18 (12): 1524–31. https://doi.org/10.1038/s41592-021-01331-z.
Lieng, Brandon Y., Andrew T. Quaile, Xavier Domingo-Almenara, Hannes L. Röst, and J. Rafael Montenegro-Burke. 2023. “Computational Expansion of High-Resolution-MSn Spectral Libraries.” Analytical Chemistry, ahead of print, November. https://doi.org/10.1021/acs.analchem.3c03343.
Loos, Martin, and Heinz Singer. 2017. “Nontargeted Homologue Series Extraction from Hyphenated High Resolution Mass Spectrometry Data.” Journal of Cheminformatics 9 (February). https://doi.org/10.1186/s13321-017-0197-z.
Ma, Yan, Tobias Kind, Dawei Yang, Carlos Leon, and Oliver Fiehn. 2014. MS2Analyzer: A Software for Small Molecule Substructure Annotations from Accurate Tandem Mass Spectra.” Analytical Chemistry 86 (21): 10724–31. https://doi.org/10.1021/ac502818e.
Mahieu, Nathaniel G., and Gary J. Patti. 2017. “Systems-Level Annotation of a Metabolomics Data Set Reduces 25 000 Features to Fewer Than 1000 Unique Metabolites.” Analytical Chemistry 89 (19): 10397–406. https://doi.org/10.1021/acs.analchem.7b02380.
Mahieu, Nathaniel G., Jonathan L. Spalding, Susan J. Gelman, and Gary J. Patti. 2016. “Defining and Detecting Complex Peak Relationships in Mass Spectral Data: The Mz.unity Algorithm.” Analytical Chemistry 88 (18): 9037–46. https://doi.org/10.1021/acs.analchem.6b01702.
Menikarachchi, Lochana C., Shannon Cawley, Dennis W. Hill, et al. 2012. MolFind: A Software Package Enabling HPLC/MS-Based Identification of Unknown Chemical Structures.” Analytical Chemistry 84 (21): 9388–94. https://doi.org/10.1021/ac302048x.
Nash, William J., and Warwick B. Dunn. 2019. “From Mass to Metabolite in Human Untargeted Metabolomics: Recent Advances in Annotation of Metabolites Applying Liquid Chromatography-Mass Spectrometry Data.” TrAC Trends in Analytical Chemistry 120 (November): 115324. https://doi.org/10.1016/j.trac.2018.11.022.
O’Boyle, Noel M., Michael Banck, Craig A. James, Chris Morley, Tim Vandermeersch, and Geoffrey R. Hutchison. 2011. “Open Babel: An Open Chemical Toolbox.” Journal of Cheminformatics 3 (1): 33. https://doi.org/10.1186/1758-2946-3-33.
Patiny, Luc, and Alain Borel. 2013. ChemCalc: A Building Block for Tomorrow’s Chemical Infrastructure.” Journal of Chemical Information and Modeling 53 (5): 1223–28. https://doi.org/10.1021/ci300563h.
Petras, Daniel, Louis-Felix Nothias, Robert A Quinn, et al. 2018. “Dark Matter in Host-Microbiome Metabolomics: Tackling the Unknowns – a Review.” Analytica Chimica Acta 1037: 13–27. https://doi.org/10.1016/j.aca.2017.11.043.
Qiu, Feng, Dennis D. Fine, Daniel J. Wherritt, Zhentian Lei, and Lloyd W. Sumner. 2016. PlantMAT: A Metabolomics Tool for Predicting the Specialized Metabolic Potential of a System and for Large-Scale Metabolite Identifications.” Analytical Chemistry 88 (23): 11373–83. https://doi.org/10.1021/acs.analchem.6b00906.
Qiu, Feng, Zhentian Lei, and Lloyd W. Sumner. 2018. MetExpert: An Expert System to Enhance Gas Chromatography-Mass Spectrometry-Based Metabolite Identifications.” Analytica Chimica Acta, Analytical Metabolomics, vol. 1037 (December): 316–26. https://doi.org/10.1016/j.aca.2018.03.052.
Ruttkies, Christoph, Emma L. Schymanski, Sebastian Wolf, Juliane Hollender, and Steffen Neumann. 2016. MetFrag Relaunched: Incorporating Strategies Beyond in Silico Fragmentation.” Journal of Cheminformatics 8 (January): 3. https://doi.org/10.1186/s13321-016-0115-9.
Scheltema, Richard A., Andris Jankevics, Ritsert C. Jansen, Morris A. Swertz, and Rainer Breitling. 2011. PeakML/mzMatch: A File Format, Java Library, R Library, and Tool-Chain for Mass Spectrometry Data Analysis.” Analytical Chemistry 83 (7): 2786–93. https://doi.org/10.1021/ac2000994.
Schymanski, Emma L., Junho Jeon, Rebekka Gulde, et al. 2014. “Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence.” Environmental Science & Technology 48 (4): 2097–98. https://doi.org/10.1021/es5002105.
Senan, Oriol, Antoni Aguilar-Mogas, Miriam Navarro, et al. 2019. CliqueMS: A Computational Tool for Annotating in-Source Metabolite Ions from LC-MS Untargeted Metabolomics Data Based on a Coelution Similarity Network.” Bioinformatics 35 (20): 4089–97. https://doi.org/10.1093/bioinformatics/btz207.
Shen, Xiaotao, Ruohong Wang, Xin Xiong, et al. 2019. “Metabolic Reaction Network-Based Recursive Metabolite Annotation for Untargeted Metabolomics.” Nature Communications 10 (1): 1–14. https://doi.org/10.1038/s41467-019-09550-x.
Silva, Ricardo R da, Pieter C Dorrestein, and Robert A Quinn. 2015. “Illuminating the Dark Matter in Metabolomics.” Proceedings of the National Academy of Sciences 112 (41): 12549–50. https://doi.org/10.1073/pnas.1516878112.
Silva, Ricardo R., Fabien Jourdan, Diego M. Salvanha, et al. 2014. ProbMetab: An R Package for Bayesian Probabilistic Annotation of LCMS-based Metabolomics.” Bioinformatics 30 (9): 1336–37. https://doi.org/10.1093/bioinformatics/btu019.
Sindelar, Miriam, and Gary J. Patti. 2020. “Chemical Discovery in the Era of Metabolomics.” Journal of the American Chemical Society, ahead of print, April. https://doi.org/10.1021/jacs.9b13198.
Spalding, Jonathan L., Kevin Cho, Nathaniel G. Mahieu, et al. 2016. “Bar Coding MS2 Spectra for Metabolite Identification.” Analytical Chemistry 88 (5): 2538–42. https://doi.org/10.1021/acs.analchem.5b04925.
Sumner, Lloyd W., Alexander Amberg, Dave Barrett, et al. 2007. “Proposed Minimum Reporting Standards for Chemical Analysis Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI).” Metabolomics : Official Journal of the Metabolomic Society 3 (3): 211–21. https://doi.org/10.1007/s11306-007-0082-2.
Tian, Zhitao, Xin Hu, Yingying Xu, et al. 2023. PMhub 1.0: A Comprehensive Plant Metabolome Database.” Nucleic Acids Research, October, gkad811. https://doi.org/10.1093/nar/gkad811.
Torigoe, Taihei, Masatomo Takahashi, Omidreza Heravizadeh, et al. 2024. “Predicting Retention Time in Unified-Hydrophilic-Interaction/Anion-Exchange Liquid Chromatography High-Resolution Tandem Mass Spectrometry (Unified-HILIC/AEX/HRMS/MS) for Comprehensive Structural Annotation of Polar Metabolome.” Predicting Retention Time in Unified-Hydrophilic-Interaction/Anion-Exchange Liquid Chromatography High-Resolution Tandem Mass Spectrometry (Unified-HILIC/AEX/HRMS/MS) for Comprehensive Structural Annotation of Polar Metabolome, ahead of print, January. https://doi.org/10.1021/acs.analchem.3c04618.
Treutler, Hendrik, Hiroshi Tsugawa, Andrea Porzel, et al. 2016. “Discovering Regulated Metabolite Families in Untargeted Metabolomics Studies.” Analytical Chemistry 88 (16): 8082–90. https://doi.org/10.1021/acs.analchem.6b01569.
Tsugawa, Hiroshi, Tobias Kind, Ryo Nakabayashi, et al. 2016. “Hydrogen Rearrangement Rules: Computational MS/MS Fragmentation and Structure Elucidation Using MS-FINDER Software.” Analytical Chemistry 88 (16): 7946–58. https://doi.org/10.1021/acs.analchem.6b00770.
Uppal, Karan, Douglas I. Walker, and Dean P. Jones. 2017. xMSannotator: An R Package for Network-Based Annotation of High-Resolution Metabolomics Data.” Analytical Chemistry 89 (2): 1063–67. https://doi.org/10.1021/acs.analchem.6b01214.
Uritboonthai, Winnie, Linh Hoang, Aries Aisporna, Martin Giera, and Gary Siuzdak. 2025. “The Dark Metabolome/Lipidome and In-Source Fragmentation.” Analytical Science Advances 6 (1): e70012. https://doi.org/10.1002/ansa.70012.
van Tetering, Lara, Sylvia Spies, Quirine D. K. Wildeman, et al. 2024. “A Spectroscopic Test Suggests That Fragment Ion Structure Annotations in MS/MS Libraries Are Frequently Incorrect.” Communications Chemistry 7 (1): 1–11. https://doi.org/10.1038/s42004-024-01112-7.
Viant, Mark R, Irwin J Kurland, Martin R Jones, and Warwick B Dunn. 2017. “How Close Are We to Complete Annotation of Metabolomes?” Current Opinion in Chemical Biology, Omics, vol. 36 (February): 64–69. https://doi.org/10.1016/j.cbpa.2017.01.001.
Wang, Mingxun, Jeremy J. Carver, Vanessa V. Phelan, et al. 2016. “Sharing and Community Curation of Mass Spectrometry Data with Global Natural Products Social Molecular Networking.” Nature Biotechnology 34 (8): 828–37. https://doi.org/10.1038/nbt.3597.
Weber, Ralf J. M., and Mark R. Viant. 2010. MI-Pack: Increased Confidence of Metabolite Identification in Mass Spectra by Integrating Accurate Masses and Metabolic Pathways.” Chemometrics and Intelligent Laboratory Systems, OMICS, vol. 104 (1): 75–82. https://doi.org/10.1016/j.chemolab.2010.04.010.
Wishart, David S, AnChi Guo, Eponine Oler, et al. 2022. HMDB 5.0: the Human Metabolome Database for 2022.” Nucleic Acids Research 50 (D1): D1003–11. https://doi.org/10.1093/nar/gkab1062.
Witting, Michael, Christoph Ruttkies, Steffen Neumann, and Philippe Schmitt-Kopplin. 2017. LipidFrag: Improving Reliability of in Silico Fragmentation of Lipids and Application to the Caenorhabditis Elegans Lipidome.” PLOS ONE 12 (3): e0172311. https://doi.org/10.1371/journal.pone.0172311.
Wolf, Sebastian, Stephan Schmidt, Matthias Müller-Hannemann, and Steffen Neumann. 2010. “In Silico Fragmentation for Computer Assisted Identification of Metabolite Mass Spectra.” BMC Bioinformatics 11 (March): 148. https://doi.org/10.1186/1471-2105-11-148.
Xie, Ting, Hailiang Zhang, Qiong Yang, et al. 2025. CSU-MS2: A Contrastive Learning Framework for Cross-Modal Compound Identification from MS/MS Spectra to Molecular Structures.” Analytical Chemistry, ahead of print, June. https://doi.org/10.1021/acs.analchem.5c01594.
Xing, Shipei, Sam Shen, Banghua Xu, Xiaoxiao Li, and Tao Huan. 2023. BUDDY: Molecular Formula Discovery via Bottom-up MS/MS Interrogation.” Nature Methods, April, 1–10. https://doi.org/10.1038/s41592-023-01850-x.
Xu, Yang, Yixiao Ma, Weijie Xu, Zuliang Yang, and Kai Ming Ting. 2025. “A Large Language Model for Deriving Spectral Embeddings for Accurate Compound Identification in Mass Spectrometry.” Communications Chemistry 8 (1): 326. https://doi.org/10.1038/s42004-025-01708-7.
Xu, Yi-Fan, Wenyun Lu, and Joshua D. Rabinowitz. 2015. “Avoiding Misannotation of In-Source Fragmentation Products as Cellular Metabolites in Liquid ChromatographyMass Spectrometry-Based Metabolomics.” Analytical Chemistry 87 (4): 2273–81. https://doi.org/10.1021/ac504118y.
Xue, Jingchuan, Rico J. E. Derks, Bill Webb, et al. 2021. “Single Quadrupole Multiple Fragment Ion Monitoring Quantitative Mass Spectrometry.” Analytical Chemistry 93 (31): 10879–89. https://doi.org/10.1021/acs.analchem.1c01246.
Xue, Jingchuan, Carlos Guijas, H. Paul Benton, Benedikt Warth, and Gary Siuzdak. 2020. METLIN MS 2 Molecular Standards Database: A Broad Chemical and Biological Resource.” Nature Methods 17 (10): 953–54. https://doi.org/10.1038/s41592-020-0942-5.
Xue, Jingchuan, Jiamin Zhu, Lixin Hu, et al. 2023. EISA-EXPOSOME: One Highly Sensitive and Autonomous Exposomic Platform with Enhanced in-Source Fragmentation/Annotation.” Analytical Chemistry, ahead of print, November. https://doi.org/10.1021/acs.analchem.3c02697.
Xue, Jun, Bingyi Wang, Hongchao Ji, and WeiHua Li. 2024. RT-Transformer: Retention Time Prediction for Metabolite Annotation to Assist in Metabolite Identification.” Bioinformatics 40 (3): btae084. https://doi.org/10.1093/bioinformatics/btae084.
Yang, Qiong, Hongchao Ji, Zhenbo Xu, et al. 2023. “Ultra-Fast and Accurate Electron Ionization Mass Spectrum Matching for Compound Identification with Million-Scale in-Silico Library.” Nature Communications 14 (1): 3722. https://doi.org/10.1038/s41467-023-39279-7.
Young, Adamo, Hannes Röst, and Bo Wang. 2024. “Tandem Mass Spectrum Prediction for Small Molecules Using Graph Transformers.” Nature Machine Intelligence, April, 1–13. https://doi.org/10.1038/s42256-024-00816-8.
Yu, Miao, Georgia Dolios, and Lauren Petrick. 2022. “Reproducible Untargeted Metabolomics Workflow for Exhaustive MS2 Data Acquisition of MS1 Features.” Journal of Cheminformatics 14 (1): 6. https://doi.org/10.1186/s13321-022-00586-8.
Yu, Miao, Mariola Olkowicz, and Janusz Pawliszyn. 2019. “Structure/Reaction Directed Analysis for LC-MS Based Untargeted Analysis.” Analytica Chimica Acta 1050 (March): 16–24. https://doi.org/10.1016/j.aca.2018.10.062.
Zhang, Hailiang, Qiong Yang, Ting Xie, Yue Wang, Zhimin Zhang, and Hongmei Lu. 2024. MSBERT: Embedding Tandem Mass Spectra into Chemically Rational Space by Mask Learning and Contrastive Learning.” Analytical Chemistry 96 (42): 16599–608. https://doi.org/10.1021/acs.analchem.4c02426.
Zhang, Xiuqiong, Zaifang Li, Chunxia Zhao, et al. 2024. “Leveraging Unidentified Metabolic Features for Key Pathway Discovery: Chemical Classification-driven Network Analysis in Untargeted Metabolomics.” Analytical Chemistry, ahead of print, February. https://doi.org/10.1021/acs.analchem.3c04591.
Zhang, Yuhao, Jingyu Liao, Wanqi Le, Gaosong Wu, and Weidong Zhang. 2023. “Improving the Data Quality of Untargeted Metabolomics Through a Targeted Data-Dependent Acquisition Based on an Inclusion List of Differential and Preidentified Ions.” Analytical Chemistry 95 (34): 12964–73. https://doi.org/10.1021/acs.analchem.3c02888.
Zhao, Tingting, Shipei Xing, Huaxu Yu, and Tao Huan. 2023. “De Novo Cleaning of Chimeric MS/MS Spectra for LC-MS/MS-Based Metabolomics.” Analytical Chemistry 95 (35): 13018–28. https://doi.org/10.1021/acs.analchem.3c00736.
Zheng, Fujian, Lei You, Xinjie Zhao, Xin Lu, and Guowang Xu. 2024. “Predicting Tandem Mass Spectra of Small Molecules Using Graph Embedding of Precursor-Product Ion Pair Graph.” Analytical Chemistry, ahead of print, November. https://doi.org/10.1021/acs.analchem.4c04375.
Zhou, Zhiwei, Mingdu Luo, Haosong Zhang, Yandong Yin, Yuping Cai, and Zheng-Jiang Zhu. 2022. “Metabolite Annotation from Knowns to Unknowns Through Knowledge-Guided Multi-Layer Metabolic Networking.” Nature Communications 13 (1): 6656. https://doi.org/10.1038/s41467-022-34537-6.
Zhu, Yuxuan et al. 2025. “Large Language Models Empowered to Predict Collision Cross-Section Values from Mass Spectra.” Analytical Chemistry.