Reactomics PMD-based reactomics — introduction and monthly literature collection.

Reactomics

Reactomics is the study of chemical reactions as a system — specifically, using paired mass distances (PMDs) observed in mass spectrometry data to identify and map the reaction networks operating in biological, environmental, and chemical systems. The concept was formally introduced in "Reactomics: using mass spectrometry as a chemical reaction detector" (Communications Chemistry, 2020), where it was shown that the mass differences between pairs of ions in a metabolomics dataset directly encode the chemical reactions that connect them.

The core insight is simple but powerful: a fixed mass difference between two molecules corresponds to a specific chemical reaction or biotransformation. By cataloguing these paired mass distances across an untargeted metabolomics dataset, one can reconstruct the reaction networks active in a sample without prior knowledge of compound identities.

Why reactomics matters

Traditional metabolomics workflows identify compounds and correlate their abundance with phenotypes. This approach is valuable, but it treats metabolites as independent entities rather than as nodes in a reaction network. In reality, metabolites are produced, consumed, and transformed by enzymes and spontaneous chemistry — they are connected by reactions.

Reactomics addresses this gap by treating reactions as first-class objects. Rather than asking "which metabolites differ between groups?", reactomics asks "which reactions differ between groups?" and "what reaction network is active in this sample?". This shift has several practical consequences:

  • Reaction-based analysis is more robust to annotation gaps than compound-based analysis, because PMDs can be computed for any peak pair regardless of whether the peaks have been annotated.
  • Reactions are chemically interpretable — a PMD of 2.0157 (H₂) means reduction; a PMD of 14.0157 (CH₂) means methylation or chain elongation. The network is readable without a lookup table.
  • Reaction networks can be compared across conditions, across species, and across sample types in a way that metabolite lists often cannot.
  • The approach connects naturally to biochemical pathway databases (KEGG, HMDB reactions, Reactome) while remaining usable when compound annotation is incomplete.

The practical appeal above rests on a deeper question worth taking seriously: why does this work at all? Why should mass differences encode chemistry so cleanly, and why should the same set of reactions recur across organisms? The figure below is not just decoration — it points to a real structural reason for the discreteness, with roots in physics and evolution.

PMDs visualised as quantized orbital shells around a precursor ion, with each shell labeled by a biochemical reaction
Like electron orbitals in quantum physics, paired mass distances are not continuous — they are quantized. Each orbital shell encodes a distinct biochemical transformation.

Where the quantization comes from

The orbital analogy above is more than a teaching aid. It has the same logical structure as the physics it borrows from: in both cases, a continuous space is forced into discrete units by a universal underlying constraint.

In quantum mechanics the constraint is the Schrödinger equation. Any bound electron must occupy one of a countable set of eigenstates, and because the equation is universal, every atom in the universe shares the same orbital architecture.

In metabolism the constraint is the early cofactor budget. Almost every reaction in living chemistry is driven by transferring a specific group from a small set of cofactors that were already in use before LUCA. Each cofactor delivers a fixed mass increment — and that increment is the PMD:

Cofactor (or oxidant) Group transferred PMD (Da) Reaction class
NAD(P)H H₂ 2.0157 Reduction
SAM CH₂ (methyl) 14.0157 Methylation
O₂ / Fe–O / cytochromes O 15.9949 Oxidation
(same systems) OH 17.0027 Hydroxylation
(same systems) H₂O 18.0106 Hydration / dehydration
Acetyl-CoA C₂H₂O 42.0106 Acetylation
ATP / kinases HPO₃ 79.9663 Phosphorylation
UDP-glucose C₆H₁₀O₅ 162.0528 Glycosylation

The PMDs that dominate real datasets are exactly the ones the cofactor-and-enzyme infrastructure can deliver cheaply. PMDs that would be chemically interesting but require unavailable cofactors or higher activation barriers simply do not appear at high frequency. The discreteness is not arbitrary — it is the shadow of which transformations early life could afford.

Why mass spectrometry can read the shells

There is another layer of physical quantization at work, easy to miss but essential. The PMDs we count are not integer masses — they sit at very specific fractional values that disambiguate chemically distinct changes that share the same nominal mass. Methylation (+CH₂) has PMD 14.0157 Da; replacing carbon with nitrogen (+N) has PMD 14.0031 Da; an isotope swap gives yet a different value. These small offsets do not come from organic chemistry — they come from nuclear binding energy.

When nucleons bind into a nucleus, energy is released, and through E = mc² that energy registers as a mass deficit. Carbon-12 is exactly 12 by convention; every other nuclide carries a mass defect that reflects how tightly its nucleons are bound. Hydrogen-1 is 1.00783 Da, oxygen-16 is 15.9949 Da, nitrogen-14 is 14.0031 Da. These fractional offsets are quantum-mechanical observables — they encode the eigenstates of the strong force the way electronic orbitals encode the eigenstates of the Coulomb potential.

The consequence is direct: the same physical quantization that gives atoms their discrete energy levels also gives molecular reactions their discrete mass signatures. Without it, PMD shells would collapse into each other on the integer-mass axis and reactomics would be impossible. High-resolution mass spectrometry (Orbitrap, FT-ICR) is essentially a nuclear-binding-energy detector dressed up as a chemistry tool — it reads out chemistry by measuring how nucleons are arranged.

So the orbital picture is doubly grounded in physics. The shells exist because cofactor inheritance fixed which transformations early life could perform; the shells are observable because nuclear binding gives every transformation its own characteristic, quantum-mechanically distinct mass defect. Quantization at the electronic level made the chemistry possible; quantization at the nuclear level made it readable.

The evolutionary freeze

Once a cofactor became central to cellular metabolism, it became almost impossible to remove. Every downstream pathway came to depend on it; the cofactor was now infrastructure, not a choice. This is the same logic that froze the genetic code in place — local mutations cannot rewire it because the entire cell now reads it.

The consequence is striking. The high-frequency PMD spectrum is deeply conserved across all domains of life. A bacterium, a plant, and a human share roughly the same set of dominant PMDs because they share roughly the same cofactors. What differs between organisms is not the structure of the spectrum but the intensity of each shell — which reactions are upregulated, which are suppressed.

This is a strong parallel to physics. Every atom shares the same orbital structure because every atom obeys the same Schrödinger equation; every organism shares the same PMD shell structure because every organism inherited the same frozen cofactor set. Universality through a common underlying constraint is the same explanatory move in both fields.

A note for chemists

The take-home is not that metabolism is like quantum mechanics in some loose poetic sense. It is that chemistry has its own genuine quantization principle, derivable from cofactor inheritance and evolutionary lock-in. Chemists do not need to borrow the prestige of physics to make this argument — the discreteness of the metabolic reaction set is a phenomenon in its own right, with its own physical mechanism (cofactor-bounded catalytic feasibility) and its own conservation law (evolutionary freeze).

If physics gets to say "every atom in the universe has discrete energy levels because the Schrödinger equation is universal," chemistry gets to say "every cell on Earth has discrete reaction levels because the cofactor pool is universal." The shapes of those two statements are the same, and so is their explanatory weight.

Paired mass distance and chemical reactions

The PMD concept

A paired mass distance (PMD) is the absolute difference in accurate mass between two ions detected in the same mass spectrometry dataset. For example:

PMD (Da) Formula change Reaction type
2.0157 H₂ Reduction / hydrogenation
14.0157 CH₂ Methylation, chain elongation
15.9949 O Oxidation (single oxygen)
17.0027 OH Hydroxylation
18.0106 H₂O Hydration / dehydration
28.0313 C₂H₄ Ethylation
42.0106 C₂H₂O Acetylation
79.9663 HPO₃ Phosphorylation
162.0528 C₆H₁₀O₅ Hexose addition (glycosylation)

When two ions have a PMD matching a known reaction, they are candidates for a substrate–product pair of that reaction. When many such pairs are found in a dataset, the reaction is inferred to be active in the biological or chemical system under study.

PMD network

Ion pairs connected by chemically meaningful PMDs form a PMD network. Nodes are ions (detected m/z values); edges are labeled by the PMD (i.e., the reaction type). The PMD network summarizes the full reaction landscape of a sample in a single data structure.

Key properties of the PMD network:

  • It is computable from any untargeted LC-MS dataset without annotation.
  • The degree distribution of nodes reflects which compounds are most metabolically active.
  • Comparison of PMD networks between conditions reveals which reactions are up- or down-regulated, analogous to differential expression analysis.
  • Subnetworks often correspond to known biochemical pathways, providing a path to biological interpretation.

The PMD network is constructed using the getchain() function in the pmd R package, which traces reaction chains through the ion list by following consecutive PMD edges.

Reaction-level quantification: what makes reactomics an omics

The single most important — and most under-appreciated — feature of reactomics is quantification at the reaction level, without compound identification. Much of the published work using PMDs has focused on building reaction networks and then interpreting individual nodes by going back to compound annotation. That is useful, but it leaves the analysis tied to identification. If a reaction's two end-points must be named before the reaction can be counted, the workflow has not really moved past traditional metabolomics — it has only added a graph layer on top.

Reactomics was proposed precisely to bypass that bottleneck. The unit of analysis is the reaction, not the molecule. A particular PMD — say 15.9949 Da, oxygen addition — may appear thousands of times across an untargeted dataset, on hundreds of substrate–product ion pairs. Each occurrence is one observed instance of that reaction happening. By measuring how active the PMD as a whole is across samples, one can quantify oxidation activity, methylation activity, or glycosylation activity directly, with no compound list required. This is what makes the approach an omics: reactions are the analytes.

The getreact() function in the pmd R package implements this idea. For each ion pair connected by a target PMD, it examines how the pair behaves across samples and chooses one of two quantification modes accordingly.

Static reactions: substrate and product change together

Sometimes the substrate and product of a reaction rise and fall together across samples — the ratio between them stays roughly constant while their absolute intensities both go up or down. Biologically, this is the picture of a reaction whose enzyme is not the rate-limiting step. The enzyme operates at a stable conversion efficiency, and what changes between samples is how much substrate is being supplied upstream or how much product is being drawn off downstream.

For these static reactions, the most informative quantity is the total throughput: substrate intensity plus product intensity. A larger sum means the whole substrate–product pool is larger, i.e., upstream supply is higher. Differences in this sum between groups of samples reveal which reactions are being regulated upstream or downstream of the catalytic step.

Dynamic reactions: substrate and product change independently

In other cases, substrate and product do not move together. The ratio between them shifts from sample to sample. This is what one expects when the enzyme itself is the regulated component — its activity or abundance changes between samples, so the conversion of substrate into product proceeds at different efficiencies even at similar substrate levels. Substrate accumulates when the enzyme is suppressed; product accumulates when it is induced.

For these dynamic reactions, the meaningful quantity is the ratio, with the more stable peak in the numerator (acting as an internal reference) and the more variable peak in the denominator. The resulting per-sample value tracks how the actively-changing partner moves relative to the stable reference, isolating changes that originate in catalytic activity from sample-level abundance shifts.

Two regulation regimes, two readouts

Together, the static and dynamic modes cover the two basic ways a reaction's quantitative signal can encode biological regulation:

  • Static PMD ⇒ upstream/downstream control. The enzyme is operating stably; what changes is the supply of substrate or the removal of product. Quantify by intensity sum.
  • Dynamic PMD ⇒ enzyme-level control. Enzyme activity is the variable; substrate supply is roughly constant. Quantify by ratio.

This is conceptually parallel to metabolic control analysis, but it is operationalised entirely from untargeted LC-MS data — no kinetic measurements, no isotopically labelled tracers, no compound identification required.

Why this is the part of reactomics worth pushing forward

The PMD network has rightly received attention as a way to organise untargeted MS data. But network construction alone does not free the analysis from compound identification: to interpret a node, the molecule still has to be named. Reaction-level quantification is what separates reactomics from "yet another network method" — the reactions themselves carry quantitative biological meaning, even when the molecules at their endpoints remain unknown. Treating that reaction layer as the analyte, rather than as a stepping-stone toward compound annotation, is the part of reactomics that we believe is most worth developing further.

Methods and tools

Computing PMDs

PMD analysis requires accurate mass measurements, typically from high-resolution instruments such as Orbitrap or Q-TOF mass spectrometers. The mass accuracy required to distinguish between isobaric reactions (e.g., CO at 27.9949 vs. C₂H₄ at 28.0313) is approximately 5 ppm or better.

The workflow is:

  1. Peak detection — extract a peak list with accurate m/z values from raw LC-MS data (e.g., using XCMS, MZmine, or similar tools).
  2. PMD calculation — compute all pairwise mass differences; filter to retain only those matching a curated list of chemically meaningful reactions.
  3. Network construction — build the PMD network using getchain(), which links ions into reaction chains.
  4. Quantitative analysis — use getreact() to quantify reaction activity in each sample. Static reactions are quantified by intensity sum (substrate + product); dynamic reactions by ratio (stable peak / variable peak). See Reaction-level quantification above for the conceptual rationale.

The pmd R package

The pmd package provides a complete implementation of reactomics analysis in R:

  • getpaired() — identifies ion pairs linked by specific PMDs
  • getchain() — constructs the PMD network by tracing reaction chains through the ion list
  • getreact() — quantifies reaction activity per sample, with method = "static" (intensity sum, for upstream/downstream-regulated reactions) or method = "dynamic" (stable/variable peak ratio, for enzyme-regulated reactions); returns a reaction-by-sample matrix for statistical comparison
  • getstd() — extracts stable isotope-related pairs for quality control
  • Visualization functions for network plots and reaction heatmaps

The package handles both positive and negative ionization mode data and integrates with standard metabolomics workflows.

PMD databases and reaction lists

Reactomics relies on curated PMD reference lists corresponding to known biochemical reactions. The pmd package ships several built-in datasets:

  • keggrall — PMDs derived from KEGG enzyme-catalyzed reactions, with reaction formula and KEGG ID
  • hmdb — high-frequency PMDs from HMDB human metabolite entries
  • omics — a merged multi-database reaction PMD table covering major omics reactions
  • sda — common PMDs for substructure differences, ion replacements, and reactions
  • MaConDa — mass spectrometry contaminant PMDs for background checking

In-source reactions and independent ion selection

In-source reactions — adduct formation, in-source fragmentation, and isotope patterns — also produce characteristic mass differences between ion pairs in an LC-MS dataset. These are analytical artefacts rather than biological reactions, yet they follow the same PMD logic: a fixed mass difference between two ions encodes a specific process connecting them.

This observation underlies the globalstd algorithm, introduced in Yu, Olkowicz & Pawliszyn (2019) and implemented in the pmd package. Crucially, globalstd is data-driven: it does not rely on a predefined adduct list. Instead, it discovers which mass differences are genuinely widespread in the current dataset and uses that evidence to define redundancy. The algorithm works in three steps:

  1. Retention-time clustering — co-eluting ions are grouped together as likely originating from the same compound.
  2. Data-driven high-frequency PMD detection — pairwise mass differences are computed within each RT group. PMDs that appear at high frequency across many groups are inferred to represent widely-occurring adducts and neutral losses (e.g., Na/H exchange, ¹³C isotope, common solvent adducts). Because every compound that undergoes a given in-source reaction contributes the same PMD, these mass differences accumulate to anomalously high counts — a signal that is entirely derived from the data itself.
  3. Independent ion screening — using the discovered high-frequency PMDs, one representative ion is retained per compound cluster; redundant adducts, isotopologue peaks, and in-source fragments are removed.

The result is a non-redundant set of independent ions that preserves full chemical diversity while eliminating peak multiplicity. No prior knowledge of which adducts to expect is required.

Applications in drug metabolism

Drug metabolism generates a predictable set of biotransformation products. Phase I reactions (oxidation, reduction, hydrolysis) and Phase II reactions (conjugation) each correspond to specific PMDs. Reactomics enables untargeted drug metabolism profiling: given a sample from a drug-treated organism, the PMD network can identify which phase I and phase II transformations occurred without pre-specifying which metabolites to look for.

Applications in environmental transformation

Environmental samples contain complex mixtures of chemicals undergoing abiotic and biotic transformations. By computing the PMD network of a water, sediment, or biological tissue sample, one can identify which transformation reactions are active without knowing the identities of the parent compounds.

Applications in endogenous metabolomics

In human and animal metabolomics, reactomics connects measured metabolite abundances to the enzyme activities that produced them. The PMD network of a plasma or urine sample reflects the metabolic state of the organism — which biosynthetic and catabolic reactions are most active.

Monthly literature collection

New papers related to reactomics and PMD-based analysis, collected monthly from PubMed.

2026-04

All publications

Full collection of publications using or extending PMD-based reactomics, from the original paper (2020) to present. Updated monthly.

Methods and tools

In-source reactions and independent ion selection

PMD network

Applications in environmental transformation

DOM transformation

Applications in drug metabolism

Applications in endogenous metabolomics

Reviews

75 papers total. Last updated 2026-05-03.

Monthly archive