Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Reactomics and Paired Mass Distance Analysis

Miao Yu

2018/11/11 (updated: 2019-12-10)

1 / 45

MS based Target/Untargeted Analysis

Purpose Target Untargeted
Target Known Unknown compounds
MS mode SIM/MRM Full Scan
Quantification Absolut Relative quantification
Qualitation Standards Semi qualitative
Study Validation Discovery
Information Subset analysis Global analysis
  • Target analysis and untargeted Analysis are designed for different purposes
  • They could be part of one workflow for certain research
2 / 45

Workflow for Untargeted Analysis

  • [Sample collection]
  • [Pretreatment]
  • [Instrumental analysis (Mass Spectrometry)]
  • [From raw data to peaks in each sample]
  • Align peaks to make retention time correction for multiple samples
  • Fill the peaks for aligned peaks list
  • Peaks list
    • Peaks with mass to charge ratio @ retention time in row
    • Samples in column
  • Annotation for peaks
  • Validation by standards (targeted analysis)
  • [Prediction/Inference for scitific purpose]
3 / 45

Demo of XC-MS Data

Demo of GC/LC-MS data

Demo of GC/LC-MS data

4 / 45

Demo of Peaks

5 / 45

Demo of Retention Time Correction

Demo of Obiwarp

Demo of Obiwarp

Prince, J. T., & Marcotte, E. M. (2006). Chromatographic Alignment of ESI-LC-MS Proteomics Data Sets by Ordered Bijective Interpolated Warping. Analytical Chemistry, 78(17), 6140–6152. doi:10.1021/ac0605344

6 / 45

Loess alignment use local region to align the peaks. However, obiwarp alignment with bijective interpolated dynamic time warping. Raw data from two LC−MS runs, whether successive fractions or across different biological conditions, (1) is interpolated into a (2) uniform matrix (or rectilinear matrix). (3) An all vs all similarity matrix of the spectra is constructed. (4) The similarity matrix distribution is mean centered and normalized by the standard deviation. (5) Dynamic programming is performed by adding similarity scores along a recursively generated optimal path while off-diagonal transitions are penalized by either a local or global gap penalty to give (6) an additive score matrix. (7) Pointers are kept in a traceback matrix used to deliver (8) the optimal alignment path. (9) High scoring points in the optimal path are selected to create a bijective (one-to-one) mapping, which is used as anchors for PCHIP interpolation to generate a smooth warp function. (II) Verification and optimization. (11) MS/MS spectra from the raw MS runs are searched via SEQUEST and Peptide/Protein Prophet to determine peak identities. (12) High-confidence identifications are selected and (13) the overlapping set of peptide identifications (after filtering outliers) is used as the alignment standard. (14) The warp function produced through the comparison of MS data is applied to the standards. (15) The ideal alignment would shift all standards to the diagonal. The accuracy of an alignment is calculated as the sum of the square residuals from the diagonal.

Demo of Peaks Filling

7 / 45

Demo of Many XC-MS Data

Demo of many GC/LC-MS data

Demo of many GC/LC-MS data

8 / 45

Major issue

Annotation is similar to find real cat in this picture

Annotation is similar to find real cat in this picture

9 / 45

Annotation for peaks

  • Predefined rules between peaks/features and compounds

  • Generate pseudo-spectrum

  • Search database or in silico prediction to identify compounds

  • Build the links between compounds by pathway/network analysis

Features -> Compounds -> Relationship among compounds

  • Problems

    • Time consuming - too many peaks
    • Sensitivity - DDA or MS/MS
    • Standards coverage
10 / 45

My Idea

Features -> Compounds -> Relationship among compounds

  • You ACTUALLY don't need people (compounds) name to know their relationship

From Wikipedia Commons:A Sunday on La Grande Jatte, Georges Seurat

11 / 45
  • all compounds from metabolomcis study is a snapshot with metabolites and parent compounds
  • We could find the relationship among people without know the name of each person
  • mass spec could measure the distance without known the name of compounds

My Idea

Features -> Compounds -> Relationship among compounds

  • Mass spectrum could directly measure reactions

12 / 45
  • Annotation is not really necessary for certain scientific problem
  • Relationship among compounds or reaction matters

Why Reactions?

  • Unit: Gene(5) < Protein(20+2) < Metabolite(100K) < Compound(100M)

  • Combination: Gene(20,000-25,000) < Protein(20,000-25,000) < Compound(???)

  • Small molecular combination is chemical reaction or paired mass distance

13 / 45

Why PMD?

Δm=ZmH+NmnM

  • The missing mass was converted into energy ( E=mc2 ) and emitted when the atom made

  • Atoms -> Compounds -> Mass distances between compounds

  • Paired Mass Distances(PMD) is unique

  • High resolution mass spectrometry WINs

14 / 45
  • Mass defects could be transferred from atom to paired mass distance
  • HRMS could measure PMDs for qualitative analysis

Sources of PMDs in the real data

Where is PMD?

  • Isotopologues

    • [M]+ [M+1]+
    • 1.006 Da
  • in source reaction

    • [M+H]+ [M+Na]+
    • 21.982 Da
  • Homologous series

    • Lipid [CH2]
    • 14.016 Da
  • Xenobiotic metabolism

    • Phase I hydrolation
    • 15.995 Da
15 / 45

Quantitative and Qualitative analysis for Reaction

KEGG reaction database

PMD Freq Example
1.008 2037 NAD(+) + succinate <=> fumarate + H(+) + NADH
2.016 1748 NAD(+) + propanoyl-CoA <=> acryloyl-CoA + H(+) + NADH
15.995 1170 ATP + GDP <=> ADP + GTP
13.979 1122 deoxynogalonate + O2 <=> H(+) + H2O + nogalonate
17.003 929 H2O + hypotaurine + NAD(+) <=> H(+) + NADH + taurine
79.966 750 ATP + H2O <=> ADP + H(+) + phosphate
14.016 611 acetyl-CoA + propanoate <=> acetate + propanoyl-CoA
0 533 L-glutamate <=> D-glutamate
162.053 365 H2O + lactose <=> D-galactose + D-glucose
18.011 361 L-serine <=> 2-aminoprop-2-enoate + H2O
  • Real reactions contain ions
  • Skewed by known reactions
16 / 45

Quantitative and Qualitative analysis for Reaction

HMDB compounds database

C H O
14.016 1 2 0
2.016 0 2 0
28.031 2 4 0
26.016 2 2 0
15.995 0 0 1
12 1 0 0
56.063 4 8 0
42.047 3 6 0
30.011 1 2 1
24 2 0 0
  • Dominated by C, H and O
  • Structure or reaction?
17 / 45
  • We need quantitative mass ready database for PMD annotation

Quantitative and Qualitative analysis for Reaction

HMDB compounds database

PMD frequency accuracy PMD frequency accuracy
+C2H 14.016 4934 0.9755 14.02 8003 0.6014
+2H 2.016 4909 0.9703 2.02 7959 0.5984
+2C4H 28.031 4878 0.9783 28.03 7799 0.6119
+2C2H 26.016 4229 0.9775 26.02 7343 0.5630
+O 15.995 4214 0.9808 15.99 7731 0.5346
+C 12.000 3861 0.9826 12.00 7145 0.5310
+4C8H 56.063 3861 0.9653 56.06 6699 0.5564
+3C6H 42.047 3771 0.9737 42.05 6558 0.5599
+C2HO 30.011 3698 0.9440 30.01 6761 0.5163
+2C 24.000 3689 0.9810 24.00 6963 0.5197
18 / 45

Quantitative and Qualitative analysis for Reaction

HMDB compounds database

PMD frequency accuracy PMD frequency accuracy
+C2H 14.0 50419 0.0955 14 156245 0.0354
+2H 2.0 50467 0.0944 2 156260 0.0352
+2C4H 28.0 50797 0.0939 28 155410 0.0356
+2C2H 26.0 48517 0.0852 26 154346 0.0309
+O 16.0 51278 0.0806 16 155811 0.0307
+C 12.0 49335 0.0769 12 155339 0.0283
+4C8H 56.1 36417 0.1026 56 151894 0.0286
+3C6H 42.0 49808 0.0737 42 153764 0.0275
+C2HO 30.0 51241 0.0681 30 154369 0.0260
+2C 24.0 48099 0.0752 24 154278 0.0273
19 / 45

Quantitative and Qualitative analysis for Reaction

Static v.s. dynamic

  • Static mass pairs: paired intensity ratio is stable across samples
  • Dynamic mass pairs: paired intensity ratio is stable across samples
  • For example, [A,B], [C,D] and [E,F] are involved in the same PMD:
A B Ins ratio C D Ins ratio E F Ins ratio
100 50 2:1 100 50 2:1 30 40 3:4
1000 500 2:1 10 95 2:19 120 160 3:4
  • [A,B] and [E,F] could be used for Quantitative analysis for certain PMD, rsd cutoff 30%
  • [C,D] could be used to check dynamics of specific reaction
20 / 45
  • Response factor is the slope of calibration curve for certain compound
  • Total intensity of all pairs with the same PMD
  • Count once for ions involved in multiple reactions

Reactomics Application

Exhaustive screen

21 / 45

Sensitivity matters

  • Target analysis could capture peaks with low intensity

  • Untargeted analysis would loss sensitivity to capture all peaks

  • Send unknown while independent peaks for MS/MS

22 / 45

How many real compounds among features?

Mahieu, N. G., & Patti, G. J. (2017). Systems-Level Annotation of a Metabolomics Data Set Reduces 25 000 Features to Fewer than 1000 Unique Metabolites. Analytical Chemistry, 89(19), 10397–10406. doi:10.1021/acs.analchem.7b02380

23 / 45

Gap between features and compounds

24 / 45

GlobalStd Algorithm

Yu, M., Olkowicz, M., & Pawliszyn, J. (2019). Structure/reaction directed analysis for LC-MS based untargeted analysis. Analytica Chimica Acta, 1050, 16–24. doi:10.1016/j.aca.2018.10.062

25 / 45

GlobalStd Algorithm Step 1

Retention time cluster analysis

26 / 45

GlobalStd Algorithm Step 2

High frequency PMD analysis across RT clusters - example

  • Based on data itself, those adducts/multiply charged ions/neutral loss/isotopologues can be unknown

27 / 45

GlobalStd Algorithm Step 3

Independent peaks selection

28 / 45

GlobalStd Algorithm Step 3

Independent peaks selection - example

29 / 45

GlobalStd Algorithm Step 3

Why redundant?

  • ~14.3% peaks can capture similar variances of all peaks
  • For CAMERA/RAMclust, peaks with highest intensity from pcgroup were selected as independent peaks

30 / 45
  • Similar to isotope labeled results (5% peaks)
  • Untargeted analysis does not mean big data

Target compounds validation

Independent peaks Target compounds found
pmd 985 18
CAMERA 1297 15
RAMclust 461 12
profinder 6628 7
  • 103 compounds for validation
  • 36 compounds could be found by xcms 6885 features
  • 7 could be found by profinder untargeted analysis 6628 features
31 / 45

Untargeted MS/MS analysis - PMDDA

  • Only use GlobalStd peaks for MS/MS analysis

    • Multiple injections
  • MS/MS spectral library annotation on GNPS

  • Compare with Data Dependent Acquisition (DDA) (173 compounds)

    • Annotated 235 extra compounds and overlap 59 compounds
    • Less contaminant ions
32 / 45
  • GNPS MS/MS annotation
  • 235:59:114 PMDDS:overlap:DDA

Untargeted MS/MS analysis - PMDDA

33 / 45
  • GNPS MS/MS annotation
  • 235:59:114 PMDDS:overlap:DDA

Untargeted MS/MS analysis - PMMD Annotation

  • Use pmd and rank of pmd for annotation

  • Intensity filter(10%) and robust for noise

  • 957/1098 PMDR/HMDB QqQ data

  • some compounds share the same pmd 87%

34 / 45

Reactomics Application

Metabolites Discovery

35 / 45

Metabolites of exogenous compound

  • Environmental pollution metabolites
  • Drug metabolites

Xenobiotic metabolism

  • Phase I

    • Oxidation (R-H ⇒ R-OH, pmd 15.995 Da)
    • Reduction (R-C=O ⇒ R-C-OH, pmd 2.016 Da)
  • Phase II

    • Methylation (R-OH ⇒ R-O-C,pmd 14.016 Da)
    • Sulfation (R-OH ⇒ R-SO4, pmd 46.976 Da)
    • Acetylation (R-OH ⇒ R-O-COCH3, pmd 42.011 Da)
    • Glucuronidation (R-NH2 ⇒ R-NH-C6H9O7, pmd 192.027 Da)
    • Glycosylation (R-OH ⇒ R-O-C6H11O5, pmd 162.053 Da)
36 / 45

Metabolites of TBBPA in Pumpkin

  • Mass defect analysis to screen Brominated Compounds

  • Confirmation by synthesized standards

Hou, X., Yu, M., Liu, A., Wang, X., Li, Y., Liu, J., … Jiang, G. (2019). Glycosylation of Tetrabromobisphenol A in Pumpkin. Environmental Science & Technology. doi:10.1021/acs.est.9b02122

37 / 45

Metabolites of TBBPA in Pumpkin

  • TBBPA Metabolites PMD network

Hou, X., Yu, M., Liu, A., Wang, X., Li, Y., Liu, J., … Jiang, G. (2019). Glycosylation of Tetrabromobisphenol A in Pumpkin. Environmental Science & Technology. doi:10.1021/acs.est.9b02122

38 / 45

KEGG reaction network

  • Metabolites of four compounds

39 / 45

Endogenous vs Exogenous

  • T3DB Endogenous (255) vs Exogenous (705)

  • Use top 20 high frequency PMDs

40 / 45

Reactomics Application

Biomarker Reaction

41 / 45

Lung cancer

  • MTBLS28 1005 human urine samples

  • PMD 2.02 Da show differences among control and diseases

42 / 45

How

Paper method v.s. Practical method in Metabolomics

Paper method v.s. Practical method in Metabolomics

43 / 45

Software

enviGCMS package

  • Target analysis
  • Mass defect analysis

pmd package

  • Untargeted analysis
  • GlobalStd algorithm
  • Reactomics analysis

rmwf package

  • NIST 1950 data
  • Script
44 / 45

Thanks

Q&A

miao.yu@mssm.edu

45 / 45

MS based Target/Untargeted Analysis

Purpose Target Untargeted
Target Known Unknown compounds
MS mode SIM/MRM Full Scan
Quantification Absolut Relative quantification
Qualitation Standards Semi qualitative
Study Validation Discovery
Information Subset analysis Global analysis
  • Target analysis and untargeted Analysis are designed for different purposes
  • They could be part of one workflow for certain research
2 / 45
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow