Structure/reaction directed analysis for environmental metabolites

class: center, middle, inverse, title-slide

# Structure/reaction directed analysis for environmental metabolites
## ACS Fall 2019 National Meeting & Expo, San Diego, CA
### Miao Yu, Georgia Dolios, Lauren Petrick
### 2019-08-27

---

## Current Workflow for environmental metabolites

- Peaks/features extraction

- Compounds identification from peaks/features

- Find the relationship among compounds

### Problem

- Time consuming - too many peaks

- Annotation - DDA or MS/MS

- Environmental metabolites - how

???

Explain environmental metabolites: find compounds + find their metabolites in environmental samples. Compounds could be endogenous or exogenous.

P1: Too many peaks to check
P2: Untargeted annotation is complex for DDA or DIA, sensitivity and data analysis issue
P3: how to define metabolites

---

## Idea

#### Time consuming

- Step 1: reduce the numbers of peaks

#### Annotation

- Step 2: untargeted MS/MS annotation

#### Environmental metabolites

- Step 3: co-occurrence analysis

#### Method: paired mass distance(PMD) analysis

#### Sample: Standard Reference Material for Human Plasma (NIST 1950)

???
S1: reduce numbers while keep the same information
S2: untargeted molecular ions search for MS/MS
S3: parent compounds and metabolites should be co-occurence in the samples and fit some other rules
Paired mass distance could link those three steps.
Sample is SRM 1950 and good for method validation/reproducible research

---

## Idea - PMD analysis

$$PMD = exact\ mass \ 1 - exact\ mass \ 2 $$

- [Nuclear Binding Energy](https://en.wikipedia.org/wiki/Nuclear_binding_energy)

`$$\Delta m = Zm_{H} + Nm_{n} - M$$`

- The missing mass was converted into energy ( `$E=mc^2$` ) and emitted when the atom made

- Atoms -> Compounds -> Mass distances between compounds

- **Paired Mass Distances(PMD)** is unique for element composition, which is associated with structures/reactions

- **High resolution** mass spectrometry WINs (exact mass -> mass to charge ratio)

???

PMD orgin from tiny mass defect during the atom generation. Such tiny while identical changes can pass from atom to paired mass distances.
PMD could be descripted by element composition
HRMS could measure PMD via mass to charge ratios

---

## Idea - PMD analysis

### Where is PMD?

.pull-left[
- Isotopologues 
  
  - `$[M]^+$` `$[M+1]^+$`
  - 1.003 Da

- in source reaction

- `$[M+H]^+$`  `$[M+Na]^+$`
  - 21.982 Da
]

.pull-right[
- Homologous series

- Lipid `$-[CH_2]-$`
  - 14.016 Da

- Xenobiotic metabolism

- Phase I hydrolation
  - 15.995 Da
]

???

Left part is the PMD from same compound. Right part is the PMD from different compounds. For S1 and S2, we need to find the left PMDs. For S3, we need right PMDs

---

## Step 1: reduce the numbers of peaks

### Gap between features and compounds

???
One compound can generate isotopologues, adducts, neutral loss, multiply charged ion and fragment ions. However, we could also find noise and co-eluted compounds. Meanwhile, we might find even more complex ions such as multiply charged fragment ions. Before we discuss details of find molecular ions, let's check how many real compounds in peaks profiles.

---

## Step 1: reduce the numbers of peaks

### How many real compounds among features?

.half[
Mahieu, N. G., & Patti, G. J. (2017). Systems-Level Annotation of a Metabolomics Data Set Reduces 25 000 Features to Fewer than 1000 Unique Metabolites. Analytical Chemistry, 89(19), 10397–10406. doi:10.1021/acs.analchem.7b02380
]

???
5%

---

## Step 1: reduce the numbers of peaks

.half[
Yu, M., Olkowicz, M., & Pawliszyn, J. (2019). Structure/reaction directed analysis for LC-MS based untargeted analysis. Analytica Chimica Acta, 1050, 16–24. doi:10.1016/j.aca.2018.10.062
]

???
GlobalStd is trying to find one peak for each compound.
S1: retention time cluster as CAMERA package
S2: pmd frequency analysis to find potential isotopologues, adducts, neutral loss, multiply charged ion and fragment ions.
S3: try to find molecular ions or ions reflect PMD between compounds(independant peaks)

---

## Step 1: reduce the numbers of peaks

### Independent peaks selection - example

???
example of independent peaks selection:
G1: single peak
G2: no isotop, no adducts
G3: isotops only
G4: adducts only
G5: combination
G6: co-elute groups

isotopologues, adducts, neutral loss, multiply charged ion and fragment ions are found by data instead of predefined lists

---

## Step 1: reduce the numbers of peaks

### Independent peaks selection - Why redundant?

- ~14.3% peaks can capture similar variances of all peaks
- For CAMERA/RAMclust, peaks with highest intensity from pcgroup were selected as independent peaks

???
compare with CAMERA and RAMclust package

PCA reveal 985 peaks could retain variances of 6885 peaks. Compared with CAMERA, GlobalStd show less peaks and simliar PC  variance explained. Compared with RAMclust, GlobalStd didn't change sample information

---

## Step 2: untargeted MS/MS annotation

### Untargeted MS/MS analysis - PMDDA

- Only use GlobalStd peaks for MS/MS analysis
    - Multiple injections(15)

- MS/MS spectral library annotation on [GNPS](https://gnps.ucsd.edu)

- Compare with Data Dependent Acquisition (DDA) (173 compounds)
    - Annotated 235 extra compounds and overlap 59 compounds
    - Less contaminant ions

???
- Multiple injections to increase the sensitivity
- GNPS MS/MS annotation similarity 0.5
- 235:59:114 PMDDS:overlap:DDA
- contaminant ions in DDA

---

## Step 2: untargeted MS/MS annotation

### Untargeted MS/MS analysis - PMDDA

???
contaminant ions issue in DDA
constant while repeated contaminant ions might from sample matrix, which is hard to remove by excluded analysis. Such peaks will not generated high freqency PMD and could be avoided by PMDDA. 
Anyway, combination of PMDDA and DDA will give you more compounds names, which could be used to search metabolites.

---

## Step 3: co-occurrence analysis

### Xenobiotic metabolism

- Phase I
    - Oxidation (R-H ⇒ R-OH, pmd 15.995 Da)
    - Reduction (R-C=O ⇒ R-C-OH, pmd 2.016 Da)
    
  - Phase II
    - Methylation (R-OH ⇒ R-O-C,pmd 14.016 Da)
    - Sulfation (R-OH ⇒ R-SO4, pmd 46.976 Da)
    - Acetylation (R-OH ⇒ R-O-COCH3, pmd 42.011 Da)
    - Glucuronidation (R-NH2 ⇒ R-NH-C6H9O7, pmd 192.027 Da)
    - Glycosylation (R-OH ⇒ R-O-C6H11O5, pmd 162.053 Da)

???
Xenobiotic metabolism is commen for environmental exogenous compounds. Here I listed their associated reactions and PMDs. Let's start with an example.

---

## Step 3: co-occurrence analysis

### Metabolites of TBBPA in Pumpkin

- Mass defect analysis to screen Brominated Compounds

- Confirmation by synthesized standards

.half[
Hou, X., Yu, M., Liu, A., Wang, X., Li, Y., Liu, J., … Jiang, G. (2019). Glycosylation of Tetrabromobisphenol A in Pumpkin. Environmental Science & Technology, 53(15), 8805–8812. doi:10.1021/acs.est.9b02122
]

???
This work is published this year and we find 7 TBBPA metabolites in pumpkin root after exposed to TBBPA.  
The major metabolism pathway is glycosylation and malonylation by manually check.
After this work get published, I found such metabolites could be a stepwise pathway. Then I selected five reaction related PMD and check the raw data again.

---

## Step 3: co-occurrence analysis

### Metabolites of TBBPA in Pumpkin

- TBBPA Metabolites PMD network (22)

???
Here I generated PMD network by searching 5 fixed PMD, compute the correlation coefficients of peaks pairs and start with TBBPA. Then I found an area which is missing by original paper due to the lack of screen of one hub compound. In total, such data could find 22 metabolites of TBBPA and the correlation coefficients of peaks pairs are larger than 0.6.

---

## Step 3: co-occurrence analysis

### KEGG reaction database

|   PMD   | Freq |                        Example                        |
|:-------:|:----:|:-----------------------------------------------------:|
|  1.008  | 2037 |     NAD(+) + succinate <=> fumarate + H(+) + NADH     |
|  2.016  | 1748 | NAD(+) + propanoyl-CoA <=> acryloyl-CoA + H(+) + NADH |
| 15.995  | 1170 |                ATP + GDP <=> ADP + GTP                |
| 13.979  | 1122 |   deoxynogalonate + O2 <=> H(+) + H2O + nogalonate    |
| 17.003  | 929  | H2O + hypotaurine + NAD(+) <=> H(+) + NADH + taurine  |
| 79.966  | 750  |         ATP + H2O <=> ADP + H(+) + phosphate          |
| 14.016  | 611  |  acetyl-CoA + propanoate <=> acetate + propanoyl-CoA  |
|    0    | 533  |              L-glutamate <=> D-glutamate              |
| 162.053 | 365  |       H2O + lactose <=> D-galactose + D-glucose       |
| 18.011  | 361  |        L-serine <=> 2-aminoprop-2-enoate + H2O        |

???
For endogenous compounds, I generated PMD list from KEGG reactions database. It contain more than 10k reactions and top 25 high frequency PMDs with three digits covered 8080 reactions. Some PMDs are invovled in ions or hard to be detected by HRMS. I selected 12 PMDs to test our NIST samples

---

## Step 3: co-occurrence analysis

### Metabolites of L-Octanoylcarnitine in NIST 1950

- L-Octanoylcarnitine Metabolites PMD network (187)

- Metabolites screen by 12 KEGG high frequency reaction PMD

???
We found 187 L-Octanoylcarnitine metabolites by PMD network. PMD network is not pathway while may associated with pathway. Regular pathway analysis always need known pathway while PMD network can find new pathway or screen new metabolites.

It seems endogenous compounds will generated a larger PMD network and exogenous compounds might not have two many metabolites. To test this hypothesis, I make a data mining on T3DB, which contain the source annotation of each compounds.

---

## Step 3: co-occurrence analysis

### Endogenous vs Exogenous

- T3DB Endogenous (255) vs Exogenous (705)

- Use top 20 high frequency PMDs

???
As an exploror analysis, I found endogenous compounds could be knit into a large PMD network while exogenous compounds will not be preferred by reactions. Exogenous compounds would be excreted by biological system or directed react with large molecular to cause toxicity. Such phenomenon may help us to find the source of certain compounds or PMD network(topological property).

---

## Summary

- Step 1: reduce the numbers of peaks

- remove redundant isotope, adducts, multiply charged ions

- Step 2: untargeted MS/MS annotation

- multiple injections of GlobleStd peaks

- Step 3: co-occurrence analysis

- PMD network

- [pmd package](http://yufree.github.io/pmd/)
  
  - functions of GlobalStd algorithm and PMD network analysis

- [rmwf package](https://github.com/yufree/rmwf)

- reproducible research of this study with data and script

???
Most of the analysis could be done by PMD package(on CRAN)
rmwf package contain 5 injections of NIST 1950 and 6 matrix blank. The data process script could be found as rmd template for reproducible research.

---

# Thanks

#### ISMMS: Lautenberg Laboratory Dr. Petrick’s Lab

- Georgia Dolios
- Dr. Vladimir Yong- Gonzalez
- Kelsey Chetnik

#### CAS: Dr. Xingwang Hou

#### NIEHS: **P30ES23515** and **U2C ES026561**

#### miao.yu@mssm.edu

#### @yu_free

???

twitter has the slides links and you could check them online