class: center, middle, inverse, title-slide .title[ # Reactomics and Paired Mass Distance Analysis ] .author[ ### Miao Yu ] .date[ ### 2018/11/11 (updated: 2023-03-26) ] --- ## Typical workflow for HRMS metabolomics - Sample collection - Acquire data (peaks) using mass spectrometry - Annotate peaks to get the compound name - Build links between compounds using pathway/network analysis > Sample -> Peaks -> Compounds -> Relationship among compounds - Problems - Time consuming - too many peaks ~20k - Standards coverage - unknown unknown ??? Hello everyone, my name is Miao Yu and I am from Icahn School of medicine at Mount Sinai. My title is 'Reactomics and Paired Mass Distance Analysis'. I fully understand that you are tired of different 'omics' and I hope this one will make a difference. Before the discussion of the concepts, let's review the general workflow for metabolomics or untargeted analysis. We collect samples and then perform the instrument analysis. In terms of mass spectrometry, we will collect peaks signals. Then we will process the data and turn the peaks signals into featuers table and annotate the featuers as compounds by either database or prediction. The next step is building the links between those compounds by pathway analysis or network analysis. In summary, samples turn into peaks, peaks turn into compounds and compounds turn into relationship among compounds. The final purpose of metabolomics is not identification of a compound and the real scientific purpose is always the relationship among compounds. However, there are some problems at current stage. The first one is time consuming. We need to deal with about around 20 thousands peaks from each sample or featuers from multiple samples. It's just impossible to check them one by one and we need to focus on a reasonable number of compounds of interests. The second issue is the standards coverage. I doubt how many of you have received comments from reviewer 3 about standards confirmation or annotation of unkown peaks. In most cases, metabolomics and untarget analysis are facing the unknown unknown paradox. If we never find some compounds before, how could we make annotation based on current database. --- ## Idea > Features -> Compounds -> Relationship among compounds - You ACTUALLY don't need people (compounds) name to know their relationship <img src="https://upload.wikimedia.org/wikipedia/commons/7/7d/A_Sunday_on_La_Grande_Jatte%2C_Georges_Seurat%2C_1884.jpg" width="70%" style="display: block; margin: auto;" /> From [Wikipedia Commons](https://commons.wikimedia.org/wiki/File:A_Sunday_on_La_Grande_Jatte,_Georges_Seurat,_1884.jpg):A Sunday on La Grande Jatte, Georges Seurat ??? - all compounds from metabolomcis study is a snapshot with metabolites and parent compounds - We could find the relationship among people without know the name of each person - mass spec could measure the distance without known the name of compounds --- ## Idea > Sample -> Peaks -> ~~Compounds~~ -> Relationship among compounds - Mass spectrum could directly measure relationship (reactions) <img src="https://yufree.github.io/presentation/figure/srda.png" width="100%" style="display: block; margin: auto;" /> ??? - Annotation is not really necessary for certain scientific problem - Relationship among compounds or reaction matters Here is my idea to those problems. Let's review the information flow. Samples, peaks, compounds, relationship among compounds... Wait, if we try to find the relationship among compounds, is there a way to skip the compounds identification phase? The answer should be yes. Let's take oxidation process as an example. If it happens, no matter which compounds are involved, there should always be a mass differnces of sixteen dalton between those compounds. Since our mass spectrometry could detect their mass to charge ratio, we actually could directly extract all those relationship by checking paired mass distance among the unknown peaks. The unit of the relationship between compounds should be chemical reactions. --- ## Why Reactions? <img src="https://yufree.github.io/presentation/figure/cas.jpg" style="display: block; margin: auto;" /> --- ## Why Reactions? <img src="https://yufree.github.io/presentation/figure/database.png" style="display: block; margin: auto;" /> - Unit: Gene (5) < Protein (20+2) < Metabolite (100K) < Compound (100M) - Combination: Gene (20,000-25,000) < Protein (20,000-25,000) < Compound(???) - Small molecule **combination** is a chemical reaction or paired mass distance ??? So why reactions? Compared with other omics, the major difference in metabolomics is the missing of unit of relationship. For genomics, five moleculars are the unit and their combination unit is base pair. For proteomics, twenty two amino acids are the molecular unit and their combination unit is peptide bond. For small molecular, over one hundred million molecular are unit while actually we only care the one who will show activity with other molecular. In this case, the combination unit should be chemical reactions. In terms of mass spectrometry, reactions mean paired mass distance among different compounds. --- ## Why PMD? - [Nuclear Binding Energy](https://en.wikipedia.org/wiki/Nuclear_binding_energy) `$$\Delta m = Zm_{H} + Nm_{n} - M$$` - The missing mass was converted into energy ( `\(E=mc^2\)` ) and emitted when the atom was made - Atoms -> Compounds -> Mass distances between compounds - **Paired Mass Distances (PMD)** is unique - **High resolution** mass spectrometry WINs ??? - Mass defects could be transferred from atom to paired mass distance - HRMS could measure PMDs for qualitative analysis So why PMD? The best feature for PMD is that it comes from the nuclear binding energy and almost unique to a specific element composition. In this case, if we have high resolution mass spectrometry, we could annotate a certain paired mass distance with element composition and check if it's linked to a known reaction. --- ## Why PMD? - Mass defect at atom level <img src="https://yufree.github.io/presentation/figure/massdefect.jpg" style="display: block; margin: auto;" /> .half[ Hall, M. P., Ashrafi, S., Obegi, I., Petesch, R., Peterson, J. N., & Schneider, L. V. (2003). ?Mass defect? tags for biomolecular mass spectrometry. Journal of Mass Spectrometry, 38(8), 809–816. https://doi.org/10.1002/jms.493 ] --- ## Sources of PMDs in real data ### Where is PMD? .pull-left[- Isotopologues - `\([M]^+\)` `\([M+1]^+\)` - 1.006 Da - in source reaction - `\([M+H]^+\)` `\([M+Na]^+\)` - 21.982 Da ] .pull-right[ - Homologous series - Lipid `\(-[CH_2]-\)` - 14.016 Da - Xenobiotic metabolism - Phase I hydrolation - 15.995 Da ] ??? However, not all of the PMD is linked to reaction among compounds. Some of the PMD could be generated from one compounds such as isotopologue or in source reaction or MSMS fragmental ions. Some PMDs could also show the homologous series like lipid, which is structure similarity instead of reaction relationship. --- ## Quantitative and Qualitative analysis in Reactomics ### Reaction PMD - Reaction: `\(S_1 + S_2 + … + S_n = P_1 + P_2 + … + P_m \ (n>=1,m>=1)\)` - PMD matrix <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> \(S_1\) </th> <th style="text-align:center;"> \(S_2\) </th> <th style="text-align:center;"> ... </th> <th style="text-align:center;"> \(S_n\) </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> \(P_1\) </td> <td style="text-align:center;"> \(|S_1 - P_1|\) </td> <td style="text-align:center;"> \(|S_2 - P_1|\) </td> <td style="text-align:center;"> ... </td> <td style="text-align:center;"> \(|S_n - P_1|\) </td> </tr> <tr> <td style="text-align:left;"> \(P_2\) </td> <td style="text-align:center;"> \(|S_1 - P_2|\) </td> <td style="text-align:center;"> \(|S_2 - P_2|\) </td> <td style="text-align:center;"> ... </td> <td style="text-align:center;"> \(|S_n - P_2|\) </td> </tr> <tr> <td style="text-align:left;"> ... </td> <td style="text-align:center;"> ... </td> <td style="text-align:center;"> ... </td> <td style="text-align:center;"> ... </td> <td style="text-align:center;"> ... </td> </tr> <tr> <td style="text-align:left;"> \(P_m\) </td> <td style="text-align:center;"> \(|S_1 - P_m|\) </td> <td style="text-align:center;"> \(|S_2 - P_m|\) </td> <td style="text-align:center;"> ... </td> <td style="text-align:center;"> \(|S_n - P_m|\) </td> </tr> </tbody> </table> - Paired mass distances for certain parent compound: `$$PMD_{S_{k}} = min(|S_k-P_1|,|S_k-P_2|,...,|S_k-P_m|) (1 \le k \le n )$$` - Paired mass distances for certain reaction: `$$PMD_R = \{PMD_{S_{1}}, PMD_{S_{2}},...,PMD_{S_{n}}\}$$` --- ## Quantitative and Qualitative analysis in Reactomics ### KEGG reaction database | Chemical Composition | PMD | Freq | Example Reaction Class | Example Enzyme | |:--------------------:|:-------:|:----:|:----------------------:|:---------------------------:| | +2H | 2.016 | 1732 | RC00095 | acrylyl-CoA reductase | | +O | 15.995 | 1169 | RC00046 | quinaldate 4-oxidoreductase | | +HP3O | 79.966 | 729 | RC00002 | Phosphorus-oxygen lyases | | +C2H | 14.016 | 594 | RC00060 | lipid methyl transferase | | Isomer | 0 | 532 | RC00302 | glutamate racemase | | +2HO | 18.011 | 359 | RC00680 | allantoinase | | +6C10H5O | 162.053 | 365 | RC00049 | beta-lactosidase | ??? In this case, though we don't need to annotate all the compounds, a PMD database are still needed for PMD annotate as element composition and their corresponding reaction class or enzyme. Here, I analyzed all the reactions PMD from KEGG reaction database and removed the ions related PMD, which is not detectable by mass spectrometry. I found our known reactions are actually highly enriched by a few PMDs. The double bonds formation or broken reaction appeared in 1732 reactions, almost fifteen percent of the reactions in KEGG. In this case, for reactomics we could skip quantitative analysis of endless unknown compounds and focus on the changes of those enriched PMDs. --- ## Quantitative and Qualitative analysis in Reactomics ### KEGG reaction database | PMD | Freq | Example | |:-------:|:----:|:-----------------------------------------------------:| | 2.016 | 1459 | NAD(+) + succinate <=> fumarate + H(+) + NADH | | 15.995 | 1033 | NAD(+) + propanoyl-CoA <=> acryloyl-CoA + H(+) + NADH | | 79.966 | 664 | ATP + GDP <=> ADP + GTP | | 14.016 | 585 | deoxynogalonate + O2 <=> H(+) + H2O + nogalonate | | 0 | 495 | H2O + hypotaurine + NAD(+) <=> H(+) + NADH + taurine | | 18.011 | 344 | ATP + H2O <=> ADP + H(+) + phosphate | | 162.053 | 276 | acetyl-CoA + propanoate <=> acetate + propanoyl-CoA | | 159.933 | 251 | L-glutamate <=> D-glutamate | | 1.032 | 246 | H2O + lactose <=> D-galactose + D-glucose | | 42.011 | 224 | L-serine <=> 2-aminoprop-2-enoate + H2O | - Real reactions contain ions - Biased by known reactions --- ## Quantitative and Qualitative analysis in Reactomics ### HMDB compounds database | Elemental Composition | PMD(digits = 3) | Frequency | Accuracy | |:---------------------:|:---------------:|:---------:|:--------:| | +C2H | 14.016 | 4934 | 0.98 | | +2H | 2.016 | 4909 | 0.97 | | +2C4H | 28.031 | 4878 | 0.99 | | +2C2H | 26.016 | 4229 | 0.98 | | +O | 15.995 | 4214 | 0.99 | | +C | 12.000 | 3861 | 0.99 | | +4C8H | 56.063 | 3861 | 0.97 | | +3C6H | 42.047 | 3771 | 0.98 | | +C2HO | 30.011 | 3698 | 0.95 | | +2C | 24.000 | 3689 | 0.99 | --- ## Quantitative and Qualitative analysis in Reactomics ### HMDB compounds database | Elemental Composition | PMD(digits = 2) | Frequency | Accuracy | |:---------------------:|:---------------:|:---------:|:--------:| | +C2H | 14.02 | 8003 | 0.63 | | +2H | 2.02 | 7959 | 0.62 | | +2C4H | 28.03 | 7799 | 0.61 | | +2C2H | 26.02 | 7343 | 0.58 | | +O | 15.99 | 7731 | 0.56 | | +C | 12.00 | 7145 | 0.56 | | +4C8H | 56.06 | 6699 | 0.56 | | +3C6H | 42.05 | 6558 | 0.58 | | +C2HO | 30.01 | 6761 | 0.54 | | +2C | 24.00 | 6963 | 0.53 | --- ## Quantitative and Qualitative analysis in Reactomics ### HMDB compounds database | Elemental Composition | PMD(digits = 1) | Frequency | Accuracy | |:---------------------:|:---------------:|:---------:|:--------:| | +C2H | 14.0 | 50419 | 0.10 | | +2H | 2.0 | 50467 | 0.10 | | +2C4H | 28.0 | 50797 | 0.09 | | +2C2H | 26.0 | 48517 | 0.09 | | +O | 16.0 | 51278 | 0.09 | | +C | 12.0 | 49335 | 0.08 | | +4C8H | 56.1 | 36417 | 0.10 | | +3C6H | 42.0 | 49808 | 0.08 | | +C2HO | 30.0 | 51241 | 0.07 | | +2C | 24.0 | 48099 | 0.08 | --- ## Quantitative and Qualitative analysis in Reactomics ### HMDB compounds database | Elemental Composition | PMD(digits = 0) | Frequency | Accuracy | |:---------------------:|:---------------:|:---------:|:--------:| | +C2H | 14 | 156245 | 0.03 | | +2H | 2 | 156260 | 0.03 | | +2C4H | 28 | 155410 | 0.03 | | +2C2H | 26 | 154346 | 0.03 | | +O | 16 | 155811 | 0.03 | | +C | 12 | 155339 | 0.03 | | +4C8H | 56 | 151894 | 0.02 | | +3C6H | 42 | 153764 | 0.03 | | +C2HO | 30 | 154369 | 0.02 | | +2C | 24 | 154278 | 0.02 | --- ## Quantitative and Qualitative analysis in Reactomics ### Static and dynamic <img src="https://yufree.github.io/presentation/figure/qreaction.png" style="display: block; margin: auto;" /> --- ## Quantitative and Qualitative analysis in Reactomics ### Static v.s. dynamic - Static mass pairs: paired intensity ratio is stable across samples - Dynamic mass pairs: paired intensity ratio is not stable across samples - For example, [A,B], [C,D], [E,F] and [G,H] are involved in the same PMD: | A | B | Ins ratio | C | D | Ins ratio | E | F | Ins ratio | G | H | |:----:|:---:|:---------:|:---:|:--:|:---------:|:---:|:---:|:---------:|:---:|:---:| | 100 | 50 | 2:1 | 100 | 50 | 2:1 | 30 | 40 | 3:4 | 100 | 100 | | 1000 | 500 | 2:1 | 10 | 95 | 2:19 | 120 | 160 | 3:4 | 100 | 200 | - [A,B] and [E,F] are static mass pairs so can be used in quantitative analysis, rsd cutoff 30% - reaction amounts for the sum of [A,B] (150, 1500) and [E,F] (70, 280) are [220, 1780] for group I and group II ??? - Response factor is the slope of calibration curve for certain compound - Total intensity of all pairs with the same PMD - Count once for ions involved in multiple reactions When we have the list of enriched PMDs, we need to define how to make quantitative analysis of the paired mass distance. Here I proposed to use static PMD pairs, which always show stable intensity ratios across samples, for quantitative analysis of reactions. After the discussion of concept and technique issues, now we could see some application of reactomics. --- ## Quantitative and Qualitative analysis in Reactomics ### Static v.s. dynamic - Dynamic mass pairs scenario: none of the paired peaks is stable across samples - For example, [A,B], [C,D], [E,F] and [G,H] are involved in the same PMD: | A | B | Ins ratio | C | D | Ins ratio | E | F | Ins ratio | G | H | |:----:|:---:|:---------:|:---:|:--:|:---------:|:---:|:---:|:---------:|:---:|:---:| | 100 | 50 | 2:1 | 100 | 50 | 2:1 | 30 | 40 | 3:4 | 100 | 100 | | 1000 | 500 | 2:1 | 10 | 95 | 2:19 | 120 | 160 | 3:4 | 100 | 200 | - [C,D] is dynamic mass pair - Normalization by the minimized intensity of C or D, then quantify certain reactions on D or C - Normalization on C, Group I has 0.5 for D, Group II has 9.5 for D, reaction amounts are [0.5,9.5] for group I and group II --- # General reactomics data analysis framework ## Step 1: Remove redundant peaks - regular metabolomics / NTA workflow ## Step 2: Extract high frequency PMDs - reaction level evaluation ## Step 3: Relationship network - network based evaluation --- # General reactomics data analysis framework ## Demo data: [ST000560](https://www.metabolomicsworkbench.org/data/DRCCMetadata.php?Mode=Study&StudyID=ST000560) - Immortalized immunoglobulin-producing cell lines - 4 replicates from 6 cell lines - 3 patients with IgA nephropathy and 3 healthy controls - 1918 features or peaks --- class: inverse, center, middle # Reactomics Workflow ## STEP1: Remove redundant peaks --- ## Gap between features and compounds <img src="https://yufree.github.io/presentation/figure/peakcom.png" style="display: block; margin: auto;" /> --- ## GlobalStd Algorithm: before (1918) and after (196) <img src="https://yufree.github.io/presentation/figure/bags.png" style="display: block; margin: auto;" /> --- ## For annotation - PMDDA <img src="https://yufree.github.io/presentation/figure/rpppmdda2.png" width="60%" style="display: block; margin: auto;" /> .half[ Yu, M., Dolios, G., & Petrick, L. (2021). Reproducible Untargeted Metabolomics Data Analysis Workflow for Exhaustive MS/MS Annotation. https://doi.org/10.26434/chemrxiv.13565159.v1 ] --- class: inverse, center, middle # Reactomics Workflow ## STEP2: Extract high frequency PMDs --- ## High frequency PMDs for retrieving chemical relationship <img src="https://yufree.cn/en/2021/02/09/reactomics-workflow-template-within-rmwf-package/index_files/figure-html/unnamed-chunk-6-1.png" width="87%" style="display: block; margin: auto;" /> --- ## Reaction level changes <img src="https://yufree.cn/en/2021/02/09/reactomics-workflow-template-within-rmwf-package/index_files/figure-html/unnamed-chunk-7-1.png" width="87%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Reactomics Workflow ## STEP3: Relationship network --- ## PMD Network <img src="https://yufree.github.io/presentation/figure/netdemo.png" width="100%" style="display: block; margin: auto;" /> --- # Applications ## PMDDA Workflow ## Metabolites Discovery ## Endogenous or Exogenous ## Biomarker Reaction --- class: inverse, center, middle # Reactomics Application ## PMDDA Workflow --- ## Sensitivity matters - Target analysis could capture peaks with low intensity - Untargeted analysis would loss sensitivity to capture all peaks - Send unknown while independent peaks for MS/MS <img src="https://yufree.github.io/presentation/figure/missxcms.png" width="300px" style="display: block; margin: auto;" /> --- ## How many real compounds among features? <img src="https://yufree.github.io/presentation/figure/gapfc.jpg" style="display: block; margin: auto;" /> .half[ Mahieu, N. G., & Patti, G. J. (2017). Systems-Level Annotation of a Metabolomics Data Set Reduces 25 000 Features to Fewer than 1000 Unique Metabolites. Analytical Chemistry, 89(19), 10397–10406. https://doi.org/10.1021/acs.analchem.7b02380 ] --- ## Redundant peaks - Untargeted analysis would loss sensitivity to capture all peaks - Send unknown while independent peaks for MS/MS <img src="https://yufree.github.io/presentation/figure/peakcom.png" style="display: block; margin: auto;" /> --- ## GlobalStd Algorithm <img src="https://yufree.github.io/presentation/figure/GlobalStd.png" style="display: block; margin: auto;" /> .half[ Yu, M., Olkowicz, M., & Pawliszyn, J. (2019). Structure/reaction directed analysis for LC-MS based untargeted analysis. Analytica Chimica Acta, 1050, 16–24. https://doi.org/10.1016/j.aca.2018.10.062 ] --- ## GlobalStd Algorithm Step 1 ### Retention time cluster analysis <img src="pres_files/figure-html/unnamed-chunk-25-1.png" width="450px" style="display: block; margin: auto;" /> --- ## GlobalStd Algorithm Step 2 ### High frequency PMD analysis across RT clusters <img src="pres_files/figure-html/unnamed-chunk-26-1.png" width="450px" style="display: block; margin: auto;" /> --- ## GlobalStd Algorithm Step 2 ### High frequency PMD analysis across RT clusters - example - Based on data itself, those adducts/multiply charged ions/neutral loss/isotopologues can be unknown <img src="pres_files/figure-html/unnamed-chunk-27-1.png" width="300px" style="display: block; margin: auto;" /> --- ## GlobalStd Algorithm Step 3 ### Independent peaks selection <img src="pres_files/figure-html/unnamed-chunk-28-1.png" width="430px" style="display: block; margin: auto;" /> --- ## GlobalStd Algorithm Step 3 ### Independent peaks selection - example <img src="pres_files/figure-html/unnamed-chunk-29-1.png" width="450px" style="display: block; margin: auto;" /> --- ## GlobalStd Algorithm Step 3 ### Why redundant? - ~14.3% peaks can capture similar variances of all peaks - For CAMERA/RAMclust, peaks with highest intensity from pcgroup were selected as independent peaks <img src="https://yufree.github.io/presentation/figure/software.png" width="500px" style="display: block; margin: auto;" /> ??? - Similar to isotope labeled results (5% peaks) - Untargeted analysis does not mean big data --- ## Target compounds validation <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Independent peaks </th> <th style="text-align:right;"> Target compounds found </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> pmd </td> <td style="text-align:right;"> 985 </td> <td style="text-align:right;"> 18 </td> </tr> <tr> <td style="text-align:left;"> CAMERA </td> <td style="text-align:right;"> 1297 </td> <td style="text-align:right;"> 15 </td> </tr> <tr> <td style="text-align:left;"> RAMclust </td> <td style="text-align:right;"> 461 </td> <td style="text-align:right;"> 12 </td> </tr> <tr> <td style="text-align:left;"> profinder </td> <td style="text-align:right;"> 6628 </td> <td style="text-align:right;"> 7 </td> </tr> </tbody> </table> - 103 compounds for validation - 36 compounds could be found by xcms 6885 features - 7 could be found by profinder untargeted analysis 6628 features --- ## Untargeted MS/MS analysis - PMDDA - Only use GlobalStd peaks for MS/MS analysis - Multiple injections - MS/MS spectral library annotation on [GNPS](https://gnps.ucsd.edu) - Compare with Data Dependent Acquisition (DDA) (173 compounds) - Annotated 235 extra compounds and overlap 59 compounds - Less contaminant ions ??? - GNPS MS/MS annotation - 235:59:114 PMDDS:overlap:DDA --- ## Untargeted MS/MS analysis - PMDDA <img src="https://yufree.github.io/presentation/figure/rpppmdda.png" width="87%" style="display: block; margin: auto;" /> ??? - GNPS MS/MS annotation - 235:59:114 PMDDS:overlap:DDA --- ## Untargeted MS/MS analysis - PMMD Annotation - Use pmd and rank of pmd for annotation - Intensity filter(10%) and robust for noise - 957/1098 PMDR/HMDB QqQ data - some compounds share the same pmd 87% <img src="https://yufree.github.io/presentation/figure/pmmd.png" width="400px" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Reactomics Application ## Metabolites Discovery ??? The first application is metabolites discovery. Current workflow for metabolites discovery is the prediction of metabolites based on MS/MS database or pathway database and then match the predicted ions with real data. Here, based on reactomics, we proposed a local recursive search for metabolites discovery. --- ## Metabolites of exogenous compound - Environmental pollution metabolites - Drug metabolites ### Xenobiotic metabolism - Phase I - Oxidation (R-H => R-OH, pmd 15.995 Da) - Reduction (R-C=O => R-C-OH, pmd 2.016 Da) - Phase II - Methylation (R-OH => R-O-C,pmd 14.016 Da) - Sulfation (R-OH => R-SO4, pmd 46.976 Da) - Acetylation (R-OH => R-O-COCH3, pmd 42.011 Da) - Glucuronidation (R-NH2 => R-NH-C6H9O7, pmd 192.027 Da) - Glycosylation (R-OH => R-O-C6H11O5, pmd 162.053 Da) ??? Metabolites discovery is commen in two types of studies: environmental pollution metabolites and drug metabolites discovery. In general, it involved xenbiotic metabolism pathway. It's usually contain phase one reaction, which will be oxdation or reduction of original compounds. And then phase two reaction, which will add functional group to increase the polarity of compounds and help to release those exdogenous compounds into the environment. At chemical reaction level, there are only a few PMDs associated with this process. --- ## Metabolites of TBBPA in Pumpkin - Mass defect analysis to screen Brominated Compounds - Confirmation by synthesized standards <img src="https://yufree.github.io/presentation/figure/TBBPAmet.jpeg" width="600px" style="display: block; margin: auto;" /> .half[ Hou, X., Yu, M., Liu, A., Wang, X., Li, Y., Liu, J., Schnoor, J. L., & Jiang, G. (2019). Glycosylation of Tetrabromobisphenol A in Pumpkin. Environmental Science & Technology, 53(15), 8805–8812. https://doi.org/10.1021/acs.est.9b02122 ] ??? Here I will use an example to demo such process. Here is a study to find the metabolites of tetrabromobisphenol A in pumpkin. I am actually one of the authors of this published work. Here we use mass defect analysis to screen all the Brominated compounds and then make confirmation by synthesized standards. You can see that we found seven metabolites of TBBPA. If you checked the structure, you will find it seems the metabolites follow a reaction flow. Metabolites one is the glycosylation of TBBPA and metabolites two is the mylonylation of metabolites one. Metabolites three is the glycosylation of both side of the hydroxlation groups of TBBPA. It seems the metabolites could be extended by certain PMDs. --- ## Metabolites of TBBPA in Pumpkin - TBBPA Metabolites PMD network <img src="https://yufree.github.io/presentation/figure/TBBPA.png" width="500px" style="display: block; margin: auto;" /> .half[ Hou, X., Yu, M., Liu, A., Wang, X., Li, Y., Liu, J., Schnoor, J. L., & Jiang, G. (2019). Glycosylation of Tetrabromobisphenol A in Pumpkin. Environmental Science & Technology, 53(15), 8805–8812. https://doi.org/10.1021/acs.est.9b02122 ] ??? Since I am one of the authors of this work, I have the raw data of this study and I reanalyzed the data in terms of reactomics. What I did is that I gather the five PMDs reported in this study. They are debrominated, glycosylation, mylonylation, methylation and hydroxylation process. You might notice that the PMDs are different from formula. For example, the debrominated process involved both adding Bromine atom and removing hydrogen atom and the PMD could not be shown as chemical formula but a plus and minus format. Anyway, I labeled all the previous found compounds as node of a network. In this case, the metabolites discovery is not based on prediction but based on the data itself. If the searching could find a peak with certain PMD, the node will be extanded and new search will begin at the new node for predefined PMD. In this case, we could find the metabolites of metabolites of metabolites of metabolites of metabolites in few seconds and it turn out that we still miss lots of metabolites of TBBPA when we didn't expect those reaction flow. However, reactomics show the power to reveal all the possibility of metabolites. --- class: inverse, center, middle # Reactomics Application ## Endogenous or Exogenous ??? The second application of reactomics is an issue confusing lots of researchers. Do we have a way to separate endogenous compounds with exogenous compounds in our samples? For example, if we collect serum samples and get the untargeted metabolites profile. We want to link a disease with certain unknown compounds show differences among control and disease group. However, we have no clue whether this compound is from ourself or the environment. So, how could reactomics help? --- ## Endogenous vs Exogenous - T3DB Endogenous (255) vs Exogenous (705) - Use top 10 high frequency PMDs <img src="https://raw.githubusercontent.com/yufree/presentation/gh-pages/figure/t3db3.png" width="450px" style="display: block; margin: auto;" /> ??? Here I use T3DB, which is the only database with the annotation of endogenous or exogenous for their compounds. I collected all of the 255 endogenous compounds and 705 exogenous compounds and polled their exact mass together. Then I performed PMD analysis to get all of the possible PMD among those compounds. As I mentioned before, PMD will always enriched in few high frequency ones. Here we didn't predefine those set but directly use top 10 high frequency PMD and construct the PMD network as we did for pumpkin study and here is the results. I removed all the compounds without any high frequency PMDs relationship with other and you could find interesting pattern for endogenous compounds and exogenous compounds. Most of the endogeous compounds could be connected by few large clusters. As for exogenous compounds, through they have larger numbers, they could not form a larger network. In this case, we need to think what does endogenous means. It means this compound could be generated in a biological process and biological process is alway enriched by few high freqency PMDs. As for exogenous compounds, they are new to biological process and we might not have two many pathways to metabolize them. --- ## KEGG reaction network - Metabolites of four compounds - Average network distance: Glucose(9.74), 5-Cholestene(6.55), Caffeine(3.31), Bromophenol(1.8), TBBPA(3.88 for pumpkin) <img src="https://raw.githubusercontent.com/yufree/presentation/gh-pages/figure/kegg.png" width="450px" style="display: block; margin: auto;" /> ??? Let's see some real reactions from KEGG database. Here I use the top 10 high freqency PMDs from KEGG and generate the reaction network for four compounds. This is Glucose, which is very important to any biological process and it's a exogenous compound. You can see its network could be extended in all of high freqency PMD and lots of compounds could be coupled into this network. Then we could see Cholestene, which is also a endogenous compounds while not that important as Glucose. It's network is smaller. Then we could see Caffine, which is exogenous compounds for most animals. It also show a network while the network structure is much simpler than endogenous compounds. Then we could see bromophenol, which is also endogenous compounds and its network shows it didn't involved in too much reactions. OK, we could not always use our eye to make annotation of endogenous or exogenous. Based on reacotmics, the average network distance of their high frequency PMD network could be a good option. The average network distance means the average numbers of how many PMD connections within a network. If such number is large, we will see a much larger network for certain compounds. Those number for Glucose is 9.74 and for Bromophenol is 1.8. To be honest, there is no hard cut off for whether a compound is endogenous or exogenous. Some compounds will be endogenous for plant sample while exogenous for animal sample. In terms of reactomics, I suggest such cutoff should be generated from the samples themselves. Using the high freqency PMDs of your samples to get the PMD network for your specific samples. In my experience, you will always see few large clusters and lots of smaller clusters. Then when you find some compounds of your interests, check if this compound is on the larger PMD cluster or smaller cluster to guess whether this compound is from environment or not. Based on known compounds or T3DB, this method works in most cases. --- class: inverse, center, middle # Reactomics Application ## Biomarker Reaction ??? This is my favourite application of reactomics. I am from a medical school and all of our samples should be linked to healthcares. Biomarker is always of the most interests to our collaborator for certain disease. In the past, biomarker always means biomarker compounds and that's why lots of analytical paper spent lots of time on identification of unknown compounds. However, in reactomics such identification has been skipped by PMD. In this case, what we will check will not be biomarker compound, but biomarker reactions. I don't believe one compound could serve as biomarker well and when we talk about reactions, it always means at least two compounds and reaction level changes will be relatively stable for prediction. --- ## Lung cancer - MTBLS28 1005 human urine samples - PMD 2.02 Da show differences among control and diseases <img src="https://yufree.github.io/presentation/figure/mtbls28.png" width="500px" style="display: block; margin: auto;" /> ??? Though we had some exciting data, I prefer to use public data to avoid the bias of our methods. Here we use MTBLS28 dataset from metabolights. It contains 1005 human urine samples and linked to lung cancer. What I did is simply check the high freqency PMDs from the data itself and some known high freqency PMD from KEGG. As I mentioned in the quanlitative analysis of reactomics, only the pairs with stable intensity ratios were used to check if it's biomarker reaction or not. Here I found PMD 2.02 Da could be a biomarker reaction for lung cancer. As you could see here in the figure, lung cancer group will show a relatively lower PMD 2.02 Da level compared with control group. If this is true, urine samples could be used to predict disease at reaction level. --- ## How <div class="figure" style="text-align: center"> <img src="https://yufree.github.io/presentation/figure/owl.png" alt="Paper method v.s. Practical method in Metabolomics" width="72%" /> <p class="caption">Paper method v.s. Practical method in Metabolomics</p> </div> --- ## Software ### [pmd package](http://yufree.github.io/pmd/) - GlobalStd algorithm - pmd-reaction database based on KEGG/HMDB - quantitative pmd analysis - pmd network anaysis - pmd MS/MS annotation algorithm - GUI application based on shiny ### [rmwf package](https://github.com/yufree/rmwf) - NIST 1950 data - Rmarkdown template - From raw data to report - part of `xcmsrocker` docker image ??? Here I show three application of reactomics and you might find more. So what I did is write a R package to include all the functions for pmd analysis or reactomics analysis such as the pmd-reaction database based on KEGG/HMDB, quantitative pmd analysis, PMD network analysis. To make it clear, I only cover the reaction between different compounds instead of reactions from one compound. However, we do have functions to make MS/MS annotation based on pmd and you could check poster 304118 for details. Meanwhile, to make reactomics fully reproducible at the very beginning, I developed another R package called rmwf, which means reproducible metabolomics work flow. I included our lab data from NIST 1950, which is standard reference material. I included lots of R code and template to help the user to process the raw data into a report. This package is also part of xcmsrocker project, which try to creat a docker image for fully reproducible metabolomics data analysis environemnt based on RStudio. --- ## Take-home message ### Reaction is the "base pair" for small molecules ### Reactomics means the omics study at reaction or PMD level ### Real data always enriched in few PMDs or reactions ### Reaction level changes could be quantitative and predictable ??? Here is the take-home message for reactomics: reaction is the "base pairs" for small molecular, Reactomics means the omics study at reaction or PMD level, real data always enriched in few PMDs or reactions, reaction level changes could be quantitative and predictable. --- class: inverse, center, middle # Thanks ## Q&A ## miao.yu@mssm.edu