Chapter 5 Workflow

You could check this book for metabolomics data analysis (S. Li 2020).

DiagrammeR::mermaid("
flowchart TB
I(peak-picking) --> C
C(visulization) --> D(normalization/batch correction)
D --> A(annotation/identification)
A --> H(statistical analysis)
C --> A --> B(omics analysis)
D --> H
B --> H
H --> E(experimental validation)
A --> E
H --> A
B --> E
C --> H
")

5.1 Platform for metabolomics data analysis

Here is a list for related open source projects

5.1.1 XCMS & XCMS online

XCMS online is hosted by Scripps Institute. If your datasets are not large, XCMS online would be the best option for you. Recently they updated the online version to support more functions for systems biology. They use metlin and iso metlin to annotate the MS/MS data. Pathway analysis is also supported. Besides, to accelerate the process, xcms online employed stream (windows only). You could use stream to connect your instrument workstation to their server and process the data along with the data acquisition automate. They also developed apps for xcms online, but I think apps for slack would be even cooler to control the data processing.

xcms is different from xcms online while they might share the same code. I used it almost every data to run local metabolomics data analysis. Recently, they will change their version to xcms 3 with major update for object class. Their data format would integrate into the MSnbase package and the parameters would be easy to set up for each step. Normally, I will use msconvert-IPO-xcms-xMSannotator-metaboanalyst as workflow to process the offline data. It could accelerate the process by parallel processing. However, if you are not familiar with R, you would better to choose some software below. For xcms, 1000 files will need around 5 hours to generate the peaks list on a regular workstation.

IPO A Tool for automated Optimization of XCMS Parameters (Libiseller et al. 2015) and Warpgroup is used for chromatogram subregion detection, consensus integration bound determination and accurate missing value integration(Mahieu, Spalding, and Patti 2016). A case study to compare different xcms parameters with IPO can be found for GC-MS (Dos Santos and Canuto 2023). Another option is AutoTuner, which are much faster than IPO(McLean and Kujawinski 2020). Recently, MetaboAnalystR 3.0 could also optimize the parameters for xcms while you need to perform the following analysis within this software(Pang et al. 2020). For IPO, ten files will need ~12 hours to generate the optimized results on a regular workstation. Paramounter is a direct measurement of universal parameters to process metabolomics data in a “White Box”(J. Guo, Shen, and Huan 2022). Another research use machine learning method to compare different optimization methods and they are all better than the default setting of xcms(Lassen et al. 2021). It could be extended to include ion mobility(Dodds et al. 2022).

Check those papers for the XCMS based workflow(Forsberg et al. 2018; Huan et al. 2017; Mahieu et al. 2016; Montenegro-Burke et al. 2017; Domingo-Almenara and Siuzdak 2020; Stancliffe et al. 2022). For metlin related annotation, check those papers(Guijas et al. 2018; Tautenhahn et al. 2012; Xue, Guijas, et al. 2020; Domingo-Almenara, Montenegro-Burke, Ivanisevic, et al. 2018).

MAIT based on xcms and you could find source code here(Fernández-Albert et al. 2014).

iMet-Q is an automated tool with friendly user interfaces for quantifying metabolites in full-scan liquid chromatography-mass spectrometry (LC-MS) data (Chang et al. 2016)

compMS2Miner is an Automatable Metabolite Identification, Visualization, and Data-Sharing R Package for High-Resolution LC–MS Data Sets. Here is related papers (Edmands et al. 2017; Edmands, Hayes, and Rappaport 2018; Edmands, Barupal, and Scalbert 2015).

mzMatch is a modular, open source and platform independent data processing pipeline for metabolomics LC/MS data written in the Java language, which could be coupled with xcms (Scheltema et al. 2011; Creek et al. 2012). It also could be used for annotation with MetAssign(Daly et al. 2014).

5.1.2 PRIMe

PRIMe is from RIKEN and UC Davis. They update their database frequently(Tsugawa et al. 2016). It supports mzML and major MS vendor formats. They defined own file format ABF and eco-system for omics studies. The software are updated almost everyday. You could use MS-DIAL for untargeted analysis and MRMOROBS for targeted analysis. For annotation, they developed MS-FINDER and statistic tools with excel. This platform could replaced the dear software from company and well prepared for MS/MS data analysis and lipidomics. They are open source, work on Windows and also could run within mathmamtics. However, they don’t cover pathway analysis. Another feature is they always show the most recently spectral records from public repositories. You could always get the updated MSP spectra files for your own data analysis.

For PRIMe based workflow, check those papers(Lai et al. 2018; Matsuo et al. 2017; Treutler et al. 2016; Tsugawa et al. 2015; Tsugawa et al. 2016; Kind et al. 2018). There are also extensions for their workflow(Uchino et al. 2022) and workflow for environmental science(Bonnefille et al. 2023).

5.1.3 GNPS

GNPS is an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. It’s a straight forward annotation methods for MS/MS data. Feature-based molecular networking (FBMN) within GNPS could be coupled with xcms, openMS, MS-DIAL, MZmine2, and other popular software. GNPS also have a dashboard for online mass spectrometery data analysis(Petras et al. 2021).

Check those papers for GNPS and related projects(Aron et al. 2020; Nothias et al. 2020; Scheubert et al. 2017; Ricardo R. da Silva et al. 2018; M. Wang et al. 2016; Bittremieux et al. 2023).

5.1.4 OpenMS & SIRIUS

OpenMS is another good platform for mass spectrum data analysis developed with C++. You could use them as plugin of KNIME. I suggest anyone who want to be a data scientist to get familiar with platform like KNIME because they supplied various API for different programme language, which is easy to use and show every steps for others. Also TOPPView in OpenMS could be the best software to visualize the MS data. You could always use the metabolomics workflow to train starter about details in data processing. pyOpenMS and OpenSWATH are also used in this platform. If you want to turn into industry, this platform fit you best because you might get a clear idea about solution and workflow.

Check those paper for OpenMS based workflow(Bertsch et al. 2011; Pfeuffer et al. 2017, 2024; Röst et al. 2014, 2016; Rurik et al. 2020; Alka et al. 2020).

OpenMS could be coupled to SIRIUS 4 for annotation. Sirius is a new java-based software framework for discovering a landscape of de-novo identification of metabolites using single and tandem mass spectrometry. SIRIUS 4 project integrates a collection of our tools, including CSI:FingerID, ZODIAC and CANOPUS. Check those papers for SIRIUS based workflow(Dührkop et al. 2019, 2020; Alka et al. 2020; Ludwig et al. 2020).

5.1.5 MZmine 2

MZmine 2 has three version developed on Java platform and the lastest version is included into MSDK. Similar function could be found from MZmine 2 as shown in XCMS online. However, MZmine 2 do not have pathway analysis. You could use metaboanalyst for that purpose. Actually, you could go into MSDK to find similar function supplied by ProteoSuite and Openchrom. If you are a experienced coder for Java, you should start here.

Check those papers for MZmine based workflow(Pluskal et al. 2010; Pluskal et al. 2020).

5.1.6 Emory MaHPIC

This platform is composed by several R packages from Emory University including apLCMS to collect the data, xMSanalyzer to handle automated pipeline for large-scale, non-targeted metabolomics data, xMSannotator for annotation of LC-MS data and Mummichog for pathway and network analysis for high-throughput metabolomics. This platform would be preferred by someone from environmental science to study exposome.

You could check those papers for Emory workflow(Uppal et al. 2013; Uppal, Walker, and Jones 2017; T. Yu et al. 2009; S. Li et al. 2013; Q. Liu et al. 2020).

5.1.7 Others

PMDDA is a reproducible workflow for exhaustive MS2 data acquisition of MS1 features(M. Yu, Dolios, and Petrick 2022) will data and script available online.
tidymass is an object-oriented reproducible analysis framework for LC–MS data(Shen et al. 2022).
R for mass spectrometry is a R software collection for the analysis and interpretation of high throughput mass spectrometry assays.
MAVEN from Princeton University (Melamud, Vastag, and Rabinowitz 2010; Clasquin, Melamud, and Rabinowitz 2012).
metabolomics is a CRAN package for analysis of metabolomics data.
autoGCMSDataAnal is a Matlab based comprehensive data analysis strategy for GC-MS-based untargeted metabolomics and AntDAS2 provided An automatic data analysis strategy for UPLC-HRMS-based metabolomics(Y.-J. Yu et al. 2019; Y.-Y. Zhang et al. 2020).
enviGCMS from environmental non-targeted analysis and rmwf for reproducible metabolomics workflow (M. Yu et al. 2020; M. Yu, Olkowicz, and Pawliszyn 2019).
Pseudotargeted metabolomics method (Zheng et al. 2020; Y. Wang et al. 2016).
pySM provides a reference implementation of our pipeline for False Discovery Rate-controlled metabolite annotation of high-resolution imaging mass spectrometry data (Palmer et al. 2017).
TinyMS is a Python-Based Pipeline for Preprocessing LC–MS Data for Untargeted Metabolomics Workflows (Riquelme et al. 2020)
MetaboliteDetector is a QT4 based software package for the analysis of GC/MS based metabolomics data (Hiller et al. 2009).
W4M and metaX could analysis data online (Giacomoni et al. 2015; Wen et al. 2017; Jalili et al. 2020).
FTMSVisualization is a suite of tools for visualizing complex mixture FT-MS data (Kew et al. 2017)
magma could predict and match MS/MS files.
metabCombiner Paired Untargeted LC-HRMS Metabolomics Feature Matching and Concatenation of Disparately Acquired Data Sets(Habra et al. 2021)
SLAW is a scalable and self-Optimizing processing workflow for Untargeted LC-MS with a docker image (Delabriere et al. 2021).
patRoon: open source software platform for environmental mass spectrometry based non-target screening (Helmus et al. 2021).
‘shape-orientated’ algorithm: A new ‘shape-orientated’ continuous wavelet transform (CWT)-based algorithm employing an adapted Marr wavelet (AMW) with a shape matching index (SMI), defined as peak height normalized wavelet coefficient for feature filtering, was developed for chromatographic peak detection and quantification. (Bai et al. 2022)
automRm An R Package for Fully Automatic LC-QQQ-MS Data Preprocessing Powered by Machine Learning. (Eilertz, Mitterer, and Buescher 2022)
IDSL.UFAIntrinsic Peak Analysis (IPA) for HRMS Data. (Baygi et al. 2022)
DEIMoS: An Open-Source Tool for Processing High-Dimensional Mass Spectrometry Data (Colby et al. 2022)
Omics Untargeted Key Script is a tools to make untargeted LC-MS metabolomic profiling with the latest computational features readily accessible in a ready-to-use unified manner to a research community(Plyushchenko et al. 2022).
MetEx is a targeted extraction strategy for improving the coverage and accuracy of metabolite annotation(Zheng et al. 2022).
Asari:Trackable and scalable LC-MS metabolomics data processing software in Python(S. Li et al. 2023)
NOMspectra: An Open-Source Python Package for Processing High Resolution Mass Spectrometry Data on Natural Organic Matter(Volikov, Rukhovich, and Perminova 2023)
MARS:A Multipurpose Software for Untargeted LC−MS-Based Metabolomics and Exposomics with GUI in C++ (Goracci et al. 2024)
MeRgeION: a Multifunctional R Pipeline for Small Molecule LC-MS/MS Data Processing, Searching, and Organizing (Y. Liu et al. 2023)

5.1.8 Workflow Comparison

Here are some comparisons for different workflow and you could make selection based on their works(Myers et al. 2017; Weber et al. 2017; Z. Li et al. 2018; Liao et al. 2023).

xcmsrocker is a docker image for metabolomics to compare R based software with template(M. Yu, Dolios, and Petrick 2022).

5.2 Project Setup

I suggest building your data analysis projects in RStudio (Click File - New project - New dictionary - Empty project). Then assign a name for your project. I also recommend the following tips if you are familiar with it.

Use git/github to make version control of your code and sync your project online.
Don’t use your name for your project because other peoples might cooperate with you and someone might check your data when you publish your papers. Each project should be a work for one paper or one chapter in your thesis.
Use workflow document(txt or doc) in your project to record all of the steps and code you performed for this project. Treat this document as digital version of your experiment notebook
Use data folder in your project folder for the raw data and the results you get in data analysis
Use figure folder in your project folder for the figure
Use munuscript folder in your project folder for the manuscript (you could write paper in rstudio with the help of template in Rmarkdown)
Just double click \[yourprojectname\].Rproj to start your project

5.4 Contest

CASMI predict small molecular contest(Blaženović et al. 2017)

References

Alka, Oliver, Timo Sachsenberg, Leon Bichmann, Julianus Pfeuffer, Hendrik Weisser, Samuel Wein, Eugen Netz, Marc Rurik, Oliver Kohlbacher, and Hannes Röst. 2020. “CHAPTER 6:OpenMS and KNIME for Mass Spectrometry Data Processing.” In Processing Metabolomics and Proteomics Data with Open Software, 201–31. https://doi.org/10.1039/9781788019880-00201.

Aron, Allegra T., Emily C. Gentry, Kerry L. McPhail, Louis-Félix Nothias, Mélissa Nothias-Esposito, Amina Bouslimani, Daniel Petras, et al. 2020. “Reproducible Molecular Networking of Untargeted Mass Spectrometry Data Using GNPS.” Nature Protocols 15 (6): 1954–91. https://doi.org/10.1038/s41596-020-0317-5.

Bai, Caihong, Suyun Xu, Jingyi Tang, Yuxi Zhang, Jiahui Yang, and Kaifeng Hu. 2022. “A ‘Shape-Orientated’ Algorithm Employing an Adapted Marr Wavelet and Shape Matching Index Improves the Performance of Continuous Wavelet Transform for Chromatographic Peak Detection and Quantification.” Journal of Chromatography A 1673 (June): 463086. https://doi.org/10.1016/j.chroma.2022.463086.

Baygi, Sadjad Fakouri, Sanjay K. Banerjee, Praloy Chakraborty, Yashwant Kumar, and Dinesh Kumar Barupal. 2022. “IDSL.UFA Assigns High-Confidence Molecular Formula Annotations for Untargeted LC/HRMS Data Sets in Metabolomics and Exposomics.” Analytical Chemistry 94 (39): 13315–22. https://doi.org/10.1021/acs.analchem.2c00563.

Bertsch, Andreas, Clemens Gröpl, Knut Reinert, and Oliver Kohlbacher. 2011. “OpenMS and TOPP: Open Source Software for LC-MS Data Analysis.” In Data Mining in Proteomics: From Standards to Applications, edited by Michael Hamacher, Martin Eisenacher, and Christian Stephan, 353–67. Methods in Molecular Biology. Totowa, NJ: Humana Press. https://doi.org/10.1007/978-1-60761-987-1_23.

Bittremieux, Wout, Nicole E. Avalon, Sydney P. Thomas, Sarvar A. Kakhkhorov, Alexander A. Aksenov, Paulo Wender P. Gomes, Christine M. Aceves, et al. 2023. “Open Access Repository-Scale Propagated Nearest Neighbor Suspect Spectral Library for Untargeted Metabolomics.” Nature Communications 14 (1): 8488. https://doi.org/10.1038/s41467-023-44035-y.

Blaženović, Ivana, Tobias Kind, Hrvoje Torbašinović, Slobodan Obrenović, Sajjan S. Mehta, Hiroshi Tsugawa, Tobias Wermuth, et al. 2017. “Comprehensive Comparison of in Silico MS/MS Fragmentation Tools of the CASMI Contest: Database Boosting Is Needed to Achieve 93% Accuracy.” Journal of Cheminformatics 9 (1): 32. https://doi.org/10.1186/s13321-017-0219-x.

Bonnefille, Bénilde, Oskar Karlsson, May Britt Rian, Rubhana Raqib, Faruque Parvez, Stefano Papazian, M. Sirajul Islam, and Jonathan W. Martin. 2023. “Nontarget Analysis of Polluted Surface Waters in Bangladesh Using Open Science Workflows.” Environmental Science & Technology, April. https://doi.org/10.1021/acs.est.2c08200.

Carroll, Adam J., Murray R. Badger, and A. Harvey Millar. 2010. “The MetabolomeExpress Project: Enabling Web-Based Processing, Analysis and Transparent Dissemination of GC/MS Metabolomics Datasets.” BMC Bioinformatics 11 (1): 376. https://doi.org/10.1186/1471-2105-11-376.

Chang, Hui-Yin, Ching-Tai Chen, T. Mamie Lih, Ke-Shiuan Lynn, Chiun-Gung Juo, Wen-Lian Hsu, and Ting-Yi Sung. 2016. “iMet-Q: A User-Friendly Tool for Label-Free Metabolomics Quantitation Using Dynamic Peak-Width Determination.” PLOS ONE 11 (1): e0146112. https://doi.org/10.1371/journal.pone.0146112.

Clasquin, Michelle F., Eugene Melamud, and Joshua D. Rabinowitz. 2012. “LC-MS Data Processing with MAVEN: A Metabolomic Analysis and Visualization Engine.” Current Protocols in Bioinformatics 37 (1): 14.11.1–23. https://doi.org/10.1002/0471250953.bi1411s37.

Colby, Sean M., Christine H. Chang, Jessica L. Bade, Jamie R. Nunez, Madison R. Blumer, Daniel J. Orton, Kent J. Bloodsworth, et al. 2022. “DEIMoS: An Open-Source Tool for Processing High-Dimensional Mass Spectrometry Data.” Analytical Chemistry 94 (16): 6130–38. https://doi.org/10.1021/acs.analchem.1c05017.

Creek, Darren J., Andris Jankevics, Karl E. V. Burgess, Rainer Breitling, and Michael P. Barrett. 2012. “IDEOM: An Excel Interface for Analysis of LC–MS-based Metabolomics Data.” Bioinformatics 28 (7): 1048–49. https://doi.org/10.1093/bioinformatics/bts069.

Daly, Rónán, Simon Rogers, Joe Wandy, Andris Jankevics, Karl E. V. Burgess, and Rainer Breitling. 2014. “MetAssign: Probabilistic Annotation of Metabolites from LC–MS Data Using a Bayesian Clustering Approach.” Bioinformatics 30 (19): 2764–71. https://doi.org/10.1093/bioinformatics/btu370.

Delabriere, Alexis, Philipp Warmer, Vincenth Brennsteiner, and Nicola Zamboni. 2021. “SLAW: A Scalable and Self-Optimizing Processing Workflow for Untargeted LC-MS.” Analytical Chemistry 93 (45): 15024–32. https://doi.org/10.1021/acs.analchem.1c02687.

Dodds, James N., Lingjue Wang, Gary J. Patti, and Erin S. Baker. 2022. “Combining Isotopologue Workflows and Simultaneous Multidimensional Separations to Detect, Identify, and Validate Metabolites in Untargeted Analyses.” Analytical Chemistry 94 (5): 2527–35. https://doi.org/10.1021/acs.analchem.1c04430.

Domingo-Almenara, Xavier, J. Rafael Montenegro-Burke, Julijana Ivanisevic, Aurelien Thomas, Jonathan Sidibé, Tony Teav, Carlos Guijas, et al. 2018. “XCMS-MRM and METLIN-MRM: A Cloud Library and Public Resource for Targeted Analysis of Small Molecules.” Nature Methods 15 (9): 681–84. https://doi.org/10.1038/s41592-018-0110-3.

Domingo-Almenara, Xavier, and Gary Siuzdak. 2020. “Metabolomics Data Processing Using XCMS.” In Computational Methods and Data Analysis for Metabolomics, edited by Shuzhao Li, 11–24. Methods in Molecular Biology. New York, NY: Springer US. https://doi.org/10.1007/978-1-0716-0239-3_2.

Dos Santos, Emile Kelly Porto, and Gisele André Baptista Canuto. 2023. “Optimizing XCMS Parameters for GC-MS Metabolomics Data Processing: A Case Study.” Metabolomics: Official Journal of the Metabolomic Society 19 (4): 26. https://doi.org/10.1007/s11306-023-01992-1.

Dührkop, Kai, Markus Fleischauer, Marcus Ludwig, Alexander A. Aksenov, Alexey V. Melnik, Marvin Meusel, Pieter C. Dorrestein, Juho Rousu, and Sebastian Böcker. 2019. “SIRIUS 4: A Rapid Tool for Turning Tandem Mass Spectra into Metabolite Structure Information.” Nature Methods 16 (4): 299–302. https://doi.org/10.1038/s41592-019-0344-8.

Dührkop, Kai, Louis-Félix Nothias, Markus Fleischauer, Raphael Reher, Marcus Ludwig, Martin A. Hoffmann, Daniel Petras, et al. 2020. “Systematic Classification of Unknown Metabolites Using High-Resolution Fragmentation Mass Spectra.” Nature Biotechnology, November, 1–10. https://doi.org/10.1038/s41587-020-0740-8.

Edmands, William M. B., Dinesh K. Barupal, and Augustin Scalbert. 2015. “MetMSLine: An Automated and Fully Integrated Pipeline for Rapid Processing of High-Resolution LC–MS Metabolomic Datasets.” Bioinformatics 31 (5): 788–90. https://doi.org/10.1093/bioinformatics/btu705.

Edmands, William M. B., Josie Hayes, and Stephen M. Rappaport. 2018. “SimExTargId: A Comprehensive Package for Real-Time LC-MS Data Acquisition and Analysis.” Bioinformatics 34 (20): 3589–90. https://doi.org/10.1093/bioinformatics/bty218.

Edmands, William M. B., Lauren Petrick, Dinesh K. Barupal, Augustin Scalbert, Mark J. Wilson, Jeffrey K. Wickliffe, and Stephen M. Rappaport. 2017. “compMS2Miner: An Automatable Metabolite Identification, Visualization, and Data-Sharing R Package for High-Resolution LC–MS Data Sets.” Analytical Chemistry 89 (7): 3919–28. https://doi.org/10.1021/acs.analchem.6b02394.

Eilertz, Daniel, Michael Mitterer, and Joerg M. Buescher. 2022. “automRm: An R Package for Fully Automatic LC-QQQ-MS Data Preprocessing Powered by Machine Learning.” Analytical Chemistry 94 (16): 6163–71. https://doi.org/10.1021/acs.analchem.1c05224.

Fernández-Albert, Francesc, Rafael Llorach, Cristina Andrés-Lacueva, and Alexandre Perera. 2014. “An R Package to Analyse LC/MS Metabolomic Data: MAIT (Metabolite Automatic Identification Toolkit).” Bioinformatics 30 (13): 1937–39. https://doi.org/10.1093/bioinformatics/btu136.

Forsberg, Erica M., Tao Huan, Duane Rinehart, H. Paul Benton, Benedikt Warth, Brian Hilmers, and Gary Siuzdak. 2018. “Data Processing, Multi-Omic Pathway Mapping, and Metabolite Activity Analysis Using XCMS Online.” Nature Protocols 13 (4): 633–51. https://doi.org/10.1038/nprot.2017.151.

Giacomoni, Franck, Gildas Le Corguillé, Misharl Monsoor, Marion Landi, Pierre Pericard, Mélanie Pétéra, Christophe Duperier, et al. 2015. “Workflow4Metabolomics: A Collaborative Research Infrastructure for Computational Metabolomics.” Bioinformatics 31 (9): 1493–95. https://doi.org/10.1093/bioinformatics/btu813.

Goracci, Laura, Paolo Tiberi, Stefano Di Bona, Stefano Bonciarelli, Giovanna Ilaria Passeri, Marta Piroddi, Simone Moretti, Claudia Volpi, Ismael Zamora, and Gabriele Cruciani. 2024. “MARS: A Multipurpose Software for Untargeted LC–MS-Based Metabolomics and Exposomics.” Analytical Chemistry, January. https://doi.org/10.1021/acs.analchem.3c03620.

Guijas, Carlos, J. Rafael Montenegro-Burke, Xavier Domingo-Almenara, Amelia Palermo, Benedikt Warth, Gerrit Hermann, Gunda Koellensperger, et al. 2018. “METLIN: A Technology Platform for Identifying Knowns and Unknowns.” Analytical Chemistry 90 (5): 3156–64. https://doi.org/10.1021/acs.analchem.7b04424.

Guo, Jian, Sam Shen, and Tao Huan. 2022. “Paramounter: Direct Measurement of Universal Parameters To Process Metabolomics Data in a ‘White Box’.” Analytical Chemistry, March. https://doi.org/10.1021/acs.analchem.1c04758.

Habra, Hani, Maureen Kachman, Kevin Bullock, Clary Clish, Charles R. Evans, and Alla Karnovsky. 2021. “metabCombiner: Paired Untargeted LC-HRMS Metabolomics Feature Matching and Concatenation of Disparately Acquired Data Sets.” Analytical Chemistry 93 (12): 5028–36. https://doi.org/10.1021/acs.analchem.0c03693.

Haug, Kenneth, Reza M Salek, and Christoph Steinbeck. 2017. “Global Open Data Management in Metabolomics.” Current Opinion in Chemical Biology, Omics, 36 (February): 58–63. https://doi.org/10.1016/j.cbpa.2016.12.024.

Helmus, Rick, Thomas L. ter Laak, Annemarie P. van Wezel, Pim de Voogt, and Emma L. Schymanski. 2021. “patRoon: Open Source Software Platform for Environmental Mass Spectrometry Based Non-Target Screening.” Journal of Cheminformatics 13 (1): 1. https://doi.org/10.1186/s13321-020-00477-w.

Hiller, Karsten, Jasper Hangebrauk, Christian Jäger, Jana Spura, Kerstin Schreiber, and Dietmar Schomburg. 2009. “MetaboliteDetector: Comprehensive Analysis Tool for Targeted and Nontargeted GC/MS Based Metabolome Analysis.” Analytical Chemistry 81 (9): 3429–39. https://doi.org/10.1021/ac802689c.

Huan, Tao, Erica M. Forsberg, Duane Rinehart, Caroline H. Johnson, Julijana Ivanisevic, H. Paul Benton, Mingliang Fang, et al. 2017. “Systems Biology Guided by XCMS Online Metabolomics.” Nature Methods 14 (5): 461–62. https://doi.org/10.1038/nmeth.4260.

Jalili, Vahid, Enis Afgan, Qiang Gu, Dave Clements, Daniel Blankenberg, Jeremy Goecks, James Taylor, and Anton Nekrutenko. 2020. “The Galaxy Platform for Accessible, Reproducible and Collaborative Biomedical Analyses: 2020 Update.” Nucleic Acids Research 48 (W1): W395–402. https://doi.org/10.1093/nar/gkaa434.

Kew, William, John W. T. Blackburn, David J. Clarke, and Dušan Uhrín. 2017. “Interactive van Krevelen Diagrams – Advanced Visualisation of Mass Spectrometry Data of Complex Mixtures.” Rapid Communications in Mass Spectrometry 31 (7): 658–62. https://doi.org/10.1002/rcm.7823.

Kind, Tobias, Hiroshi Tsugawa, Tomas Cajka, Yan Ma, Zijuan Lai, Sajjan S. Mehta, Gert Wohlgemuth, et al. 2018. “Identification of Small Molecules Using Accurate Mass MS/MS Search.” Mass Spectrometry Reviews 37 (4): 513–32. https://doi.org/10.1002/mas.21535.

Lai, Zijuan, Hiroshi Tsugawa, Gert Wohlgemuth, Sajjan Mehta, Matthew Mueller, Yuxuan Zheng, Atsushi Ogiwara, et al. 2018. “Identifying Metabolites by Integrating Metabolome Databases with Mass Spectrometry Cheminformatics.” Nature Methods 15 (1): 53–56. https://doi.org/10.1038/nmeth.4512.

Lassen, Johan, Kirstine Lykke Nielsen, Mogens Johannsen, and Palle Villesen. 2021. “Assessment of XCMS Optimization Methods with Machine-Learning Performance.” Analytical Chemistry 93 (40): 13459–66. https://doi.org/10.1021/acs.analchem.1c02000.

Li, Shuzhao. 2020. Computational Methods and Data Analysis for Metabolomics. Springer.

Li, Shuzhao, Youngja Park, Sai Duraisingham, Frederick H. Strobel, Nooruddin Khan, Quinlyn A. Soltow, Dean P. Jones, and Bali Pulendran. 2013. “Predicting Network Activity from High Throughput Metabolomics.” PLOS Computational Biology 9 (7): e1003123. https://doi.org/10.1371/journal.pcbi.1003123.

Li, Shuzhao, Amnah Siddiqa, Maheshwor Thapa, Yuanye Chi, and Shujian Zheng. 2023. “Trackable and Scalable LC-MS Metabolomics Data Processing Using Asari.” Nature Communications 14 (1): 4113. https://doi.org/10.1038/s41467-023-39889-1.

Li, Zhucui, Yan Lu, Yufeng Guo, Haijie Cao, Qinhong Wang, and Wenqing Shui. 2018. “Comprehensive Evaluation of Untargeted Metabolomics Data Processing Software in Feature Detection, Quantification and Discriminating Marker Selection.” Analytica Chimica Acta 1029 (October): 50–57. https://doi.org/10.1016/j.aca.2018.05.001.

Liao, Jingyu, Yuhao Zhang, Wendan Zhang, Yuanyuan Zeng, Jing Zhao, Jingfang Zhang, Tingting Yao, et al. 2023. “Different Software Processing Affects the Peak Picking and Metabolic Pathway Recognition of Metabolomics Data.” Journal of Chromatography A 1687 (January): 463700. https://doi.org/10.1016/j.chroma.2022.463700.

Libiseller, Gunnar, Michaela Dvorzak, Ulrike Kleb, Edgar Gander, Tobias Eisenberg, Frank Madeo, Steffen Neumann, et al. 2015. “IPO: A Tool for Automated Optimization of XCMS Parameters.” BMC Bioinformatics 16 (April): 118. https://doi.org/10.1186/s12859-015-0562-8.

Liu, Qin, Douglas Walker, Karan Uppal, Zihe Liu, Chunyu Ma, ViLinh Tran, Shuzhao Li, Dean P. Jones, and Tianwei Yu. 2020. “Addressing the Batch Effect Issue for LC/MS Metabolomics Data in Data Preprocessing.” Scientific Reports 10 (1): 13856. https://doi.org/10.1038/s41598-020-70850-0.

Liu, Youzhong, Yingjie Zhang, Tom Vennekens, Jennifer L. Lippens, Luc Duijsens, Danh Bui-Thi, Kris Laukens, and Thomas de Vijlder. 2023. “MeRgeION: A Multifunctional R Pipeline for Small Molecule LC-MS/MS Data Processing, Searching, and Organizing.” Analytical Chemistry 95 (22): 8433–42. https://doi.org/10.1021/acs.analchem.2c04343.

Ludwig, Marcus, Louis-Félix Nothias, Kai Dührkop, Irina Koester, Markus Fleischauer, Martin A. Hoffmann, Daniel Petras, et al. 2020. “Database-Independent Molecular Formula Annotation Using Gibbs Sampling Through ZODIAC.” Nature Machine Intelligence 2 (10): 629–41. https://doi.org/10.1038/s42256-020-00234-6.

Mahieu, Nathaniel G., Jonathan L. Spalding, Susan J. Gelman, and Gary J. Patti. 2016. “Defining and Detecting Complex Peak Relationships in Mass Spectral Data: The Mz.unity Algorithm.” Analytical Chemistry 88 (18): 9037–46. https://doi.org/10.1021/acs.analchem.6b01702.

Mahieu, Nathaniel G., Jonathan L. Spalding, and Gary J. Patti. 2016. “Warpgroup: Increased Precision of Metabolomic Data Processing by Consensus Integration Bound Analysis.” Bioinformatics 32 (2): 268–75. https://doi.org/10.1093/bioinformatics/btv564.

Matsuo, Teruko, Hiroshi Tsugawa, Hiromi Miyagawa, and Eiichiro Fukusaki. 2017. “Integrated Strategy for Unknown EI–MS Identification Using Quality Control Calibration Curve, Multivariate Analysis, EI–MS Spectral Database, and Retention Index Prediction.” Analytical Chemistry 89 (12): 6766–73. https://doi.org/10.1021/acs.analchem.7b01010.

McLean, Craig, and Elizabeth B. Kujawinski. 2020. “AutoTuner: High Fidelity and Robust Parameter Selection for Metabolomics Data Processing.” Analytical Chemistry 92 (8): 5724–32. https://doi.org/10.1021/acs.analchem.9b04804.

Melamud, Eugene, Livia Vastag, and Joshua D. Rabinowitz. 2010. “Metabolomic Analysis and Visualization Engine for LC-MS Data.” Analytical Chemistry 82 (23): 9818–26. https://doi.org/10.1021/ac1021166.

Montenegro-Burke, J. Rafael, Aries E. Aisporna, H. Paul Benton, Duane Rinehart, Mingliang Fang, Tao Huan, Benedikt Warth, et al. 2017. “Data Streaming for Metabolomics: Accelerating Data Processing and Analysis from Days to Minutes.” Analytical Chemistry 89 (2): 1254–59. https://doi.org/10.1021/acs.analchem.6b03890.

Myers, Owen D., Susan J. Sumner, Shuzhao Li, Stephen Barnes, and Xiuxia Du. 2017. “Detailed Investigation and Comparison of the XCMS and MZmine 2 Chromatogram Construction and Chromatographic Peak Detection Methods for Preprocessing Mass Spectrometry Metabolomics Data.” Analytical Chemistry 89 (17): 8689–95. https://doi.org/10.1021/acs.analchem.7b01069.

Nothias, Louis-Félix, Daniel Petras, Robin Schmid, Kai Dührkop, Johannes Rainer, Abinesh Sarvepalli, Ivan Protsyuk, et al. 2020. “Feature-Based Molecular Networking in the GNPS Analysis Environment.” Nature Methods 17 (9): 905–8. https://doi.org/10.1038/s41592-020-0933-6.

Palmer, Andrew, Prasad Phapale, Ilya Chernyavsky, Regis Lavigne, Dominik Fay, Artem Tarasov, Vitaly Kovalev, et al. 2017. “FDR-controlled Metabolite Annotation for High-Resolution Imaging Mass Spectrometry.” Nature Methods 14 (1): 57–60. https://doi.org/10.1038/nmeth.4072.

Pang, Zhiqiang, Jasmine Chong, Shuzhao Li, and Jianguo Xia. 2020. “MetaboAnalystR 3.0: Toward an Optimized Workflow for Global Metabolomics.” Metabolites 10 (5): 186. https://doi.org/10.3390/metabo10050186.

Petras, Daniel, Vanessa V. Phelan, Deepa Acharya, Andrew E. Allen, Allegra T. Aron, Nuno Bandeira, Benjamin P. Bowen, et al. 2021. “GNPS Dashboard: Collaborative Exploration of Mass Spectrometry Data in the Web Browser.” Nature Methods, December, 1–3. https://doi.org/10.1038/s41592-021-01339-5.

Pfeuffer, Julianus, Chris Bielow, Samuel Wein, Kyowon Jeong, Eugen Netz, Axel Walter, Oliver Alka, et al. 2024. “OpenMS 3 Enables Reproducible Analysis of Large-Scale Mass Spectrometry Data.” Nature Methods 21 (3): 365–67. https://doi.org/10.1038/s41592-024-02197-7.

Pfeuffer, Julianus, Timo Sachsenberg, Oliver Alka, Mathias Walzer, Alexander Fillbrunn, Lars Nilse, Oliver Schilling, Knut Reinert, and Oliver Kohlbacher. 2017. “OpenMS – A Platform for Reproducible Analysis of Mass Spectrometry Data.” Journal of Biotechnology, Bioinformatics Solutions for Big Data Analysis in Life Sciences presented by the German Network for Bioinformatics Infrastructure, 261 (November): 142–48. https://doi.org/10.1016/j.jbiotec.2017.05.016.

Pluskal, Tomáš, Sandra Castillo, Alejandro Villar-Briones, and Matej Orešič. 2010. “MZmine 2: Modular Framework for Processing, Visualizing, and Analyzing Mass Spectrometry-Based Molecular Profile Data.” BMC Bioinformatics 11: 395. https://doi.org/10.1186/1471-2105-11-395.

Pluskal, Tomáš, Ansgar Korf, Aleksandr Smirnov, Robin Schmid, Timothy R. Fallon, Xiuxia Du, and Jing-Ke Weng. 2020. “CHAPTER 7:Metabolomics Data Analysis Using MZmine.” In Processing Metabolomics and Proteomics Data with Open Software, 232–54. https://doi.org/10.1039/9781788019880-00232.

Plyushchenko, Ivan V., Elizaveta S. Fedorova, Natalia V. Potoldykova, Konstantin A. Polyakovskiy, Alexander I. Glukhov, and Igor A. Rodin. 2022. “Omics Untargeted Key Script: R-Based Software Toolbox for Untargeted Metabolomics with Bladder Cancer Biomarkers Discovery Case Study.” Journal of Proteome Research 21 (3): 833–47. https://doi.org/10.1021/acs.jproteome.1c00392.

Riquelme, Gabriel, Nicolás Zabalegui, Pablo Marchi, Christina M. Jones, and María Eugenia Monge. 2020. “A Python-Based Pipeline for Preprocessing LC–MS Data for Untargeted Metabolomics Workflows.” Metabolites 10 (10): 416. https://doi.org/10.3390/metabo10100416.

Röst, Hannes L., Timo Sachsenberg, Stephan Aiche, Chris Bielow, Hendrik Weisser, Fabian Aicheler, Sandro Andreotti, et al. 2016. “OpenMS: A Flexible Open-Source Software Platform for Mass Spectrometry Data Analysis.” Nature Methods 13 (9): 741–48. https://doi.org/10.1038/nmeth.3959.

Röst, Hannes L., Uwe Schmitt, Ruedi Aebersold, and Lars Malmström. 2014. “pyOpenMS: A Python-based Interface to the OpenMS Mass-Spectrometry Algorithm Library.” PROTEOMICS 14 (1): 74–77. https://doi.org/10.1002/pmic.201300246.

Rurik, Marc, Oliver Alka, Fabian Aicheler, and Oliver Kohlbacher. 2020. “Metabolomics Data Processing Using OpenMS.” In Computational Methods and Data Analysis for Metabolomics, edited by Shuzhao Li, 49–60. Methods in Molecular Biology. New York, NY: Springer US. https://doi.org/10.1007/978-1-0716-0239-3_4.

Scheltema, Richard A., Andris Jankevics, Ritsert C. Jansen, Morris A. Swertz, and Rainer Breitling. 2011. “PeakML/mzMatch: A File Format, Java Library, R Library, and Tool-Chain for Mass Spectrometry Data Analysis.” Analytical Chemistry 83 (7): 2786–93. https://doi.org/10.1021/ac2000994.

Scheubert, Kerstin, Franziska Hufsky, Daniel Petras, Mingxun Wang, Louis-Félix Nothias, Kai Dührkop, Nuno Bandeira, Pieter C. Dorrestein, and Sebastian Böcker. 2017. “Significance Estimation for Large Scale Metabolomics Annotations by Spectral Matching.” Nature Communications 8 (1): 1494. https://doi.org/10.1038/s41467-017-01318-5.

Shen, Xiaotao, Hong Yan, Chuchu Wang, Peng Gao, Caroline H. Johnson, and Michael P. Snyder. 2022. “TidyMass an Object-Oriented Reproducible Analysis Framework for LC–MS Data.” Nature Communications 13 (1): 4365. https://doi.org/10.1038/s41467-022-32155-w.

Silva, Ricardo R. da, Mingxun Wang, Louis-Félix Nothias, Justin J. J. van der Hooft, Andrés Mauricio Caraballo-Rodríguez, Evan Fox, Marcy J. Balunas, Jonathan L. Klassen, Norberto Peporine Lopes, and Pieter C. Dorrestein. 2018. “Propagating Annotations of Molecular Networks Using in Silico Fragmentation.” PLOS Computational Biology 14 (4): e1006089. https://doi.org/10.1371/journal.pcbi.1006089.

Stancliffe, Ethan, Michaela Schwaiger-Haber, Miriam Sindelar, Matthew J. Murphy, Mette Soerensen, and Gary J. Patti. 2022. “An Untargeted Metabolomics Workflow That Scales to Thousands of Samples for Population-Based Studies.” Analytical Chemistry, December. https://doi.org/10.1021/acs.analchem.2c01270.

Tautenhahn, Ralf, Kevin Cho, Winnie Uritboonthai, Zhengjiang Zhu, Gary J. Patti, and Gary Siuzdak. 2012. “An Accelerated Workflow for Untargeted Metabolomics Using the METLIN Database.” Nature Biotechnology 30 (9): 826–28. https://doi.org/10.1038/nbt.2348.

Treutler, Hendrik, Hiroshi Tsugawa, Andrea Porzel, Karin Gorzolka, Alain Tissier, Steffen Neumann, and Gerd Ulrich Balcke. 2016. “Discovering Regulated Metabolite Families in Untargeted Metabolomics Studies.” Analytical Chemistry 88 (16): 8082–90. https://doi.org/10.1021/acs.analchem.6b01569.

Tsugawa, Hiroshi, Tomas Cajka, Tobias Kind, Yan Ma, Brendan Higgins, Kazutaka Ikeda, Mitsuhiro Kanazawa, Jean VanderGheynst, Oliver Fiehn, and Masanori Arita. 2015. “MS-DIAL: Data-Independent MS/MS Deconvolution for Comprehensive Metabolome Analysis.” Nature Methods 12 (6): 523–26. https://doi.org/10.1038/nmeth.3393.

Tsugawa, Hiroshi, Tobias Kind, Ryo Nakabayashi, Daichi Yukihira, Wataru Tanaka, Tomas Cajka, Kazuki Saito, Oliver Fiehn, and Masanori Arita. 2016. “Hydrogen Rearrangement Rules: Computational MS/MS Fragmentation and Structure Elucidation Using MS-FINDER Software.” Analytical Chemistry 88 (16): 7946–58. https://doi.org/10.1021/acs.analchem.6b00770.

Uchino, Haruki, Hiroshi Tsugawa, Hidenori Takahashi, and Makoto Arita. 2022. “Computational Mass Spectrometry Accelerates C = C Position-Resolved Untargeted Lipidomics Using Oxygen Attachment Dissociation.” Communications Chemistry 5 (1): 1–13. https://doi.org/10.1038/s42004-022-00778-1.

Uppal, Karan, Quinlyn A. Soltow, Frederick H. Strobel, W. Stephen Pittard, Kim M. Gernert, Tianwei Yu, and Dean P. Jones. 2013. “xMSanalyzer: Automated Pipeline for Improved Feature Detection and Downstream Analysis of Large-Scale, Non-Targeted Metabolomics Data.” BMC Bioinformatics 14 (1): 15. https://doi.org/10.1186/1471-2105-14-15.

Uppal, Karan, Douglas I. Walker, and Dean P. Jones. 2017. “xMSannotator: An R Package for Network-Based Annotation of High-Resolution Metabolomics Data.” Analytical Chemistry 89 (2): 1063–67. https://doi.org/10.1021/acs.analchem.6b01214.

Volikov, Alexander, Gleb Rukhovich, and Irina V. Perminova. 2023. “NOMspectra: An Open-Source Python Package for Processing High Resolution Mass Spectrometry Data on Natural Organic Matter.” NOMspectra: An Open-Source Python Package for Processing High Resolution Mass Spectrometry Data on Natural Organic Matter, June. https://doi.org/10.1021/jasms.3c00003.

Wang, Mingxun, Jeremy J. Carver, Vanessa V. Phelan, Laura M. Sanchez, Neha Garg, Yao Peng, Don Duy Nguyen, et al. 2016. “Sharing and Community Curation of Mass Spectrometry Data with Global Natural Products Social Molecular Networking.” Nature Biotechnology 34 (8): 828–37. https://doi.org/10.1038/nbt.3597.

Wang, Yang, Fang Liu, Peng Li, Chengwei He, Ruibing Wang, Huanxing Su, and Jian-Bo Wan. 2016. “An Improved Pseudotargeted Metabolomics Approach Using Multiple Ion Monitoring with Time-Staggered Ion Lists Based on Ultra-High Performance Liquid Chromatography/Quadrupole Time-of-Flight Mass Spectrometry.” Analytica Chimica Acta 927 (July): 82–88. https://doi.org/10.1016/j.aca.2016.05.008.

Weber, Ralf J. M., Thomas N. Lawson, Reza M. Salek, Timothy M. D. Ebbels, Robert C. Glen, Royston Goodacre, Julian L. Griffin, et al. 2017. “Computational Tools and Workflows in Metabolomics: An International Survey Highlights the Opportunity for Harmonisation Through Galaxy.” Metabolomics 13 (2). https://doi.org/10.1007/s11306-016-1147-x.

Wen, Bo, Zhanlong Mei, Chunwei Zeng, and Siqi Liu. 2017. “metaX: A Flexible and Comprehensive Software for Processing Metabolomics Data.” BMC Bioinformatics 18 (March): 183. https://doi.org/10.1186/s12859-017-1579-y.

Xue, Jingchuan, Carlos Guijas, H. Paul Benton, Benedikt Warth, and Gary Siuzdak. 2020. “METLIN MS 2 Molecular Standards Database: A Broad Chemical and Biological Resource.” Nature Methods 17 (10): 953–54. https://doi.org/10.1038/s41592-020-0942-5.

Yu, Miao, Georgia Dolios, and Lauren Petrick. 2022. “Reproducible Untargeted Metabolomics Workflow for Exhaustive MS2 Data Acquisition of MS1 Features.” Journal of Cheminformatics 14 (1): 6. https://doi.org/10.1186/s13321-022-00586-8.

Yu, Miao, Sofia Lendor, Anna Roszkowska, Mariola Olkowicz, Leslie Bragg, Mark Servos, and Janusz Pawliszyn. 2020. “Metabolic Profile of Fish Muscle Tissue Changes with Sampling Method, Storage Strategy and Time.” Analytica Chimica Acta 1136 (November): 42–50. https://doi.org/10.1016/j.aca.2020.08.050.

Yu, Miao, Mariola Olkowicz, and Janusz Pawliszyn. 2019. “Structure/Reaction Directed Analysis for LC-MS Based Untargeted Analysis.” Analytica Chimica Acta 1050 (March): 16–24. https://doi.org/10.1016/j.aca.2018.10.062.

Yu, Tianwei, Youngja Park, Jennifer M. Johnson, and Dean P. Jones. 2009. “apLCMS—Adaptive Processing of High-Resolution LC/MS Data.” Bioinformatics 25 (15): 1930–36. https://doi.org/10.1093/bioinformatics/btp291.

Yu, Yong-Jie, Qing-Xia Zheng, Yue-Ming Zhang, Qian Zhang, Yu-Ying Zhang, Ping-Ping Liu, Peng Lu, et al. 2019. “Automatic Data Analysis Workflow for Ultra-High Performance Liquid Chromatography-High Resolution Mass Spectrometry-Based Metabolomics.” Journal of Chromatography A 1585 (January): 172–81. https://doi.org/10.1016/j.chroma.2018.11.070.

Zhang, Yu-Ying, Qian Zhang, Yue-Ming Zhang, Wei-Wei Wang, Li Zhang, Yong-Jie Yu, Chang-Cai Bai, Ji-Zhao Guo, Hai-Yan Fu, and Yuanbin She. 2020. “A Comprehensive Automatic Data Analysis Strategy for Gas Chromatography-Mass Spectrometry Based Untargeted Metabolomics.” Journal of Chromatography A 1616 (April): 460787. https://doi.org/10.1016/j.chroma.2019.460787.

Zheng, Fujian, Lei You, Wangshu Qin, Runze Ouyang, Wangjie Lv, Lei Guo, Xin Lu, Enyou Li, Xinjie Zhao, and Guowang Xu. 2022. “MetEx: A Targeted Extraction Strategy for Improving the Coverage and Accuracy of Metabolite Annotation in Liquid Chromatography–High-Resolution Mass Spectrometry Data.” Analytical Chemistry 94 (24): 8561–69. https://doi.org/10.1021/acs.analchem.1c04783.

Zheng, Fujian, Xinjie Zhao, Zhongda Zeng, Lichao Wang, Wangjie Lv, Qingqing Wang, and Guowang Xu. 2020. “Development of a Plasma Pseudotargeted Metabolomics Method Based on Ultra-High-Performance Liquid Chromatography–Mass Spectrometry.” Nature Protocols 15 (8): 2519–37. https://doi.org/10.1038/s41596-020-0341-5.