Chapter 5 Workflow

This chapter focuses on practical workflow choices for metabolomics data analysis, from preprocessing platforms to project organization and data sharing(Li 2020).

DiagrammeR::mermaid("
flowchart TB
I(peak-picking) --> C
C(visualization) --> D(normalization/batch correction)
D --> A(annotation/identification)
A --> H(statistical analysis)
C --> A --> B(omics analysis)
D --> H
B --> H
H --> E(experimental validation)
A --> E
H --> A
B --> E
C --> H
")

5.1 Platform for metabolomics data analysis

Many open-source metabolomics projects are available, and a useful overview can be found here.

5.1.2 XCMS & XCMS online

XCMS online is hosted by Scripps Institute. If your datasets are not large and you want a web-based workflow, XCMS online is still one of the most accessible starting points. They use METLIN and isoMETLIN to annotate the MS/MS data, and pathway analysis is also supported. This is a reasonable option for teaching, pilot studies, or users who are not ready to script their workflow locally.

xcms is different from XCMS online although they share some conceptual background. For local metabolomics data analysis, xcms remains one of the most flexible and reproducible options, especially for users who are comfortable with R. A practical default workflow is msconvert -> IPO or AutoTuner -> xcms -> annotation tools -> MetaboAnalyst or R-based downstream analysis. If you want full scripting, parameter tracking, and scalability, this is still one of the strongest starting points. If you are not familiar with R, the learning curve is real and a GUI-centered platform may be easier.

IPO is a tool for automated optimization of xcms parameters(Libiseller et al. 2015), and Warpgroup is used for chromatogram subregion detection, consensus integration bound determination and accurate missing value integration(Mahieu, Spalding, and Patti 2016). A case study to compare different xcms parameters with IPO can be found for GC-MS(Dos Santos and Canuto 2023). Another option is AutoTuner, which is much faster than IPO(McLean and Kujawinski 2020). In practice, parameter optimization is most useful when you have representative QC files and enough time to test settings. It is not always necessary for every small project, and default settings should not be treated as universally safe.

Check those papers for the XCMS based workflow(Forsberg et al. 2018; Huan et al. 2017; Mahieu, Spalding, Gelman, et al. 2016; Montenegro-Burke et al. 2017; Domingo-Almenara and Siuzdak 2020; Stancliffe et al. 2022). For metlin related annotation, check those papers(Guijas et al. 2018; Tautenhahn et al. 2012; Xue, Guijas, et al. 2020; Domingo-Almenara, Montenegro-Burke, Ivanisevic, et al. 2018).

MAIT based on xcms and you could find source code here(Fernández-Albert et al. 2014).

iMet-Q is an automated tool with friendly user interfaces for quantifying metabolites in full-scan liquid chromatography-mass spectrometry (LC-MS) data (Chang et al. 2016)

compMS2Miner is an Automatable Metabolite Identification, Visualization, and Data-Sharing R Package for High-Resolution LC–MS Data Sets. Here is related papers (Edmands et al. 2017; Edmands et al. 2018, 2015).

mzMatch is a modular, open source and platform independent data processing pipeline for metabolomics LC/MS data written in the Java language, which could be coupled with xcms (Scheltema et al. 2011; Creek et al. 2012). It also could be used for annotation with MetAssign(Daly et al. 2014).

5.1.3 PRIMe

PRIMe is from RIKEN and UC Davis. They update their database frequently(Tsugawa et al. 2016). You could use MS-DIAL for untargeted analysis and MRMPROBS for targeted analysis. For annotation, they developed MS-FINDER and statistic tools with Excel. This platform is especially strong for MS/MS-rich workflows, lipidomics, and users who want a mature GUI. In my view, MS-DIAL is one of the best first choices for users who want serious untargeted analysis without committing to an R-based workflow from the start. The main limitation is that pathway analysis is not the center of this ecosystem, so downstream interpretation may still move to other tools.

MS-DIAL 4 added support for lipidomics with an integrated CCS and retention time atlas(Tsugawa et al. 2020). The latest version, MS-DIAL 5, further extends the platform with multimodal mass spectrometry data mining capabilities including improved DIA deconvolution(Tsugawa et al. 2024).

For PRIMe based workflow, check those papers(Lai et al. 2018; Matsuo et al. 2017; Treutler et al. 2016; Tsugawa et al. 2015; Tsugawa et al. 2016; Kind et al. 2018). There are also extensions for their workflow(Uchino et al. 2022) and workflow for environmental science(Bonnefille et al. 2023).

5.1.4 GNPS

GNPS is an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. It is not a full replacement for primary preprocessing software, but it is one of the most useful platforms for MS/MS-centered annotation, feature-based molecular networking, and community data sharing. Feature-based molecular networking within GNPS could be coupled with xcms, OpenMS, MS-DIAL, MZmine, and other popular software. If your study relies heavily on tandem MS interpretation, GNPS should be considered early rather than added only at the end.

Check those papers for GNPS and related projects(Aron et al. 2020; Nothias et al. 2020; Scheubert et al. 2017; Silva et al. 2018; Wang et al. 2016; Bittremieux et al. 2023; Schmid et al. 2021).

5.1.5 OpenMS & SIRIUS

OpenMS is another good platform for mass spectrum data analysis developed with C++. You could use it as a plugin of KNIME. OpenMS is a strong option when transparency of workflow steps, interoperability, and scalable processing are more important than a minimal learning curve. TOPPView is also one of the better tools for visualizing MS data. If you want a workflow that is explicit, modular, and suitable for engineering-style data pipelines, OpenMS is a good choice.

Check those paper for OpenMS based workflow(Bertsch et al. 2011; Pfeuffer et al. 2017, 2024; Röst et al. 2014, 2016; Rurik et al. 2020; Alka et al. 2020).

OpenMS could be coupled to SIRIUS for annotation. SIRIUS is a software framework for de novo identification of metabolites using single and tandem mass spectrometry. It integrates tools such as CSI:FingerID, ZODIAC and CANOPUS. If your project emphasizes formula assignment, structural class prediction, and in silico annotation, SIRIUS is one of the most important downstream tools to learn.

5.1.6 MZmine

MZmine was originally developed on the Java platform. In 2023, MZmine 3 was released with a completely rewritten architecture, adding support for ion mobility spectrometry data, improved feature detection, and native integration with GNPS molecular networking and SIRIUS(Schmid et al. 2023). MZmine 3 is now one of the most actively maintained open-source platforms for untargeted metabolomics. If you want a GUI workflow with strong modern integration to annotation tools, MZmine is one of the best current choices. Like MS-DIAL, it usually needs to be paired with other tools for pathway analysis.

Check those papers for MZmine based workflow(Pluskal et al. 2010; Pluskal et al. 2020; Schmid et al. 2023).

5.1.7 Emory MaHPIC

This platform is composed by several R packages from Emory University including apLCMS to collect the data, xMSanalyzer to handle automated pipeline for large-scale, non-targeted metabolomics data, xMSannotator for annotation of LC-MS data and Mummichog for pathway and network analysis for high-throughput metabolomics. Note that the original Mummichog is no longer actively maintained; its algorithm is now integrated into MetaboAnalyst(Pang et al. 2024). This platform would be preferred by someone from environmental science to study exposome.

You could check those papers for Emory workflow(Uppal et al. 2013, 2017; Yu et al. 2009; S. Li et al. 2013; Liu et al. 2020).

5.1.8 Others

5.1.9 Workflow Comparison

Here are some comparisons for different workflow and you could make selection based on their works(Myers et al. 2017; Weber et al. 2017; Li et al. 2018; Liao et al. 2023).

xcmsrocker is a docker image for metabolomics to compare R based software with template(Yu et al. 2022).

5.1.10 A simple opinionated choice guide

If a short list is needed instead of a long catalog:

  • For reproducible R-based untargeted metabolomics: xcms and related R tools

  • For GUI-centered untargeted workflows with strong MS/MS support: MS-DIAL or MZmine

  • For MS/MS networking and community annotation: GNPS

  • For in silico structural annotation: SIRIUS

  • For downstream statistics and pathway analysis: MetaboAnalyst or scripted analysis in R

This is enough for many real projects. The best workflow is usually not the one with the most software, but the one where every step is documented, reproducible, and appropriate for the study objective.

5.2 Project Setup

I suggest building your data analysis projects in RStudio (Click File - New project - New Directory - Empty project). Then assign a name for your project. I also recommend the following tips if you are familiar with it.

  • Use git/github to make version control of your code and sync your project online.

  • Don’t use your name for your project because other peoples might cooperate with you and someone might check your data when you publish your papers. Each project should be a work for one paper or one chapter in your thesis.

  • Use workflow document(txt or doc) in your project to record all of the steps and code you performed for this project. Treat this document as digital version of your experiment notebook

  • Use data folder in your project folder for the raw data and the results you get in data analysis

  • Use figure folder in your project folder for the figure

  • Use manuscript folder in your project folder for the manuscript (you could write paper in rstudio with the help of template in Rmarkdown)

  • Just double click \[yourprojectname\].Rproj to start your project

5.3 Data Standards and Metadata

Reproducible metabolomics research requires not only sharing raw data but also providing well-structured metadata that describes how the data was generated and processed. The FAIR principles (Findable, Accessible, Interoperable, Reusable) provide a general framework for scientific data management(Wilkinson et al. 2016), and their adoption in metabolomics is critical for cross-study comparisons and meta-analyses.

5.3.1 Metadata Standards

The Investigation-Study-Assay (ISA) framework is the most widely adopted metadata standard in metabolomics(Sansone et al. 2012). ISA-Tab provides a structured format to describe the experimental design (Investigation), the samples and their biological context (Study), and the analytical measurements (Assay). For metabolomics specifically, mwTab is the format used by the Metabolomics Workbench(Sud et al. 2016).

When submitting data to public repositories, the minimum reporting standards proposed by the Metabolomics Standards Initiative (MSI) should be followed(Salek et al. 2013). These standards cover the biological context, chemical analysis, data processing and statistical analysis metadata. In practice, compliance with these minimum reporting standards remains a challenge in the community(R. A. Spicer et al. 2017).

5.3.2 FAIR in Metabolomics

FAIR principles(Wilkinson et al. 2016) have been increasingly adopted in metabolomics workflows. Several tools and resources have been developed to make metabolomics data more FAIR(Rocca-Serra et al. 2016):

  • Use persistent identifiers (e.g., InChI, SMILES) for compounds and DOIs for datasets

  • Deposit raw data in open formats (mzML) to public repositories (MetaboLights(Haug et al. 2020), Metabolomics Workbench(Sud et al. 2016))

  • Document the complete analytical and computational workflow with version-controlled parameters

  • Use controlled vocabularies and ontologies (e.g., Chemical Entities of Biological Interest, ChEBI) for annotation

The metaRbolomics initiative provides an overview of R-based tools that support FAIR-compliant metabolomics workflows(Stanstrup et al. 2019). For quality assurance and quality control standards in practice, check the mQACC consortium guidelines(O’Brien et al. 2024).

5.3.3 Practical Recommendations

For new metabolomics practitioners, the following checklist could help improve data quality and reproducibility(Rampler et al. 2021; Broadhurst, Goodacre, Stacey N. Reinke, et al. 2018a):

  • Record all instrument parameters, column information and mobile phase composition in a machine-readable format

  • Include pooled QC and blank samples in every analytical batch and document their preparation

  • Convert vendor-specific raw files to open formats (mzML via ProteoWizard) immediately after acquisition

  • Use standardized file naming conventions that encode sample metadata (group, batch, injection order)

  • Track all data processing parameters (software version, peak picking thresholds, alignment settings) in a reproducible script or workflow file

5.4 Data sharing

See this paper(Haug et al. 2017):

  • MetaboLights is a major general-purpose international repository and a good default choice for many studies.

  • The Metabolomics Workbench is another major repository with strong adoption, especially in the United States, and uses the mwTab ecosystem.

  • MetaboBank is a useful repository in Japan and Asia-Pacific contexts.

  • MetabolomeXchange is best treated as a discovery portal rather than the primary home for your submission.

  • MetabolomeExpress is a public place to process, interpret and share GC/MS metabolomics datasets(Carroll et al. 2010).

In practice, the decision can be simple:

  • choose MetaboLights if you want a broadly recognized default repository with strong international visibility

  • choose Metabolomics Workbench if your community, funder, journal, or collaborators already work within that ecosystem

  • use MetaboBank when it best matches your regional infrastructure or collaboration network

Whichever repository you choose, the important point is to deposit raw data in open formats when possible, include metadata that satisfy MSI-style minimum reporting, and provide enough information for another group to reproduce the computational workflow.

5.5 Contest

References

Alka, Oliver, Timo Sachsenberg, Leon Bichmann, et al. 2020. CHAPTER 6:OpenMS and KNIME for Mass Spectrometry Data Processing.” In Processing Metabolomics and Proteomics Data with Open Software. https://doi.org/10.1039/9781788019880-00201.
Aron, Allegra T., Emily C. Gentry, Kerry L. McPhail, et al. 2020. “Reproducible Molecular Networking of Untargeted Mass Spectrometry Data Using GNPS.” Nature Protocols 15 (6): 1954–91. https://doi.org/10.1038/s41596-020-0317-5.
Bai, Caihong, Suyun Xu, Jingyi Tang, Yuxi Zhang, Jiahui Yang, and Kaifeng Hu. 2022. “A ‘Shape-Orientated’ Algorithm Employing an Adapted Marr Wavelet and Shape Matching Index Improves the Performance of Continuous Wavelet Transform for Chromatographic Peak Detection and Quantification.” Journal of Chromatography A 1673 (June): 463086. https://doi.org/10.1016/j.chroma.2022.463086.
Baygi, Sadjad Fakouri, Sanjay K. Banerjee, Praloy Chakraborty, Yashwant Kumar, and Dinesh Kumar Barupal. 2022. IDSL.UFA Assigns High-Confidence Molecular Formula Annotations for Untargeted LC/HRMS Data Sets in Metabolomics and Exposomics.” Analytical Chemistry 94 (39): 13315–22. https://doi.org/10.1021/acs.analchem.2c00563.
Bertsch, Andreas, Clemens Gröpl, Knut Reinert, and Oliver Kohlbacher. 2011. OpenMS and TOPP: Open Source Software for LC-MS Data Analysis.” In Data Mining in Proteomics: From Standards to Applications, edited by Michael Hamacher, Martin Eisenacher, and Christian Stephan. Methods in Molecular Biology. Humana Press. https://doi.org/10.1007/978-1-60761-987-1_23.
Bittremieux, Wout, Nicole E. Avalon, Sydney P. Thomas, et al. 2023. “Open Access Repository-Scale Propagated Nearest Neighbor Suspect Spectral Library for Untargeted Metabolomics.” Nature Communications 14 (1): 8488. https://doi.org/10.1038/s41467-023-44035-y.
Blaženović, Ivana, Tobias Kind, Hrvoje Torbašinović, et al. 2017. “Comprehensive Comparison of in Silico MS/MS Fragmentation Tools of the CASMI Contest: Database Boosting Is Needed to Achieve 93% Accuracy.” Journal of Cheminformatics 9 (1): 32. https://doi.org/10.1186/s13321-017-0219-x.
Bonnefille, Bénilde, Oskar Karlsson, May Britt Rian, et al. 2023. “Nontarget Analysis of Polluted Surface Waters in Bangladesh Using Open Science Workflows.” Environmental Science & Technology, ahead of print, April. https://doi.org/10.1021/acs.est.2c08200.
Broadhurst, David, Royston Goodacre, Stacey N Reinke, et al. 2018a. “Guidelines and Considerations for the Use of System Suitability and Quality Control Samples in Mass Spectrometry Assays Applied in Untargeted Clinical Metabolomic Studies.” Metabolomics 14: 72. https://doi.org/10.1007/s11306-018-1367-3.
Carroll, Adam J., Murray R. Badger, and A. Harvey Millar. 2010. “The MetabolomeExpress Project: Enabling Web-Based Processing, Analysis and Transparent Dissemination of GC/MS Metabolomics Datasets.” BMC Bioinformatics 11 (1): 376. https://doi.org/10.1186/1471-2105-11-376.
Chang, Hui-Yin, Ching-Tai Chen, T. Mamie Lih, et al. 2016. iMet-Q: A User-Friendly Tool for Label-Free Metabolomics Quantitation Using Dynamic Peak-Width Determination.” PLOS ONE 11 (1): e0146112. https://doi.org/10.1371/journal.pone.0146112.
Clasquin, Michelle F., Eugene Melamud, and Joshua D. Rabinowitz. 2012. LC-MS Data Processing with MAVEN: A Metabolomic Analysis and Visualization Engine.” Current Protocols in Bioinformatics 37 (1): 14.11.1–23. https://doi.org/10.1002/0471250953.bi1411s37.
Colby, Sean M., Christine H. Chang, Jessica L. Bade, et al. 2022. DEIMoS: An Open-Source Tool for Processing High-Dimensional Mass Spectrometry Data.” Analytical Chemistry 94 (16): 6130–38. https://doi.org/10.1021/acs.analchem.1c05017.
Creek, Darren J., Andris Jankevics, Karl E. V. Burgess, Rainer Breitling, and Michael P. Barrett. 2012. IDEOM: An Excel Interface for Analysis of LCMS-based Metabolomics Data.” Bioinformatics 28 (7): 1048–49. https://doi.org/10.1093/bioinformatics/bts069.
Daly, Rónán, Simon Rogers, Joe Wandy, Andris Jankevics, Karl E. V. Burgess, and Rainer Breitling. 2014. MetAssign: Probabilistic Annotation of Metabolites from LCMS Data Using a Bayesian Clustering Approach.” Bioinformatics 30 (19): 2764–71. https://doi.org/10.1093/bioinformatics/btu370.
Delabriere, Alexis, Philipp Warmer, Vincenth Brennsteiner, and Nicola Zamboni. 2021. SLAW: A Scalable and Self-Optimizing Processing Workflow for Untargeted LC-MS.” Analytical Chemistry 93 (45): 15024–32. https://doi.org/10.1021/acs.analchem.1c02687.
Domingo-Almenara, Xavier, J. Rafael Montenegro-Burke, Julijana Ivanisevic, et al. 2018. XCMS-MRM and METLIN-MRM: A Cloud Library and Public Resource for Targeted Analysis of Small Molecules.” Nature Methods 15 (9): 681–84. https://doi.org/10.1038/s41592-018-0110-3.
Domingo-Almenara, Xavier, and Gary Siuzdak. 2020. “Metabolomics Data Processing Using XCMS.” In Computational Methods and Data Analysis for Metabolomics, edited by Shuzhao Li. Methods in Molecular Biology. Springer US. https://doi.org/10.1007/978-1-0716-0239-3_2.
Dos Santos, Emile Kelly Porto, and Gisele André Baptista Canuto. 2023. “Optimizing XCMS Parameters for GC-MS Metabolomics Data Processing: A Case Study.” Metabolomics: Official Journal of the Metabolomic Society 19 (4): 26. https://doi.org/10.1007/s11306-023-01992-1.
Edmands, William M. B., Dinesh K. Barupal, and Augustin Scalbert. 2015. MetMSLine: An Automated and Fully Integrated Pipeline for Rapid Processing of High-Resolution LCMS Metabolomic Datasets.” Bioinformatics 31 (5): 788–90. https://doi.org/10.1093/bioinformatics/btu705.
Edmands, William M. B., Josie Hayes, and Stephen M. Rappaport. 2018. SimExTargId: A Comprehensive Package for Real-Time LC-MS Data Acquisition and Analysis.” Bioinformatics 34 (20): 3589–90. https://doi.org/10.1093/bioinformatics/bty218.
Edmands, William M. B., Lauren Petrick, Dinesh K. Barupal, et al. 2017. compMS2Miner: An Automatable Metabolite Identification, Visualization, and Data-Sharing R Package for High-Resolution LCMS Data Sets.” Analytical Chemistry 89 (7): 3919–28. https://doi.org/10.1021/acs.analchem.6b02394.
Eilertz, Daniel, Michael Mitterer, and Joerg M. Buescher. 2022. automRm: An R Package for Fully Automatic LC-QQQ-MS Data Preprocessing Powered by Machine Learning.” Analytical Chemistry (Washington) 94 (16): 6163–71. https://doi.org/10.1021/acs.analchem.1c05224.
Fernández-Albert, Francesc, Rafael Llorach, Cristina Andrés-Lacueva, and Alexandre Perera. 2014. “An R Package to Analyse LC/MS Metabolomic Data: MAIT (Metabolite Automatic Identification Toolkit).” Bioinformatics 30 (13): 1937–39. https://doi.org/10.1093/bioinformatics/btu136.
Forsberg, Erica M., Tao Huan, Duane Rinehart, et al. 2018. “Data Processing, Multi-Omic Pathway Mapping, and Metabolite Activity Analysis Using XCMS Online.” Nature Protocols 13 (4): 633–51. https://doi.org/10.1038/nprot.2017.151.
Giacomoni, Franck, Gildas Le Corguillé, Misharl Monsoor, et al. 2015. Workflow4Metabolomics: A Collaborative Research Infrastructure for Computational Metabolomics.” Bioinformatics 31 (9): 1493–95. https://doi.org/10.1093/bioinformatics/btu813.
Goracci, Laura, Paolo Tiberi, Stefano Di Bona, et al. 2024. MARS: A Multipurpose Software for Untargeted LCMS-Based Metabolomics and Exposomics.” Analytical Chemistry, ahead of print, January. https://doi.org/10.1021/acs.analchem.3c03620.
Guijas, Carlos, J. Rafael Montenegro-Burke, Xavier Domingo-Almenara, et al. 2018. METLIN: A Technology Platform for Identifying Knowns and Unknowns.” Analytical Chemistry 90 (5): 3156–64. https://doi.org/10.1021/acs.analchem.7b04424.
Habra, Hani, Maureen Kachman, Kevin Bullock, Clary Clish, Charles R. Evans, and Alla Karnovsky. 2021. metabCombiner: Paired Untargeted LC-HRMS Metabolomics Feature Matching and Concatenation of Disparately Acquired Data Sets.” Analytical Chemistry 93 (12): 5028–36. https://doi.org/10.1021/acs.analchem.0c03693.
Haug, Kenneth, Keeva Cochrane, Venkata Chandrasekhar Nainala, et al. 2020. MetaboLights: A Resource Evolving in Response to the Needs of Its Scientific Community.” Nucleic Acids Research 48 (D1): D440–44. https://doi.org/10.1093/nar/gkz1019.
Haug, Kenneth, Reza M Salek, and Christoph Steinbeck. 2017. “Global Open Data Management in Metabolomics.” Current Opinion in Chemical Biology, Omics, vol. 36 (February): 58–63. https://doi.org/10.1016/j.cbpa.2016.12.024.
Helmus, Rick, Thomas L. ter Laak, Annemarie P. van Wezel, Pim de Voogt, and Emma L. Schymanski. 2021. patRoon: Open Source Software Platform for Environmental Mass Spectrometry Based Non-Target Screening.” Journal of Cheminformatics 13 (1): 1. https://doi.org/10.1186/s13321-020-00477-w.
Hiller, Karsten, Jasper Hangebrauk, Christian Jäger, Jana Spura, Kerstin Schreiber, and Dietmar Schomburg. 2009. MetaboliteDetector: Comprehensive Analysis Tool for Targeted and Nontargeted GC/MS Based Metabolome Analysis.” Analytical Chemistry 81 (9): 3429–39. https://doi.org/10.1021/ac802689c.
Huan, Tao, Erica M. Forsberg, Duane Rinehart, et al. 2017. “Systems Biology Guided by XCMS Online Metabolomics.” Nature Methods 14 (5): 461–62. https://doi.org/10.1038/nmeth.4260.
Jalili, Vahid, Enis Afgan, Qiang Gu, et al. 2020. “The Galaxy Platform for Accessible, Reproducible and Collaborative Biomedical Analyses: 2020 Update.” Nucleic Acids Research 48 (W1): W395–402. https://doi.org/10.1093/nar/gkaa434.
Kew, William, John W. T. Blackburn, David J. Clarke, and Dušan Uhrín. 2017. “Interactive van Krevelen Diagrams – Advanced Visualisation of Mass Spectrometry Data of Complex Mixtures.” Rapid Communications in Mass Spectrometry 31 (7): 658–62. https://doi.org/10.1002/rcm.7823.
Kind, Tobias, Hiroshi Tsugawa, Tomas Cajka, et al. 2018. “Identification of Small Molecules Using Accurate Mass MS/MS Search.” Mass Spectrometry Reviews 37 (4): 513–32. https://doi.org/10.1002/mas.21535.
Lai, Zijuan, Hiroshi Tsugawa, Gert Wohlgemuth, et al. 2018. “Identifying Metabolites by Integrating Metabolome Databases with Mass Spectrometry Cheminformatics.” Nature Methods 15 (1): 53–56. https://doi.org/10.1038/nmeth.4512.
Li, Shuzhao. 2020. Computational Methods and Data Analysis for Metabolomics. Springer.
Li, Shuzhao, Youngja Park, Sai Duraisingham, et al. 2013. “Predicting Network Activity from High Throughput Metabolomics.” PLOS Computational Biology 9 (7): e1003123. https://doi.org/10.1371/journal.pcbi.1003123.
Li, Shuzhao, Amnah Siddiqa, Maheshwor Thapa, Yuanye Chi, and Shujian Zheng. 2023. “Trackable and Scalable LC-MS Metabolomics Data Processing Using Asari.” Nature Communications 14 (1): 4113. https://doi.org/10.1038/s41467-023-39889-1.
Li, Zhucui, Yan Lu, Yufeng Guo, Haijie Cao, Qinhong Wang, and Wenqing Shui. 2018. “Comprehensive Evaluation of Untargeted Metabolomics Data Processing Software in Feature Detection, Quantification and Discriminating Marker Selection.” Analytica Chimica Acta 1029 (October): 50–57. https://doi.org/10.1016/j.aca.2018.05.001.
Liao, Jingyu, Yuhao Zhang, Wendan Zhang, et al. 2023. “Different Software Processing Affects the Peak Picking and Metabolic Pathway Recognition of Metabolomics Data.” Journal of Chromatography A 1687 (January): 463700. https://doi.org/10.1016/j.chroma.2022.463700.
Libiseller, Gunnar, Michaela Dvorzak, Ulrike Kleb, et al. 2015. IPO: A Tool for Automated Optimization of XCMS Parameters.” BMC Bioinformatics 16 (April): 118. https://doi.org/10.1186/s12859-015-0562-8.
Liu, Qin, Douglas Walker, Karan Uppal, et al. 2020. “Addressing the Batch Effect Issue for LC/MS Metabolomics Data in Data Preprocessing.” Scientific Reports 10 (1): 13856. https://doi.org/10.1038/s41598-020-70850-0.
Liu, Youzhong, Yingjie Zhang, Tom Vennekens, et al. 2023. MeRgeION: A Multifunctional R Pipeline for Small Molecule LC-MS/MS Data Processing, Searching, and Organizing.” Analytical Chemistry 95 (22): 8433–42. https://doi.org/10.1021/acs.analchem.2c04343.
Mahieu, Nathaniel G., Jonathan L. Spalding, Susan J. Gelman, and Gary J. Patti. 2016. “Defining and Detecting Complex Peak Relationships in Mass Spectral Data: The Mz.unity Algorithm.” Analytical Chemistry 88 (18): 9037–46. https://doi.org/10.1021/acs.analchem.6b01702.
Mahieu, Nathaniel G., Jonathan L. Spalding, and Gary J. Patti. 2016. “Warpgroup: Increased Precision of Metabolomic Data Processing by Consensus Integration Bound Analysis.” Bioinformatics 32 (2): 268–75. https://doi.org/10.1093/bioinformatics/btv564.
Matsuo, Teruko, Hiroshi Tsugawa, Hiromi Miyagawa, and Eiichiro Fukusaki. 2017. “Integrated Strategy for Unknown EIMS Identification Using Quality Control Calibration Curve, Multivariate Analysis, EIMS Spectral Database, and Retention Index Prediction.” Analytical Chemistry 89 (12): 6766–73. https://doi.org/10.1021/acs.analchem.7b01010.
McLean, Craig, and Elizabeth B. Kujawinski. 2020. AutoTuner: High Fidelity and Robust Parameter Selection for Metabolomics Data Processing.” Analytical Chemistry 92 (8): 5724–32. https://doi.org/10.1021/acs.analchem.9b04804.
Melamud, Eugene, Livia Vastag, and Joshua D. Rabinowitz. 2010. “Metabolomic Analysis and Visualization Engine for LC-MS Data.” Analytical Chemistry 82 (23): 9818–26. https://doi.org/10.1021/ac1021166.
Montenegro-Burke, J. Rafael, Aries E. Aisporna, H. Paul Benton, et al. 2017. “Data Streaming for Metabolomics: Accelerating Data Processing and Analysis from Days to Minutes.” Analytical Chemistry 89 (2): 1254–59. https://doi.org/10.1021/acs.analchem.6b03890.
Myers, Owen D., Susan J. Sumner, Shuzhao Li, Stephen Barnes, and Xiuxia Du. 2017. “Detailed Investigation and Comparison of the XCMS and MZmine 2 Chromatogram Construction and Chromatographic Peak Detection Methods for Preprocessing Mass Spectrometry Metabolomics Data.” Analytical Chemistry 89 (17): 8689–95. https://doi.org/10.1021/acs.analchem.7b01069.
Nothias, Louis-Félix, Daniel Petras, Robin Schmid, et al. 2020. “Feature-Based Molecular Networking in the GNPS Analysis Environment.” Nature Methods 17 (9): 905–8. https://doi.org/10.1038/s41592-020-0933-6.
O’Brien, Elizabeth J, Robert Ward, Padma Bhatt, et al. 2024. “Standards and Best Practices in Metabolomics: The Metabolomics Quality Assurance and Quality Control Consortium (mQACC).” Metabolomics 20: 48. https://doi.org/10.1007/s11306-024-02109-0.
Palmer, Andrew, Prasad Phapale, Ilya Chernyavsky, et al. 2017. FDR-controlled Metabolite Annotation for High-Resolution Imaging Mass Spectrometry.” Nature Methods 14 (1): 57–60. https://doi.org/10.1038/nmeth.4072.
Pang, Zhiqiang, Lei Xu, Charles Viau, et al. 2024. MetaboAnalystR 4.0: A Unified LC-MS Workflow for Global Metabolomics.” Nature Communications 15 (1): 3675. https://doi.org/10.1038/s41467-024-48009-6.
Pfeuffer, Julianus, Chris Bielow, Samuel Wein, et al. 2024. OpenMS 3 Enables Reproducible Analysis of Large-Scale Mass Spectrometry Data.” Nature Methods 21 (3): 365–67. https://doi.org/10.1038/s41592-024-02197-7.
Pfeuffer, Julianus, Timo Sachsenberg, Oliver Alka, et al. 2017. OpenMSA Platform for Reproducible Analysis of Mass Spectrometry Data.” Journal of Biotechnology, Bioinformatics Solutions for Big Data Analysis in Life Sciences presented by the German Network for Bioinformatics Infrastructure, vol. 261 (November): 142–48. https://doi.org/10.1016/j.jbiotec.2017.05.016.
Pluskal, Tomáš, Sandra Castillo, Alejandro Villar-Briones, and Matej Orešič. 2010. MZmine 2: Modular Framework for Processing, Visualizing, and Analyzing Mass Spectrometry-Based Molecular Profile Data.” BMC Bioinformatics 11: 395. https://doi.org/10.1186/1471-2105-11-395.
Pluskal, Tomáš, Ansgar Korf, Aleksandr Smirnov, et al. 2020. CHAPTER 7:Metabolomics Data Analysis Using MZmine.” In Processing Metabolomics and Proteomics Data with Open Software. https://doi.org/10.1039/9781788019880-00232.
Plyushchenko, Ivan V., Elizaveta S. Fedorova, Natalia V. Potoldykova, Konstantin A. Polyakovskiy, Alexander I. Glukhov, and Igor A. Rodin. 2022. “Omics Untargeted Key Script: R-Based Software Toolbox for Untargeted Metabolomics with Bladder Cancer Biomarkers Discovery Case Study.” Journal of Proteome Research 21 (3): 833–47. https://doi.org/10.1021/acs.jproteome.1c00392.
Rampler, Evelyn, Yasin El Abiead, Harald Schoeny, et al. 2021. “Recurrent Topics in Mass Spectrometry-Based Metabolomics and Lipidomics—Standardization, Coverage, and Throughput.” Analytical Chemistry 93 (1): 519–45. https://doi.org/10.1021/acs.analchem.0c04698.
Riquelme, Gabriel, Nicolás Zabalegui, Pablo Marchi, Christina M. Jones, and María Eugenia Monge. 2020. “A Python-Based Pipeline for Preprocessing LCMS Data for Untargeted Metabolomics Workflows.” Metabolites 10 (10): 416. https://doi.org/10.3390/metabo10100416.
Rocca-Serra, Philippe, Reza M Salek, Masanori Arita, et al. 2016. “Data Standards Can Boost Metabolomics Research, and If There Is a Will, There Is a Way.” Metabolomics 12: 14. https://doi.org/10.1007/s11306-015-0879-3.
Röst, Hannes L., Timo Sachsenberg, Stephan Aiche, et al. 2016. OpenMS: A Flexible Open-Source Software Platform for Mass Spectrometry Data Analysis.” Nature Methods 13 (9): 741–48. https://doi.org/10.1038/nmeth.3959.
Röst, Hannes L., Uwe Schmitt, Ruedi Aebersold, and Lars Malmström. 2014. pyOpenMS: A Python-based Interface to the OpenMS Mass-Spectrometry Algorithm Library.” PROTEOMICS 14 (1): 74–77. https://doi.org/10.1002/pmic.201300246.
Rurik, Marc, Oliver Alka, Fabian Aicheler, and Oliver Kohlbacher. 2020. “Metabolomics Data Processing Using OpenMS.” In Computational Methods and Data Analysis for Metabolomics, edited by Shuzhao Li. Methods in Molecular Biology. Springer US. https://doi.org/10.1007/978-1-0716-0239-3_4.
Salek, Reza M, Christoph Steinbeck, Mark R Viant, Royston Goodacre, and Warwick B Dunn. 2013. “The Role of Reporting Standards for Metabolite Annotation and Identification in Metabolomic Studies.” GigaScience 2 (1): 13. https://doi.org/10.1186/2047-217X-2-13.
Sansone, Susanna-Assunta, Philippe Rocca-Serra, Dawn Field, et al. 2012. “Toward Interoperable Bioscience Data.” Nature Genetics 44 (2): 121–26. https://doi.org/10.1038/ng.1054.
Scheltema, Richard A., Andris Jankevics, Ritsert C. Jansen, Morris A. Swertz, and Rainer Breitling. 2011. PeakML/mzMatch: A File Format, Java Library, R Library, and Tool-Chain for Mass Spectrometry Data Analysis.” Analytical Chemistry 83 (7): 2786–93. https://doi.org/10.1021/ac2000994.
Scheubert, Kerstin, Franziska Hufsky, Daniel Petras, et al. 2017. “Significance Estimation for Large Scale Metabolomics Annotations by Spectral Matching.” Nature Communications 8 (1): 1494. https://doi.org/10.1038/s41467-017-01318-5.
Schmid, Robin, Steffen Heuckeroth, Ansgar Korf, et al. 2023. Integrative analysis of multimodal mass spectrometry data in MZmine 3.” Nature Biotechnology 41: 447–49. https://doi.org/10.1038/s41587-023-01690-2.
Schmid, Robin, Daniel Petras, Louis-Félix Nothias, et al. 2021. “Ion Identity Molecular Networking for Mass Spectrometry-Based Metabolomics in the GNPS Environment.” Nature Communications 12: 3832. https://doi.org/10.1038/s41467-021-23953-9.
Shen, Xiaotao, Hong Yan, Chuchu Wang, Peng Gao, Caroline H. Johnson, and Michael P. Snyder. 2022. TidyMass an Object-Oriented Reproducible Analysis Framework for LCMS Data.” Nature Communications 13 (1): 4365. https://doi.org/10.1038/s41467-022-32155-w.
Silva, Ricardo R. da, Mingxun Wang, Louis-Félix Nothias, et al. 2018. “Propagating Annotations of Molecular Networks Using in Silico Fragmentation.” PLOS Computational Biology 14 (4): e1006089. https://doi.org/10.1371/journal.pcbi.1006089.
Spicer, Rachel A, Reza Salek, and Christoph Steinbeck. 2017. “Compliance with Minimum Information Guidelines in Public Metabolomics Repositories.” Scientific Data 4: 170137. https://doi.org/10.1038/sdata.2017.137.
Stancliffe, Ethan, Michaela Schwaiger-Haber, Miriam Sindelar, Matthew J. Murphy, Mette Soerensen, and Gary J. Patti. 2022. “An Untargeted Metabolomics Workflow That Scales to Thousands of Samples for Population-Based Studies.” Analytical Chemistry, ahead of print, December. https://doi.org/10.1021/acs.analchem.2c01270.
Stanstrup, Jan, Corey D Broeckling, Rick Helmus, et al. 2019. The metaRbolomics Toolbox in Bioconductor and beyond.” Metabolites 9 (10): 200. https://doi.org/10.3390/metabo9100200.
Sud, Manish, Eoin Fahy, Dawn Cotter, et al. 2016. “Metabolomics Workbench: An International Repository for Metabolomics Data and Metadata, Metabolite Standards, Protocols, Tutorials and Training, and Analysis Tools.” Nucleic Acids Research 44 (D1): D463–70. https://doi.org/10.1093/nar/gkv1042.
Tautenhahn, Ralf, Kevin Cho, Winnie Uritboonthai, Zhengjiang Zhu, Gary J. Patti, and Gary Siuzdak. 2012. “An Accelerated Workflow for Untargeted Metabolomics Using the METLIN Database.” Nature Biotechnology 30 (9): 826–28. https://doi.org/10.1038/nbt.2348.
Treutler, Hendrik, Hiroshi Tsugawa, Andrea Porzel, et al. 2016. “Discovering Regulated Metabolite Families in Untargeted Metabolomics Studies.” Analytical Chemistry 88 (16): 8082–90. https://doi.org/10.1021/acs.analchem.6b01569.
Tsugawa, Hiroshi, Tomas Cajka, Tobias Kind, et al. 2015. MS-DIAL: Data-Independent MS/MS Deconvolution for Comprehensive Metabolome Analysis.” Nature Methods 12 (6): 523–26. https://doi.org/10.1038/nmeth.3393.
Tsugawa, Hiroshi, Kazutaka Ikeda, Mikiko Takahashi, et al. 2020. MS-DIAL 4: accelerating lipidomics using an MS/MS, CCS, and retention time atlas.” Nature Biotechnology 38: 1159–63. https://doi.org/10.1038/s41587-020-0531-2.
Tsugawa, Hiroshi, Tobias Kind, Ryo Nakabayashi, et al. 2016. “Hydrogen Rearrangement Rules: Computational MS/MS Fragmentation and Structure Elucidation Using MS-FINDER Software.” Analytical Chemistry 88 (16): 7946–58. https://doi.org/10.1021/acs.analchem.6b00770.
Tsugawa, Hiroshi, Haruki Uchino, Tomas Cajka, Takuma Ono, and Oliver Fiehn. 2024. MS-DIAL 5 multimodal mass spectrometry data mining.” Nature Biotechnology, ahead of print. https://doi.org/10.1038/s41587-024-02292-y.
Uchino, Haruki, Hiroshi Tsugawa, Hidenori Takahashi, and Makoto Arita. 2022. “Computational Mass Spectrometry Accelerates C = C Position-Resolved Untargeted Lipidomics Using Oxygen Attachment Dissociation.” Communications Chemistry 5 (1): 1–13. https://doi.org/10.1038/s42004-022-00778-1.
Uppal, Karan, Quinlyn A. Soltow, Frederick H. Strobel, et al. 2013. xMSanalyzer: Automated Pipeline for Improved Feature Detection and Downstream Analysis of Large-Scale, Non-Targeted Metabolomics Data.” BMC Bioinformatics 14 (1): 15. https://doi.org/10.1186/1471-2105-14-15.
Uppal, Karan, Douglas I. Walker, and Dean P. Jones. 2017. xMSannotator: An R Package for Network-Based Annotation of High-Resolution Metabolomics Data.” Analytical Chemistry 89 (2): 1063–67. https://doi.org/10.1021/acs.analchem.6b01214.
Volikov, Alexander, Gleb Rukhovich, and Irina V. Perminova. 2023. NOMspectra: An Open-Source Python Package for Processing High Resolution Mass Spectrometry Data on Natural Organic Matter.” NOMspectra: An Open-Source Python Package for Processing High Resolution Mass Spectrometry Data on Natural Organic Matter, ahead of print, June. https://doi.org/10.1021/jasms.3c00003.
Wang, Mingxun, Jeremy J. Carver, Vanessa V. Phelan, et al. 2016. “Sharing and Community Curation of Mass Spectrometry Data with Global Natural Products Social Molecular Networking.” Nature Biotechnology 34 (8): 828–37. https://doi.org/10.1038/nbt.3597.
Weber, Ralf J. M., Thomas N. Lawson, Reza M. Salek, et al. 2017. “Computational Tools and Workflows in Metabolomics: An International Survey Highlights the Opportunity for Harmonisation Through Galaxy.” Metabolomics 13 (2). https://doi.org/10.1007/s11306-016-1147-x.
Wen, Bo, Zhanlong Mei, Chunwei Zeng, and Siqi Liu. 2017. metaX: A Flexible and Comprehensive Software for Processing Metabolomics Data.” BMC Bioinformatics 18 (March): 183. https://doi.org/10.1186/s12859-017-1579-y.
Wilkinson, Mark D, Michel Dumontier, IJsbrand Jan Aalbersberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3: 160018. https://doi.org/10.1038/sdata.2016.18.
Xue, Jingchuan, Carlos Guijas, H. Paul Benton, Benedikt Warth, and Gary Siuzdak. 2020. METLIN MS 2 Molecular Standards Database: A Broad Chemical and Biological Resource.” Nature Methods 17 (10): 953–54. https://doi.org/10.1038/s41592-020-0942-5.
Yu, Miao, Georgia Dolios, and Lauren Petrick. 2022. “Reproducible Untargeted Metabolomics Workflow for Exhaustive MS2 Data Acquisition of MS1 Features.” Journal of Cheminformatics 14 (1): 6. https://doi.org/10.1186/s13321-022-00586-8.
Yu, Tianwei, Youngja Park, Jennifer M. Johnson, and Dean P. Jones. 2009. apLCMS—Adaptive Processing of High-Resolution LC/MS Data.” Bioinformatics 25 (15): 1930–36. https://doi.org/10.1093/bioinformatics/btp291.
Yu, Yong-Jie, Qing-Xia Zheng, Yue-Ming Zhang, et al. 2019. “Automatic Data Analysis Workflow for Ultra-High Performance Liquid Chromatography-High Resolution Mass Spectrometry-Based Metabolomics.” Journal of Chromatography A 1585 (January): 172–81. https://doi.org/10.1016/j.chroma.2018.11.070.
Zhang, Yu-Ying, Qian Zhang, Yue-Ming Zhang, et al. 2020. “A Comprehensive Automatic Data Analysis Strategy for Gas Chromatography-Mass Spectrometry Based Untargeted Metabolomics.” Journal of Chromatography A 1616 (April): 460787. https://doi.org/10.1016/j.chroma.2019.460787.
Zheng, Fujian, Lei You, Wangshu Qin, et al. 2022. MetEx: A Targeted Extraction Strategy for Improving the Coverage and Accuracy of Metabolite Annotation in Liquid ChromatographyHigh-Resolution Mass Spectrometry Data.” Analytical Chemistry 94 (24): 8561–69. https://doi.org/10.1021/acs.analchem.1c04783.