Chapter 10 Statistical Analysis

The statistical strategy in a metabolomics study depends strongly on the research goal. Because many metabolomics experiments are performed in an untargeted mode, analysis often begins with exploratory methods before moving to formal hypothesis testing or predictive modeling. The basic goals of exploratory analysis are:

  • Find the relationship among variables

  • Find the relationship among samples/group of samples.

This is basically unsupervised analysis.

However, many studies also include group labels or continuous outcomes that can be used to identify biomarkers or model associations between variables and phenotype. These settings require supervised methods. For a broader discussion of statistical analysis in metabolic phenotyping, see(Blaise et al. 2021). Before discussing specific algorithms, it is useful to review a few core concepts.

10.1 Basic Statistical Analysis

A statistic is a numerical summary used to describe properties of variables or samples. It can be designed to extract signal, reduce noise, or support inference. Statistical models and hypothesis tests are built from such summaries rather than from raw observations alone.

\[Statistic = f(sample_1,sample_2,...,sample_n)\]

Null Hypothesis Significance Testing (NHST) is commonly used for statistical inference. A p-value is the probability of observing a statistic at least as extreme as the one obtained, assuming the null hypothesis holds.

In omics studies, the multiple-comparison problem appears as soon as many metabolites are tested at the same time. False discovery rate (FDR) control is therefore essential to limit false positives. Common approaches include the Benjamini-Hochberg adjustment of raw p-values or direct estimation of q-values using Storey’s method.

NHST is widely used, but p-values are often misinterpreted and do not by themselves quantify evidence in favor of an alternative model. Bayesian hypothesis testing can address some of these limitations by using Bayes factors to compare the null hypothesis with an alternative hypothesis.

\[Bayes\ factor = \frac{p(D|Ha)}{p(D|H0)} = \frac{posterior\ odds}{prior\ odds}\]

Statistical models use measured data to support prediction or explanation. Most models require parameter tuning to achieve good performance, and they can be evaluated with metrics such as \(R^2\) or ROC curves. Once candidate models are fitted, model selection can be used to compare their performance and choose an appropriate balance between interpretability and predictive accuracy.

\[Target = g(Statistic) = g(f(sample_1,sample_2,...,sample_n))\]

The bias-variance tradeoff is a core concept in statistical modeling. Poorly tuned models may be overfitted (low bias, high variance) or underfitted (high bias, low variance).

\[E[(y - \hat f)^2] = \sigma^2 + Var[\hat f] + Bias[\hat f]\]

Cross-validation is commonly used to compare models through repeated training-testing schemes such as jackknife resampling, bootstrap resampling, or k-fold cross-validation.

Regularization can also improve predictive performance and robustness. Ridge regression, LASSO, and related penalties are widely used for this purpose.

For supervised analysis, linear models and tree-based models are two common starting points. Linear models are useful when the goal is to estimate interpretable relationships between variables and outcomes. Tree-based models, including bagging, random forest, and boosting, instead learn hierarchical decision structures and often perform well for prediction. Other model families, such as Support Vector Machines (SVMs), artificial neural networks (ANNs), and deep learning methods, make different assumptions and may also be useful depending on data size and study goals. If prediction is the main objective, comparing several model classes is usually more informative than relying on a single algorithm.

10.2 A practical method-choice guide

For most metabolomics studies, statistical analysis can be organized around a few common goals:

  1. Do I want to test individual metabolites?
    Use univariate methods such as t-test, ANOVA, LIMMA, or regression models with multiple-testing correction.

  2. Do I want to explore overall sample structure without using group labels?
    Use unsupervised methods such as PCA, clustering, and exploratory visualization.

  3. Do I want to build a supervised separation model between groups?
    Consider PLS-DA, sPLS-DA, OPLS-DA, regularized regression, random forest, or SVM, but only with careful validation.

  4. Do I want feature selection for biomarker discovery?
    Use sparse supervised methods, penalized regression, or tree-based models, and compare them with simpler baselines.

  5. Do I want network-level or systems-level interpretation?
    Use pathway analysis and network analysis after normalization, annotation, and differential analysis are already stable.

As a rough guide:

  • Small two-group untargeted study: start with PCA, QC checks, t-test or LIMMA, and simple effect-size reporting.

  • Multi-group or covariate-rich study: use ANOVA, linear models, LIMMA, and adjusted regression rather than many separate t-tests.

  • Biomarker panel discovery: compare univariate methods with sPLS-DA, regularized regression, and random forest instead of relying on one classifier.

  • Mechanism-oriented study: combine univariate analysis, PCA, pathway analysis, and network interpretation.

  • Prediction-focused study with external validation: supervised ML is reasonable, but model selection, nested validation, and batch control become critical.

10.3 Differences analysis

After we get corrected peaks across samples, the next step is to find the differences between two groups. Actually, you could perform ANOVA or Kruskal-Wallis Test for comparison among more than two groups. The basic idea behind statistic analysis is to find the meaningful differences between groups and extract such ions or peak groups.

So how to find the differences? In most metabolomics software, such task is completed by a t-test and report p-value and fold changes. If you only compare two groups on one peaks, that’s OK. However, if you compare two groups on thousands of peaks, statistic textbook would tell you to notice the false positive. For one comparison, the confidence level is 0.05, which means 5% chances to get false positive result. For two comparisons, such chances would be \(1-0.95^2\). For 10 comparisons, such chances would be \(1-0.95^{10} = 0.4012631\). For 100 comparisons, such chances would be \(1-0.95^{100} = 0.9940795\). You would almost certainly to make mistakes for your results.

In statistics, the false discovery rate(FDR) control is always mentioned in omics studies for multiple tests. I suggested using q-values to control FDR. If q-value is less than 0.05, we should expect a lower than 5% chances we make the wrong selections for all of the comparisons showed lower q-values in the whole dataset. Also we could use local false discovery rate, which showed the FDR for certain peaks. However, such values are hard to be estimated accurately.

Karin Ortmayr thought fold change might be better than p-values to find the differences (Ortmayr et al. 2016).

10.3.1 T-test or ANOVA

If one peak shows significant differences among two groups or multiple groups, T-test or ANOVA could be used to find such peaks. However, when multiple hypothesis testings are performed, the probability of false positive would increase. In this case, false discovery rate(FDR) control is required. Q value or adjusted p value could be used in this situation. At certain confidence interval, we could find peaks with significant differences after FDR control.

10.3.2 LIMMA

Linear Models for MicroArray Data(LIMMA) model could also be used for high-dimensional data like metabolomics. They use a moderated t-statistic to make estimation of the effects called Empirical Bayes Statistics for Differential Expression. It is a hierarchical model to shrink the t-statistic for each peak to all the peaks. Such estimation is more robust. In LIMMA, we could add the known batch effect variable as a covariance in the model. LIMMA is different from t-test or ANOVA while we could still use p value and FDR control on LIMMA results.

10.3.3 Bayesian mixture model

Another way to make difference analysis is based on Bayesian mixture model without p value. Such model would not use hypothesis testing and directly generate the posterior estimation of parameters. A posterior probability could be used to check whether certain peaks could be related to different condition. If we want to make comparison between classical model like LIMMA and Bayesian mixture model. We need to use simulation to find the cutoff.

10.4 Exploratory multivariate analysis

10.4.1 PCA

In metabolomics, PCA is primarily an exploratory data analysis method. It is widely used for visualization, but its value goes beyond plotting alone. PCA compresses a high-dimensional data matrix into a smaller set of orthogonal components that capture as much variance as possible. For example, instead of visualizing 100 metabolites across 100 samples directly, PCA projects the data into a smaller number of principal components that summarize the dominant structure of the dataset.

This projection makes it easier to inspect sample relationships, outliers, QC clustering, and batch structure. Samples with similar metabolite patterns will tend to project near each other in score plots, while dissimilar samples will separate along the principal components. Although two principal components are commonly plotted for convenience, they are usually not sufficient to represent the whole dataset. In many cases, several PCs are needed to explain most of the variance.

PCA can also be used to explore metabolite relationships through loadings, which describe how strongly each metabolite contributes to a principal component. Metabolites with similar loadings often show similar abundance patterns across samples. In this way, PCA helps summarize both sample-level structure and variable-level structure.

Besides visualization, PCA loadings can sometimes be used for downstream interpretation. Yamamoto et al.(Yamamoto et al. 2014) used statistical testing on loading-related correlations to identify peaks associated with biologically meaningful PCs. Such approaches are most useful when a relatively small number of PCs explain a large proportion of the variance.

10.4.2 Cluster Analysis

After we got a lot of samples and analyzed the concentrations of many compounds in them, we may ask about the relationship between the samples. You might have the sampling information such as the date and the position and you could use boxplot or violin plot to explore the relationships among those categorical variables. However, you could also use the data to find some potential relationship.

But how? if two samples’ data were almost the same, we might think those samples were from the same potential group. On the other hand, how do we define the “same” in the data?

Cluster analysis told us that just define a “distances” to measure the similarity between samples. Mathematically, such distances would be shown in many different manners such as the sum of the absolute values of the differences between samples.

For example, we analyzed the amounts of compound A, B and C in two samples and get the results:

Compounds(ng) A B C
Sample 1 10 13 21
Sample 2 54 23 16

The distance could be:

\[ distance = |10-54|+|13-23|+|21-16| = 59 \]

Also you could use the sum of squares or other way to stand for the similarity. After you defined a “distance”, you could get the distances between all of pairs for your samples. If two samples’ distance was the smallest, put them together as one group. Then calculate the distances again to combine the small group into big group until all of the samples were include in one group. Then draw a dendrogram for those process.

The following issue is that how to cluster samples? You might set a cut-off and directly get the group from the dendrogram. However, sometimes you were ordered to cluster the samples into certain numbers of groups such as three. In such situation, you need K means cluster analysis.

The basic idea behind the K means is that generate three virtual samples and calculate the distances between those three virtual samples and all of the other samples. There would be three values for each samples. Choose the smallest values and class that sample into this group. Then your samples were classified into three groups. You need to calculate the center of those three groups and get three new virtual samples. Repeat such process until the group members unchanged and you get your samples classified.

OK, the basic idea behind the cluster analysis could be summarized as define the distances, set your cut-off and find the group. By this way, you might show potential relationships among samples.

10.5 Supervised multivariate discrimination

10.5.1 PCA / PLS-DA / sPLS-DA / OPLS-DA

These methods are often used together in metabolomics papers, but they answer different questions and should not be treated as interchangeable.

Method Uses class labels Main purpose Strengths Main risks / limitations
PCA No Explore global structure and major variance directions Simple, robust, useful for QC, outliers, and batch patterns May miss group separation when biological variance is smaller than technical variance
PLS-DA Yes Find latent components related to class separation Handles collinearity well and gives interpretable loadings Easily overfits in small high-dimensional studies
sPLS-DA Yes Class separation with embedded feature selection More suitable for biomarker discovery because it removes many irrelevant variables Feature selection can still be unstable without resampling
OPLS-DA Yes Separate predictive variation from orthogonal variation Often gives clearer visual separation and easier interpretation Can appear more convincing than justified if validation is weak

In practical terms:

  • PCA is usually the first multivariate method to run because it does not use labels and is useful for checking QC clustering, outliers, batch effects, and dominant variance structure.

  • PLS-DA is useful when the goal is supervised discrimination and interpretation of class-related latent variables, but it should never be trusted from score plots alone.

  • sPLS-DA is often a better choice than standard PLS-DA when the goal is feature prioritization, because sparse penalties reduce noise from irrelevant metabolites.

  • OPLS-DA can be useful when you want to isolate variation related to the class from variation unrelated to the class, but it requires especially careful validation because attractive separation plots can be misleading.

No supervised separation method should be interpreted without cross-validation, permutation testing, and comparison with simpler alternatives.

10.5.2 PLSDA

PLS-DA, OPLS-DA and HPSO-OPLS-DA (Yang et al. 2017) could be used.

Partial least squares discriminant analysis(PLSDA) was first used in the 1990s. However, Partial least squares(PLS) was proposed in the 1960s by Hermann Wold. Principal components analysis produces the weight matrix reflecting the covariance structure between the variables, while partial least squares produces the weight matrix reflecting the covariance structure between the variables and classes. After rotation by weight matrix, the new variables would contain relationship with classes.

The classification performance of PLSDA is identical to linear discriminant analysis(LDA) if class sizes are balanced, or the columns are adjusted according to the mean of the class mean. If the number of variables exceeds the number of samples, LDA can be performed on the principal components. Quadratic discriminant analysis(QDA) could model nonlinearity relationship between variables while PLSDA is better for collinear variables. However, as a classifier, there is little advantage for PLSDA. The advantage of PLSDA is that this model could show relationship between variables, which is not the goal of regular classifier.

Different algorithms (Andersson 2009) for PLSDA would show different scores, while PCA always shows the same score with fixed algorithm. For PCA, both new variables and classes are orthogonal. However, for PLS(Wold), only new classes are orthogonal. For PLS(Martens), only new variables are orthogonal. This paper show the details of using such methods (Brereton and Lloyd 2018).

Sparse PLS discriminant analysis(sPLS-DA) make an L1 penalty on the variable selection to remove the influences from unrelated variables, which make sense for high-throughput omics data (Lê Cao et al. 2011).

For o-PLS-DA, s-plot could be used to find features(Wiklund et al. 2008).

10.5.3 When to choose which method

For typical metabolomics study designs:

  • Exploratory cohort profiling: use PCA first, then clustering if you want to inspect natural sample grouping.

  • Two-class discrimination with moderate sample size: use PCA for diagnostics and then compare PLS-DA with simpler classifiers or univariate baselines.

  • Biomarker shortlist generation in high-dimensional data: prefer sPLS-DA or regularized regression because they embed feature selection.

  • Need to separate class-related from unrelated structured variation: consider OPLS-DA, but validate with permutation tests and external or nested cross-validation.

  • Very small sample size with many features: avoid over-interpreting any supervised latent-variable model; rely more on PCA, univariate analysis, and conservative reporting.

  • Prediction as the main goal: compare PLS-DA family methods against random forest, SVM, or penalized regression rather than assuming latent-variable methods are best.

10.6 Network analysis

10.6.1 Vertex and edge

Each node is a vertex and the connection between nodes is a edge in the network. The connection can be directed or undirected depending on the relationship.

10.6.2 Build the network

Adjacency matrices were always used to build the network. It’s a square matrix with n dimensions. Row i and column j is equal to 1 if and only if vertices i and j are connected. In directed network, such values could be 1 for i to j and -1 for j to i.

10.6.3 Network attributes

Vertex/edge attributes could be the group information or metadata about the nodes/connections. The edges could be weighted as attribute.

Path is the way from one node to another node in the network and you could find the shortest path in the path. The largest distance of a graph is called its diameter.

An undirected network is connected if there is a way from any vertex to any other. Connected networks can further classified according to the strength of their connectedness. An undirected network with at least two paths between each pairs of nodes is said to be biconnected.

The transitivity of network is a crude summary of the structure. A high value means that nodes are connected well locally with dense subgraphs. Network data sets typically show high transitivity.

Maximum flows and minimum cuts could be used to check the largest volumns and smallest path flow between two nodes. For example, two hubs is connected by one node and the largest volumn and smallest path flow between two nodes from each hub could be counted at the select node.

Sparse network has similar number of edges and the number of nodes. Dense network has the number of edges as a quadratic function of the nodes.

10.7 Machine Learning and Deep Learning

Machine learning (ML) and deep learning (DL) have been increasingly applied across the metabolomics workflow, from peak detection to biomarker discovery. However, proper application requires understanding both the opportunities and the pitfalls specific to high-dimensional, low-sample-size metabolomics data(Galal et al. 2022; Sen et al. 2021).

10.7.1 Where ML/DL is applied in metabolomics

ML/DL methods appear at multiple stages of the metabolomics pipeline:

  • Peak detection and deconvolution: Deep learning models can learn peak shapes from training data and improve detection in noisy spectra. NeatMS uses a convolutional neural network for peak quality assessment(Gloaguen et al. 2022), and 2D pseudo-color image-based deep learning has been applied to LC-MS feature detection(Zhao et al. 2021).

  • Spectral annotation: ML models predict molecular fingerprints from MS/MS spectra. Tools like CSI:FingerID(Dührkop et al. 2019), CANOPUS(Dührkop et al. 2020), MS2DeepScore(Huber et al. 2020), and MSBERT(H. Zhang et al. 2024) use various architectures (random forest, deep neural networks, transformers) to match or predict spectra. MS2Query enables analogue searching using learned spectral embeddings(de Jonge et al. 2023).

  • Retention time prediction: Models such as RT-Transformer(Xue et al. 2024) and Retip(Bonini et al. 2020) predict chromatographic retention times from molecular structure, adding an orthogonal dimension for annotation confidence.

  • Biomarker discovery and classification: Supervised models (random forest, SVM, PLS-DA, gradient boosting, neural networks) are commonly used to classify samples and select discriminating features. Ensemble methods and model stacking often outperform single models.

  • Data integration: Multi-omics integration benefits from ML approaches that can handle heterogeneous data types, such as MOFA (Multi-Omics Factor Analysis) and similarity network fusion.

10.7.2 Common pitfalls

Metabolomics datasets are particularly susceptible to ML overfitting due to the high number of features relative to the sample size (the “p >> n” problem). Common mistakes include:

  • Data leakage: Performing feature selection, normalization, or missing value imputation before splitting data into training/test sets. All preprocessing steps that use information across samples must be performed within the cross-validation loop.

  • Inadequate validation: Using only a single random train/test split instead of repeated k-fold cross-validation. For small sample sizes (n < 100), nested cross-validation is recommended to get unbiased performance estimates.

  • Ignoring batch effects: If batch structure correlates with the biological groups, ML models may learn batch signatures rather than biology. Always check whether your model predictions correlate with known technical factors.

  • Reporting bias: Reporting only the best-performing model among many tried, without adjusting for the model selection process. Permutation testing (shuffling labels to establish null performance) is essential to verify that classification accuracy exceeds what is expected by chance.

10.7.3 Best practices

Practical guidance for training and reporting ML models in metabolomics has been proposed(Zhao et al. 2025). The EMBRACE checklist provides a framework for environmental ML, emphasizing baseline reporting and comprehensive evaluation(Zhu et al. 2024). Key recommendations include:

  • Always report the full cross-validation scheme, including how hyperparameters were tuned

  • Include a permutation test to verify statistical significance of classification accuracy

  • Report multiple metrics (accuracy, AUC, sensitivity, specificity, F1) rather than a single number

  • Make code and processed data available for reproducibility

  • Compare ML results against simple baselines (e.g., univariate t-test + fold change) to justify the added complexity

  • For feature importance, use model-agnostic methods (e.g., SHAP values, permutation importance) that provide interpretable results

10.7.4 Deep learning considerations

Deep learning models (autoencoders, variational autoencoders, transformers, graph neural networks) are powerful but typically require larger training sets than classical ML. In metabolomics, they are most useful when:

  • Large public spectral databases are available for pretraining (e.g., MS/MS spectra prediction)

  • The task involves pattern recognition in complex spectral or image data

  • Transfer learning from related domains can compensate for limited sample sizes

For classification tasks with fewer than a few hundred samples, classical methods (random forest, gradient boosting, regularized regression) often perform as well as or better than deep learning, with the added benefit of interpretability(Sen et al. 2021).

10.8 Causal inference in metabolomics

Standard metabolomics analyses identify associations between metabolites and outcomes, but association does not imply causation. Confounding, reverse causation, and measurement error can all produce spurious associations. Several approaches are now being applied to strengthen causal claims from metabolomics data.

10.8.1 Mendelian randomization

Mendelian randomization (MR) uses genetic variants as instrumental variables to estimate the causal effect of an exposure (e.g., a metabolite level) on an outcome (e.g., disease risk). Because genetic variants are randomly allocated at conception, they are generally not affected by the confounders that complicate observational studies. With the availability of large-scale genome-wide association studies (GWAS) for hundreds of circulating metabolites, two-sample MR has become a popular approach to test whether a metabolite is causally related to a health outcome(Holmes et al. 2018). The MR-Base platform and the TwoSampleMR R package provide accessible tools for conducting such analyses(Hemani et al. 2018).

Key assumptions of MR include: the genetic instrument must be associated with the metabolite (relevance), must not be associated with confounders (independence), and must affect the outcome only through the metabolite (exclusion restriction). Violations of these assumptions, particularly horizontal pleiotropy, can bias results. Sensitivity analyses such as MR-Egger, weighted median, and MR-PRESSO are essential to assess robustness.

10.8.2 Practical considerations

For beginners applying causal inference methods in metabolomics, it is worth noting that MR works best for metabolites with strong genetic determinants (high heritability) and is less powerful for metabolites primarily driven by recent environmental exposure. Combining MR evidence with traditional observational associations and biological pathway knowledge provides the strongest basis for causal claims. Mediation analysis and structural equation modeling are additional tools for understanding how metabolites sit on the causal path between exposure and disease.

10.9 Software

  • MetaboAnalystR (Chong et al. 2019)

  • caret could employ more than 200 statistical models in a general framework to build/select models. You could also show the variable importance for some of the models.

  • caretEnsemble Functions for creating ensembles of caret models

  • pROC Tools for visualizing, smoothing and comparing receiver operating characteristic (ROC curves). (Partial) area under the curve (AUC) can be compared with statistical tests based on U-statistics or bootstrap. Confidence intervals can be computed for (p)AUC or ROC curves.

  • gWQS Fits Weighted Quantile Sum (WQS) regressions for continuous, binomial, multinomial and count outcomes.

  • Community ecology tool could be used to analysis metabolomic data(Passos Mansoldo et al. 2022).

References

Andersson, Martin. 2009. “A Comparison of Nine PLS1 Algorithms.” Journal of Chemometrics 23 (10): 518–29. https://doi.org/10.1002/cem.1248.
Blaise, Benjamin J., Gonçalo D. S. Correia, Gordon A. Haggart, et al. 2021. “Statistical Analysis in Metabolic Phenotyping.” Nature Protocols, July, 1–28. https://doi.org/10.1038/s41596-021-00579-1.
Bonini, Paolo, Tobias Kind, Hiroshi Tsugawa, Dinesh Kumar Barupal, and Oliver Fiehn. 2020. “Retip: Retention Time Prediction for Compound Annotation in Untargeted Metabolomics.” Analytical Chemistry 92 (11): 7515–22. https://doi.org/10.1021/acs.analchem.9b05765.
Brereton, Richard G., and Gavin R. Lloyd. 2018. “Partial Least Squares Discriminant Analysis for Chemometrics and Metabolomics: How Scores, Loadings, and Weights Differ According to Two Common Algorithms.” Journal of Chemometrics 32 (4): e3028. https://doi.org/10.1002/cem.3028.
Chong, Jasmine, David S. Wishart, and Jianguo Xia. 2019. “Using MetaboAnalyst 4.0 for Comprehensive and Integrative Metabolomics Data Analysis.” Current Protocols in Bioinformatics 68 (1): e86. https://doi.org/10.1002/cpbi.86.
de Jonge, Niek F., Joris J. R. Louwen, Elena Chekmeneva, et al. 2023. MS2Query: Reliable and Scalable MS2 Mass Spectra-Based Analogue Search.” Nature Communications 14 (1): 1752. https://doi.org/10.1038/s41467-023-37446-4.
Dührkop, Kai, Markus Fleischauer, Marcus Ludwig, et al. 2019. SIRIUS 4: A Rapid Tool for Turning Tandem Mass Spectra into Metabolite Structure Information.” Nature Methods 16 (4): 299–302. https://doi.org/10.1038/s41592-019-0344-8.
Dührkop, Kai, Louis-Félix Nothias, Markus Fleischauer, et al. 2020. “Systematic Classification of Unknown Metabolites Using High-Resolution Fragmentation Mass Spectra.” Nature Biotechnology, November, 1–10. https://doi.org/10.1038/s41587-020-0740-8.
Galal, Aya, Marwa Talal, and Ahmed Moustafa. 2022. “Applications of Machine Learning in Metabolomics: Disease Modeling and Classification.” Frontiers in Genetics 13: 1017340. https://doi.org/10.3389/fgene.2022.1017340.
Gloaguen, Yoann, Jennifer A. Kirwan, and Dieter Beule. 2022. “Deep Learning-Assisted Peak Curation for Large-Scale LC-MS Metabolomics.” Analytical Chemistry 94 (12): 4930–37. https://doi.org/10.1021/acs.analchem.1c02220.
Hemani, Gibran, Jie Zheng, Benjamin Elsworth, et al. 2018. “The MR-Base Platform Supports Systematic Causal Inference Across the Human Phenome.” eLife 7: e34408. https://doi.org/10.7554/eLife.34408.
Holmes, Michael V, Iona Y Millwood, Christiana Kartsonaki, et al. 2018. Lipids, lipoproteins, and metabolites and risk of myocardial infarction and stroke.” Journal of the American College of Cardiology 71 (6): 620–32. https://doi.org/10.1016/j.jacc.2017.12.006.
Huber, Florian, Stefan Verhoeven, Christiaan Meijer, et al. 2020. “Matchms - Processing and Similarity Evaluation of Mass Spectrometry Data.” Journal of Open Source Software 5 (52): 2411. https://doi.org/10.21105/joss.02411.
Lê Cao, Kim-Anh, Simon Boitard, and Philippe Besse. 2011. “Sparse PLS Discriminant Analysis: Biologically Relevant Feature Selection and Graphical Displays for Multiclass Problems.” BMC Bioinformatics 12 (June): 253. https://doi.org/10.1186/1471-2105-12-253.
Ortmayr, Karin, Verena Charwat, Cornelia Kasper, Stephan Hann, and Gunda Koellensperger. 2016. Uncertainty Budgeting in Fold Change Determination and Implications for Non-Targeted Metabolomics Studies in Model Systems. 142 (1): 80–90. https://doi.org/10.1039/C6AN01342B.
Passos Mansoldo, Felipe Raposo, Rafael Garrett, Veronica da Silva Cardoso, Marina Amaral Alves, and Alane Beatriz Vermelho. 2022. “Metabology: Analysis of Metabolomics Data Using Community Ecology Tools.” Analytica Chimica Acta 1232 (November): 340469. https://doi.org/10.1016/j.aca.2022.340469.
Sen, Partho, Santosh Lamichhane, Vivek B Mathema, et al. 2021. “Deep Learning Meets Metabolomics: A Methodological Perspective.” Briefings in Bioinformatics 22 (2): bbaa204. https://doi.org/10.1093/bib/bbaa204.
Wiklund, Susanne, Erik Johansson, Lina Sjöström, et al. 2008. “Visualization of GC/TOF-MS-Based Metabolomics Data for Identification of Biochemically Interesting Compounds Using OPLS Class Models.” Analytical Chemistry 80 (1): 115–22. https://doi.org/10.1021/ac0713510.
Xue, Jun, Bingyi Wang, Hongchao Ji, and WeiHua Li. 2024. RT-Transformer: Retention Time Prediction for Metabolite Annotation to Assist in Metabolite Identification.” Bioinformatics 40 (3): btae084. https://doi.org/10.1093/bioinformatics/btae084.
Yamamoto, Hiroyuki, Tamaki Fujimori, Hajime Sato, Gen Ishikawa, Kenjiro Kami, and Yoshiaki Ohashi. 2014. “Statistical Hypothesis Testing of Factor Loading in Principal Component Analysis and Its Application to Metabolite Set Enrichment Analysis.” BMC Bioinformatics 15 (February): 51. https://doi.org/10.1186/1471-2105-15-51.
Yang, Qin, Shan-Shan Lin, Jiang-Tao Yang, Li-Juan Tang, and Ru-Qin Yu. 2017. “Detection of Inborn Errors of Metabolism Utilizing GC-MS Urinary Metabolomics Coupled with a Modified Orthogonal Partial Least Squares Discriminant Analysis.” Talanta 165 (April): 545–52. https://doi.org/10.1016/j.talanta.2017.01.018.
Zhang, Hailiang, Qiong Yang, Ting Xie, Yue Wang, Zhimin Zhang, and Hongmei Lu. 2024. MSBERT: Embedding Tandem Mass Spectra into Chemically Rational Space by Mask Learning and Contrastive Learning.” Analytical Chemistry 96 (42): 16599–608. https://doi.org/10.1021/acs.analchem.4c02426.
Zhao, Chen, Tingting Zhao, Qiming Shen, Zhifeng Tang, Xiaoxiao Li, and Tao Huan. 2025. “Practical Guidance for Training Machine Learning Models in Metabolomics and Mass Spectrometry Research.” Analytical Chemistry, ahead of print, October. https://doi.org/10.1021/acs.analchem.5c01417.
Zhao, Fan, Shuai Huang, and Xiaozhe Zhang. 2021. “High Sensitivity and Specificity Feature Detection in Liquid Chromatography–Mass Spectrometry Data: A Deep Learning Framework.” Talanta 222 (January): 121580. https://doi.org/10.1016/j.talanta.2020.121580.
Zhu, Yuming, Miao Yu, and Yanqiu Ge. 2024. “EMBRACE the Uncertainty: An Evaluation Framework for Environmental Machine Learning.” Environmental Science & Technology 58 (51): 22505–16. https://doi.org/10.1021/acs.est.4c07726.