Paired Mass Distance(PMD) analysis for GC/LC-MS based non-targeted analysis

Introduction of Paired Mass Distance analysis

pmd package use Paired Mass Distance (PMD) relationship to analysis the GC/LC-MS based non-targeted data. PMD means the distance between two masses or mass to charge ratios. In mass spectrometry, PMD would keep the same value between two masses and two mass to charge ratios(m/z). There are two kinds of PMD involved in this package: PMD from the same compound and PMD from different compounds. In GC/LC-MS or XCMS based non-targeted data analysis, peaks could be separated by chronograph and same compound means ions from similar retention times or ions co-eluted by certain column.

PMD from the same compound

For MS1 full scan data, we could build retention time(RT) bins to assign peaks into different RT groups by retention time hierarchical clustering analysis. For each RT group, the peaks should come from same compounds or co-elutes. If certain PMD appeared in multiple RT groups, it would be related to the relationship about adducts, neutral loss, isotopologues or common fragments ions.

PMD from different compounds

The peaks from different retention time groups would like to be different compounds separated by chronograph. The PMD would reflect the relationship about homologous series or chemical reactions.

GlobalStd algorithm use the PMD within same RT group to find independent peaks among certain data set. Then, structure/reaction directed analysis use PMD from different RT groups to screen important compounds or reactions.

Data format

The input data should be a list object with at least two elements from a peaks list:

mass to charge ratio with name of mz, high resolution mass spectrometry is required
retention time with name of rt

However, I suggested to add intensity and group information to the list for validation of PMD analysis.

In this package, a data set from in vivo solid phase micro-extraction(SPME) was attached. This data set contain 9 samples from 3 fish with triplicates samples for each fish. Here is the data structure:

library(pmd)
data("spmeinvivo")
str(spmeinvivo)
#> List of 4
#>  $ data : num [1:1459, 1:9] 1095 10439 10154 2797 90211 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:1459] "100.1/170" "100.5/86" "101/85" "103.1/348" ...
#>   .. ..$ : chr [1:9] "1405_Fish1_F1" "1405_Fish1_F2" "1405_Fish1_F3" "1405_Fish2_F1" ...
#>  $ group:'data.frame':   9 obs. of  2 variables:
#>   ..$ sample_name : chr [1:9] "1405_Fish1_F1" "1405_Fish1_F2" "1405_Fish1_F3" "1405_Fish2_F1" ...
#>   ..$ sample_group: chr [1:9] "fish1" "fish1" "fish1" "fish2" ...
#>  $ mz   : num [1:1459] 100 101 101 103 104 ...
#>  $ rt   : num [1:1459] 170.2 86.3 84.9 348.1 48.8 ...

You could build this list or mzrt object from the xcms objects via enviGCMS package. When you have a xcmsSet object or XCMSnExp object named xset, you could use enviGCMS::getmzrt(xset) to get such list. Of course you could build such list by yourself.

GlobalStd algorithm

GlobalStd algorithm try to find independent peaks among certain peaks list. The first step is retention time hierarchical clustering analysis. The second step is to find the relationship among adducts, neutral loss, isotopologues and common fragments ions. The third step is to screen the independent peaks.

Here is a workflow for this algorithm:

knitr::include_graphics('https://yufree.github.io/presentation/figure/GlobalStd.png')

STEP1: Retention time hierarchical clustering

pmd <- getpaired(spmeinvivo)
#> 75 retention time clusters found.
#> Using ng = 15
#> 5 unique PMDs retained.
#> The unique within RT clusters high frequency PMD(s) is(are)  28.03 21.98 44.03 17.03 18.01.
#> 409 isotope peaks found.
#> 109 multiple charged isotope peaks found.
#> 251 multiple charged peaks found.
#> 346 paired peaks found.
plotrtg(pmd)

This plot would show the distribution of RT groups. The rtcutoff in function getpaired could be used to set the cutoff of the distances in retention time hierarchical clustering analysis. Retention time cluster cutoff should fit the peak picking algorithm. For HPLC, 10 is suggested and 5 could be used for UPLC.

Global PMD’s retention time group numbers should be around 20 percent of the retention time cluster numbers. For example, if you find 100 retention time clusters, I suggested you use 20 as the cutoff of empirical global PMD’s retention time group numbers. If you don’t specifically assign a value to ng, the algorithm will select such recommendation by default setting.

Take care of the retention time cluster with lots of peaks. In this case, such cluster could be co-eluted compounds on certain column. It would be wise to trim the retention time window for high quality peaks. Another important hint is that pre-filter your peak list by black samples or other quality control samples. Otherwise the running time would be long and lots of pmd relationship would be just from noise.

STEP2: Relationship among adducts, neutral loss, isotopologues and common fragments ions

The ng in function getpaired could be used to set cutoff of global PMD’s retention time group numbers. If ng is 10, at least 10 of the retention time groups should contain the shown PMD relationship. You could use plotpaired to show the distribution.

plotpaired(pmd)

You could also show the distribution of PMD relationship by index:

# show the unique PMD found by getpaired function
for(i in 1:length(unique(round(pmd$paired$diff,2)))){
        diff <- unique(round(pmd$paired$diff,2))[i]
        index <- round(pmd$paired$diff,2)== diff
        plotpaired(pmd,index)
}

This is an easy way to find potential adducts of the data by high frequency PMD from the same compound. For example, 21.98 Da could be the mass distances between $[M+H]^+$ and $[M+Na]^+$ . In this case, user could find the potential adducts or neutral loss even when they have no preferred adducts list. If one adduct exist in certain analytical system, the high frequency PMD will reveal such relationship. The high frequency PMD list could also be used to check the fragmental pattern of in-source reactions as long as such patterns are popular among all collected ions.

STEP3: Screen the independent peaks

You could use getstd function to get the independent peaks. Independent peaks mean the peaks list removing the redundant peaks such as adducts, neutral loss, isotopologues and comment fragments ions found by PMD analysis in STEP2. Ideally, those peaks could be molecular ions while they might still contain redundant peaks.

std <- getstd(pmd)
#> 8 group(s) have single peaks 14 23 32 33 54 55 56 75
#> 11 group(s) with multiple peaks while no isotope/paired relationship 4 5 7 8 11 ... 42 49 68 72 73
#> 9 group(s) with isotope without paired relationship 2 9 22 26 52 62 64 66 70
#> 4 group(s) with paired without isotope relationship 1 10 15 18
#> 43 group(s) with both paired and isotope relationship 3 6 12 13 16 ... 65 67 69 71 74
#> 292 standard masses identified.

Here you could plot the peaks by plotstd function to show the distribution of independent peaks:

plotstd(std)

You could also plot the peaks distribution by assign a retention time group via plotstdrt:

par(mfrow = c(2,3))
plotstdrt(std,rtcluster = 23,main = 'Retention time group 23')
plotstdrt(std,rtcluster = 9,main = 'Retention time group 9')
plotstdrt(std,rtcluster = 18,main = 'Retention time group 18')
plotstdrt(std,rtcluster = 67,main = 'Retention time group 67')
plotstdrt(std,rtcluster = 49,main = 'Retention time group 49')
plotstdrt(std,rtcluster = 6,main = 'Retention time group 6')

Extra filter with correlation coefficient cutoff

Original GlobalStd algorithm only use mass to charge ratio and retention time of peaks to select independent peaks. However, if intensity data across samples are available, correlation coefficient of paired ions could be used to further filter the random noise in high frequency PMDs. You could set up cutoff of Pearson Correlation Coefficient between peaks to refine the peaks selected by GlobalStd within same retention time groups. In this case, the numbers of selected independent peaks will be further reduced. When you use this parameter, make sure the intensity data are from real samples instead of blank samples, which will affect the calculation of correlation coefficient.

std2 <- globalstd(pmd,corcutoff = 0.9)
#> 75 retention time clusters found.
#> Using ng = 15
#> 2 unique PMDs retained.
#> The unique within RT clusters high frequency PMD(s) is(are)  21.98 17.03.
#> 242 isotope peaks found.
#> 63 multiple charged isotope peaks found.
#> 150 multiple charged peaks found.
#> 120 paired peaks found.
#> 8 group(s) have single peaks 14 23 32 33 54 55 56 75
#> 23 group(s) with multiple peaks while no isotope/paired relationship 2 4 5 7 8 ... 68 69 70 72 73
#> 17 group(s) with isotope without paired relationship 9 12 20 22 24 ... 60 64 66 67 71
#> 3 group(s) with paired without isotope relationship 1 53 74
#> 24 group(s) with both paired and isotope relationship 3 6 13 16 17 ... 48 58 61 63 65
#> 209 standard masses identified.

Validation by principal components analysis(PCA)

You need to check the GlobalStd algorithm’s results by principal components analysis(PCA). If we removed too much peaks containing information, the score plot of reduced data set would show great changes.

library(enviGCMS)
par(mfrow = c(2,2),mar = c(4,4,2,1)+0.1)
plotpca(std$data,lv = as.numeric(as.factor(std$group$sample_group)),main = "all peaks")
plotpca(std$data[std$stdmassindex,],lv = as.numeric(as.factor(std$group$sample_group)),main = paste(sum(std$stdmassindex),"independent peaks"))
plotpca(std2$data[std2$stdmassindex,],lv = as.numeric(as.factor(std$group$sample_group)),main = paste(sum(std2$stdmassindex),"reduced independent peaks"))

You might find original GlobalStd algorithm show a similar PCA score plot with original data while GlobalStd algorithm considering intensity data seems change the profile. The major reason is that correlation coefficient option in the algorithm will remove the paired ions without strong correlation. It will be aggressive to remove low intensity peaks, which are vulnerable by baseline noise. However, such options would be helpful if you only concern high quality peaks for following analysis. Otherwise, original GlobalStd will keep the most information for discover purpose.

Comparison with other pseudo spectra extraction method

GlobalStd algorithm in pmd package could be treated as a method to extract pseudo spectra. You could use getpseudospectrum to get peaks groups information for all GlobalStd peaks. This function would consider the merge of GlobalStd peaks when certain peak is involved in multiple clusters. Then you could choose export peaks with the highest intensities or base peaks in each GlobalStd merged peaks groups. Meanwhile, you could also include the correlation coefficient cutoff to further improve the data quality.

stdcluster <- getpseudospectrum(std)
#> 75 retention time clusters found.
#> Using ng = 15
#> 5 unique PMDs retained.
#> The unique within RT clusters high frequency PMD(s) is(are)  28.03 21.98 44.03 17.03 18.01.
#> 409 isotope peaks found.
#> 109 multiple charged isotope peaks found.
#> 251 multiple charged peaks found.
#> 346 paired peaks found.
#> 245 pseudo spectrum found.
#> 0.84 peaks covered by PMD relationship.
# extract the first pseudospectra for retention time cluster 37
idx <- stdcluster$pseudo$sid=='37@1'
plot(stdcluster$pseudo$mz[idx],stdcluster$pseudo$ins[idx],type = 'h',xlab = 'm/z',ylab = 'intensity',main = 'pseudo spectra for retention time cluster 37')

# considering the correlation coefficient cutoff
stdcluster2 <- getpseudospectrum(std, corcutoff = 0.9)
#> 75 retention time clusters found.
#> Using ng = 15
#> 2 unique PMDs retained.
#> The unique within RT clusters high frequency PMD(s) is(are)  21.98 17.03.
#> 242 isotope peaks found.
#> 63 multiple charged isotope peaks found.
#> 150 multiple charged peaks found.
#> 120 paired peaks found.
#> 211 pseudo spectrum found.
#> 0.52 peaks covered by PMD relationship.

We supplied getcorpseudospectrum to find peaks groups by correlation analysis only. The base peaks of correlation cluster were selected to stand for the compounds.

corcluster <- getcorpseudospectrum(spmeinvivo)
#> 75 retention time cluster found.
#> 141 pseudo spectrum found.
#> 0.72 peaks covered by PMD relationship.
# extract pseudospectra 37@1
peak <- corcluster$pseudo[corcluster$pseudo$sid == '37@1',]
plot(peak$ins~peak$mz,type = 'h',xlab = 'm/z',ylab = 'intensity',main = 'pseudo spectra for correlation cluster')

Then we could compare the compare reduced result using PCA similarity factor. A good peak selection algorithm could show a high PCA similarity factor compared with original data set while retain the minimized number of peaks.

par(mfrow = c(2,3),mar = c(4,4,2,1)+0.1)
plotpca(std$data[std$stdmassindex,],lv = as.numeric(as.factor(std$group$sample_group)),main = paste(sum(std$stdmassindex),"independent peaks"))
plotpca(std$data[stdcluster$stdmassindex2,],lv = as.numeric(as.factor(std$group$sample_group)),main = paste(sum(stdcluster$stdmassindex2),"independent base peaks"))
plotpca(std$data[stdcluster2$stdmassindex2,],lv = as.numeric(as.factor(std$group$sample_group)),main = paste(sum(stdcluster2$stdmassindex2),"independent reduced base peaks"))
plotpca(std$data[corcluster$stdmassindex,],lv = as.numeric(as.factor(std$group$sample_group)),main = paste(sum(corcluster$stdmassindex),"peaks without correlationship"))
plotpca(std$data[corcluster$stdmassindex2,],lv = as.numeric(as.factor(std$group$sample_group)),main = paste(sum(corcluster$stdmassindex2),"base peaks without correlationship"))
plotpca(std$data,lv = as.numeric(as.factor(std$group$sample_group)),main = paste(nrow(std$data),"all peaks"))

pcasf(std$data, std$data[std$stdmassindex,])
#>     pcasf 
#> 0.9833183
pcasf(std$data, std$data[stdcluster$stdmassindex2,])
#>     pcasf 
#> 0.9550097
pcasf(std$data, std$data[stdcluster2$stdmassindex2,])
#>     pcasf 
#> 0.9751328
pcasf(std$data, std$data[corcluster$stdmassindex,])
#>     pcasf 
#> 0.7021908
pcasf(std$data, std$data[corcluster$stdmassindex2,])
#>     pcasf 
#> 0.9852427

In this case, four peaks selection algorithms are fine to stand for the original peaks with PCA similarity score larger than 0.9.

Structure/Reaction directed analysis

getsda function could be used to perform Structure/reaction directed analysis. The cutoff of frequency is automate found by PMD network analysis with the largest mean distance of all nodes.

sda <- getsda(std)
#> PMD frequency cutoff is 9 by PMD network analysis with largest network average distance 7.26 .
#> 8 groups were found as high frequency PMD group.
#> 0 was found as high frequency PMD. 
#> 2.02 was found as high frequency PMD. 
#> 12 was found as high frequency PMD. 
#> 28.03 was found as high frequency PMD. 
#> 30.05 was found as high frequency PMD. 
#> 42.05 was found as high frequency PMD. 
#> 58.04 was found as high frequency PMD. 
#> 116.08 was found as high frequency PMD.

Such largest mean distance of all nodes is calculated for top 1 to 100 (if possible) high frequency PMDs. Here is a demo for the network generation process.

library(igraph)
#> 
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#> 
#>     decompose, spectrum
#> The following object is masked from 'package:base':
#> 
#>     union
cdf <- sda$sda
# get the PMDs and frequency
pmds <- as.numeric(names(sort(table(cdf$diff2),decreasing = T)))
freq <- sort(table(cdf$diff2),decreasing = T)
# filter the frequency larger than 10 for demo
pmds <- pmds[freq>10]
cdf <- sda$sda[sda$sda$diff2 %in% pmds,]
g <- igraph::graph_from_data_frame(cdf,directed = F)
l <- igraph::layout_with_fr(g)
for(i in 1:length(pmds)){
  g2 <- igraph::delete_edges(g,which(E(g)$diff2%in%pmds[1:i]))
  plot(g2,edge.width=1,vertex.label="",vertex.size=1,layout=l,main=paste('Top',length(pmds)-i,'high frequency PMDs'))
}

Here we could find more and more compounds will be connected with more high frequency PMDs. Meanwhile, the mean distance of all network nodes will increase. However, some PMDs are generated by random combination of ions. In this case, if we included those PMDs for the network, the mean distance of all network nodes will decrease. Here, the largest mean distance means no more information will be found for certain data set and such value is used as the cutoff for high frequency PMDs selection.

You could use plotstdsda to show the distribution of the selected paired peaks.

plotstdsda(sda)

You could also use index to show the distribution of certain PMDs.

par(mfrow = c(1,3),mar = c(4,4,2,1)+0.1)
plotstdsda(sda,sda$sda$diff2 == 2.02)
plotstdsda(sda,sda$sda$diff2 == 28.03)
plotstdsda(sda,sda$sda$diff2 == 58.04)

Structure/reaction directed analysis could be directly performed on all the peaks, which is slow to process:

sdaall <- getsda(spmeinvivo)
#> PMD frequency cutoff is 104 by PMD network analysis with largest network average distance 14.06 .
#> 6 groups were found as high frequency PMD group.
#> 0 was found as high frequency PMD. 
#> 2.02 was found as high frequency PMD. 
#> 28.03 was found as high frequency PMD. 
#> 31.01 was found as high frequency PMD. 
#> 58.04 was found as high frequency PMD. 
#> 116.08 was found as high frequency PMD.
par(mfrow = c(1,3),mar = c(4,4,2,1)+0.1)
plotstdsda(sdaall,sdaall$sda$diff2 == 2.02)
plotstdsda(sdaall,sdaall$sda$diff2 == 28.03)
plotstdsda(sdaall,sdaall$sda$diff2 == 58.04)

Extra filter with correlation coefficient cutoff

Structure/Reaction directed analysis could also use correlation to restrict the paired ions. However, similar to GlobalStd algorithm, such cutoff will remove low intensity data. Researcher should have a clear idea to use this cutoff.

sda2 <- getsda(std, corcutoff = 0.9)
#> PMD frequency cutoff is 9 by PMD network analysis with largest network average distance 7.26 .
#> 7 groups were found as high frequency PMD group.
#> 0 was found as high frequency PMD. 
#> 2.02 was found as high frequency PMD. 
#> 12 was found as high frequency PMD. 
#> 28.03 was found as high frequency PMD. 
#> 30.05 was found as high frequency PMD. 
#> 58.04 was found as high frequency PMD. 
#> 116.08 was found as high frequency PMD.
plotstdsda(sda2)

Structure/reaction directed analysis for peaks/compounds only data

When you only have data of peaks without retention time or compounds list, structure/reaction directed analysis could also be done by getrda function.

sda <- getrda(spmeinvivo$mz)
#> 164462 pmd found.
#> 20 pmd used.
# check high frequency pmd
colnames(sda)
#>  [1] "0"       "1.001"   "1.002"   "1.003"   "1.004"   "2.015"   "2.016"  
#>  [8] "14.015"  "17.026"  "18.011"  "21.982"  "28.031"  "28.032"  "44.026" 
#> [15] "67.987"  "67.988"  "88.052"  "116.192" "135.974" "135.975"
# get certain pmd related m/z
idx <- sda[,'2.016']
# show the m/z
spmeinvivo$mz[idx]
#>  [1] 118.0651 118.0652 120.0812 159.1575 162.0552 170.0330 170.0932 170.1541
#>  [9] 174.1363 174.9917 175.0873 176.0305 176.0418 181.9872 184.1695 188.6484
#> [17] 192.1487 192.1604 226.9522 226.9523 228.1969 228.1973 259.1148 261.1317
#> [25] 270.3185 271.3217 272.3230 272.3234 273.8902 274.8744 284.2955 285.3002
#> [33] 285.3002 286.3101 286.3101 291.0712 293.1755 294.9392 296.2961 304.3081
#> [41] 305.2480 305.3118 308.0889 308.2953 308.2954 309.1672 309.2046 315.1781
#> [49] 317.9344 319.3005 319.3002 319.9302 320.3041 320.3322 321.3165 322.3185
#> [57] 323.3221 324.3266 325.3294 327.2022 327.3449 329.0052 331.0031 350.3426
#> [65] 352.3214 352.3215 353.3244 354.3365 355.0696 359.2410 361.2353 372.3197
#> [73] 375.3066 383.2804 383.3723 384.3350 385.2753 385.3480 387.2851 397.1907
#> [81] 399.3274 400.9174 401.3420 403.2859 432.8860 433.2781 445.8289 447.1173
#> [89] 451.3633 462.8615 522.3557 524.1178 525.9831 526.4841 705.7223 708.8218
#> [97] 976.3139 976.8122 982.7763

Wrap function for GlobalStd algorithm

globalstd function is a wrap function to process GlobalStd algorithm and structure/reaction directed analysis in one line. All the plot function could be directly used on the list objects from globalstd function. If you want to perform structure/reaction directed analysis, set the sda=T in the globalstd function.

result <- globalstd(spmeinvivo, sda=FALSE)
#> 75 retention time clusters found.
#> Using ng = 15
#> 5 unique PMDs retained.
#> The unique within RT clusters high frequency PMD(s) is(are)  28.03 21.98 44.03 17.03 18.01.
#> 409 isotope peaks found.
#> 109 multiple charged isotope peaks found.
#> 251 multiple charged peaks found.
#> 346 paired peaks found.
#> 8 group(s) have single peaks 14 23 32 33 54 55 56 75
#> 11 group(s) with multiple peaks while no isotope/paired relationship 4 5 7 8 11 ... 42 49 68 72 73
#> 9 group(s) with isotope without paired relationship 2 9 22 26 52 62 64 66 70
#> 4 group(s) with paired without isotope relationship 1 10 15 18
#> 43 group(s) with both paired and isotope relationship 3 6 12 13 16 ... 65 67 69 71 74
#> 292 standard masses identified.

Use independent peaks for MS/MS validation (PMDDA)

Independent peaks are supposing generated from different compounds. We could use those peaks for MS/MS analysis instead of DIA or DDA. Here we need multiple injections for one sample since it might be impossible to get all ions’ fragment ions in one injection with good sensitivity. You could use gettarget to generate the index for the injections and output the peaks for each run.

# you need retention time for independent peaks
index <- gettarget(std$rt[std$stdmassindex])
#> You need 11 injections!
# output the ions for each injection
table(index)
#> index
#>  1  2  3  4  5  6  7  8  9 10 11 
#> 39 15 15 33 32 32 26 22 22 21 35
# show the ions for the first injection
std$mz[index==1]
#>   [1] 104.0090 113.9638 138.0549 139.0591 140.9956 145.9553 147.1175 155.1295
#>   [9] 155.1294 156.0776 157.4607 158.1546 158.9617 162.0558 162.0552 166.0867
#>  [17] 172.9774 180.1752 184.1695 192.1604 196.0204 211.1698 220.1184 220.9350
#>  [25] 227.1755 229.6753 235.4382 242.2863 242.2863 242.2863 244.1920 249.1869
#>  [33] 251.0476 252.0721 255.9443 256.2645 257.9694 258.1107 259.1810 262.1791
#>  [41] 266.1739 271.3217 271.3217 271.3216 272.3230 273.1685 278.2482 279.1604
#>  [49] 280.2641 282.0531 282.2811 283.1760 285.3002 285.3002 286.3101 294.9392
#>  [57] 304.9038 307.1107 309.9372 312.3261 326.3425 330.2521 331.0031 336.3260
#>  [65] 338.6338 340.3593 348.0713 349.1829 351.3455 353.3244 356.3423 357.3157
#>  [73] 359.2410 364.3578 365.1059 365.3023 365.3196 367.2694 370.3330 374.3041
#>  [81] 387.3532 389.2529 392.2873 393.2990 394.4045 400.2536 400.3982 401.3421
#>  [89] 403.3582 403.3586 404.3619 408.3080 409.1624 411.1725 418.9952 429.3733
#>  [97] 430.9137 437.1936 442.3373 454.2924 468.3078 469.1008 480.3100 494.3178
#> [105] 494.8113 502.3382 504.1096 507.3409 514.8764 518.3245 520.3415 523.1141
#> [113] 532.3865 532.8946 533.9698 534.4711 537.1654 539.5420 543.1198 561.4023
#> [121] 563.1820 564.3811 566.1776 567.1783 574.8768 576.4110 577.1345 579.2935
#> [129] 580.1907 581.1925 584.4734 584.8611 590.4266 603.4105 617.4657 622.4229
#> [137] 636.1983 641.1961 664.4577 673.8481 674.5057 679.4600 685.3464 692.4941
#> [145] 702.6369 703.3651 704.8667 708.3638 709.5932 711.3532 715.5204 729.6484
#> [153] 730.3384 731.6537 731.7391 734.8119 739.5078 751.6125 773.3274 773.5940
#> [161] 775.8295 778.8378 780.8078 789.5260 790.5883 791.8441 794.6305 794.8123
#> [169] 796.5418 802.5006 802.7995 806.8262 825.8384 840.5683 846.8256 861.8356
#> [177] 868.4448 884.4968 896.7943 897.2942 907.8274 920.4687 925.7373 941.7605
#> [185] 943.7976 947.7443 952.1496 968.2956 971.8036 975.8147 984.4307 984.7703
#> [193] 985.7859 990.7855 997.4754
std$rt[index==1]
#>   [1]   48.8480 1079.4300  511.2940  511.2940   86.3490   85.3855  611.4110
#>   [8]  785.8260  732.5740  227.6550  144.3705  470.3640  154.9830  430.6780
#>  [15]  511.2940  511.2940  163.3250  611.8420  639.1000  337.3920  216.6930
#>  [22]  452.9630  170.8240  213.8030  170.0505  639.3150  144.0400  880.3730
#>  [29]  809.1845  611.4110  415.6790  645.4960  605.8400  678.3130   76.7060
#>  [36]  588.0530  146.3970  144.0390  890.9790  464.0430  612.6975  669.4930
#>  [43]  841.0510  781.0040  785.9345  336.5620  559.5550  664.9220  576.6950
#>  [50]  617.8400  639.1000  170.3910  614.1970  631.8765  631.8770  213.7720
#>  [57]  145.4960  212.7520  145.4960  636.9570  647.5035  596.4120  509.3650
#>  [64]  622.7680  639.1000  658.6000  146.3950  169.3825  626.0920  582.4830
#>  [71]  594.4830  639.3150  213.7505  642.2065  143.9315  639.0990  656.2235
#>  [78]  382.6770  551.8380  582.4840  644.6705  383.1060  665.0280  659.6730
#>  [85]  682.3135  551.4100  594.5915  632.8410  645.9050  585.8040  645.9560
#>  [92]  611.4130  482.4185  621.1625  819.1920  704.6740  213.7270  507.4370
#>  [99]  404.5340  504.0090  491.1510  717.3170  513.6510  540.1180  862.5875
#> [106]  511.5080  762.5750  422.9630  215.7020  495.3295  515.3650  688.6000
#> [113]  537.2240  213.5300  639.2075  639.0990  762.5740  705.4235  762.7890
#> [120]  546.8030  762.3630  439.4630  762.5770  762.3630  214.5700  534.0090
#> [127]  582.6950  583.7680  762.3610  762.4690  576.8045  215.7865  548.6240
#> [134]  530.7950  455.1500  455.1500  818.9800  818.9770  800.0760  214.3560
#> [141]  468.4360  468.4360  215.9255  528.2230  639.1000  213.5480  213.9270
#> [148]  481.0780  619.9850  215.2290  481.0790  594.5905  213.3840  595.1270
#> [155]  659.2430  216.7260  480.6530  624.2705  370.5620  624.2695  214.1480
#> [162]  214.5700  214.7290  730.8590  492.8650  213.2940  613.5560  215.7855
#> [169]  519.8660  519.6610  216.6375  213.6270  213.4215  517.5660  214.4150
#> [176]  213.3840  493.9370  213.3590  215.4870  215.8710  213.7720  497.7940
#> [183]  636.9560  636.9310  214.0705  800.0770  213.3340  213.5120  213.5485
#> [190]  213.7270  386.1965  215.0690  215.0680  215.2260  213.4215

Shiny application

An interactive document has been included in this package to perform PMD analysis. You need to prepare a csv file with m/z and retention time of peaks. Such csv file could be generated by run enviGCMS::getcsv() on the list object from enviGCMS::getmzrt(xset) function. The xset should be XCMSnExp object or xcmsSet object. You could also generate the csv file by enviGCMS::getmzrt(xset,name = 'test'). You will find the csv file in the working dictionary named test.csv.

Then you could run runPMD() to start the Graphical user interface(GUI) for GlobalStd algorithm and structure/reaction directed analysis.

Conclusion

pmd package could be used to reduce the redundancy peaks for GC/LC-MS based research and perform structure/reaction directed analysis to screen known and unknown important compounds or reactions.

Miao Yu

2025-05-06

Introduction of Paired Mass Distance analysis

PMD from the same compound

PMD from different compounds

Data format

GlobalStd algorithm

STEP1: Retention time hierarchical clustering

STEP2: Relationship among adducts, neutral loss, isotopologues and common fragments ions

STEP3: Screen the independent peaks

Extra filter with correlation coefficient cutoff

Validation by principal components analysis(PCA)

Comparison with other pseudo spectra extraction method

Structure/Reaction directed analysis

Extra filter with correlation coefficient cutoff

Structure/reaction directed analysis for peaks/compounds only data

Wrap function for GlobalStd algorithm

Use independent peaks for MS/MS validation (PMDDA)

Shiny application

Conclusion