Data analysis of GC-MS and LC-MS in Environmental Science
Miao Yu
2025-01-14
Source:vignettes/GCMSDA.Rmd
GCMSDA.RmdQualitative and quantitative analysis of contaminants are the core of
the Environmental Science. GC/LC-MS might be one of the most popular
instruments for such analytical methods. Previous works such as
xcms were developed for GC-MS data. However, such packages
have limited functions for environmental analysis. In this package. I
added functions for various GC/LC-MS data analysis purposes used in
environmental analysis. Such feature could not only reveal certain
problems, but also help the user find out the unknown patterns in the
dataset of GC/LC-MS.
Data analysis for single sample
Data structure of GC/LC-MS full scan mode
GC/LC is used for separation and MS is used for detection in a
GC/LC-MS system. The collected data are intensities of certain mass at
different retention time. When we perform analysis on certain column in
full scan mode, the counts of different mass were collected in each
scan. The dwell time for each scan might only last for 500ms or less.
Then the next scan begins with a different retention time. Here we could
use a matrix to stand for those data. Each column stands for each mass
and row stands for the retention time of that scan. Such matrix could be
treated as time series data. In this package, we treat such data as
matrix type.
For high-resolution MS, building such matrix is tricky. We might need
to bin the RAW data to make alignment for different scans into a matrix.
Such works could be done by xcms.
Data structure of GC-MS SIM mode
When you perform a selected ions monitor(SIM) mode analysis, only few
mass data were collected and each mass would have counts and retention
time as a time series data. In this package, we treat such data as
data.frame type.
Data input
Since version 0.7.4, those functions have been removed from the package.
You could use getmd to import the mass spectrum data as
supported by xcms and get the profile of GC-MS data matrix.
mzstep is the bin step for mass:
data <- enviGCMS:::getmd('data/data1.CDF', mzstep = 0.1)
You could also subset the data by the mass(m/z 100-1000) or retention
time range(40-100s) in getmd function:
data <- enviGCMS:::getmd(data,mzrange=c(100,1000),rtrange=c(40,100))
You could also combined the mass full-scan data with the same range
of retention time by cbmd:
data <- cbmd(data1,data2,data3)
Visualization of mass spectrum data
You could plot the Total Ion Chromatogram(TIC) for certain RT and mass range.
plottic(data,rt=c(3.1,25),ms=c(100,1000))
You could use plotms or plotmz to show the
heatmap or scatter plot for LC/GC-MS data, which is very useful for
exploratory data analysis.
plotms(data)
plotmz(data)
You could change the retention time into the temperature if it is a constant speed of temperature rising process. But you need show the temperature range.
plott(data,temp = c(100,320))
Data analysis for influence from GC-MS
enviGCMS supplied many functions for decreasing the
noise during the analysis process. findline could be used
for find line of the boundary regression model for noise.
comparems could be used to make a point-to-point data
subtraction of two full-scan mass spectrum data. plotgroup
could be used convert the data matrix into a 0-1 heatmap according to
threshold. plotsub could be used to show the self
background subtraction of full-scan data. plotsms shows the
RSD of the intensity of full scan data. plothist could be
used to find the data distribution of the histogram of the intensities
of full scan data.
Data analysis for molecular isotope ratio
Some functions could be used to calculate the molecular isotope
ratio. EIC data could be import into GetIntergration and
return the information of found peaks. Getisotoplogues
could be used to calculate the molecular isotope ratio of certain
molecular. Some shortcut function such as batch and
qbatch could be used to calculate molecular isotope ratio
for multiple and single molecular in EIC data.
Quantitative analysis for short-chain chlorinated paraffins(SCCPs)
enviGCMS supply function to perform Quantitative
analysis for short-chain chlorinated paraffins(SCCPs) with Q-tof data.
Use getsccp to make Quantitative analysis for SCCPs.
If you want a graphical user interface for SCCPs analysis, a shiny
application is developed in this package. You could use
runsccp() to power on the application in a browser.
Data analysis for multiple samples
In environmental non-target analysis, when multiple samples are
collected, problem will raise from the heterogeneity among samples. For
example, retention time would shift due to the column. In those cases,
xcms package could be used to get a peaks list across
samples within certain retention time and m/z. enviGCMS
package has some wrapped function to get the peaks list. Besides, some
specific functions such as group comparison, batch correction and
visualization are also included.
Wrap function for xcms package
Since version 0.7.4, those functions have been removed from the package.
getdatacould be used to get thexcmsSetobject in one step with optimized methodsgetdata2could be used to get theXCMSnExpobject in one step with optimized methodsgetmzrtcould get a list asmzrtobject with peaks list, mz, retention time and class of samples fromxcmsSet/XCMSnExpobject. You could also save relatedxcmsSetandxcmsEICobject for further analysis. It also support to output the file for metaboanalyst, xMSannotator, Mummichog pathway analysis and paired mass distance(PMD) analysis.getmzrtcsvcould read in the csv files and return a list for peaks list asmzrtobject
Data imputation and filtering
getimputationcould impute NA in the peaks list.getfiltercould filter the data based on row and column index.getdoecould filter the data based on rsd and intensity and generate group mean, standard deviation, and group rsd.getpowercould compute the power for known peaks list or the sample numbers for each peakgetoverlappeakcould get the overlap peaks by mass and retention time range for two different list objects
Visualization of peaks list data
plotmrcould plot the scatter plot for peaks list with thresholdplotmrccould plot the differences as scatter plot for peaks list with threshold between two group of dataplotrsdcould plot the rsd influences of data in different groupsgifmrcould plot scatter plot for peaks list and output gif file for multiple groupsplotpcacould plot pca score plotplothmcould plot heatmapplotdencould plot density of multiple samplesplotrlacould plot Relative Log Abundance (RLA) plotsplotridgescould plot Relative Log Abundance Ridge (RLAR) plots