Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
<!-- README.md is generated from README.Rmd. Please edit that file -->
# provoc <a href='https://github.com/JHuguenin/provoc'><img src="https://raw.githubusercontent.com/JHuguenin/provoc/master/inst/img/imgfile.png" align="right" height="138"/></a>
<!-- badges: start -->
[](https://zenodo.org/badge/latestdoi/425788381)
[](https://github.com/JHuguenin/provoc/actions)
[](https://www.tidyverse.org/lifecycle/#experimental)
[](https://cran.r-project.org/package=provoc)
<!-- [](https://codecov.io/gh/mitchelloharawild/icon?branch=master) -->
<!-- badges: end -->
**P**erform a **R**apid **O**verview for **V**olatile **O**rganic
**C**ompounds.
The `provoc` package has been developed to support PTR-ToF-MS users in
their analyses. It has been designed for a quick import of data into R
and visualization of the first results in a few minutes. It
automatically detects peaks and provides a matrix for further analysis.
Some chemometrics functions are proposed.
It is still a young and wild package that will appreciate feedback and
new ideas for its development. Do not hesitate to contact the author to
get or provide help.
For cite this package :
- Joris Huguenin, UMR 5175 CEFE, CNRS, University of Montpellier.
provoc: analyze data of VOC by PTR-ToF-MS Vocus. DOI :
10.5281/zenodo.6642830.
# Installation
The **development** version can be installed from GitHub using:
``` r
# install.packages("remotes")
remotes::install_github("jhuguenin/provoc")
```
The package requires the update of many dependencies:
- `baseline`(>= 1.3.0)
- `dygraphs`(>= 1.1.0)
- `graphics`(>= 4.0.0)
- `grDevices`(>= 4.0.0)
- `Iso`(>= 0.0-18.1)
- `magrittr`(>= 2.0.0)
- `MALDIquant`(>= 1.19.0)
- `nnls`(>= 1.4)
- `rhdf5`(>= 2.34.0)
- `rmarkdown`(>= 2.11.0)
- `scales`(>= 1.1.0)
- `stats`(>= 4.0.0)
- `stringr`(>= 1.4.0)
- `viridis`(>= 0.6.0)
- `usethis`(>= 2.1.0)
- `utils`(>= 4.0.0)
- `xts`(>= 0.12.0)
# Description of function
## Principals functions
- **import.h5** for import you data. Easy.
- **import.meta** custom your analysis plans.
- **dy.spectra** & **fx.spectra** look at your dynamic or fixed
spectra.
- **kinetic.plot** look the kinetic of your VOCs.
- **mcr.voc** perform a MCR analysis on your data.
## Secondary functions
### Manage your data
- if you can’t import your files, look them with **info.h5**. After
that, delete a corrupted spectra with **delete.spectra.h5**.
- peek your meta data with **meta.ctrl** and the peak alignment with
**peak.ctrl**.
- manage the time of your data with **re.calc.T.para** or revert to
original data by **re.init.T.para**.
- create a new file “meta\_empty.csv” : **empty.meta**.
### Find index or position
- **ind.acq** return index of spectra for each acquistion.
- **ind.pk** return index of peaks.
- **det.c** return the index more closed of a decimal number in the
numeric vector.
- **M.Z** search all peak in accord to a mass number
- **M.Z.max** search the highest peak in accord to a mass number
# Usage
``` r
library(provoc)
```
## Importation
Before importing, all h5 files must be placed in a directory named `h5`.
This directory must be placed in the working directory. The name of the
h5 files placed in the directory may contain the date and time of
recording in the form \_yyymmdd\_hhmmss. This information will be
removed during import. e.g. `00_file_PTR_ToF_MS_20210901_093055.h5` will
be renamed `00_file_PTR_ToF_MS` by the import.
Each file is an `acquisition` with several `spectra`.
``` r
# working directory
wd <- "~/R/data_test/miscalenous" # without final "/"
setwd(wd) # If you you don't work by project.
# + wd/
# | \- h5/
# | \- 00_file_PTR_ToF_MS.h5
# | \- 01_file_PTR_ToF_MS.h5
# | \- 02_file_PTR_ToF_MS.h5
# import
sp <- import.h5(wd) # or just sp <- import.h5() if you work by project.
```
The `import.h5()` function automatically creates a directory named
“Figures”, a csv file with the meta data “meta\_empty.csv” and a list
sp. This list contains :
- `MS` : a big matrix with all data.
- `peaks` : a short matrix with peak intensity.
- `xMS` : vector for the MS abscissa.
- `names` : folder names use in this meta set.
- `wd` : the working directory.
- `acq` : ID use in this meat seta.
- `nbr_sp` : the number of spectra for each acquisition.
- `names_acq` : names of spectra.
- `Tinit` : the original date (and time) of spectra.
- `Trecalc` : the recalculated time of spectra (for cumulated several
acquisition).
- `workflow` : the list of each operation in the project.
- `mt` : the R meta data.
- `meta` : the Vocus meta data.
You can control part of the analysis with the “meta\_empty.csv” file. It
is in the form of a table with all the acquisitions imported in rows.
The columns are :
- `names` : names of acquisiton.
- `ID` : identifier number.
- `nbr_MS` : number of spectra in this acquisition.
- `start` : the index where this acquisition begins.
- `end` : the index where this acquisition ends.
- `used` : `TRUE`/`FALSE` selects the acquisitions useful for the
analysis.
- `blank (ID)` : (not available) subtract the blank.
- `color` : specifies a color.
- `concentration` : (not available) specifies a concentration.
- `unit` : (not available) and the unit of concentration.
- `acq_T0 (ID)` : the T0 ID of the sequence.
- `delta_T (s)` : It’s possible to adjust the sequence by a shift time
in second.
- `grp1`, `grp2`, `...` : others free columns for analyzes. Names of
columns can be changed.
You can prepare several meta files. By activating or not the
acquisitions, it is possible to make different analyses. This allows you
to do only one import (often long). You can rename the files meta\_1,
meta\_2 or with more explicit names.
If the import is stopped because of a corrupted file, use `info.h5()`
and `delete.spectra.h5()` to correct this file.
If your h5 files are differents, let me know and I can add an option for
you.
### Optimize the importation
If you want to improve the data import, there are three options to
consider :
- First, a baseline correction step can be added with:
`baseline_correction = TRUE`. When the instrument receives a large
number of molecules with the same mass, the detector fluctuates
slightly which leaves a visible trace on the spectrogram. This
baseline correction is a time consuming but very effective step to
improve the analysis.
- Then, the parameters for detecting and aligning the peaks can be
adjusted according to the experiment. The package proposes some
predefined parameters (see `pk_param`) but the user can modify these
parameters (mainly the size of the width of the peak at half height
“halfWindowSize” and the signal to noise ratio “SNR”). The quality
of these parameters can be visualized thanks to the `ctrl_peak`
argument.
- Finally, the argument `skip = X` allows not to import the first X
spectra of each file. Typically, in a sequence with several
channels, the first spectra will always be polluted by the previous
channel. Not importing them saves RAM and computation time. It also
allows to import longer sequences in one go.
## Preparation
After the import, a csv file was created in the working directory. This
file should look like this:
<img src="https://raw.githubusercontent.com/JHuguenin/provoc/master/inst/img/meta_empty.PNG" align="center" />
It is made to be easily opened and closed by excel in Windows10. If your
default settings do not allow this facility, let me know and I can add
an option for you.
Once opened, you can edit the information inside to fill in different
information such as the color or modality of your samples. Here, the
example shows the first three cycles of a sequence with four samples,
two with modality A, one with modality B and one blank. There are 12
spectra per sample.
With the column acq\_T0, I indicate which acquisitions belong to each
sample. This is useful for the “time” option when producing the graph.
<img src="https://raw.githubusercontent.com/JHuguenin/provoc/master/inst/img/meta_1.PNG" align="center" />
``` r
sp <- import.meta("meta_1") # without '.csv'
```
Then, to refine my analysis, I decided to superimpose the T0 of each
sample to facilitate the comparison. For this, I removed 300 (600 and
900) seconds because each sample is analyzed for 5 minutes. I also
removed my sample 2 with the used column. Finally, I selected only the
last 6 spectra (out of 12) by modifying the start column.
<img src="https://raw.githubusercontent.com/JHuguenin/provoc/master/inst/img/meta_2.PNG" align="center" />
``` r
sp <- import.meta("meta_2")
```
To be able to switch from one graphical representation to another
quickly, I created two files meta\_1.csv and meta\_2.csv that I import
according to my needs.
All operations performed during the analysis are recorded. It is easy to
save this trace. Afterwards, you can restart your workflow automatically
(not available).
``` r
saveRDS(sp$workflow, "workflow.rds")
wf <- readRDS("workflow.rds")
```
After preparing the meta file, you should recalculate the time with
`re.calc.T.para()` if you need to. The other function reinitialize the
time.
``` r
sp <- re.calc.T.para(sp)
sp <- re.init.T.para(sp)
```
Be careful. By default, the “time” option uses a relative T0 from the
first spectrum of each acquisition and the “date” option uses the actual
date and time of each spectrum. Using the acq\_T0 column with the “time”
option allows different acquisitions to be sequenced using the T0 of the
specified acquisition. Using the acq\_T0 column with the “date” option
allows you to overlap acquisitions on the T0 of the specified
acquisition.
The delta\_T column is used to add the specified time (in seconds) to
the acquisition.
## Make a plot
With the following three functions, it is really easy to make graphs to
explore your data.
`dy.spectra` and `fx.spectra` allow you to make figures of the spectra,
respectively dynamically and fixed. You have to fill `sel_sp` with a
numerical vector indicating the numbers of the spectra to use (sp$Sacq).
For `fx.spectra`, pkm and pkM are the min and max limits.
``` r
# a dynamic plot :
dy.spectra(sel_sp = sp$mt$meta[sp$acq,"end"], new_color = FALSE)
# a standart plot :
fx.spectra(sel_sp = sp$mt$meta[sp$acq,"end"], pkm = 137, pkM = 137, leg = "l")
fx.spectra(sel_sp = 1, pkm = 59, pkM = 150)
```
kinetic.plot plots the evolution of the peaks.
- `M_num` : Analyzed masses. M.Z(c(69, 205, 157)), M.Z.max(c(69, 205,
157)) or c(69.055, 205.158, 157.021).
- `each_mass` : make a plot for each masse or not. Logical TRUE or
FALSE.
- `group` : the name of the meta column that categorizes the groups,
or not. e.g. : “grp1” or FALSE.
- `graph_type` : choice “fx” for fixed plot (.tiff) or “dy” for
dynamic plot (.html)
- `Y_exp` : y axe exponential or not. Logical TRUE or FALSE.
- `time_format` : x axe with a time (“time”) or with a date (“date”).
``` r
kinetic.plot(M_num = M.Z.max(c(59, 137)), each_mass = TRUE,
group = "grp1", graph_type = "dy",
Y_exp = FALSE, time_format = "date")
```
## Make a MCR
After performing a univariate analysis, Provoc allows a multivariate
analysis using the MCR algorithm. This technique is detailed in the
article : *Multivariate Curve Resolution (MCR). Solving the mixture
analysis problem*(2014). Anna de Juan, Joaquim Jaumot and Roma Tauler.
<https://doi.org/10.1039/C4AY00571F>
Currently the constraints of the MCR are by default the same as those of
the “alsace” package. They are consistent with an analysis of PTR-ToF-MS
data. The function allows to set the number of components used for the
RCM (ncMCR) and to specify a variable selection (pk\_sel). You can also
specify a column from the meta\_xxx.csv file that you wish to use to
group the spectra according to a modality.
Arguments - `ncMCR` : (integer) number of componant of MCR. - `grp` : a
character string of the group ’s column name. - `pk_sel` : a vector of
selected peaks, or “all”. - `time_format` : a charater string “date” or
“time”. - `Li` : the list with spectra (sp).
``` r
mcr.result <- mcr.voc(ncMCR = 3, grp = "modality", pk_sel = "all",
time_format = "date", Li = sp)
```
## Others functions
The package includes several small utility functions.
Three of them deserve a clear explanation because they allow you to find
the index of peaks or spectra. *ind.acq* allows you to find the spectra
related to an acquisition. *For example, if you have taken a series of
60 acquisitions, each with 8 spectra, ind.acq allows you to locate the 8
spectra of the 42nd acquisition \[ ind.acq(42) \#329 330 331 332 333 334
335 336\].* *ind.pk* works in the same way but on the position of the
peaks. This function should not be confused with *M.Z* (or *M.Z.max*)
which finds the peaks detected during the import.
## And now …
You have everything to work wit provoc. If you have specific needs,
questions or remarks, you can contact me quickly at my email address
(joris.name \[at\] cefe.cnrs.fr).
See you