diff --git a/README.md b/README.md index 032029158c6e02cedcfc9df7492c10b9245d2e5f..cba26069829914736a5a2c2c94eb15a526da9443 100644 --- a/README.md +++ b/README.md @@ -6,8 +6,27 @@ The EMOTE-tk R library focuses on exact 5'-end and/or 3'-end quantification of RNAs *via* targeted RNA-Seq. +\ +**Figure 1: Overview schema of the features provided by the EMOTE-tk R library regrouped by +phases of an RNA-end sequencing project.** EMOTE-tk provides functions (A) for FASTQ reads pre- +processing, manipulation and validation based on a read_features table describing the structure of +sequencing reads; (B) for aligning reads on reference sequences and compute abundances at the +single nucleotide resolution to obtain (C) a count table representing the abundances (count) of RNA +3' or 5' ends mapped to a reference sequence (rname) at a certain location (position) on a strand. In +this example, the reads contain UMI extracted in the pre-processing step that are used at this step +to remove PCR amplification duplicates introduced in the library preparation. (D) the count table +can be further manipulated to apply some normalization or filtering to prepare the downstream +analyses. (E) EMOTE-tk provides functions for downstream analysis of the count table for TSS +identification, analysis of RNA-ends species proportions (e.g. 5'P/5’PPP in WT vs. mutant strain), +periodic signal analysis such as ribosome translation speed profiling, or the exploration of 3' end +modifications. (F) EMOTE-tk count table can also be analyzed further by other libraries: we illustrate +this by a differential expression analysis to identify endoribonucleolytic cleavage sites by comparing +abundances between WT and endoRNase deletion mutant strains. + Features: -* Pre-processing raw reads fastq file + +* [Pre-processing raw reads fastq file](scripts/Pre-processing.ipynb) * Reads parsing and validation * Reads feature extraction * Demultiplexing into separate fastq files @@ -22,6 +41,7 @@ Features: * Differential proportion statistical analysis Applications: + * TSS identification → [notebook](scripts/TSS.identification.Prados-2016.ipynb) * Endoribonucleolytic cleavage sites identification → [notebook](scripts/Endoribonucleolytic.cleavage.sites.identification.Broglia-2020.ipynb) * co-translational degradation → [notebook](scripts/Co-translational.exoribonucleolytic.degradation.Khemici-2015.ipynb) diff --git a/img/EMOTE-tk.topics.png b/img/EMOTE-tk.topics.png new file mode 100644 index 0000000000000000000000000000000000000000..7e5e2fa711a4e40c122e94c0f3c591f2011f5333 Binary files /dev/null and b/img/EMOTE-tk.topics.png differ diff --git a/scripts/Post-transcriptional.modifications.and.differential.proportion.analysis.Xu-2023.ipynb b/scripts/Post-transcriptional.modifications.and.differential.proportion.analysis.Xu-2023.ipynb index ed766ce31cf8de34be0752377725dab049acdb66..89edda8b6e507ec25a0132bfa293015ac47dadec 100644 --- a/scripts/Post-transcriptional.modifications.and.differential.proportion.analysis.Xu-2023.ipynb +++ b/scripts/Post-transcriptional.modifications.and.differential.proportion.analysis.Xu-2023.ipynb @@ -4,19 +4,19 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# *EMOTE-tk* tutorial on 3'-end post transcriptional modifications exploration and differential proportion analysis\n", + "# *EMOTE-tk* notebook on 3'-end post-transcriptional modifications exploration and differential proportion analysis\n", "\n", "From \"MenT nucleotidyltransferase toxins extend tRNA acceptor stems and can be inhibited by asymmetrical antitoxin binding\" by Xu *et al.*, Nature commun. 2023. https://doi.org/10.1038/s41467-023-40264-3\n", "\n", "## Download dataset\n", "\n", - "Data accompanying the article publicly available on European Nucleotide Archive (ENA) at EBI https://www.ebi.ac.uk/ena/browser/view/PRJEB62085\n", + "Data accompanying the article are publicly available on European Nucleotide Archive (ENA) at EBI https://www.ebi.ac.uk/ena/browser/view/PRJEB62085\n", "\n", - "For this notebook, we will use some RNA-Seq samples multiplexed into one raw fastq file to reproduce results on the action of toxin MenT1 which adds ribonucleotides at the 3'-end of tRNAs as illustrated in [figure 3E of the original paper](https://www.nature.com/articles/s41467-023-40264-3/figures/3).\n", + "For this notebook, we will use some RNA-Seq samples multiplexed into one raw FASTQ file to reproduce some results on the action of toxin MenT1 which adds ribonucleotides at the 3'-end of tRNAs as illustrated in [figure 3E of the original paper](https://www.nature.com/articles/s41467-023-40264-3/figures/3).\n", "\n", - "The dataset will be directly downloaded in the `../data/MenT1` directory of the project (that should be readily available if the GitLab project is cloned).\n", + "The dataset will be directly downloaded in the [../data/MenT1](../data/MenT1) directory of the project (that should be readily available if the GitLab project is cloned).\n", "\n", - "The intial `data/MenT1` directory contains also the reference sequences of the tRNAs (multifasta file) to which we will align the 3'-end sequencing reads: `Msm.reference.tRNAs.with.CCA.fasta`" + "The initial [../data/MenT1](../data/MenT1) directory contains also the reference sequences of the tRNAs (multifasta file) to which we will align the 3'-end sequencing reads: [Msm.reference.tRNAs.with.CCA.fasta](../data/MenT1/Msm.reference.tRNAs.with.CCA.fasta)." ] }, { @@ -29,11 +29,14 @@ }, "outputs": [], "source": [ + "options(width = 250)\n", + "options(crayon.enabled = FALSE)\n", + "\n", "data_dir <- \"../data/MenT1\"\n", "\n", "raw_fastq <- paste0(data_dir, \"/Msm.total.RNA.MenT1.Library.fq.gz\")\n", "\n", - "if(!file.exists(raw_fastq))\n", + "if (!file.exists(raw_fastq))\n", " download.file(\"ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR114/ERR11439143/Msm.total.RNA.MenT1.Library.fq.gz\",\n", " destfile=\"../data/MenT1/Msm.total.RNA.MenT1.Library.fq.gz\")" ] @@ -59,16 +62,16 @@ "name": "stderr", "output_type": "stream", "text": [ - "── \u001b[1mAttaching core tidyverse packages\u001b[22m ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──\n", - "\u001b[32m✔\u001b[39m \u001b[34mdplyr \u001b[39m 1.1.4 \u001b[32m✔\u001b[39m \u001b[34mreadr \u001b[39m 2.1.5\n", - "\u001b[32m✔\u001b[39m \u001b[34mforcats \u001b[39m 1.0.0 \u001b[32m✔\u001b[39m \u001b[34mstringr \u001b[39m 1.5.1\n", - "\u001b[32m✔\u001b[39m \u001b[34mggplot2 \u001b[39m 3.5.0 \u001b[32m✔\u001b[39m \u001b[34mtibble \u001b[39m 3.2.1\n", - "\u001b[32m✔\u001b[39m \u001b[34mlubridate\u001b[39m 1.9.3 \u001b[32m✔\u001b[39m \u001b[34mtidyr \u001b[39m 1.3.1\n", - "\u001b[32m✔\u001b[39m \u001b[34mpurrr \u001b[39m 1.0.2 \n", - "── \u001b[1mConflicts\u001b[22m ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──\n", - "\u001b[31m✖\u001b[39m \u001b[34mdplyr\u001b[39m::\u001b[32mfilter()\u001b[39m masks \u001b[34mstats\u001b[39m::filter()\n", - "\u001b[31m✖\u001b[39m \u001b[34mdplyr\u001b[39m::\u001b[32mlag()\u001b[39m masks \u001b[34mstats\u001b[39m::lag()\n", - "\u001b[36mℹ\u001b[39m Use the conflicted package (\u001b[3m\u001b[34m<http://conflicted.r-lib.org/>\u001b[39m\u001b[23m) to force all conflicts to become errors\n", + "── Attaching core tidyverse packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──\n", + "✔ dplyr 1.1.4 ✔ readr 2.1.5\n", + "✔ forcats 1.0.0 ✔ stringr 1.5.1\n", + "✔ ggplot2 3.4.4 ✔ tibble 3.2.1\n", + "✔ lubridate 1.9.3 ✔ tidyr 1.3.0\n", + "✔ purrr 1.0.2 \n", + "── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──\n", + "✖ dplyr::filter() masks stats::filter()\n", + "✖ dplyr::lag() masks stats::lag()\n", + "ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors\n", "Loading required package: GenomeInfoDb\n", "\n", "Loading required package: BiocGenerics\n", @@ -94,12 +97,8 @@ "\n", "The following objects are masked from ‘package:base’:\n", "\n", - " anyDuplicated, aperm, append, as.data.frame, basename, cbind,\n", - " colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,\n", - " get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,\n", - " match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,\n", - " Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,\n", - " table, tapply, union, unique, unsplit, which.max, which.min\n", + " anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste,\n", + " pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max, which.min\n", "\n", "\n", "Loading required package: S4Vectors\n", @@ -205,20 +204,10 @@ "\n", "The following objects are masked from ‘package:matrixStats’:\n", "\n", - " colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,\n", - " colCounts, colCummaxs, colCummins, colCumprods, colCumsums,\n", - " colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,\n", - " colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,\n", - " colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,\n", - " colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,\n", - " colWeightedMeans, colWeightedMedians, colWeightedSds,\n", - " colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,\n", - " rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,\n", - " rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,\n", - " rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,\n", - " rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,\n", - " rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,\n", - " rowWeightedMads, rowWeightedMeans, rowWeightedMedians,\n", + " colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins, colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs, colMads, colMaxs, colMeans2, colMedians,\n", + " colMins, colOrderStats, colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds, colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads, colWeightedMeans, colWeightedMedians, colWeightedSds,\n", + " colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet, rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods, rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs,\n", + " rowMeans2, rowMedians, rowMins, rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks, rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, rowWeightedMads, rowWeightedMeans, rowWeightedMedians,\n", " rowWeightedSds, rowWeightedVars\n", "\n", "\n", @@ -226,9 +215,7 @@ "\n", "Welcome to Bioconductor\n", "\n", - " Vignettes contain introductory material; view with\n", - " 'browseVignettes()'. To cite Bioconductor, see\n", - " 'citation(\"Biobase\")', and for packages 'citation(\"pkgname\")'.\n", + " Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation(\"Biobase\")', and for packages 'citation(\"pkgname\")'.\n", "\n", "\n", "\n", @@ -294,8 +281,7 @@ } ], "source": [ - "source(\"../src/emote-tk.R\")\n", - "options(width = 250)" + "source(\"../src/emote-tk.R\")" ] }, { @@ -304,21 +290,24 @@ "source": [ "## Pre-processing of reads prior mapping\n", "\n", - "In this section, we will parde the raw reads to\n", + "In this section, we will parse the raw reads to:\n", "* validate their content\n", - "* split the fastq into 4 different fastq files, each one of them corresponding to a sample: control without toxin or tRNAs with toxin and replicate 1 or 2\n", + "* split the FASTQ file into 4 different FASTQ files, each one of them corresponding to a sample: control without toxin or tRNAs with toxin and replicate 1 or 2\n", + "\n", + "For more information on the other pre-processing features available, see the notebook on [pre-processing](Pre-processing.ipynb).\n", "\n", "The reads are 100 nt long and their structure can be schematized as follows:\n", "```\n", - "read pos: 1 2 6 25 40 46\n", - " N · multiplexing barcode · recognition sequence · UMI · control sequence · 3'-5' tRNA-end reverse complement\n", - " │ │ │ │ │ └ 55nt long\n", - " │ │ │ │ └ must be AGATAC\n", - " │ │ │ └ 15 random nucleotides (A, T, C or G)\n", - " │ │ └ must be CGGCACCAACCGAGGCTCA\n", - " │ └ must be one of ACAC, ATTG, GACG or GGTA\n", - " │\n", - " └ one random nucleotide at the beginning\n", + "read position: 1 2 6 25 40 46 \n", + " N · multiplexing barcode · recognition sequence · UMI · control sequence · 3'-5' tRNA-end reverse complement \n", + " │ │ │ │ │ └ 55nt long \n", + " │ │ │ │ └ must be AGATAC \n", + " │ │ │ └ 15 random nucleotides (A, T, C or G) serving as \n", + " │ │ │ Unique Molecule Identifier (UMI) for PCR duplicates removal\n", + " │ │ └ must be CGGCACCAACCGAGGCTCA \n", + " │ └ must be one of ACAC, ATTG, GACG or GGTA \n", + " │ \n", + " └ one random nucleotide at the beginning \n", "```\n", "\n", "Here, we define the authorized barcodes and their associated experimental condition (in same order) so that we can reuse them later." @@ -334,7 +323,7 @@ }, "outputs": [], "source": [ - "barcodes <- c(\"ACAC\", \"ATTG\", \"GACG\", \"GGTA\")\n", + "barcodes <- c(\"ACAC\", \"ATTG\", \"GACG\", \"GGTA\")\n", "conditions <- c(\"ctrl.r1\", \"ctrl.r2\", \"MenT1.r1\", \"MenT1.r2\")" ] }, @@ -344,7 +333,7 @@ "source": [ "The above schematized structure is translated to an `EMOTE_read_features` table that will be used for reads parsing and validation.\n", "\n", - "For its intialisation, the user must specify which region on the reads contains the sequence destined to be aligned on the reference sequences." + "For its initialization, the user must specify which region on the reads contains the sequence destined to be aligned on the reference sequences." ] }, { @@ -432,7 +421,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The UMI from position 25 to 39 are composed of A, T, C or G, thus is a feature of type 1 which is a string composed of a certain alphabet. This feature will be extracted and stored in the read identifier for future uses in the quantification step, thus the `readid_prepend = TRUE` argument)." + "The UMI from position 25 to 39 are composed of A, T, C or G, thus is a feature of type 1 which is a string composed of a certain alphabet. This feature will be extracted and stored in the read identifier for future use in the quantification step, thus the `readid_prepend = TRUE` argument)." ] }, { @@ -458,7 +447,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Then, the recognition sequence and the control sequence correspond to constant character strings and will be used to validate the reads. These are feature of type 2." + "Then, the recognition sequence and the control sequence correspond to constant character strings and will be used to validate the reads. These are features of type 2." ] }, { @@ -551,11 +540,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Once the `EMOTE_parse_read` table is defined, parsing of read can be performed. As we need also to demultiplex the samples here, we will directly call `EMOTE_demultiplex` which will call the `EMOTE_parse_read` internally to parse and validate the reads.\n", + "Once the `EMOTE_features` table is defined, parsing of read can be performed. As we need also to demultiplex the samples here, we will directly call `EMOTE_demultiplex` which will call the `EMOTE_parse_read` internally to parse and validate the reads.\n", "\n", - "## Demultiplex original raw fastq file(s) into separate fastq files\n", + "## Demultiplex original raw FASTQ file(s) into separate FASTQ files\n", "\n", - "Reads are validated against the read features and only valid reads are found in the demultiplexed fastq files." + "Reads are validated against the read features and only valid reads are found in the demultiplexed FASTQ files." ] }, { @@ -672,7 +661,7 @@ "|| Repeat threshold : 100 repeats ||\n", "|| Gapped index : no ||\n", "|| ||\n", - "|| Free / total memory : 11.5GB / 31.3GB ||\n", + "|| Free / total memory : 21.5GB / 62.5GB ||\n", "|| ||\n", "|| Input files : 1 file in total ||\n", "|| o Msm.reference.tRNAs.with.CCA.fasta ||\n", @@ -700,7 +689,7 @@ "|| [ 90.0% finished ] ||\n", "|| [ 100.0% finished ] ||\n", "|| ||\n", - "|| Total running time: 0.1 minutes. ||\n", + "|| Total running time: 0.2 minutes. ||\n", "||Index ../data/MenT1/Msm.reference.tRNAs.with.CCA.fasta.subread/index w ... ||\n", "|| ||\n", "\\\\============================================================================//\n", @@ -712,8 +701,9 @@ "reference_genome <- \"Msm.reference.tRNAs.with.CCA.fasta\"\n", "reference_fasta <- paste0(data_dir, \"/\", reference_genome)\n", "reference_index <- paste0(data_dir, \"/\", reference_genome, \".subread\")\n", - "dir.create(reference_index, showWarnings=FALSE)\n", - "buildindex(basename = paste0(reference_index, \"/index\"), reference = reference_fasta)" + "dir.create(reference_index, showWarnings = FALSE)\n", + "buildindex(basename = paste0(reference_index, \"/index\"), \n", + " reference = reference_fasta)" ] }, { @@ -743,14 +733,14 @@ " reference_genome)\n", "\n", "# perform mapping\n", - "bam_files = mclapply(\n", + "bam_files <- mclapply(\n", " fastq_files,\n", " EMOTE_map_fastq,\n", " fasta_name = reference_fasta,\n", " index_name = reference_index,\n", " max_map_mismatch = max_mismatch,\n", " aligner = aligner,\n", - " #force = TRUE,\n", + " force = FALSE,\n", " bam_dir = bam_dir,\n", " threads = 4,\n", " mc.cores = 4\n", @@ -763,7 +753,7 @@ "source": [ "## Convert bam files to count table with EMOTE_quantify_3prime\n", "\n", - "The `EMOTE_quantify_3prime` is called on each BAM file. The `read_features` table is passed to take into account the UMIs that were present in the reads to remove duplicates." + "The `EMOTE_quantify_3prime` is called on each BAM file. The `EMOTE_features` table is passed to take into account the UMIs that were present in the reads to remove duplicates." ] }, { @@ -839,13 +829,13 @@ ], "source": [ "count_table <- EMOTE_quantify_from_bam_files(bam_files,\n", - " conditions,\n", - " EMOTE_quantify_3prime,\n", - " cores = 4,\n", - " features = read_features,\n", - " min_mapped_length = 20,\n", - " rev_comp_reads = TRUE,\n", - " keep_UMI = FALSE)\n", + " conditions,\n", + " EMOTE_quantify_3prime,\n", + " cores = 4,\n", + " features = read_features,\n", + " min_mapped_length = 20,\n", + " rev_comp_reads = TRUE,\n", + " keep_UMI = FALSE)\n", "count_table %>%\n", " head()" ] @@ -856,7 +846,7 @@ "source": [ "## Downstream analysis: 3'-end post-transciptional modifications exploration\n", "\n", - "Pivot count_table to have counts for each sample for the same 3'-end. For noise reduction, we will only keep rows for which there are at least 10 reads in one sample." + "Pivot `count_table` to have counts for each sample for the same 3'-end. For noise reduction, we will only keep rows for which there are at least 10 reads in one sample." ] }, { @@ -933,7 +923,7 @@ "source": [ "count_table_w <- count_table %>%\n", " select(-reads) %>%\n", - " pivot_wider(names_from = \"barcode\", values_from=\"count\", values_fill = 0) %>%\n", + " pivot_wider(names_from = \"barcode\", values_from = \"count\", values_fill = 0) %>%\n", " filter(if_any(all_of(conditions), ~ . >= 10))\n", "count_table_w %>%\n", " head()" @@ -949,7 +939,7 @@ "source": [ "Next, we are going to focus on full length tRNAs. Thus, we will need their length.\n", "\n", - "tRNAs with expected length from genome reference fasta file:" + "tRNAs with expected length from reference sequences FASTA file:" ] }, { @@ -1024,9 +1014,11 @@ } ], "source": [ - "tRNAs_DNASS <- readDNAStringSet( paste0(data_dir, \"/\", reference_genome))\n", - "tRNAs <- tibble(rlength=width(tRNAs_DNASS), seq=as.character(tRNAs_DNASS), name=names(tRNAs_DNASS)) %>%\n", - " mutate(id = str_extract(name, 'trna\\\\d+.\\\\S+'),\n", + "tRNAs_DNASS <- readDNAStringSet(paste0(data_dir, \"/\", reference_genome))\n", + "tRNAs <- tibble(rlength = width(tRNAs_DNASS),\n", + " seq = as.character(tRNAs_DNASS),\n", + " name = names(tRNAs_DNASS)) %>%\n", + " mutate(id = str_extract(name, \"trna\\\\d+.\\\\S+\"),\n", " rname = as.factor(str_extract(name, \"^\\\\S+\"))) %>%\n", " select(rname, rlength, seq)\n", "tRNAs %>%\n", @@ -1345,7 +1337,7 @@ "source": [ "## Visualize proportions of post-transcriptional modifications with bar plots\n", "\n", - "Longer format to compute proportions per tRNA per condition" + "Pivot to longer format to compute proportions per tRNA per condition" ] }, { @@ -1472,9 +1464,9 @@ ")\n", "options(repr.plot.width = 7, repr.plot.height = 7)\n", "\n", - "tRNAlabs = count_table_l$rname\n", - "labs = tRNAlabs %>% as.character %>% unique %>% str_sub(5,11)\n", - "names(labs) = tRNAlabs %>% as.character %>% unique \n", + "tRNA_labs = count_table_l$rname\n", + "labs = tRNA_labs %>% as.character %>% unique %>% str_sub(5,11)\n", + "names(labs) = tRNA_labs %>% as.character %>% unique \n", "\n", "count_table_l %>%\n", " mutate(rname = fct_relevel(rname,\n", @@ -1492,7 +1484,7 @@ " theme_classic() +\n", " xlab(NULL) +\n", " ylab(NULL) +\n", - " facet_wrap(facets = . ~ rname, ncol = 6, labeller = labeller(rname = labs)) + \n", + " facet_wrap(facets = . ~ rname, ncol = 6, labeller = labeller(rname = labs)) +\n", " labs(fill = \"3'-end modification\")" ] }, @@ -1506,7 +1498,7 @@ "\n", "For this, we will compare the proportion of modified tRNAs in the conrtol *vs.* the proportion of modified tRNAs in the samples treated with MenT1.\n", "\n", - "In the following, we regroup the modified tRNAs as the \"fraction\" and we sum up unmodfied and modified tRNAs in the \"total\" as expected by `EMOTE_differential_proportion`." + "In the following, we regroup the modified tRNAs as the \"fraction\" and we sum up unmodified and modified tRNAs in the \"total\" as expected by `EMOTE_differential_proportion` function." ] }, { @@ -1801,14 +1793,14 @@ } ], "source": [ - "proportion_table <- count_table_l %>% \n", + "proportion_table <- count_table_l %>%\n", " select(-rlength) %>%\n", - " mutate(count_of = ifelse(downstream_seq==\"\", \"total\", \"fraction\")) %>%\n", + " mutate(count_of = ifelse(downstream_seq == \"\", \"total\", \"fraction\")) %>%\n", " group_by(rname, strand, position, condition, count_of) %>%\n", " summarise(count = sum(count), .groups = \"keep\") %>%\n", " separate(condition, into = c(\"group\", \"replicate\")) %>%\n", - " pivot_wider(names_from=\"count_of\", values_from=\"count\", values_fill=0) %>%\n", - " mutate(total = total + fraction)%>% # for dss which expects total and fraction\n", + " pivot_wider(names_from = \"count_of\", values_from = \"count\", values_fill=0) %>%\n", + " mutate(total = total + fraction) %>% # for dss which expects total and fraction\n", " pivot_longer(cols = c(\"fraction\", \"total\"),\n", " names_to = \"count_of\",\n", " values_to = \"count\") %>%\n", @@ -1821,7 +1813,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Then, we need a `design_matrix` to specify the group and replicate for each `sample_id` and also if the counts in the `count_table` corresponds to the fractin or the total." + "Then, we need a `design_matrix` to specify the group and replicate for each `sample_id` and also if the counts in the `count_table` corresponds to the fraction or the total." ] }, { @@ -1905,9 +1897,9 @@ "source": [ "design_matrix <- tibble(sample_id = unique(sort(proportion_table$sample_id))) %>%\n", " separate(sample_id,\n", - " sep=\"\\\\.\",\n", - " into=c(\"group\", \"replicate\", \"count_of\"),\n", - " remove = F)\n", + " sep = \"\\\\.\",\n", + " into = c(\"group\", \"replicate\", \"count_of\"),\n", + " remove = FALSE)\n", "design_matrix" ] }, @@ -2124,22 +2116,11 @@ ], "source": [ "diff_prop <- EMOTE_differential_proportion(proportion_table,\n", - " design_matrix,\n", - " group1 = \"ctrl\",\n", - " group2 = \"MenT1\")\n", + " design_matrix,\n", + " group1 = \"ctrl\",\n", + " group2 = \"MenT1\")\n", "diff_prop" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "vscode": { - "languageId": "r" - } - }, - "outputs": [], - "source": [] } ], "metadata": {