RNA-sequencing: data analysis with R
This comprehensive, hands-on course provides training on the bioinformatics of gene expression analysis, focusing on bulk RNA-sequencing data using R, R Studio, and various R packages. Additionally, the course introduces foundational concepts of single-cell RNA-seq analysis. All instructors are practicing bioinformaticians and lecturers at training courses of NGO Genomics UA. Each session includes an introductory lecture followed by a practical tutorial. During tutorials, students will engage live in data analysis under the guidance of an instructor and participate in interactive assessments.
The course begins with an R boot camp to introduce newcomers to the R programming language and R Studio, while also serving as a refresher for those already familiar with R. Next, the basics of expression analysis, advanced R techniques, differential expression analysis using DESeq2, data visualization, functional analysis, and single-cell expression analysis are discussed. A final assessment in the form of multiple-choice questions will conclude the course, requiring a minimum pass rate of 70% on each test for successful completion. Besides, the course features a team project where groups of 2-3 students analyze expression data from a publicly available dataset, starting from a count matrix. Two final sessions focus on student presentations and feedback from the instructors. These mini-projects are not graded but successful teams will receive additional certificates from NGO Genomics UA, signed by the instructors.
All instruction will be delivered in Ukrainian, but as the bioinformatics packages are documented in English and the field uses established English terminology, the necessary background will be provided in each session.
Navigating R studio, introducing R script structure. The notion of data classes & types. Basic code writing, including syntax, data slicing, substitution, and transformation. Application of loops, branches (if-else), and creation of functions.
Loading and exporting files. Illustrating the essential steps in exploring the data. A brief comparison of basic R and Tidyverse. Data visualization and interpretation of the graphs.
Introduction to gene expression analysis and RNA-seq data. Experiment planning and design. Raw data. Metadata. Types and methods of normalization. P-values and multiple testing corrections. Negative binomial distribution and expression data.
Introduction of the main dataset used in the course. Walk through the main stages and scripts of the expression analysis using the dataset
Normality test, data transformation, test for homoscedasticity. Understanding covariance and correlation. hypothesis testing (t-test, ANOVA)
Gene pre-processing and annotation, exploratory analysis including dimensionality reduction (PCA, t-SNE), normalization effect on dimensionality reduction
DESeq2 test selection, definition of differentially expressed genes thresholds, pairwise testing, multiple group comparison, results interpretation, differential expression tools beyond DESeq2
Introduction to data visualization. ggplot2. Basic plots: scatterplot, density plot, histogram. Defining mean, median, and mode. Selection and application of parametric and non-parametric tests. Box and violin plots. Principal component analysis and t-SNE.
Venn diagrams and upSet plots. Volcano plots. Heatmaps. Interactive plots. Rmarkdown best practices. Embedding tables and files. Bulk RNA-Seq report structure.
Functional approaches: over-representation, gene set enrichment, pathway analysis. Working with GO, Reactome, and KEGG databases
Basics of graph theory and network analysis. Genes co-expression, transcriptional regulation, identification of hub genes, and interpretation
Introduction to Single Cell RNA-seq analysis using Seurat and PBMC dataset: normalization, clustering, differential expression between clusters and between conditions
Single Cell RNA-seq analysis: visualizations. Methods of deconvolution of expression signatures
Presentation of students’ projects and instructors' feedback