Introduction to R and RNA sequencing

General description

This course is designed to equip students with a robust skillset in R programming, data analysis, and advanced bioinformatics techniques, particularly in single-cell transcriptomics and spatial proteo-transcriptomic analysis. The curriculum is divided into two interconnected modules, ensuring a seamless progression from general data analysis skills to specialized applications in bioinformatics.

The first module focuses on building essential expertise in R scripting and navigating RStudio. Students will learn to implement best practices for analysis reproducibility within a controlled environment (e.g., with Renv and Conda), engage in data visualization, and produce structured data analysis reports with Rmarkdown and Shiny. Additionally, the module introduces the application of statistical methods in R, basic text analysis, and introductory machine learning techniques, providing a solid groundwork for further exploration in data analysis fields of the student's choice. Instruction is delivered through a combination of lectures and practical hands-on seminars. During these sessions, students participate in live data analysis under the guidance of experienced instructors, engage in interactive assessments, and have ample opportunities to ask questions.

The second module focuses on the bioinformatics of single-cell transcriptomics and spatial proteo-transcriptomic analyses. This advanced segment emphasizes data processing and classical downstream analysis using R and relevant packages, while also introducing key concepts of multiomics analysis, including data integration with scATAC-seq and CITE-seq. Students will gain hands-on experience with spatial biology techniques such as 10x Visium, 10x Visium HD, and Imaging Mass Cytometry, enabling them to evaluate cell identities and marker expressions within tissue microenvironments. The module includes voluntary supervised team capstone projects, preparing students for independent work in real-world settings. Each session comprises the key activity (lecture or seminar), followed by a Q&A, troubleshooting, and discussion round. Panel discussions with multiple instructors are incorporated where applicable, ensuring comprehensive coverage of up-to-date techniques and live data analysis.

Throughout the course, students are divided into small groups to stimulate collaborative learning and are assigned homework to reinforce their knowledge and enhance troubleshooting – however, all the sessions are being held as a common track. Collaboration with groupmates and instructors and the use of generative models for troubleshooting are encouraged, although plagiarism is strictly prohibited, and ethical applications of the generative models will be instructed. The entire course content is delivered in Ukrainian, supplemented with the necessary English terminology and background required for utilizing R packages effectively.

The course culminates in a team project analyzing expression data from publicly available datasets, including presentations and instructor feedback.

All instructors are experienced data analysts and bioinformaticians. Their expertise, developed through routine job responsibilities, publication track, and prior teaching engagements, ensures high-quality instruction and relevant, practical insights throughout the course.

Форма реєстрації

MODULE 1. INTRODUCTION TO R FOR BIOLOGISTS & BIOINFORMATICIANS

Session 1. R syntax and data types.

Dariia MYKHAILYSHYNA

Overviewing the features of R and Rstudio, introduction to basic R syntax and data types (numeric, boolean, strings, vectors, lists, matrices, dataframes, dates, factors, etc.) and operations with them.

Session 2. Code optimization with loops and functions.

Dariia MYKHAILYSHYNA

Contents: Introduction to loops (while loops, for loops) and functions. Applying functions via apply and map functions as an alternative to loops.

Session 3. Introduction to Tidyverse.

Dariia MYKHAILYSHYNA

Introduction to the tidyverse package and its features. Introduction to data loading, data exploration, and data manipulation with the tidyverse package.

Session 4. Data visualization.

Dariia MYKHAILYSHYNA

Introduction to data visualization with the ggplot2 package. Exploring different types of plots for different data types. Customizing your plot with different color palettes, themes, etc.

Session 5. Text data analysis.

Dariia MYKHAILYSHYNA

Loading, and cleaning text data. Introduction to bigrams and ngrams. Creating wordclouds. Sentiment analysis. Introduction to topic modeling.

Session 6. Rmarkdown and project reporting.

Oleksandr SHYNKARENKO

The use of RMarkdown for creating dynamic and reproducible reports. Learn how to integrate code, results, and narrative seamlessly to produce professional-quality documents. Topics include formatting, embedding visualizations, parameterizing reports for different outputs, and best practices for project documentation to facilitate collaboration and publication.

Session 7. Statistics in R 1.

Dmytro GOSPODARYOV

Methods of copying data from electronic tables and uploading data saved in comma-separated format. Calculation of mean, median, and variance in the base package. Testing for normality and equality of variances. Parametric pairwise comparisons and analysis of variance, using ‘base’, ‘DescTools’, and ‘coin’ packages.

Session 8. Statistics in R 2.

Dmytro GOSPODARYOV

Graphics in ‘base’ and ‘ggplot2’ and specialized packages (‘corrplot’, ‘pROC’, etc.): boxplots, correlation tables, trendlines, biplots, heatmaps, dendrograms, receiver operating characteristic (ROC) curves, odds ratios, and response surfaces.

Session 9. Introduction to Regression Analysis.

Dariia MYKHAILYSHYNA

The use of regression analysis, including a simple linear regression. Including categorical data and interactions into the regressions. Interpreting the output of the regressions. Dealing with multicollinearity and heteroskedasticity. Regression analysis with binary dependent variables.

Session 10. Data Wrangling 1.

Oleksandr SHYNKARENKO

Introduction to data wrangling techniques in R using packages like dplyr and tidyr. Learn how to clean, transform, and manipulate datasets to prepare them for analysis. Topics include filtering, selecting, mutating, summarizing data, and handling missing values to ensure data integrity and usability.

Session 11. Data Wrangling 2.

Oleksandr SHYNKARENKO

Advanced data manipulation strategies in R, including reshaping data, joining multiple datasets, and working with complex data structures. Explore best practices for efficient data processing, optimizing code performance, and automating repetitive tasks to streamline the data analysis workflow.

Session 12. Reproducible R pipelines. R + Docker.

Oleksandr PETRENKO

Introduction to reproducible research and the importance of reproducible pipelines in data analysis. Overview of Docker and containerization concepts tailored for R environments. A step-by-step guide to setting up Docker containers for R projects to ensure consistency across different systems. Creating and managing R scripts and their dependencies within Docker containers. Best practices for version control, automation, and documentation in reproducible workflows. Demonstration: Building a simple reproducible R pipeline using Docker, including environment setup, script execution, and container management.

Session 13. Constructing and working in Conda environments.

Oleksandr PETRENKO

Introduction to Conda and its role in environment and package management for data science projects. Setting up Conda on various operating systems and configuring basic environments. Creating, cloning, and managing Conda environments to handle different project dependencies effectively. Installing and updating packages using Conda, including handling complex dependencies and channels. Integrating Conda environments with popular IDEs and tools like Jupyter Notebook and RStudio. Best practices for environment sharing and reproducibility using environment.yml files. Troubleshooting common issues in Conda environments and optimizing environment performance.

Session 14. Introduction to machine learning with R.

Valeriia VASYLIEVA

Brief introduction of Machine Learning types, implementation of R packages for classification and regression (caret, randomForest etc.).

Session 15. Webapps with Shiny.

Valeriia VASYLIEVA

Building interactive web applications with R. Putting together layouts, themes, graphics, and interaction with users (feedback, upload, download).

Session 16. R vs Python.

Oleksandr SHYNKARENKO

Comparative analysis of R and Python for bioinformatics and data science applications. Discuss the strengths and weaknesses of each language, scenarios where one may be preferred over the other, interoperability between R and Python, and best practices for integrating both tools into a cohesive workflow. Explore key libraries and frameworks that support advanced data analysis in both environments.

Sessions 17. Small group workshops on data analysis for team projects.

Dariia MYKHAILYSHYNA

A series of meetings dedicated to developing own R workflow on a publicly available dataset, including exploratory, statistical analysis, visualization, and reporting, according to the best practices.

Sessions 18. Small group workshops on data analysis for team projects.

Oleksandr SHYNKARENKO

Sessions 19. Small group workshops on data analysis for team projects.

Dmytro GOSPODARYOV

Sessions 20. Small group workshops on data analysis for team projects.

Oleksandr PETRENKO

Module 2. SINGLE-CELL AND SPATIAL BIOLOGY: STATE OF THE ART AND DATA ANALYSIS

Session 1. Introduction to sequencing technologies.

Valeriia VASYLIEVA & Anna DIAMANT

NGS technologies (sequencing by synthesis, nanopore), library preparation, experimental design, output files, and general steps for reads preprocessing.

Session 2. Foundations of multidimensional data visualization.

Serhiy NAUMENKO

Theory and practice on Principal Component Analysis, Multidimensional Scaling, tSNE, UMAP.

Session 3. Single-cell RNA-seq analysis 1.

Maryna KORSHEVNIUK

Introduction of the main dataset used in the course. Walk through the main stages and scripts of the single-cell RNAseq using the dataset and package Seurat. This includes quality control, feature selection, data normalization, linear and non-linear dimensionality reduction, clustering, annotating cell types, and identifying markers.

Session 4. Single-cell RNA-seq analysis 2.

Maryna KORSHEVNIUK

Advanced techniques in single-cell RNA-seq analysis, including trajectory inference, differential expression analysis across conditions, and integrating additional data modalities to enhance biological insights.

Session 5. Diverse datasets integration with Harmony.

Ihor AREFIEV

Introduction to the Harmony package. Installation, and introduction of the datasets for the tutorial. Data integration with Harmony using Seurat package.

Session 6. Single-cell RNA-seq with long reads: advances vs short reads.

Anna DIAMANT

Exploration of long-read sequencing technologies in single-cell RNAseq, comparing their advantages and challenges relative to short-read methods, and discussing their impact on transcriptome analysis.

Session 7. Introduction to single-cell multi-omics.

Maryna KORSHEVNIUK

Technology landscape. Main approaches in data integration.

Session 8. Single-cell ATAC-seq.

Details on the ATAC-seq samples preparation, and considerations. Examples of application.

Session 9. Bi-modal integrative analysis of single-cell RNA+ATACseq data.

Maryna KORSHEVNIUK

Going through the main stages of single-cell ATAC-seq analysis with package Signac. RNA-ATAC integration: Weighted nearest neighbor analysis, cell type gene/pick identification, and visualization of chromatin accessibility profiles. TF binding motif enrichment analysis.

Session 10. Cellular indexing of transcriptomes and epitopes (CITE-seq).

Vladyslav KAVAKA

Principles and practices of CITE-seq, covering sample preparation, antibody tagging, and appropriate usage of the technique.

Session 11. Single-cell RNA-sequencing + CITE-seq: data integration and analysis.

Vladyslav KAVAKA

Practical introduction into the integration and CITE-seq data analysis to enable simultaneous single-cell transcriptome and protein expression profiling.

Session 12. Single-cell datasets: clustering and DimRed beyond textbooks and UMAPs.

Oleksandr PETRENKO

Advanced Clustering Techniques: Exploration of state-of-the-art clustering algorithms tailored for high-dimensional single-cell data, including graph-based and density-based methods. Algorithmic approaches for clustering evaluation (cluster stability, cluster silhouette). Beyond UMAP: a review of alternative dimensionality reduction techniques (e.g., PHATE).

Session 13. Overview of spatial and multiplex technologies and their application in translational oncology.

Tools for spatial analysis of tissue and cellular structures, coupled with multiplexed data collection techniques. Illustrate their use for the examination of tumor microenvironments and cellular heterogeneity.

Session 14. 10x Visium + HD: gene expression and tissue microenvironments.

Oleksandr PETRENKO

An overview of the 10x Visium spatial transcriptomics technology and its capabilities in mapping gene expression within tissue sections. Methods for dissecting and understanding the complex interactions between different cell types within their native tissue context. Approaches for integrating spatial transcriptomics data with other datasets and advanced visualization techniques to interpret spatial gene expression patterns.

Session 15. Introduction to the Imaging Mass Cytometry.

Elena MELNIK

Overview of Imaging Mass Cytometry technology, including its principles, instrumentation, and applications in high-dimensional cellular analysis.

Session 16. Imaging Mass Cytometry Data Analysis 1.

Elena MELNIK

Fundamentals of analyzing Imaging Mass Cytometry data, including data preprocessing, normalization, and initial exploratory analysis techniques.

Session 17. Imaging Mass Cytometry Data Analysis 2.

Elena MELNIK

Advanced strategies for Imaging Mass Cytometry data analysis, such as spatial pattern recognition, cell population identification, and integration with other omics data.

Session 18. Reproducible reporting and adhesion to FAIR in single-cell data analysis.

Oleksandr PETRENKO

An introduction to data and open science repositories (ENA, NCBI, OSF, Zenodo). Data management in collaborative project development and on high-performance computers/clusters. Data usage according to FAIR principles.

Session 19. Project presentations 1.

Anna DIAMANT

Presentation of students’ projects and instructors' feedback. These two sessions involve all instructors.

Session 20. Project presentations 2.

Maryna KORSHEVNIUK

Presentation of students’ projects and instructors' feedback. These two sessions involve all instructors.

Timetable

Date	Time	Topic	Instruction
13.01.2025 Mon	19:00	R syntax and data types	*Dariia MYKHAILYSHYNA*
17.01.2025 Fri	19:00	Code optimization with loops and functions	*Dariia MYKHAILYSHYNA*
20.01.2025 Mon	19:00	Introduction to Tidyverse	*Dariia MYKHAILYSHYNA*
24.01.2025 Fri	19:00	Data visualisation	*Dariia MYKHAILYSHYNA*
27.01.2025 Mon	19:00	Text data analysis	*Dariia MYKHAILYSHYNA*
31.01.2025 Fri	19:00	Rmarkdown and project reporting	*Oleksandr SHYNKARENKO*
03.02.2025 Mon	19:00	Statistics in R 1	*Dmytro GOSPODARYOV*
07.02.2025 Fri	19:00	Statistics in R 2	*Dmytro GOSPODARYOV*
10.02.2025 Mon	19:00	Introduction to Regression Analysis	*Dariia MYKHAILYSHYNA*
14.02.2025 Fri	19:00	Data Wrangling 1	*Oleksandr SHYNKARENKO*
17.02.2025 Mon	19:00	Data Wrangling 2	*Oleksandr SHYNKARENKO*
19.02.2025 Wed	19:00	Introduction to sequencing technologies	*Valeriia VASYLIEVA* *Anna DIAMANT*
21.02.2025 Fri	19:00	Reproducible R pipelines. R + Docker	*Oleksandr PETRENKO*
22.02.2025 Sat	19:00	Foundations of multidimensional data visualization	*Serhiy NAUMENKO*
24.02.2025 Mon	19:00	Constructing and working in conda environments	*Oleksandr PETRENKO*
26.02.2025 Wed	19:00	Single-cell RNA-seq analysis 1	*Maryna KORSHEVNIUK*
01.03.2025 Sat	19:00	Introduction to machine learning with R	*Valeriia VASYLIEVA*
02.03.2025 Sun	19:00	Single-cell RNA-seq analysis 2	*Maryna KORSHEVNIUK*
03.03.2025 Mon	19:00	Webapps with Shiny	*Valeriia VASYLIEVA*
05.03.2025 Wed	19:00	Diverse datasets integration with Harmony	*Ihor AREFIEV*
07.03.2025 Fri	19:00	R vs Python	*Oleksandr SHYNKARENKO*
(Throughout the M1 duration)		Small group workshops on data analysis for team projects	*Dariia MYKHAILYSHYNA* *Oleksandr SHYNKARENKO* *Dmytro GOSPODARYOV* *Oleksandr PETRENKO*
09.03.2025 Sun	19:00	Single-cell RNA-seq with long reads: advances vs short reads	*Anna DIAMANT*
12.03.2025 Wed	19:00	Introduction to single-cell multi-omics	*Maryna KORSHEVNIUK*
16.03.2025 Sun	19:00	Single-cell ATAC-seq
19.03.2025 Wed	19:00	Bi-modal integrative analysis of single-cell RNA+ATACseq data	*Maryna KORSHEVNIUK*
23.03.2025 Sun	19:00	Cellular indexing of transcriptomes and epitopes (CITE-seq)	Vladyslav KAVAKA
26.03.2025 Wed	19:00	Single-cell RNA-sequencing + CITE-seq: data integration and analysis	Vladyslav KAVAKA
30.03.2025 Sun	19:00	Single-cell datasets: clustering and DimRed beyond textbooks and UMAPs	*Oleksandr PETRENKO*
02.04.2025 Wed	19:00	Overview of spatial and multiplex technologies and their application in translational oncology
06.04.2025 Sun	19:00	10x Visium + HD: gene expression and tissue microenvironments	*Oleksandr PETRENKO*
09.04.2025 Wed	19:00	Introduction to the Imaging Mass Cytometry	*Elena MELNIK*
13.04.2025 Sun	19:00	Imaging Mass Cytometry data analysis 1	*Elena MELNIK*
16.04.2025 Wed	19:00	Imaging Mass Cytometry data analysis 2	*Elena MELNIK*
20.04.2025 Sun	19:00	Reproducible reporting and adherence to FAIR in single-cell data analysis	*Oleksandr PETRENKO*
23.04.2025 Wed	19:00	Project presentations 1	*Anna DIAMANT*
27.04.2025 Sun	19:00	Project presentations 2	*Maryna KORSHEVNIUK*

Lecturers

Introduction to R and RNA sequencing

Dmytro Gospodaryov

Maryna Korshevniuk

Oleksandr Petrenko

Oleksandr Shynkarenko

Valeriia Vasylieva