Introduction to R and RNA sequencing

This course is designed to equip students with a robust skillset in R programming, data analysis, and advanced bioinformatics techniques, particularly in single-cell transcriptomics and spatial proteo-transcriptomic analysis. The curriculum is divided into two interconnected modules, ensuring a seamless progression from general data analysis skills to specialized applications in bioinformatics.
The first module focuses on building essential expertise in R scripting and navigating RStudio. Students will learn to implement best practices for analysis reproducibility within a controlled environment (e.g., with Renv and Conda), engage in data visualization, and produce structured data analysis reports with Rmarkdown and Shiny. Additionally, the module introduces the application of statistical methods in R, basic text analysis, and introductory machine learning techniques, providing a solid groundwork for further exploration in data analysis fields of the student's choice. Instruction is delivered through a combination of lectures and practical hands-on seminars. During these sessions, students participate in live data analysis under the guidance of experienced instructors, engage in interactive assessments, and have ample opportunities to ask questions.
The second module focuses on the bioinformatics of single-cell transcriptomics and spatial proteo-transcriptomic analyses. This advanced segment emphasizes data processing and classical downstream analysis using R and relevant packages, while also introducing key concepts of multiomics analysis, including data integration with scATAC-seq and CITE-seq. Students will gain hands-on experience with spatial biology techniques such as 10x Visium, 10x Visium HD, and Imaging Mass Cytometry, enabling them to evaluate cell identities and marker expressions within tissue microenvironments. The module includes voluntary supervised team capstone projects, preparing students for independent work in real-world settings. Each session comprises the key activity (lecture or seminar), followed by a Q&A, troubleshooting, and discussion round. Panel discussions with multiple instructors are incorporated where applicable, ensuring comprehensive coverage of up-to-date techniques and live data analysis.
Throughout the course, students are divided into small groups to stimulate collaborative learning and are assigned homework to reinforce their knowledge and enhance troubleshooting – however, all the sessions are being held as a common track. Collaboration with groupmates and instructors and the use of generative models for troubleshooting are encouraged, although plagiarism is strictly prohibited, and ethical applications of the generative models will be instructed. The entire course content is delivered in Ukrainian, supplemented with the necessary English terminology and background required for utilizing R packages effectively.
The course culminates in a team project analyzing expression data from publicly available datasets, including presentations and instructor feedback.
All instructors are experienced data analysts and bioinformaticians. Their expertise, developed through routine job responsibilities, publication track, and prior teaching engagements, ensures high-quality instruction and relevant, practical insights throughout the course.
Overviewing the features of R and Rstudio, introduction to basic R syntax and data types (numeric, boolean, strings, vectors, lists, matrices, dataframes, dates, factors, etc.) and operations with them.
Contents: Introduction to loops (while loops, for loops) and functions. Applying functions via apply and map functions as an alternative to loops.
Introduction to the tidyverse package and its features. Introduction to data loading, data exploration, and data manipulation with the tidyverse package.
Introduction to data visualization with the ggplot2 package. Exploring different types of plots for different data types. Customizing your plot with different color palettes, themes, etc.
Loading, and cleaning text data. Introduction to bigrams and ngrams. Creating wordclouds. Sentiment analysis. Introduction to topic modeling.
The use of RMarkdown for creating dynamic and reproducible reports. Learn how to integrate code, results, and narrative seamlessly to produce professional-quality documents. Topics include formatting, embedding visualizations, parameterizing reports for different outputs, and best practices for project documentation to facilitate collaboration and publication.
Methods of copying data from electronic tables and uploading data saved in comma-separated format. Calculation of mean, median, and variance in the base package. Testing for normality and equality of variances. Parametric pairwise comparisons and analysis of variance, using ‘base’, ‘DescTools’, and ‘coin’ packages.
Graphics in ‘base’ and ‘ggplot2’ and specialized packages (‘corrplot’, ‘pROC’, etc.): boxplots, correlation tables, trendlines, biplots, heatmaps, dendrograms, receiver operating characteristic (ROC) curves, odds ratios, and response surfaces.
The use of regression analysis, including a simple linear regression. Including categorical data and interactions into the regressions. Interpreting the output of the regressions. Dealing with multicollinearity and heteroskedasticity. Regression analysis with binary dependent variables.
Introduction to data wrangling techniques in R using packages like dplyr and tidyr. Learn how to clean, transform, and manipulate datasets to prepare them for analysis. Topics include filtering, selecting, mutating, summarizing data, and handling missing values to ensure data integrity and usability.
Advanced data manipulation strategies in R, including reshaping data, joining multiple datasets, and working with complex data structures. Explore best practices for efficient data processing, optimizing code performance, and automating repetitive tasks to streamline the data analysis workflow.
Introduction to reproducible research and the importance of reproducible pipelines in data analysis. Overview of Docker and containerization concepts tailored for R environments. A step-by-step guide to setting up Docker containers for R projects to ensure consistency across different systems. Creating and managing R scripts and their dependencies within Docker containers. Best practices for version control, automation, and documentation in reproducible workflows. Demonstration: Building a simple reproducible R pipeline using Docker, including environment setup, script execution, and container management.
Introduction to Conda and its role in environment and package management for data science projects. Setting up Conda on various operating systems and configuring basic environments. Creating, cloning, and managing Conda environments to handle different project dependencies effectively. Installing and updating packages using Conda, including handling complex dependencies and channels. Integrating Conda environments with popular IDEs and tools like Jupyter Notebook and RStudio. Best practices for environment sharing and reproducibility using environment.yml files. Troubleshooting common issues in Conda environments and optimizing environment performance.
Brief introduction of Machine Learning types, implementation of R packages for classification and regression (caret, randomForest etc.).
Building interactive web applications with R. Putting together layouts, themes, graphics, and interaction with users (feedback, upload, download).
Comparative analysis of R and Python for bioinformatics and data science applications. Discuss the strengths and weaknesses of each language, scenarios where one may be preferred over the other, interoperability between R and Python, and best practices for integrating both tools into a cohesive workflow. Explore key libraries and frameworks that support advanced data analysis in both environments.
A series of meetings dedicated to developing own R workflow on a publicly available dataset, including exploratory, statistical analysis, visualization, and reporting, according to the best practices.
A series of meetings dedicated to developing own R workflow on a publicly available dataset, including exploratory, statistical analysis, visualization, and reporting, according to the best practices.
A series of meetings dedicated to developing own R workflow on a publicly available dataset, including exploratory, statistical analysis, visualization, and reporting, according to the best practices.
A series of meetings dedicated to developing own R workflow on a publicly available dataset, including exploratory, statistical analysis, visualization, and reporting, according to the best practices.