Bioinformatics 03 2024
The bioinformatics course consists of three blocks - the basics of analysis of DNA and protein sequences, principles of DNA sequencing and molecular phylogenetics. The general goal of the course is to introduce students and listeners to the basics of biological data analysis, primarily DNA and protein sequences, approaches to their alignment; DNA sequencing methods and their application for various research tasks; basics of visualization and manipulations with genome sequences using genome browsers; basics of population genetics and phylogenetic analysis. The course is designed for master's and postgraduate students with basic knowledge of genetics and molecular biology. The course aims at acquiring and deepening the existing practical skills of working with biological data and includes a series of seminars and practical classes with the execution of individual tasks using the methods and approaches discussed in the lectures.
All classes will be given in Ukrainian.
- Definition of bioinformatics. How is it different from computational biology?
- Information content of nucleotide and amino acid sequences. Shannon equation
- Models that describe the frequency of occurrence of short substrings within a genetic sequence – Bernoulli and Markov chains
- Likelihood and Bayesian approaches in bioinformatics – toy examples
- Concept of the complexity of genetic sequences. Low complexity regions, their biological significance
- The concept of a pairwise alignment, basic terms (match, mismatch, gaps).
- Homology of sequences.
- Methods for evaluation of pairwise alignments: mechanistic and empirical approaches
- Algorithms - dynamic programming (Smith-Waterman, Needleman-Wunsch) and heuristic (BLAST).
- Karlin-Altschul statistics and the Expectation number (E).
- Newer developments: Diamond
- Definition
- Progressive methods of multiple sequence alignment – CLUSTAL W2.
- New tools of multiple sequence alignment (T-COFFEE, MUSCLE, MSAProbs).
- Consensus sequence
- Position-specific score matrices (PSSM). Weblogo
- Concept of HMM and a toy example
- TMHMM. GeneMark. Pfam. HHPred
- Why biologists are obsessed with structures
- PDB
- AlphaFold
- RNA databases
- History of DNA sequencing development.
- Sanger and Maxam–Gilbert sequencing.
- Next-generation sequencing.
- Third-generation sequencing.
- Principles of genome assembly.
- Problems of functional annotation of genomes.
- Origin of genetic variation and basic concepts of evolutionary biology and molecular evolution.
- Evolutionary forces, the fate of alleles in the population.
- Neutral evolution, mutation/drift equilibrium.
- Wondering in the space of genotypes.
- Phylogenetic trees: nomenclature, tree-thinking, editing and format conversion.
- Three main approaches to the evolutionary history reconstruction. Distance-based methods, Likelihood methods and Bayesian inference phylogenetic reconstructions.
- Substitution models.
- Multilocus analysis of phylogenies, coalescent methods.
- Phylogeny as a backbone and null model of evolutionary biology.
- Population history, demography and selection.
- Method of phylogenetics contrasts.
- Test for selection. Fst outliers.
- Challenges and limitations of whole genome-based phylogenetic reconstruction.
- Genotyping by sequencing and artificial data downscaling. DdRAD sequencing.
Working with NCBI databases – PubMed: doi, PMID, PMCID; GenBank – file structure; Genome – database structure and its use. Genome browser. Taxonomy. GEO.