Statistics and data visualization for life sciences

General description

The course comprises lectures and hands-on sessions devoted to the basic methods of statistical analysis and data presentation used by biologists. It will teach how to correctly calculate confidence intervals for values; build and use calibration curves; compare sample means; conduct statistical testing and account for multiple comparisons; do analysis of variance (ANOVA); select which test to use for analysis; do no-linear regression analysis; present data in clear and straightforward manner. All the discussed statistical tests are broadly used in scientific publications in the field of life sciences. The course also contains a chapter related to the design of experiments and the rules for creating graphs for journal articles.

Lectures
1. Why, where, and when to use statistics in biology?
Dmytro GOSPODARYOV

Population. Range. Variational series. Sample. Sampling techniques and randomization. Mean. Median. Quartiles and percentiles. Probability distributions: normal, binomial, Poisson, Pearson, and others.

2. Types of measurement errors: systematic errors, random errors, gross errors.
Volodymyr SHVADCHAK

Cause of measurement errors: instrument errors, method errors, environmental errors, and observer errors. How errors can affect data analysis and interpretation? Minimizing measurement errors: calibration, repeated measurements, error correction techniques. Absolute and relative errors. Error propagation. Techniques for quantifying uncertainty due to measurement errors.

3. Variance.
Dmytro GOSPODARYOV

Standard deviation. Standard error of the mean. Z-test, Z-score. Confidence interval and confidence level. Statistical hypotheses. Student's t-test. P-values: definition, interpretation, common misconceptions. Tests for normality.

4. Graph preparation.
Volodymyr SHVADCHAK

Selection of graph type. When (not) to use bar graphs. Box plots and strip plots, X-Y scatter. Logic and consistency in color coding of data. The balance between size and information. Text marks, figure legend, figure caption, axis labels.

5. Types of experimental design.
Volodymyr SHVADCHAK

Randomized, randomized block, factorial, and others. Replication. How to calculate the required sample size? Surveys and questionnaire design. Reporting results.

6. Comparison between three or more groups.
Dmytro GOSPODARYOV

Analysis of variance (ANOVA). Tests for homogeneity of variances. Multiple testing. Tukey's honestly significant difference test. Dunnett's test. Scheffe's test. Corrections for multiple testing.

7. Non-parametrical statistics.
Dmytro GOSPODARYOV

Analysis of non-normally distributed data. Chi-Square (χ2) test. Survival analysis. Mann-Whitney U test. Kruskal-Wallis test.

8. Simple statistical calculations in Python.
Victor HUSAK

Mean, median, mode, range, variance, standard deviation. Performing statistical tests with Python (t-test, chi-square test, ANOVA). Handling outliers: removal, transformation, imputation.

9. Correlation.
Victor HUSAK

Pearson correlation coefficient. Spearman’s Rank Correlation. Linear regression. Calibration curve. Least square approximation. Residual Analysis: Definition, residual plots, checking assumptions of linear regression.

10. Types of non-linear functions.
Dmytro GOSPODARYOV

Exponential, logarithmic, polynomial, power, sigmoidal, and more. Techniques for estimating parameters in non-linear regression, such as least squares and maximum likelihood estimation. The process of constructing a curve that best fits the data points in a non-linear fashion. Overfitting and underfitting. Model selection. Applications of non-linear regression.

11. Basics of graphic design.
Volodymyr SHVADCHAK

Raster and vector images. Software and file types. Representation of colors (CMYK, RGB, HSL). Storytelling with data. How to guide the reader's eyes: contrast and highlights. Readability of graphs. Fonts, line thickness. Hierarchy. Alignments.

Seminars
1. Introduction to R.
Volodymyr SHVADCHAK
2. Introduction to Python for statistical calculations.
Victor HUSAK

Data import. Visualization. 

3. Linear and non-linear curve fitting.
Volodymyr SHVADCHAK

Origin software.

4. Data visualization.
Volodymyr SHVADCHAK
Level
Bachelor and master students
Lectures
11
Practical classes
4
Duration
1 Month
Language
Ukrainian
Certificate
1 credit ECTS
Lecturers

Associate Professor of the Department of Biochemistry and Biotechnology at the Vasyl Stefanyk Precarpathian National University.

Associate Professor of the Department of Biochemistry and Biotechnology at the Vasyl Stefanyk Precarpathian National University.

Employee of the Precarpathian University (Ivano-Frankivsk) and the Institute of Organic and Bioorganic Chemistry (Prague). Doctorate degree in Life sciences (PhD) was obtained from Strasbourg University in 2009 for research work on the development of solvatochromic fluorescent labels for studies of protein interactions.