Introduction to Bioinformatics: Overview
Syllabus
Some old slides (download)
What's bioinformatics?
Short: Bioinformatics is about connecting computational and biological ideas.Long: Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three important sub-disciplines within bioinformatics: the development of new algorithms and statistics with which to assess relationships among members of large data sets; the analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and the development and implementation of tools that enable efficient access and management of different types of information. (NCBI)
Related fields
- Translational bioinformatics
- Health informatics/Biomedical informatics
- Complex systems
- Systems biology
- Biophysics
- Mathematical biology, which tackles biological problems using methods that need not be numerical and need not be implemented in software or hardware
What do you need to know?
Biology/CS (Algorithm; Programming)/Math/Statistics/Physics/ChemistryBiology primer
- Prokaryotes vs Eukaryotes; model organisms (E.coli, Yeast, C. elegans, fruit fly)
- Cells
- Genome, gene, genotype, phynotype
- DNA, protein, the Central Dogma, the Genetic Code
- Evolution of genes
From biological problems to computational problems (algorithms)
- Many biological problems can be formulated as well known computational problems
- Examples
- How do we compare two proteins -- Dynamic programming algorithm (edit distance calculation)
- How do we assemble genomes -- Graph algorithms
- Computational abstractions: biological sequences as strings; networks as graphs
Bioinformatics meets big data
- Adapting bioinformatics curricula for big data [2015]
- Challenges raised by big data
- Data unification: data wrangling, i.e. obtaining the necessary data in the appropriate format, as well as the normalization necessary to make them comparable across sources.
- Computational and storage limitations: the difficulties and costs associated with keeping data, moving data and analyzing data.
- Multiple hypothesis testing: statistically addressing the likelihood of finding spurious associations in large data sets.
- Bias and confounding in the data: refer to challenges related to which experiments have been performed or which processes are most frequently assayed.