BioID

ArtPoon 2017
GitHub

layout: page title: About —

This undergraduate course is designed to cover a broad domain of bioinformatics as it is applied to the study of infectious diseases. The course is structured by different topics that are anchored by recent, high-impact papers in the scientific literature. For each paper, we will cover the overall theme, the context of the specific study, the underlying model and algorithm, and then run a simplified version of the analysis in the laboratory section.

Learning objectives

  • To develop a fundamental understanding of the concepts underlying the analysis of genetic sequence variation from infectious disease outbreaks (genetic distances, maximum likelihood).
  • To gain basic command-line literacy.
  • To become acquainted with popular software tools used for the analysis of infectious disease sequence data.

Outline

  1. Databases
    • Genbank and the birth of bioinformatics
    • What is a database?
    • Sequence formats
  2. Alignment
    • scoring matrices
    • BLAST queries
    • Smith-Waterman and related algorithms
  3. Genetic diversity
    • genetic distances
    • virus nomenclature
    • molecular epidemiology (genetic clustering)
  4. Building trees
    • Distance-based methods (neighbor-joining)
    • Rooting (outgroup, midpoint)
    • Maximum likelihood
  5. Measuring rates of evolution
    • Diversity
    • Markov chain models (Jukes-Cantor)
    • Rates of evolution
    • Detecting selection
  6. Molecular clocks
    • Rescaling trees
    • Root-to-tip methods
    • Bayesian inference
    • The coalescent
  7. Next-generation sequencing
    • NGS data formats, databases
    • Quality scores
    • Short-read mapping
    • De-novo assembly
  8. Applications
    • Pathogen discovery
    • 16S rRNA, microbial ecology
    • RNA-seq
  9. Scripting
    • Scripting languages
  10. Ethics
    • Ethics

GitHub repository

All code used to implement this website can be obtained on GitHub.

License

These course materials, with the exception of the data sets associated with publications from other parties, are released into the public domain under the Creative Commons Attribution-ShareAlike 4.0 license, under which you are free to copy, modify and redistribute this content, even for commercial purposes, so long as that derived content is distributed under this same license.