Fall 2009, CAP 6938: Special Topic of Algorithms in Computational Molecular Biology

Instructor: Shaojie Zhang

Lectures: M/W 3:00-4:15 pm HEC 110

Office hours: M/W 1:00-2:00 pm HEC 311

Syllabus: The course will concentrate on computer science aspects of computational molecular biology and will emphasize the recent advances and open problems in the area.

The computational techniques will include algorithms, graph theory, combinatorics, machine learning, etc. Many important bioinformatics topics will be used as examples to illustrate how the formulation and solution of a computer science problem can help to answer a biological question.

This course is designed for computer science graduate students. No biology knowledge is required.

Graduate students with either biology or physical/computer science backgrounds who have taken a fundamental bioinformatics course are also welcome to take this course.

Preliminary topics to be covered: 1. Introduction to algorithms 2. Exact string matching, Data structure: Suffix tree 3. Suffix tree: applications 4. Mismatch tree and the motif finding problem 5. Breaking problems down: dynamic programming 6. Combinatorial search: intractable problems 7. Integer programming; Reductions 8. Graph algorithms: trees 9. Haplotyping via perfect phylogeny 10. Comparing trees 11. De Bruijn graph; Eulerian graphs 12. Minimum Spanning trees; Shortest paths 13. Matching problems; Network flow 14. Breakpoint graph 15. Set cover and set packing 16. Graph-based clustering 17. Heuristic algorithms

Reference Textbooks:

  • P.A. Pevzner, Computational Molecular Biology: An Algorithmic Approach. The MIT Press, 2000 (CMB)
  • D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press. 1997 (Dan)
  • N.C, Jones and P.A. Pevzner. Introduction to Bioinformatics Algorithms. The MIT Press, 2004. (BA)
  • Assignments: We will have 4 take-home assignments.

    Grading: Assignments (40%), Midterm exam (25%), Final exam (25%), Attendance (10%).

    Topics and Tentative Agenda:

    Week Date Topic Slides Notes
    1 08/24 Introduction to algorithm PDF
    08/26 Exact string matching PDF Dan 2.3, Dan 1.4
    2 08/31 Z-value, Suffix trees
    Assignment 1 (Due 09/14)
    PDF Dan 1.4, Dan 6
    09/04 Suffix trees PDF Dan 6
    3 09/07 No Class (Labor Day)
    09/09 Suffix trees See above
    4 09/14 Mismatch tree PDF CMB 8.6
    09/16 Dynamic Programming PDF CMB CMB 6.2 and 6.3
    Myers-Miller algorithm
    5 09/21 DP Applications: RNA folding, complicated parameters PDF mfold
    09/23 DP Applications: RNA alignments with guiding tree
    Assignment 2 (Due 10/7)
    PDF Paper_1
    6 09/28 DP Applications: Pseudoknotted RNA alignments PDF pseudoknot
    09/30 DP Applications: RNA folding, a different formulation PDF RNAscf
    7 10/05 DP Applications: Spliced sequence alignment PDF CMB 9.4
    10/07 NP-hard problem and multiple sequence alignment PDF
    8 10/12 Algorithmic problems related to phylogenetic trees-1 PDF
    10/14 Mid-term Exam (take-home)
    9 10/19 Algorithmic problems related to phylogenetic trees-2 (Fitch Algorithm) See above
    10/21 Algorithmic problems related to phylogenetic trees-3 (perfect phylogeny)
    Assignment 3 (Due 11/04)
    See above
    10 10/26 Algorithmic problems related to phylogenetic trees-4 (SNP) See above PPH
    10/28 Algorithmic problems related to phylogenetic trees-5 (SNP and Haplotyping) See above PPH
    11 11/02 De Bruijn graph and Eulerian graph-1 PDF CMB 5
    11/04 De Bruijn graph and Eulerian graph-2 (fragment assembly) See above CMB 5
    12 11/09 Genome rearrangements and breakpoint graph-1 PDF CMB 10
    11/11 No Class (Veterans Day)
    13 11/16 Genome rearrangements and breakpoint graph-2
    Assignment 4 (Due 11/30)
    See above CMB 10
    11/18 Computational Problems in RNA-seq data analysis
    14 11/23 Computational Proteomics (Spectrum Graphs and Spectral Alignment)-1 PDF CMB 11
    11/25 No Class (Thanksgving Day)
    15 11/30 Computational Proteomics (Spectrum Graphs and Spectral Alignment)-2 See above CMB 11
    11/02 omputational Proteomics (Spectrum Graphs and Spectral Alignment)-3 See above CMB 11

    We are always looking for motivated students. If you are looking for research projects, please get in touch.