Spring 2010 CAP6938: Special Topics in Computational Genomics

Instructor: Shaojie Zhang

Lectures: M/W 3:00-4:15pm BA 221

Office hours: Shaojie Zhang, HEC 311, M/W 2:00 pm - 3:00 pm, 4:30 pm - 5:30 pm or by appointment.


The course should be self-contained. However, a concise introduction to Biology can be found at the Bioinformatics Algorithms web-site (chapter 3). Also, the text of Mol. Biol. of the Cell can be searched online.

This course will summarize computational techniques for comparing genomes on the DNA and protein sequence levels. Topics include state of the art computational techniques and their applications: understanding of hereditary diseases and cancer, genetic mobile elements, genome rearrangements, genome evolution, and the identification of potential drug targets in microbial genomes.

This course is designed for the advanced level computer science graduate students. Graduate students with entry-level background in bioinformatics research (e.g. after taking CAP 5510 or equivalent courses) are welcome to take this course. Biological background students who are interested in comparative genomics are also welcome.


E. Koonin and M. Y. Galperin: Sequence-Evolution-Function: Computational Approaches in Comparative Genomics, Springer, 2002. (COMP). There is online version of this book:Link. We will also distribute complementary lecture notes and papers along the course for these topics.

Dan Gusfild Algorithms on strings, trees and sequences. (ALG) This book covers most of the algorithms we will discuss in the class.

Current research papers (2003-2010) from "Nature", "Science", "PLOS Biology", "Genome Research", "Bioinformatics", and etc. are distributed along the course for different research topics.

Grading: Summary (40%), Paper presentations (60%).

Summaries Guide Line (for Research Paper Reading and Presentation) Read the paper before lecture. Write a one-page summary of the paper that will be discussed on class. Make sure write down the biological problem and the computational problem hidden inside the paper. Send the summary by email to me before the lecture (12:00 pm sharp)

Paper Presentation Guide Line: Read paper first, meet with me 1-2 weeks before lecture to discuss the paper. Meet with me 3-5 day before lecture to discuss the slides. Slides due at noon (sharp) the lecture. Please make the appointments throught emails.

Topics and Tentative Schedule:

Date Topic Slides Book References/Papers Note
L1: 01/11 Course Introduction PDF
L2: 01/13 1. Genome Alignments
1.1 Overview of Sequence Alignment Algorithms
01/18 No Class (MLK Day)
L3: 01/20 1.2 Overview of Sequence Alignment Algorithms (2) PDF ALG 14
L4: 01/25 1.3 Overview of Sequence Alignment Algorithms (3) PDF COMP 4.4/ALG 11
Smith-Waterman Algorithm
Myers-Miller Algorithm (Linear Space Alignment)
L5: 01/27 1.4 Overview of Sequence Alignment Algorithms (4) PDF ALG 14
L6: 02/01 1.5 Overview of Sequence Alignment Algorithms (5) PDF ALG 12.5.2
L7: 02/03 1.6 Genome Alignment Algorithms PDF LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA, Genome Research
L8: 02/082. Genome Rearrangements and Genome Evolutions
2.1 Genome Rearrangements
PDF Towards a Computational Theory of Genome Rearrangements
L9: 02/102.2 Cancer genomicsPDFReconstructing tumor genome architectures, Bioinformatics
L10: 02/152.3 Whole genome duplications PDF Proof and evolutionary analysis of ancient genome duplication in theyeast Saccharomyces cerevisiae, Nature
L11: 02/172.3 Micro rearrangementsMicroinversions in mammalian evolution, PNAS
L12: 02/223. Whole Genome SequencingFragment assembly with short reads, Bioinformatics
De novo fragment assembly with short mate-paired reads: Does the read length matter?, Genome Research
L13: 02/243.2 Genome Assemblysee above
L14: 03/014. Gene Prediction
L15: 03/035. Gene Regulation and Micro-array
03/08,10 Spring Break
L16: 03/15 6. Repeats in Genomes
6.1 Repeat Identifications
De novo identification of repea families inlarge genomes, Bioinformatics Peter Clements
L17: 03/17 6.2 Transposable Elements IdentificationsIdentification of transposable elements using multiple alignments of related genomes, Genome Research Stephen Fulwider
L18: 03/22 6.3 ALU EvolutionsWhole-genome analysis of Alu repeat elemen reveals complex evolutionary history, Genome Research Sonal Gadia
L19: 03/24 6.4 Transposable elements and pi-RNAPopulation dynamics of PIWI-interacting RNAs (piRNAs) and their targets in Drosophila , Genome Research Travis Roe
L20: 03/29 7 Motifs Discovery in Genomes
7.1 Phylo_HMM
Evolutionarily conserved elements invertebrate, insect, worm, and yeast genomes, Genome Research Pengju Shang
L21: 03/31 7.2 Motifs Discovery Through Comparative GenomicsSystematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals, Nature Zhengkai Wu
L22: 04/05 7.3 Regulatory Network Assigning roles to DNA regulatory motifs using comparative genomics , Bioinformatics Chris Zonca
L23: 04/07 8 Finding Non-coding RNAs in Genomes
8.1 Introduction and RNAz
1. Secondary Structure Prediction for Aligned RNA Sequences, Journal of Molecular Biology
2. Consensus Folding of Aligned Sequences as a New Measure for the Detection of Functional RNAs by Comparative Genomics, Journal of Molecular Biology
3. Fast and reliable prediction of noncoding RNAs, PNAS
4. Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome, Nature Biotchnology
L24: 04/12 8.2 RNA ClusteringInferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering, Plos Computational Biology Sonal Gadia
L25: 04/14 8.3 RNA ClusteringRNA stem-loops: To be or not to be cleaved by RNAse III, RNA Peter Clements
L26: 04/19 8.4 sRNAIntaRNA: efficient prediction of bacterial sRNA targets incorporating target site accessibility and seed regions, Bioinformatics Chris Zonca
L27: 04/21 9 MetagenomicsMEGAN analysis of metagenomic data, Genome Research Pengju Shang
L28: 04/21 9.2 MetagenomicsUniFrac: a New Phylogenetic Method for Comparing Microbial Communities, APPLIED AND ENVIRONMENTAL MICROBIOLOGY Zhengkai Wu
L29: 04/26 10 Next-Generation Sequencing Technologies Applications A comprehensive catalogue of somatic mutations from a human cancer genome, Nature Travis Roe
L30: 04/26 10.2 Next-Generation Sequencing Technologies Applications Population genetic inference from genomic sequence variation, Genome Research Stephen Fulwider


We are always looking for motivated students. If you are looking for research projects, please get in touch.