Tentative Course Schedule (This may be updated without prior notice)
Week
Day
Topics
Readings
HW Assignments& Project
1
1/9
Introduction and Overviews
Computer System Paper Evaluation, Common Bugs in Writing, Presentation Skills
1/11
Fundamentals of Computing in the past and at present
2
1/16
No Class in memory of Martin Luther King Jr. Day
1/18
Data-intensive Scalable Computing and Cloud, and HPC
).
As organizations of all types struggle to keep pace with explosion of data deluging their daily operations, deciphering the complexities involved in data mining, analytics, and data processing will require the utilization of advanced HPC technologies.
3
1/23
Quantitative principals of Computing
Textbook, Distributed and Cloud Computing, from parallel processing to the Internet of things, Chapter 1
Reference Textbook, Computer Architecture, Chapter 6
1/25
MapReduce
4
1/30
Updates on MapReduce
2/1
Other data processing models
#4. PrIter: A Distributed Framework for Prioritized Iterative Computations SOCC11. (Bao)
#5: Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Eurosys 2007. (Mendez)
5
2/6
Parallel I/O in Practice
#6 Hadoop (Jie Chen)
#7 POSIX-IO and MPI-IO (Junyao Zhang)
2/8
Data-intensive HPC Analytics
#9 MRAP: A Novel MapReduce-based Framework to Support HPC Analytics Applications with Access Patterns, ACM High Performance Distributed Computing. June 2010, Chicago, IL, USA. (Mendez )
#8 Parallel Virtual File System (Jie Chen)
6
2/13
Data-intensive File System
#10 DOT: a matrix model for analyzing, optimizing and deploying software for
big data analytics in distributed systems. SOCC 2011(Carcheri)
#11 The duality of PVFS and HDFS, SC11 (Xiao)
2/15
Data Placement
#13: Volley: Automated Data Placement for Geo-Distributed Cloud Services. S. Agarwal, J. Dunagan, et al. NSDI 2010. (Zittrower)
#14 RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems ICDE11. (Carcheri).
Term Project Proposal Due
7
2/20
Term Project Proposal Presentation
2/22
8
2/27
Scientific Workflows
#15. Auto-Scaling to Minimize Cost and Meet Application Deadlines in Cloud Workflows, SC11 (Bai)
#16. Scientific workflows and clouds, ACM Spring 2010 (Zittrower)
2/29
Beyond Cloud Computing
9
3/5-3/10
No Class Spring Break
10
3/12
Reserved for Guest Lectures
Prof. Shaojie Zhang’s guest talk
3/14
Virtual Machines
#19: Memory Resource Management in VMware ESX Server. Carl A. Waldspurger. OSDI 2002. (Luna)
#20: Differential Virtual Time (DVT): Rethinking I/O Service Differentiation for Virtual Machines. Mukil Kesavan, Ada Gavrilovska, Karsten Schwan. SOCC 2010. (Luna)
11
3/19
Not meet
3/21
Cloud provider building blocks/Amazon AWS
#21: The eucalyptus open-source cloud-computing system. Daniel Nurmi, Rich Wolski, et al. CCGRID 2009. (Wertz)
#22: Beyond Virtual Data Centers: Toward an Open Resource Control Architecture. Jeff Chase, Laura Grit, et al. ICVCI 2007. (Muhmud)
12
3/26
Microsoft Windows Azure
#23 Amazon Web Services Links and Resources (textbook pp231, links) (Kwok)
#24 Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency SOSP2011 (pdf, ppt) (Kwok)
3/28
Berkeley cluster scheduling papers
13
4/2
More cluster/scalable OSs/schedulers
4/4
Not Meet
14
4/9
VM image storage
4/11
Cloud storage
15
4/16
Final project presentation
4/18
16
4/23
Term Project Due
Hadoop acceleration through network levitated merge, SC11
In situ data processing for extreme scale computing, DOE SCIDAC 2011
Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications, SC11
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats, Yi Wang, Wei Jiang and Gagan Agrawal. IEEE/ACM CCGrid (CCGrid'12), May 2012, Ottawa, Canada.