Tentative Course Schedule (This may be updated without prior notice)

Week

Day

Topics

Readings

HW Assignments& Project

1

1/9

Introduction and Overviews

 Computer System Paper Evaluation, Common Bugs in Writing, Presentation Skills

1

1/11

Fundamentals of Computing in the past and at present

 

2

1/16

No Class in memory of Martin Luther King Jr. Day

2

1/18

Data-intensive Scalable Computing and Cloud, and HPC

  • Data-Intensive Supercomputing: The case for DISC. Randal E. Bryant. Carnegie Mellon University technical report CMU-CS-07-128(pdf

).

As organizations of all types struggle to keep pace with explosion of data deluging their daily operations, deciphering the complexities involved in data mining, analytics, and data processing will require the utilization of advanced HPC technologies.

3

1/23

Quantitative principals of Computing

Textbook, Distributed and Cloud Computing, from parallel processing to the Internet of things, Chapter 1

Reference Textbook, Computer Architecture, Chapter 6

3

1/25

MapReduce

4

1/30

Updates on MapReduce

4

2/1

Other data processing models

#4. PrIter: A Distributed Framework for Prioritized Iterative Computations SOCC11. (Bao)

#5: Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Eurosys 2007. (Mendez)

5

2/6

Parallel I/O in Practice

#6 Hadoop (Jie Chen)

#7 POSIX-IO and MPI-IO (Junyao Zhang)

5

2/8

Data-intensive HPC Analytics

#9 MRAP: A Novel MapReduce-based Framework to Support HPC Analytics Applications with Access Patterns, ACM High Performance Distributed Computing. June 2010, Chicago, IL, USA. (Mendez )

#8 Parallel Virtual File System (Jie Chen)

6

2/13

Data-intensive File System

#10 DOT: a matrix model for analyzing, optimizing and deploying software for

big data analytics in distributed systems. SOCC 2011(Carcheri)

#11 The duality of PVFS and HDFS, SC11 (Xiao)

6

2/15

Data Placement

#13: Volley: Automated Data Placement for Geo-Distributed Cloud Services. S. Agarwal, J. Dunagan, et al. NSDI 2010. (Zittrower)

#14 RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems ICDE11. (Carcheri).

Term Project Proposal Due

7

2/20

Term Project Proposal Presentation

7

2/22

Term Project Proposal Presentation

 

8

2/27

Scientific Workflows

#15. Auto-Scaling to Minimize Cost and Meet Application Deadlines in Cloud Workflows, SC11 (Bai)

#16. Scientific workflows and clouds, ACM Spring 2010 (Zittrower)

 

8

2/29

Beyond Cloud Computing

9

3/5-3/10

No Class Spring Break

 

 

10

3/12

Reserved for Guest Lectures

Prof. Shaojie Zhang’s guest talk

 

10

3/14

Virtual Machines

#19: Memory Resource Management in VMware ESX Server. Carl A. Waldspurger. OSDI 2002. (Luna)

#20: Differential Virtual Time (DVT): Rethinking I/O Service Differentiation for Virtual Machines. Mukil Kesavan, Ada Gavrilovska, Karsten Schwan. SOCC 2010. (Luna)

 

11

3/19

Not meet

 

11

3/21

Cloud provider building blocks/Amazon AWS

#21: The eucalyptus open-source cloud-computing system. Daniel Nurmi, Rich Wolski, et al. CCGRID 2009. (Wertz)

#22: Beyond Virtual Data Centers: Toward an Open Resource Control Architecture. Jeff Chase, Laura Grit, et al. ICVCI 2007. (Muhmud)

 

12

3/26

Microsoft Windows Azure

#23 Amazon Web Services Links and Resources (textbook pp231, links) (Kwok)

#24 Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency SOSP2011 (pdf, ppt) (Kwok)

 

12

3/28

Berkeley cluster scheduling papers

 

13

4/2

More cluster/scalable OSs/schedulers

 

13

4/4

Not Meet

 

14

4/9

VM image storage

 

14

4/11

Cloud storage

 

15

4/16

Final project presentation

 

15

4/18

Final project presentation

 

 

16

4/23

 

 

Term Project Due

Hadoop acceleration through network levitated merge, SC11

In situ data processing for extreme scale computing, DOE SCIDAC 2011

Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications, SC11

SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats, Yi Wang, Wei Jiang and Gagan Agrawal.    IEEE/ACM CCGrid (CCGrid'12), May 2012, Ottawa, Canada.