Distributed Systems
CSCI-B 534/ENGR-E 510 (Spring 2021)

Course Description

Distributed computing systems are complex, difficult to understand, and everywhere.

This course will cover the necessary principles, techniques, and tools for understanding, analyzing, and building distributed applications and systems. We will be looking at both distributed computing fundamentals, as well as study the design of popular distributed systems. We will also examine blockchains from an academic distributed systems perspective.

We will look at how systems can communicate and coordinate through message passing, and study classical distributed algorithms involving logical and vector clocks, leader election, fault-tolerance, and consensus. Students will also learn about the design of large-scale distributed systems, and be expected to implement many of the ideas studied in class as part of homework assignments and projects.

Prerequisites

Distributed systems build upon and extend many classical areas in Computer Science. Strong fundamentals in Operating Systems, Computer Networks, and Algorithms are a must.

Text-books

We will use a combination of books and research papers.

  • Required: Distributed Systems: Principles and Paradigms, 3rd Edition (Maarten Van Steen and Andrew Tanenbaum) Online version
  • Recommended: Elements of Distributed Computing (Vijay Garg)

Learning Objectives

  • A fundamental shift in how you think about computing: from serial programs to loosely coupled asynchrnous distributed systems.
  • Design and implement moderately complex distributed systems of your own
  • Understand classic distributed algorithms for synchronization, consistency, fault-tolerance, etc.
  • Reason about correctness of distributed algorithms, and derive your your own algorithms for special cases
  • Understand how modern distributed systems are designed and engineered.

Format

This course is designed and optimized for in-person socratic teaching. A typical in-class lecture comprises of starting with a simplistic solution, and collaboratively iterating on it to develop the final, correct solution.

Unfortunately, this semester (Spring 2021) is fully virtual. Videos will be posted ahead of time on youtube (links will be posted on Canvas). During the class hours (on Zoom), there will be quizzes and discussion about the key learning objectives from the previous lecture videos.

Syllabus

Lecture Topic Reading Notes
1 Introduction to Distributed Computing Chapter 1 Lec1-slides
2 Building blocks: OS processes and threads Chapter 3 Lec2-slides
3 Computer Networks Chapter 4 Lec3-slides
4 Remote Procedure Calls Birrel and Nelson Lec4-slides
5 MapReduce MapReduce paper Lec5-slides
6 High-level communication and publish-subscribe ZeroMQ, Kafka Lec6-slides
7 Event ordering and logical clocks Lamport Clocks, Chapter 6 Lec7-slides
8 Vector clocks and applications Garg Chapter 4 Lec8-slides Vector clock proof
9 Vector clock applications and Causal Orders Garg Chapter 4, 6 Excluded from mid-term
10 Mutual exclusion and leader election Chapter 6 Lec10-slides
11 Distributed Snapshots Chapter 10 from Garg Lec11-slides
12 Load balancing   Lec12-notes
13 Consistency Models: Sequential Consistency Chapter 7 Lec13-slides
14 Causal Consistency models Chapter 7 Lec14-slides
15 CAP Theorem, Eventual Consistency   Lec15-slides
16 CRDT   Lec16-slides
17 Failures Chapter 8 Lec17-slides
18–19 Consensus: Paxos Chapter 8 Lec18-slides

Overflow:

20 Raft and Zookeeper   raft Zookeeper
21 Byzantine fault tolerance Chapter 8 Lec21-slides
22 Spark Fault Tolerance   Spark
23 No class, Project hacking    
24 Mid-term 2    
25 Mid-project presentations    
26 Distributed Filesystems NFS, Ceph Lec22-slides
27 Distributed Machine Learning TensorFlow Lec23-slides
28 Distributed Resource Management Mesos, DRF, Sparrow Lec24-slides
29 Course Review and Recap    
24–25 Cloud Computing    
28 High-performance key-value stores Redis, ScyllaDB  

Important Dates

Date Event
Around Lecture #12 Mid-term 1

Evaluation Criteria

The rough breakdown is as follows:

   
Mid-term 20%
Final 30%
Assignments and Homework 40%
Class participation and Quizzes 10%

Exams

The exams will test how well students have understood various distributed algorithms, correctness proofs, edge-cases, tradeoffs, and real-life implementation considerations.

Assignments

The assignments will be a mix of theory and distributed system design. Students will implement various classic distributed algorithms (such as Map-Reduce, totally ordered multicast, logical clocks, various consistency models in a distributed key-value store, etc.).

The design oriented assignments will involve a large degree of programming and debugging. In most cases, the programming assignments are language agnostic (you can pick any reasonable programming language).

A key learning objective of this course is to design, architect, and implement a distributed system from scratch, and to design useful test-cases for evaluating the implementation. Therefore, no starter-code or templates will be provided, to give students the maximum flexibility and freedom to explore the unconstrained design space. Points will be awarded for correct and faithful designs, complete implementation, adequate testing, and reports and documentation.

Active learning/In-person class participation

Students will learn about distributed algorithms using group activities in class. Typically, small groups of students will "emulate" a message-passing-based distributed algorithm, by passing messages (on post-it notes).

Michelin Star Grading

The grading in this course will favor students who turn in exceptional programs, reviews, and exam answers. Towards this end, we will use a "Michelin Star" system where points are awarded for high quality course products. Going over and beyond the standard evaluation criteria will fetch multiple stars. Students are eligible for an A (or A+) grade only if they have atleast one "star" across the course. Thus, it is not enough to turn in work that is merely correct. Students with a few stars are automatically eligible for A/A+ grades irrespective of their performance in the rest of the course.

Examples of high-quality work

  • Programs that are well documented, have a clean design, and implement something non-trivial in a clever way.
  • Proofs that are correct and concise.
  • Insightful and thoughtful paper reviews
  • Exam answers that are crisp, insightful, and show a deep or unique understanding of the subject matter.
  • A great question or answer during class discussions/office hours

Late submission policy

Students can avail a total of four late submission days as they wish.

Administrative Information

Class Information

  When
Main Class Tuesdays and Thursdays 09:25A-10:40A
Lab 1 Friday 09:25A-10:40A
Lab 2 Friday 11:30A-12:45P

Labs serve as office hours and assignment help for all students. Grading will also be performed during these times, where students will be asked to explain and justify their work.

Office Hours

Who Email Office Location Office Hours
Prateek Sharma prateeks @iu Luddy 4126 Wed 9–10 am, or by appointment
Sahil Tyagi styagi @iu   Lab 1 time
Tingyi Wanyan tiwanyan @iu   Lab 2 time

Author: prateek s

Created: 2021-05-02 Sun 18:54

Validate