Distributed Systems
CSCI-B 534/ENGR-E 510 (Spring 2022)

Course Description

Distributed computing systems are complex, difficult to understand, and everywhere.

This course will cover the necessary principles, techniques, and tools for understanding, analyzing, and building distributed applications and systems. We will be looking at both distributed computing fundamentals, as well as study the design of popular distributed systems.

We will look at how systems can communicate and coordinate through message passing, and study classical distributed algorithms involving logical and vector clocks, leader election, fault-tolerance, data-consistency, and consensus. Students will also learn about the design of large-scale distributed systems, and be expected to implement many of the ideas studied in class as part of homework assignments and projects.

Prerequisites

"To be able to use a second computer, you must know how to use the first one".

Distributed systems build upon and extend many classical areas in Computer Science. Strong fundamentals in Operating Systems, Computer Networks, and Algorithms are a must.

Text-books

We will use a combination of books and research papers.

  • Required: Distributed Systems: Principles and Paradigms, 3rd Edition (Maarten Van Steen and Andrew Tanenbaum) Online version
  • Recommended: Elements of Distributed Computing (Vijay Garg)

Learning Objectives

  • A fundamental shift in how you think about computing: from serial programs to loosely coupled asynchrnous distributed systems.
  • Design and implement moderately complex distributed systems of your own
  • Understand classic distributed algorithms for synchronization, consistency, fault-tolerance, etc.
  • Reason about correctness of distributed algorithms, and derive your your own algorithms for special cases
  • Understand how modern distributed systems are designed and engineered.

Format

This course is designed and optimized for in-person socratic teaching. A typical in-class lecture comprises of starting with a simplistic solution, and collaboratively iterating on it to develop the final, correct solution.

Unfortunately, this semester (Spring 2021) is fully virtual. Videos will be posted ahead of time on youtube (links will be posted on Canvas). During the class hours (on Zoom), there will be quizzes and discussion about the key learning objectives from the previous lecture videos.

Syllabus

Lecture Topic Reading Notes
Module A Overview and whirlwind intro    
1 Introduction to Distributed Computing Chapter 1 Lec1-slides
2 Event ordering and logical clocks Lamport Clocks, Chapter 6 Lec7-slides
3 Total Order Multicast   [See previous]
4 Vector Clocks   Lec8-slides Vector clock proof
Module B Networking    
5,6 Computer Networks Chapter 4 Lec3-slides
7 Remote Procedure Calls Birrel and Nelson Lec4-slides
8 High-level communication and publish-subscribe ZeroMQ, Kafka Lec6-slides
9,10 MapReduce MapReduce paper Lec5-slides
Module C Classic Distributed Algorithms    
10 Vector clock applications and Causal Orders Garg Chapter 4, 6 Excluded from mid-term
11 Mutual exclusion and leader election Chapter 6 Lec10-slides
12 Distributed Snapshots Chapter 10 from Garg Lec11-slides
Module D Distributed Data Storage    
13 Load balancing   Lec12-notes
14 Consistency Models: Sequential Consistency Chapter 7 Lec13-slides
15 Causal Consistency models Chapter 7 Lec14-slides
16 CAP Theorem, Eventual Consistency   Lec15-slides
17 CRDT   Lec16-slides
18 Failures Chapter 8 Lec17-slides
19–20 Consensus: Paxos Chapter 8 Lec18-slides
Overflow      
20 Raft and Zookeeper   raft Zookeeper
21 Byzantine fault tolerance Chapter 8 Lec21-slides
22 Spark Fault Tolerance   Spark
26 Distributed Filesystems NFS, Ceph Lec22-slides
27 Distributed Machine Learning TensorFlow Lec23-slides
28 Distributed Resource Management Mesos, DRF, Sparrow Lec24-slides

Important Dates

Date Event
Around Lecture #12 Mid-term 1

Evaluation Criteria

The rough breakdown is as follows:

   
Mid-term 20%
Final 30%
Assignments and Homework 40%
Class participation and Quizzes 10%

Exams

The exams will test how well students have understood various distributed algorithms, correctness proofs, edge-cases, tradeoffs, and real-life implementation considerations.

Programming Assignments

The assignments will be a mix of theory and distributed system design. Students will implement various classic distributed algorithms (such as Map-Reduce, totally ordered multicast, logical clocks, various consistency models in a distributed key-value store, etc.).

The design oriented assignments will involve a large degree of programming and debugging. In most cases, the programming assignments are language agnostic (you can pick any reasonable programming language).

A key learning objective of this course is to design, architect, and implement a distributed system from scratch, and to design useful test-cases for evaluating the implementation. Therefore, no starter-code or templates will be provided, to give students the maximum flexibility and freedom to explore the unconstrained design space. Points will be awarded for correct and faithful designs, complete implementation, adequate testing, and reports and documentation.

Most programming assignments will take significantly longer than you anticipate. Start early. Please see the assignment descriptions below (from last year), to get a sense of how they will look like. In general, all programming assignments in this course only specify the "end goal", and you must figure out how to get there: what and how to implement, what libraries to use, etc. There will be no starter-code, no templates, no training wheels. You are on your own.

   
Distributed Map-Reduce  
Total Order Multicast  
Project: Distributed KV Store  

Homework

Classic distributed systems papers will be assigned for reading and review.

Active learning/In-person class participation

Students will learn about distributed algorithms using group activities in class. Typically, small groups of students will "emulate" a message-passing-based distributed algorithm, by passing messages (on post-it notes).

Late submission policy

Students can avail a total of four late submission days as they wish.

Administrative Information

Class Information

  When Where
Main Class Tuesdays and Thursdays 09:45A-11:00A Luddy AI 1001
Lab 1 Friday 09:45A-11:00A Ballantine Hall 308
Lab 2 Friday 11:30A-12:45P Ballantine Hall 308

Labs serve as office hours and assignment help for all students. Grading will also be performed during these times, where students will be asked to explain and justify their work.

Office Hours

Who Email Office Location Office Hours
Prateek Sharma prateeks @iu Luddy 4126 Wed 9–10 am, or by appointment
Sahil Tyagi styagi @iu   Lab 1 time
Shubham Mohapatra shmoha @iu   Lab 2 time
Alex Fuerst alfuerst @iu    

Author: prateek s

Created: 2022-01-18 Tue 08:29

Validate