CS 525, Spring 2015: Course Schedule

Date	Presenters, Slides, and Reviews	Topic	Main Papers	More Papers (optional) *Must-see Papers if your Project overlaps with the area.*
1/20	Indy [ppt] [pdf]	Introduction	The mathematical theory of infectious diseases and its applications, N.T.J. Bailey, 1975 (out of print)	See topic "Spreading the Rumor" below
1/22	Indy [ppt] [pdf]	Before, There Were Clouds	Historical reflections: The rise, fall, and resurrection of software as a service, M. Campbell-Kelly, CACM, May 2009. Above the clouds (see the latest version of the paper on the site), M. Armbrust et al, Berkeley RADLAB, 2009. •Larry Ellison's Rant on Cloud Computing (Youtube video) You can join the Googlegroups on Cloud Computing	Open Cirrus™ Cloud Computing Testbed: Federated Data Centers for Open Source Systems and Services Research, R. Campbell, I. Gupta, et al, HotCloud 2009 [HotCloud Version] Datatecture: Data Center Overload, Tom Vanderbilt, New York Times Magazine, June 2009 Amazon EC2 and S3 Google AppEngine Others: IBM Blue Cloud, SUN network.com, others (Joyent, Flexiscale, GoGrid) - see the GoogleGroups Cost of a Cloud: Research Problems in Data Center Networks, A. Greenberg et al, ACM SIGCOMM CCR, 2009 A BluePrint for Introducing Disruptive Technology into the Internet, L. Peterson et al Economic Perspectives on the History of the Computer Timesharing Industry, M. Campbell-Kelly and D. Garcia-Swartz PlanetLab website Emulab Website ModelNet website OpenCirrus
1/27	Indy [ppt] [pdf]	Cloud Computing Continued	MapReduce: Simplified Data Processing on Large Clusters, J. Dean et al, OSDI 2004 (Google) Grid: a new infrastructure for 21st century science, I. Foster, Physics Today, 2002 (Argonne)	Parallel Computing on the Berkeley NOW, D. E. Culler et al, JSPP 1997 Cloudera's Video Tutorials on Hadoop and HDFS List of Cloud Computing Providers Hadoop Tutorial (and website) Some open source Cloud Computing Projects Hadoop Summit and Data-Intensive Symposium at CMU/Yahoo in March 2008 Tashi Project (CMU) Deter Testbed (UC Berkeley) Hadoop-on-demand The anatomy of the Grid: enabling scalable virtual organizations, I. Foster et al, Intnl Journal High Perf. Computing Appl. 2001
1/29	Indy [ppt] [pdf]	P2P Systems	The Gnutella protocol specification v 0.4	See topic "Overlays and DHTs" below BitTorrent Protocol Specification and BitTorrent Economics Paper Freenet: a distributed anonymous information storage and retrieval system, I. Clarke et al, 2000
1/29	Last day to sign up for a presentation slot (2/12/15 through 3/20/15). Two presenters per slot only. To sign up, you must come to Indy's office hours. Instructions for Presentations and Reviews
2/3	Indy [ppt] [pdf]	P2P Systems (contd.)	Chord: a scalable peer-to-peer lookup service for Internet applications, I. Stoica et al, SIGCOMM 2001 Pastry: scalable, distributed object location and routing for large-scale peer-to-peer systems, A. Rowstron et al, Middleware 2001. Kelips, I. Gupta et al, IPTPS 2003	Tutorial on CSP (Communicating Sequential Processes), Tony Hoare [Free Book] [CACM 1978 paper] State Machine Aproach: A Tutorial, F. Schneider, ACM CSUR 1990. Resilient overlay networks , D. Andersen et al, SOSP 2001 A scalable content addressable network, S. Ratnasamy et al, SIGCOMM 2001 Kelips, I. Gupta et al, IPTPS 2003 A routing underlay for overlay networks, A. Nakao et al, SIGCOMM 2003 Viceroy: a scalable and dynamic emulation of the butterfly, D. Malkhi et al, PODC 2002
2/5	Indy [ppt] [pdf]	Key-value Stores and NoSQL	Cassandra NoSQL Presentation Cassandra 1.0 documentation at datastax.com Cassandra Apache wiki HBase Others: MongoDB	Dynamo: Amazon's highly-available key-value store, DeCandia et al, SOSP 2007 Project Voldemort, Linkedin Comet: An Active Distributed Key-Value Store, R. Geambasu et al, OSDI 2010
2/10	Indy [ppt] [pdf]	Basic Distributed Algorithms Fundamentals and Sensor Networks	Time, clocks and the ordering of events in a distributed system, L. Lamport, Communications ACM 1978 Distributed snapshots: determining global states of distributed systems, Chandy and Lamport, ACM TOCS 1985 Impossibility of distributed consensus with one faulty process, Fischer, Lynch and Patterson, Journal ACM 1985 Smart Dust TinyOS	Research challenges in wireless networks of biomedical sensors, L. Schwiebert, ACM Sigmobile 2001 Research challenges in environmental observation and forecasting systems, D.C. Steere et al, Mobicom 2000 Design considerations for distributed microsensor systems, A. Chandrakasan et al, CICC 1999
2/12	Ashutosh Dhekne & Sayedhadi Hashemi Slides 1: [ppt] [pdf] Slides 2: [ppt] [pdf]	Paxos and Commiting	Please don't review the first paper (Indy will briefly present this paper) Paxos Made Simple, L. Lamport. Indy's slides: [ppt] [pdf] Student Presentations and Reviews Start - See Instructions (Review these) - Student Presenters will present these Paxos Quorum Leases: Fast Reads Without Sacrificing Writes, Iulian Moraru, David G. Andersen, Michael Kaminsky, SoCC 2014 Low-latency multi-datacenter databases using replicated commit, H. Mahmoud et al, VLDB 2013.	There is more consensus in egalitarian parliaments, I. Moraru, D. G. Andersen. M. Kaminsky, SOSP 2013. Paxos made live - an engineering perspective, T. Chandra et al, PODC 2007 Using Paxos to build a scalable, consistent, and highly available datastore, J. Rao, E. Shekita, S. Tata, VLDB 2011 Chubby lock service, M. Burrows, OSDI 2006 (Google) Paxos replicated state machines as the basis of a high-performance data store, W. Bolosky et al, NSDI 2011
2/17	Shyam Rajendran & Saikat Roychoudhury Slides 1: [ppt] [pdf] Slides 2: [pps] [pdf] [Demo]	Cloud Programming	Hive - a warehousing solution over a map-reduce framework A. Thusoo et al, VLDB 2009 Storm (use the wiki or other web resources)	Naiad: a timely dataflow system, D. G. Murray, et al, SOSP 2013 Pig latin: a not-so-foreign language for data processing, C. Olston et al, SIGMOD 2008 (Yahoo!) MegaPipe: A New Programming Interface for Scalable I/O, S. Han, S. Marshall, B.-G. Chun, S. Ratnasamy, OSDI 2012 DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language, Yuan Yu et al, OSDI 2008 Large-scale Incremental Processing Using Distributed Transactions and Notifications, D. Peng et al, OSDI 2010 Map-reduce-merge: simplified relational data processing on large clusters, H.-C. Yang et al, SIGMOD 2007 MapReduce Online, T. Condie et al, NSDI 2010 Wave Computing in the Cloud, B. He et al, HotOS 2009 Hadoop Streaming HBase Hive Data challenges at Yahoo!, R. Baeza-Yates and Ramakrishnan, EDBT 08 Zookeeper (Yahoo!) Zookeeper: wait-free coordination for Internet-scale systems, P. Hunt et al (Yahoo!), Usenix 2010
2/19	Sanjana Chandrasekhar & Pranav Moktali Slides 1 [ppt] Slides 2: [pdf]	Stream Processing	Adaptive Stream Processing using Dynamic Batch Sizing, Tathagata Das, Yuan Zhong, Ion Stoica, Scott Shenker, SoCC 2014 Stream: The Stanford data stream management system, A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, M. Datar, K. Ito, R. Motwani, U. Srivastava, J. Widom, Technical Report, Stanford University, 2004.	The design of the Borealis stream processing engine, D. J. Abadi et al, CIDR, 2005 Discretized Streams: Fault-tolerant Streaming Computation at Scale, M. Zaharia et al, SOSP 2013 L. Aniello, R. Baldoni, L. Querzoni, “Adaptive online scheduling in Storm,” Proc. ACM International Conference on Distributed Event-Based systems (DEBS), pp. 207-218, 2013. M. Duller, J. S. Rellermeyer, G. Alonso, N. Tatbul, “Virtualizing stream processing,” Proc. ACM/IFIP/Usenix Middleware, Springer Lecture Notes in Computer Science, vol. 7049, pp. 269-288, 2011. V. M. Gulisano, “StreamCloud: an elastic parallel-distributed stream processing engine,” PhD Thesis, Universidad Politécnica de Madrid, 2012. S. Loesing, M. Hentschel, T. Kraska, D. Kossmann, “Stormy: an elastic and highly available streaming service in the cloud,” Proc. Joint EDBT/ICDT Workshops (EDBT-ICDT), pp. 55-60, 2012. S. Schneider, H. Andrade, B. Gedik, A. Biem, “Elastic scaling of data parallel operators in stream processing,” Proc. International Parallel and Distributed Processing Symposium (IPDPS), pp. 1-12, 2009. M. Yuan, K. L. Wu, G. Jacques-Silva, Y. Lu, “Efficient processing of streaming graphs for evolution-aware clustering,” Proc. ACM Conference of Information & Knowledge Management (CIKM), 2013.
2/24	Aditya Rastogi & Chinmay Kulkarni Slides1: [ppt] [pdf] Slides2: [ppt] [pdf]	Somewhat Consistent	GentleRain: Cheap and Scalable Causal Consistency with Physical Clocks, Jiaqing Du, Calin Iorgulescu, Amitabha Roy, Willy Zwaenepoel, SoCC 2014 A Self-Configurable Geo-Replicated Cloud Storage System, Masoud Saeida Ardekani, and Douglas B. Terry, OSDI 2014	Stronger semantics for low-latency geo-replicated storage, W. Lloyd, M. Freedman, M. Kaminsky, D. Andersen, NSDI 2013 Transaction chains: Achieving Serializability with Low Latency in Geo-Distributed Storage Systems, Y, Zhang et al, SOSP 2013 Tango: distributed data structures over a shared log, M. Balakrishnan, et al, SOSP 2013 Perspectives on the CAP theorem, S. Gilbert, N Lynch, Feb 2012 Don't settle for eventual: scalable causal consistency for wide-area storage with COPS, W. Lloyd, M. Freedman, M. Kaminsky, SOSP 2011 Making geo-replicated systems fast as possible, consistency when necessary, C. Li et al, OSDI 2012 Stronger semantics for low-latency geo-replicated storage, W. Llyod et al, NSDI 2013 Leveraging Sharding in the Design of Scalable Replication Protocols, H. Abu-Libdeh, R. van Renesse, Y. Vigfusson, SOCC 2013 Transactional storage for geo-replicated systems, Y. Sovran, R. Power, M. K. Aguilera, J. Li, SOSP 2011 Scalable consistency in Scatter, L. Glendenning, I. Beschastnikh, A. Krishnamurthy, T. Anderson, SOSP 2011 Towards robust distributed systems, Eric A. Brewer, Keynote, ACM PODC 2000 Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services, S Gilbert and N. Lynch, ACM SIGACT News, June 2002 Cumulus: Filesystem Backup to the Cloud, M. Vrable et al, FAST 2009 SPORC: Group Collaboration using Untrusted Cloud Resources, A. J. Feldman et al, OSDI 2010
2/26	Sharanya Bathey & Darshan Valia Slides 1: [ppt] [pdf] Slides 2: [pdf]	Litmus Tests	Salt: Combining ACID and BASE in a Distributed Database, Chao Xie, Chunzhi Su, Manos Kapritsos, Yang Wang, Navid Yaghmazadeh, Lorenzo Alvisi, and Prince Mahajan, OSDI 2014 Extracting More Concurrency from Distributed Transactions, Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, Jinyang Li, OSDI 2014
3/1	Project Survey Report due, 11.59 pm [12pt font, single-sided, 3 pages for main material + 1 page Business plan if applicable + any number of pages for references] (In groups of 2-3) Instructions for Survey and its Submission
3/3	Nirupam Roy & Vijetha Vijayendran Slides 1: [ppt] [pdf] Slides 2: [pdf]	Adaptivity	Starﬁsh: a self-tuning system for big data analytics, H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, S. Babu, CIDR 2011. Distributed Autonomous Virtual Resource Management in Datacenters Using Finite-Markov Decision Process, Liuhua Chen, Haiying Shen, Karan Sapra, SoCC 2014	CloudScale: elastic resource-scaling for multi-tenant cloud systems, Z. Shen, S. Subbaiah, X. Gu, J. Wilkes, SOCC 2011 Albatross: lightweight elasticity in shared storage databases for the cloud using live migration, S. Das, et al, VLDB 2010 PRESS: PRedictive Elastic ReSource Scaling for cloud systems, Z. Gong, X. Gu, J. Wilkes, CNSM, 2010. EventWave: Programming Model and Runtime Support for Tightly-Coupled Elastic Cloud Applications, W.-C. Chuang et al, SOCC 2013 H. Herodotou, F. Dong, and S. Babu, “No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics,” Proc. ACM Symposium on Cloud Computing (SoCC), page 18, 2011. AutoScale: dynamic, robust capacity management for multi-tier data centers, A. Gandhi, M. Harchol-Balter, R. Raghunathan, M. A. Kozuch, ACM Transactions on Computer Systems, vol. 30, no. 4, article 14 Nov. 2012. The little engine(s) that could: scaling online social networks, J. M. Pujol et al, SIGCOMM CCR 2010. S. Das, D. Agrawal. A. El Abbadi, “ElasTraS: An elastic transactional data store in the cloud,” Proc. ACM Workshop on Hot topics in cloud computing (HotCloud), 2009. H. C. Lim, S. Babu, and J. S. Chase, “Automated Control for Elastic Storage,” Proc. International Conference on Autonomic Computing (ICAC), pp. 1–10, 2010. P. Riteau, K. Keahey, C. Morin, “Bringing elastic Mapreduce to scientific clouds,” Proc. Annual Workshop on Cloud Computing and its Applications (Poster), 2011.
3/5	Vaijayanth Raghavan & Balachander Ramachandran Slides 1: [ppt] [pdf] Slides 2: [ppt]	Blowing Hot and Cold: Storage	f4: Facebook’s Warm BLOB Storage System, Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, Sanjeev Kumar, OSDI 2014 Pelican: A Building Block for Exascale Cold Data Storage, Shobana Balakrishnan, Richard Black, Austin Donnelly, Paul England, Adam Glass, Dave Harper, and Sergey Legtchenko, Aaron Ogus, Eric Peterson and Antony Rowstron, OSDI 2014.	IOFlow: A Software-Defined Storage Architecture, E. Thereska, H. Ballani, et al. SOSP 2013 Rhea: automatic filtering for unstructured storage, C. Gkantsidis et al, NSDI 2013 Bigtable: A Distributed Storage System for Structured Data, Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, OSDI 2006 (Google) Robustness in the Salus Scalable Block Store, Y. Wang et al, NSDI 2013 Scaling Memcache at Facebook, R. Nishtala, et al, NSDI 2013 MemC3: compact and concurrent MemCache with dumber caching and smarter hashing, B. Fan et al, NSDI 2013 Spanner: Google's Globally-Distributed Database, J. C. Corbett, J. Dean, et al, OSDI 2012 Flat Datacenter Storage, E. B. Nightingale, J. Elson, J. Fan. O. Hofmann, J.Howell, Y. Suzue, OSDI 2012 Finding a Needle in Haystack: Facebook's Photo Storage, D. Beaver et al, OSDI 2010 [Link 1] [Link 2] A Case for Redundant Arrays of Inexpensive Disks (RAID), D. Patterson et al, SIGMOD 1988 Ch. 1 from "The Innovator's Dilemma", C. M. Christensen (handout given in class) HydraFS: A High-Throughput File System for the HYDRAstor Content-Addressable Storage System, C. Ungureanu et al, FAST 2010 Block-level RAID Is Dead, R. Appuswamy et al, HotStorage 2010 Mean Time to Meaningless: MTTDL, Markov Models, and Storage System Reliability, K. M. Greenan, HotStorage 2010 The Google File System, S. Ghemawat et al, SOSP 2003. Megastore: Providing Scalable, Highly Available Storage for Interactive Services, J. Baker et al, CIRD 2010 Cassandra - a decentralized structured storage system, A. Lakshman and P. Malik (Facebook) PNUTS: Yahoo!’s Hosted Data Serving Platform, Brian F. Cooper et al, VLDB 08 Hypertable (Yahoo!)
3/10	Ronald Wright & Uttam Thakore Slides 1: [ppt] [pdf] Slides 2: [pdf]	Reliability	Limplock: Understanding the Impact of Limpware on Scale-Out Cloud Systems, T. Do et al, SOCC 2013 Heading Off Correlated Failures through Independence-as-a-Service, Ennan Zhai, Ruichuan Chen, David Isaac Wolinsky, Bryan Ford, OSDI 2014.	Pico Replication: A High Availability Framework for Middleboxes, S. Rajagopalan, D. Williams, H. Jamjoom, SOCC 2013 On Fault Resilience of OpenStack, X. Ju, L. Soares, K. G. Shin, , K. D. Ryu, D. Da Silva, SOCC 2013
3/12	Kajori Banerjee & Syeda Persia Slides 1: [ppt] Slides 2: [ppt]	A Touch of Sensor Nets	Directed diffusion: A scalable and robust communication paradigm for sensor networks, C. Intanagonwiwat et al, Mobicom 2000 A review of current routing protocols for ad hoc mobile wireless networks, E.M. Royer et al, IEEE Personal Communications 1999	Experiences from a decade of TinyOS development, P. Levis, OSDI 2012 Learn on the Fly: Data-driven Link Estimation and Routing in Sensor Network Backbones, Hongwei Zhang et al, Infocom 2006 Rumor Routing Algorithm For Sensor Networks, Braginsky et al Energy-Efficient Communication Protocol for Wireless Microsensor Networks, Heinzelman et al Adaptive Protocols for Information Dissemination in Wireless Sensor Networks, Kulik et al Energy efficient routing in ad hoc disaster recovery Networks, G. Zussman et al, Infocom 2003 Locating and bypassing routing holes in sensor networks, Q. Fang et al, Infocom 2004.
3/17	Ajay Nair & Chaitanya Datye Slides 1: [ppt] Slides 2: [ppt] [pdf]	Graph Processing	LFGraph: Simple and Fast Distributed Graph Analytics, I. Hoque, I. Gupta, TRIOS 2013 GraphX: Graph Processing in a Distributed Dataflow Framework, Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, Ion Stoica, OSDI 2014.	A Distributed Graph Engine for Web Scale RDF Data, Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, Zhongyuan Wang, VLDB 2013 X-Stream: edge-centric graph processing using streaming partitions, A. Roy, I. Mihailovic, W. Zwanopoel, SOSP 2013 PowerGraph: distributed graph-parallel computation on natural graphs, J. Gonzalez, et al, OSDI 2012 Mining and indexing graphs for supergraph search, D. Yuan, P. Mitra, C. L. Giles, VLDB 2013 NeMa: Fast Graph Search with Label Similarity, Arijit Khan, Yinghui Wu, Charu Aggarwal, Xifeng Yan, VLDB 2013 Top-K Nearest Keyword Search on Large Graphs, Miao Qiao, Lu Qin, Hong Cheng, Jeffrey Yu, VLDB 2013 Efficient SimRank-based Similarity Join Over Large Graphs, Weiguo Zheng, Lei Zou, Yansong Feng, Lei Chen, Dongyan Zhao, VLDB 2013 Distributed GraphLab: a framework for machine learning in the cloud, Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, J. M. Hellerstein, VLDB 2012. GraphChi: large-Scale graph computation on just a PC, A. Kyrola, G. Blelloch, C. Guestrin, OSDI 2012. Pregel: a system for large-scale graph processing, G. Malewicz et al, SIGMOD 2010 Towards energy-efficient database cluster design, W. Lang, et al, VLDB 2012 CrowdDB: query processing with the VLDB crowd, A. Feng et al, VLDB 2011 Fate and Destini: a framework for cloud recovery testing, H. Gunawi et al, NSDI 2011 Model checking a networked system without the network, R. Guerraoui, M. Yabandeh, NSDI 2011 SILT: a memory-efficient, high performance key-value store, H. Lim, B. Fan. D. Andersen, M. Kaminsky, SOSP 2011 Distributed GraphLab: a framework for machine-learning and datamining in the cloud, Y. Low et al, VLDB 2012 Densest subgraph in streaming and MapReduce, B. Bahmani, R. Kumar, S. Vassilvitskii, VLDB 2012 PrIter: a distributed framework for prioritized iterative computations, Y. Zhang et al, Socc 2011 R. Chen, M. Yang, X. Weng, B. Choi, B. He, X. Li, “Improving large graph processing on partitioned graphs in the cloud,” Proc. ACM Symposium on Cloud Computing (SoCC), 2012. J. Shun, G. E. Blelloch, “Ligra: a lightweight graph processing framework for shared memory,” SIGPLAN Notices, vol. 48, no. pp. 135-146, Feb. 2013. L. M. Vaquero, F. Cuadrado, D. Logothetis, C. Martella, C., “xDGP: A dynamic graph processing system with adaptive partitioning,” Technical report, Cornell University: arXiv:1309.1049 2013.
3/19	Tejala Thippeswamy & Yosub Shin Slides 1: [ppt] Slides 2: [pdf]	Latency is King	Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency, Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, Steven D. Gribble, SoCC 2014 PriorityMeister: Tail Latency QoS for Shared Networked Storage, Timothy Zhu, Alexey Tumanov, Michael A. Kozuch, Mor Harchol-Balter, Gregory R. Ganger, SoCC 2014	Bobtail: avoiding long tails in the cloud, Y. Xu, Z. Musgrave, B. Noble, M. Bailey, NSDI 2013 Small is Better: Avoiding Latency Traps in Virtualized Data Centers, Y. Xu, M. Bailey, B. Noble, F. Jahanian, SOCC 2013 The Tail at Scale, J. Dean, CACM 2013.
3/24	Spring Vacation - No Class.
3/26	Spring Vacation - No Class.
3/31	Baskar Retinasabapathi & Ashish Bijlani Slides 1: [ppt] Slides 2: [ppt] [pdf]	There's a P2P App for That	Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility, A. Rowstron et al, SOSP 2001 Ivy: A Read/Write Peer-to-Peer File System, Athicha Muthitacharoen, Robert Morris, Thomer M. Gil, and Benjie Chen, OSDI 2002	Corona: a high performance publish-subscribe system for the World Wide Web, V. Ramasubramanian, R. Peterson, E. G. Sirer, NSDI 2006 Reliable client accounting for p2p-infrastructure hybrids, P. Aditya, et al, NSDI 2012 UsenetDHT: A Low-Overhead Design for Usenet Emil Sit et al, NSDI 2008 Colyseus: A distributed architecture for interactive multiplayer games, A.R. Bharambe, Usenix NSDI 2006. Peer-to-peer support for massively multiplayer games, B. Knutsson et al, Infocom 2004. Operating system support for planetary-scale network services, A. Bavier et al, NSDI 2004. CoDNS: Improving DNS Performance and Reliability via Cooperative Lookups, KyoungSoo Park et al, OSDI 2004 [ppt] Wide-area cooperative storage with CFS, F. Dabek et al, SOSP 2001 Scalability of reliable group communication using overlays, F. Baccelli et al, Infocom 2004. OceanStore: An Architecture for Global-Scale Persistent Storage , J. Kubiatowicz, ASPLOS 2000
4/2	Xander Masotto & Akash Kapoor Slides 1: [ppt] Slides 2: [ppt] [pdf]	Process it In-network	TAG: A Tiny Aggregation service for ad-hoc sensor networks, S. Madden, et al, OSDI 2002 Synopsis diffusion for robust aggregation in sensor networks, S. Nath et al, ACM TOSN, 2008.	Camdoop: exploiting in-network aggregation for big-data applications, P. Costa, A. Donnelly, A. Rowstron, G. O'Shea, NSDI 2012 Sailfish: a framework for large scale data processing, S. Rao et al, SOCC 2012 Making cloud intermediate data fault-tolerant, S. Ko et al, SOCC 2010 Trickle: a self-regulating algorithm for code propagation and maintenance in wireless sensor networks, P. Levis et al, NSDI 2004. A framework for time indexing in sensor networks, He et al, ACM TOSN 2005. Multi-resolution state retrieval in sensor networks, B. Deb et al, SNPA 2003 Robust location detection in emergency sensor networks, S. Ray et al, Infocom 2003 DIFS: A distributed index for features in sensor networks, B. Greenstein et al, SNPA 2003 Localized edge detection in sensor fields, K.K.Chintalapudi et al, SNPA 2003 Optimal energy balanced algorithm for selection in single hop sensor network, M. Singh et al, SNPA 2003 Sensor deployment and target localization based on virtual forces, Y. Zou et al, Infocom 2003 Localized algorithms in wireless ad-hoc networks: location discovery and sensor exposure, S. Meguerdichian et al, Mobihoc 2001 Amorphous Computing, H. Abelson et al, CACM 2000. Probabilistic counting for database systems, Flajolet and Martin, JCSS, 1985
4/5	Project Midterm Report due, 11.59 pm [12pt font, single-sided, 8 pages for main material + 1 page Business plan if applicable + any number of pages for references] (In groups of 2-3) Instructions for Midterm and its Submission
4/7	No class due to midterm reviews (No reviews required, unless you're making up the count)	How does it Really Behave?	Measurement, modeling, and analysis of a peer-to-peer file-sharing workload Krishna P. Gummadi et al, SOSP 2003 Understanding availability, R. Bhagwan et al, IPTPS 2003 Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming, L. Vu, I. Gupta, J. Liang, K. Nahrstedt, QShine 2007 An Evaluation of Amazon's Grid Computing Services: EC2, S3 and SQS, Simson Garfinkel, Harvard TechRep., 2007 What do Real-Life Hadoop Workloads Look Like, Cloudera Blog	Hadoop's Adolescence: An analysis of Hadoop usage in scientific workloads, K. Ren, Y. Kwon, M. Balazinska, B. Howe, VLDB 2013 Availability and Locality Measurements of Peer-to-Peer File Systems, J. Chu et al, SPIE 2002. A measurement study of peer-to-peer file sharing systems, S. Saroui et al, MMCN 2002 Free riding on Gnutella Adar and Huberman, First Monday, 2000 Small-world file sharing communities, A Iamnitchi et al, Infocom 2004.
4/9	No class due to midterm reviews (No reviews required, unless you're making up the count)	Low Fees Required - Probabilistic Membership	A gossip-based failure detection service, R. van Renesse et al, Middleware 1998 SWIM: Scalable Weakly-consistent Infection-style process group Membership protocol, A. Das et al, DSN 2002 On scalable and efficient distributed failure detectors, I. Gupta et al, PODC 2001	Improving availability in distributed systems with failure informers, J. B. Lerners et al, NSDI 2013 Peer-to-peer membership management for gossip-based protocols, A.J. Ganesh et al, IEEE TOC, Feb 2003. CONGRESS:CONnection-oriented Group address Resolution Service, A. Tal et al, 1997 Using random subsets to build scalable network services, D. Kostic et al, USITS 2003 T-Man: Fast Gossip-based Construction of Large-Scale Overlay Topologies, M. Jelasity et al, U. Bologna Tech Report. CYCLON: Inexpensive Membership Management for Unstructured P2P Overlays, S. Voulgaris et al, Journal Network Systems and Management, June 2005
4/12	Midterm Reviews due, 11.59 PM
4/14	Qi Wang (qiwang11) & Dhruve Ashar Slides 1: [ppt] Slides 2: [ppt]	Cluster Scheduling	The Power of Choice in Data-Aware Cluster Scheduling, Shivaram Venkataraman, Aurojit Panda, Ganesh Ananthanarayanan, Michael J. Franklin, Ion Stoica, OSDI 2014. Reservation-based scheduling: if you're late don't blame us! Carlo Curino, Djellel Difallah, Chris Douglas, Subru Krishnan, Raghu Ramakrishnan, Sriram Rao, SoCC 2014	Omega: flexible, scalable schedulers for large compute clusters, Malte Schwarzkopf and Andy Konwinski and Michael Abd-El-Malek and John Wilkes, Eurosys 2013 Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing, Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, OSDI 2014 Sparrow: Distributed, Low Latency Scheduling, K. Ousterhout, P. Wendell, M. Zaharia, I. Stoica, SOSP 2013 ClouDiA: a Deployment advisor for Public Clouds, T. Zou et al, VLDB 2013 Distribution-based query scheduling, Y. Hi et al, VLDB 2013 Cake: enabling high-level SLOs on shared storage systems, A. Wang, et al, SOCC 2012 Dominant resource fairness: a fair allocation of multiple resource types, A. Ghodsi et al, NSDI 2011 Fast crash recovery in RAMCloud, D. Ongaro et al, SOSP 2011 PACMan: Coordinated memory caching for parallel jobs, G. Ananthanarayanan, A. Ghodsi, A. Wang, et al, NSDI 2012 Resilient Distributed Datasets: A fault-tolerant abstraction for in-memory cluster computing, M. Zaharia, M. Chowdhury, et al, NSDI 2012 Lightweight Locking For Main Memory Database Systems, K. Ren, A. ThomsonD. Abadi, VLDB 2013 Memcached: [Short article] [Main wiki] [Using memcached at Facebook] (read first two, look closely at third) Redis Zookeeper (Yahoo!) Interactive Analysis of Webscale data, C. Olston et al, CIDR 2009 (Yahoo) Mesos: a platform for fine-grained resource sharing in the data center, B. Hindman et al, NSDI 2011 Untangling cluster management with Helix, K. Gopalakrishna et al (LinkedIn), Socc 2012 Windows Azure Storage: a highly available cloud storage service with strong consistency, B. Calder et al, SOSP 2011 Don't lose sleep over availability: the GreenUp decentralized wakup service, S. Sen, J. Lorch, et al, NSDI 2012 Centrifuge: Integrated Lease Management and Partitioning for Cloud Services, A. Adya et al, NSDI 2010 ACMS: The Akamai Configuration Management System Sherman et al, NSDI 2005. Orchestrating the deployment of computations in the cloud with Conductor, A. Wieder et al, NSDI 2012 DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems, Y. Huai et al, Socc 2011 Declarative automated cloud resource orchestration, C. Liu et al, SOCC 2011
4/16	Alex Zahdeh & Sanchit Gupta Slides 1: [ppt] Slides 2: [ppt] [pdf]	Distributed Machine Learning	Project Adam: Building an Efficient and Scalable Deep Learning Training System, Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman, OSDI 2014 Scaling Distributed Machine Learning with the Parameter Server, Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, Bor-Yiing Su, OSDI 2014	Exploiting iterative-ness for parallel ML computations, Henggang Cui, Alexey Tumanov, Jinliang Wei, Lianghong Xu, Wei Dai, Jesse Haber-Kucharsky, Qirong Ho, Gregory R. Ganger, Phillip B. Gibbons*, Garth A. Gibson, Eric P. Xing, SoCC 2014
4/21	Hongwei Wang & Guangxiang Du Slides 1: [pdf] Slides 2: [ppt] [pdf]	Now Emerging	Apache Hadoop YARN: Yet Another Resource Negotiator, V. K. Vavilapalli, A. C Murthy et al, SOCC 2013 C-Hint: An Effective and Reliable Cache Management for RDMA-Accelerated Key-Value Stores, Yandong Wang, Xiaoqiao Meng, Li Zhang, Jian Tan, SoCC 2014	Effective Straggler Mitigation: attack of the clones, G. Ananthanarayanan. A. Ghodsi, S. Shenker, I. Stoica, NSDI 2013 Scale-up vs Scale-out for Hadoop: Time to rethink? R. Appuswamy et al, SOCC 2013 Natjam: Design and Evaluation of Eviction Policies For Supporting Priorities and Deadlines in Mapreduce Clusters, Brian Cho et al, SOCC 2013 ReStore: Reusing results of MapReduce jobs, I. Elghandour, A. Aboulnaga, VLDB 2012 Building wavelet histograms on large data in MapReduce, J. Jestes, K. Yi, F. Li, VLDB 2012. Only aggressive elephants are fast elephants, J. Dittrich et al, VLDB 2012 Themis: an I/O efficient MapReduce, A. Rasmussen et al, Socc 2012 Social Content Matching in MapReduce, G. Morales, A. Gionis, M. Sozio, VLDB 2011 Improving MapReduce Performance in Heterogeneous Environments, M. Zaharia et al, OSDI 2008 Reining in the Outliers in Map-Reduce Clusters using Mantri, G. Ananthanarayanan, OSDI 2010 Oozie
4/23	Boyan Li & Richeng Huang Slides 1: [pdf] Slides 2: [ppt]	So Much Data!	What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems, Haryadi S. Gunawi, Mingzhe Hao, Tanakorn Leesatapornwongsa, Tiratat Patana-anake, Thanh Do, Jeffry Adityatama, Kurnia J. Eliazar, Agung Laksono, Jeffrey F. Lukman, Vincentius Martin, Anang D. Satria, SoCC 2014 Heterogeneity and dynamicity of clouds at scale: Google trace analysis, C. Reiss et al, SoCC 2012	An analysis of Facebook caching, Q. Huang, K. Birman, R. van Renesse, W. Lloyd, S. Kumar, H. C. Li, SOSP 2013 Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads, Y. Chen. S. Alspaugh, R. Katz, VLDB 2012 An empirical study on configuration errors in commercial and open-source systems, Z. Yin, et al, SOSP 2011 Projecting disk usage based on historical trends in a cloud environment, M. Stokely et al, SCC 2012 (Google). Design implications for enterprise storage systems via multi-dimensional trace analysis, Y. Chen et al, SOSP 2011 Structured comparative analysis of systems logs to diagnose performance problems, K. Nagaraj, C. Killian, J. Neville, NSDI 2012 The unified logging infrastructure for data analystics at Twitter, G. Lee, et al, VLDB 2012 Modeling and synthesizing task placement constraints in Google compute clusters, B. Sharma et al, Socc 2011.
4/28	Harshit Dokania & Anirudh Jayakumar Slides 1: [ppt][pdf] Slides 2: [pdf]	Spreading the Rumor	Bimodal multicast, K Birman et al, ACM TOCS 1999 Epidemic algorithms for replicated database maintenance, A. Demers et al, PODC 1987.	Randomized Rumor Spreading, Karp and Shenker, FOCS 2000 Immunology as information processing, S. Forrest et al, 2000. Adaptive and Efficient Epidemic-style Protocols for Reliable and Scalable Multicast, Gupta et al, IEEE TPDS, 2006. Gossip-based ad hoc routing, Z. Haas et al, Infocom 2002 Spatial gossip and resource location protocols, Kempe, Kleinberg and Demers, STOC 2001
4/30	Indy [ppt]	How do Networks Look?	Exploring complex networks, Steven Strogatz, Nature 2001 Scaling properties of the Internet graph, A. Akella et al, PODC 2003 Mapping the Gnutella network, M. Ripeanu et al, IEEE Computing Journal 2002	Implicit structure and the nature of blogspace, Adar et al. The anatomy of a large-scale hypertextual web search engine, Brin and Page DHT routing using social links, S. Marti et al, IPTPS 2004. Duncan Watt's Small World Project, Columbia Jon Kleinberg's Structure of Information Networks course Advice on Research and Writing (take with a grain of salt)
5/5	Indy [ppt]	Completing the Circle	(No reviews required for the following papers. Paper copies for offline papers were handed out during previous lecture.) World Brain, H. G. Wells, 1937 The tragedy of the commons, G. Hardin, 1968 How (and how not) to write a good SOSP paper, R. Levin and D. D. Redell, 1983	R. Hoffmann, "Why buy that theory?", 2003 R. P. Feynman, "Metaplast Corp."
END OF CLASSES
May 10 (Sunday)	Project Final Report due, 11.59 pm [12pt font, single-sided, 12 pages for main material + 1 page Business plan if applicable + any number of pages for references] (In groups of 2-3) (Deadline is Hard and final, no extensions!) Instructions for Final Report and its Submission


The following sessions are a rich source of ideas and directions for your projects - mine them! We could not include them in the course schedule due to the limited time.
Leftover		Potpourri	Distributed time-aware provenance, W. Zhou et al, VLDB 2013 Lightweight privacy-preserving peer-to-peer data integration, Y. Zhang et al, VLDB 2013 Understanding and Mitigating the Impact of Load Imbalance in the Memory Caching Tier, Yu-Ju Hong, M. Thottethodi, SOCC 2013 More for your money: exploiting performance heterogeneity in public clouds, B. Farley et al, SOCC 2012 How to price shared optimizations in the cloud, P. Upadhyay, M. Balazinska, D. Suciu, VLDB 2012 Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs, H. Herodotu, S. Babu, VLDB 2011 Distributed Systems meets Economics: pricing in the cloud, H. Wang et al, HotCloud 2010 Optimizing Cost and Performance in Online Service Provider Networks, Z. Zhang et al, NSDI 2010 Reducing costs of spot instances via checkpointing in the Amazon Elastic Compute Cloud, S. Yi et al,” IEEE Intl. Conf. on Cloud Computing, 2010 Market-oriented Grids and Utility Computing: The State-of-the-art and Future Directions, J. Broberg et al, Journ. Grid Computing, 2007
Leftover		Energy-Efficient Design	Towards energy-efficient database cluster design, W. Lang, et al, VLDB 2012 Using batteries to reduce the power costs of Internet-scale distributed networks, D. Palasamudram et al, Socc 2012 FAWN: A Fast Array of Wimpy Nodes, D. G. Andersen et al, SOSP 2009
Leftover	(You can't sign up yet for this slot)	Securitae!	DJoin: differentially private join queries over distributed databases, A. Narayan, A. Haeberlen, OSDI 2012 Towards statistical queries over distributed private user data, R. Chen, et al, NSDI 2012 CryptDB: protecting confidentiality with encrypted query processing, R. Popa, et al, SOSP 2011
Leftover		In Byzantium	The Byzantine Generals problem, L. Lamport et al, TOPLAS 1982 PeerReview: practical accountability for distributed systems, A. Haeberlen, P. Kouznetsov, P. Druschel, SOSP 2007 Airavat: Security and Privacy for MapReduce, I. Roy et al, OSDI 2010	UpRight Cluster Services, A. Clement et al, SOSP 2009 Zyzzyva: Speculative Byzantine Fault Tolerance (Awarded a best paper award.) Ramakrishna Kotla et al, SOSP 2007 Practical Byzantine Fault-Tolerance, Castro et al, OSDI 1999. Preserving peer replicas by rate-limited sampled voting, P. Maniatis et al, SOSP 2003 BAR Fault Tolerance for Cooperative Services. A. Aiyer et al, SOSP 2005. BAR Gossip. Harry Li et al Usenix OSDI 2006 Scaling Byzantine Fault-Tolerant Replication to Wide Area Networks. Yair Amir et al, IEEE DSN 2006 BFT Protocols under Fire, Atul Singh et al, NSDI 2008
Leftover		Geo-Distribution	RACS: a case for cloud storage diversity, H. Abu-Libdeh et al, SOCC 2010	Smoke and Mirrors: Reflecting Files at a Geographically Remote Location Without Loss of Performance, H. Weatherspoon et al, FAST 2009 Volley: Automated Data Placement for Geo-Distributed Cloud Services, S. Agarwal et al, NSDI 2010 Availability in Globally Distributed Storage Systems, D. Ford et al, OSDI 2010
Leftover		Old Wine: Stale or Vintage?	A comparison of approaches to large-scale data analysis, A. Pavlo et al, ACM SIGMOD 2009 On death, taxes and the convergence of peer-to-peer and grid computing, I. Foster et al, IPTPS 2003	2 P2P or Not 2 P2P, M. Roussopoulos et al, IPTPS 2004. Scooped, again, J. Ledlie et al, IPTPS 2003. A Note on Distributed Computing, A. Wollrath et al, MSR Techreport, 1994 Cloud computing is a trap, warns GNU founder Richard Stallman, Guardian (UK), Sep 29, 2008 (paper deleted - see their updated version in their SIGMOD 2009 paper) MapReduce - a major step backwards, D. DeWitt and M. Stonebraker
Leftover		Publish-Subscribe/CDNs	Splitstream, M. Castro et al, SOSP 2003. Anysee: p2p live streaming, X. Liao et al, Infocom 2006. Corona: a high-performance publish-subscribe for the World Wide Web, V. Ramasubramaniam et al, Usenix NSDI 2006.	Gryphon Home An efficient multicast protocol for content-based publish-subscribe systems, G. Banavar et al, ICDCS 1999 A reliable multicast framework for light-weight sessions and application level framing, S. Floyd et al, 1997 SCRIBE: the design of a large-scale event notification infrastructure, A. Rowstron et al, NGC 2001. A shared control plane for overlay multicast, A. Nandi et al, NSDI 2007 FeedTree: Sharing Web micronews with peer-to-peer event notification, Sandler et al, IPTPS 2005. Efficient Probabilistic Subsumption Checking for Content-based Publish/Subscribe Systems. A. Ouksel et al, Middleware 2006 Matching events in a content-based subscription system. M. K. Aguilera et al., PODC, 1999 Amazon's CloudFront service
Leftover		Distributed Monitoring and Management	Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining, van Renesse et al, ACM TOCS 2003. MON: ON-demand overlays for distributed system management, J. Liang et al, SIGOPS OSR 2007. Moara: Flexible and Scalable Group-Based Querying System, S. Ko, Middleware 2008.	Chukwa: A large-scale monitoring system, J. Boulon et al, CCA 2008 PlanetLab website Emulab Website WAIL website Chukwa system (Hadoop monitoring) Distributed system management: PlanetLab incidents and management tools, R. Adams, PlanetLab Techreport PlanetLab management using Plush, J. Albrecht et al, ACM SIGOPS OSR, Jan 2006 A Scalable Distributed Information Management System. Praveen Yalagandula and Mike Dahlin. In Proceedings of ACM SIGCOMM, August, 2004. Network imprecision: a new consistency metric for scalable monitoring, N. Jain, OSDI 2008 Field studies of computer system administrators: analysis of system management tools and practices, Barrett et al, IBM Almaden Reducing the cost of IT operations: is automation always the answer? Brown and Hellerstein, IBM TJ Watson
Leftover		Green Clouds	Managing Energy and Server Resources in Hosting Centers, J. Chase et al, SOSP 2001 On the Energy Inefficiency of Hadoop Clusters, J. Leverich et al, HotPower 2009 [Alternative Paper Link] Cost- and Energy-Aware Load Distribution Across Data Centers, Kien Le et al, HotPower 2009 ElasticTree: Saving Energy in Data Center Networks, B. Heller et al, NSDI 2010	HotPower 2009 L. Chiaraviglio et al, A Green Distributed Cooperation for Network and Content Management
Leftover		Distributed Debugging	D³S: Debugging Deployed Distributed Systems Xuezheng Liu et al, OSDI 2008 (Microsoft Research) WiDS Checker: Combating Bugs in Distributed Systems, X. Liu et al, NSDI 07 X-Trace: A Pervasive Network Tracing Framework, R. Fonseca et al, NSDI 07	Friday: Global Comprehension for Distributed Replay, Dennis Geels et al, NSDI 07 Pip: Detecting the Unexpected in Distributed Systems, P. Reynolds, NSDI 06 Performance Debugging for Distributed Systems of Black Boxes, A. Muthitacharoen et al, SOSP 03 Using Magpie for request extraction and workload modeling, P. Barham et al, OSDI 04 Pinpoint: Problem Determination in Large, Dynamic Internet Services, M. Chen et al, DSN 02 Life, Death, and the Critical Transition: Finding Liveness Bugs in Systems Code, NSDI 07 Using Queries for Distributed Monitoring and Forensics, A. Singh et al, EuroSys 06
Leftover		Flash!	Characterizing Flash Memory: Anomalies, Observations, and Applications, L. Grupp et al, MICRO 2009 Extending SSD Lifetimes with Disk-Based Write Caches, G. Soundararajan et al, FAST 2010 DFS: A File System for Virtualized Flash Storage, W. Josephson et al, FAST 2010
Leftover		The Middle or the End?	(review any one of the following 3 papers) End to end arguments in system design, Saltzer, Reed and Clark, 1984 Middleboxes: taxonomy and issues, RFC 3234 An End to the Middle, C. Dixon et al, Usenix HotOS 2009.	Rethinking the design of the Internet: the end-to-end arguments vs. the brave new world, Blumenthal and Clark, ACM Trans. Internet Technology, 2001 Middleboxes no longer considered harmful, M. Walfish et al, OSDI 2004. Scalable, Commodity Data Center Network Architecture, Al-Fares et al, SIGCOMM 2008 Internet-Scale Service Efficiency, J. H. Hamilton, LADIS 2008 Stable and Accurate Network Coordinates, Jonathan Ledlie, Peter Pietzuch, and Margo Seltzer, ICDCS 2006 On transport layer support for peer to peer networks, H-Y. Hsieh et al, IPTPS 2004. A comparison of overlay routing and multihoming route control, A. Akella et al, SIGCOMM 2004. Consensus Routing: The Internet as a Distributed System, John P. John et al, OSDI 2008 Overview of CAIDA Tools (give overview, and discuss at least *five* tools from different categories)
Leftover		Availability-Aware Systems	(read the papers, but no reviews required for this session) Understanding availability, R. Bhagwan et al, IPTPS 2003 AVCast: new approaches for implementing availability-dependent reliability for multicast receivers, T. Pongthawornkamol et al, IEEE SRDS 2006. AVMON: Optimal and scalable discovery of consistent availability monitoring overlays for distributed systems, R. Morales et al, IEEE TPDS 2008.
Leftover		Design Methodologies, Handling Stress	(No class today, but if you submitted a review on time, you can skip one of the remaining review sessions) The design of novel distributed protocols from differential equations, Distributed Computing, August 2007 Implementing Declarative Overlays. Boon Thau Loo et al, SOSP 2005. Sinfonia: A New Paradigm for Building Scalable Distributed Systems, Marcos K. Aguilera et al, SOSP 2007	Comparing the performance of DHTs under churn, J. Li et al, IPTPS 2004. Routing design in operational networks: a look from the inside, D. A. Maltz et al, SIGCOMM 2004 (short paper) Tools for the code generation, J. Ambrosio 2003. A protocol family approach to survivable storage infrastructures, J. Wylie et al, Fudico 2004. Randomized ID selection for peer-to-peer networks, G. S. Manku, PODC 2004 Peer-to-Peer Approach to Resource Location in Grid Environments, A. Iamnitchi et al, 2003. OSPF monitoring: architecture, design and deployment experience, A Shaikh et al, NSDI 2004 Metarouting, Griffin et al, SIGCOMM 2005. Automatic Discovery of Mutual Exclusion Algorithms, Bar David et al, PODC 2003.
Leftover		Sources of unreliability in networks	Internet routing instability, C. Labovitz et al, SIGCOMM 1997 Characterization of failures in an IP backbone, A. Markopoulos et al, Infocom 2004. The Changing Usage of a Mature Campuswide Wireless Network, Tristan Henderson et al, ACM Mobicom 2004	Characterising the use of a campus wireless network, D. Schwab et al, Infocom 2004. Origins of Internet Routing Instability, C. Labovitz et al, INFOCOM 1999 Firefly-inspired Heartbeat Synchronization in Overlay Networks, O. Babaoglu, SASO 2007 Gossip-Based Clock Synchronization for Large Decentralized Systems, K. Iwanicki et al, SelfMan 2006: 28-42 On the scalability of cooperative time synchronization in pulse-connected networks, Hu and Servetto, IEEE TON 2006. Locating Internet routing instabilities, A. Feldmann et al, SIGCOMM 2004. A longitudinal survey of Internet host reliability, D. Long et al, SRDS 1995 End-to-end Internet packet dynamics, V. Paxson, SIGCOMM 1997 Measurement and modeling of the temporal dependence in packet loss, M. Yajnik et al, Infocom 1999 Route flap damping exacerbates Internet routing convergence , Z. M. Mao et al, SIGCOMM 2002 Route oscillations in I-BGP with route reflection, A. Basu et al, SIGCOMM 2002 Stability issues in OSPF routing, A. Basu et al, SIGCOMM 20 01 On the effect of traffic self-similarity on network performance, K. Park et al, WSC 1997 Measurement and analysis of the error characteristics of an in building wireless network SIGCOMM 1996 Modeling the performance of wireless sensor networks, C-F. Chiasserini et al, Infocom 2004. The synchronization of periodic routing messages, S. Floyd et al, IEEE/ACM TON, 1994. Characterizing User Behavior and Network Performance in a Public Wireless LAN, Anand Balachandran et al, ACM SIGMETRICS 2002
Leftover		A Step Back	A modular network layer for SensorNets, C.T. Ee et al, Usenix OSDI 2006. Evaluating the running time of a communication round over the Internet, O. Bakr et al, PODC 2002	Service capacity of peer-to-peer networks, X. Yang et al, Infocom 2004. The capacity of wireless networks, P. Gupta et al, IEEE Transactions on Information Theory, vol. IT-46, no. 2, pp. 388-404, March 2000
Leftover		Distributed Management (2)	Globus: a metacomputing infrastructure toolkit, I. Foster et al, Intnl. Journal Supercomputer Applications and High Performance Computing Condor and the Grid, D. Thain et al, Wiley Journals Globus and PlanetLab Resource Management Solutions Compared, M. Ripeanu et al, HPDC 2004
Leftover		Handling Stress	Understanding availability, R. Bhagwan et al, IPTPS 2003 Minimizing churn in distributed systems, P. Godfrey, S. Shenker, and I. Stoica, SIGCOMM 2006 AVCast: new approaches for implementing availability-dependent reliability for multicast receivers, T. Pongthawornkamol et al, IEEE SRDS 2006.	Handling Churn in a DHT, S. Rhea et al, Usenix 2004. High-reliability architectures for networks under stress, G. E. Weichenberg et al, Infocom 2004. Comparing the performance of DHTs under churn, J. Li et al, IPTPS 2004.
Leftover		Selfish algorithms	The tragedy of the commons, G. Hardin, 1968 How bad is selfish routing, T. Roughgarden et al, FOCS 2000 Characterizing selfishly constructed overlay networks, B-G. Chun et al, Infocom 2004.	On Selfish Routing in Internet-Like Environments, L. Qiu, SIGCOMM 2003
Leftover		Security	Scalability, Fidelity and Containment in the Potemkin Virtual Honeyfarm, Michael Vrable et al, SOSP 2005. Vigilante: End-to-End Containment of Internet Worms, Manuel Costa, SOSP 2005. TinySec: A Link Layer Security Architecture for Wireless Sensor Networks, Chris Karlof et al, Sensys 2004.	The Sybil Attack, J. R. Douceur, IPTPS 2002 Secure routing in wireless sensor networks: attacks and countermeasures, C. Karlof et al, SNPA 2003 Secure routing for structured peer-to-peer overlay networks, M. Castro et al, OSDI 2002 Peer-to-Peer File Sharing and Copyright Law: A Primer for Developers, F. von Lohmann, IPTPS 2003
Leftover		Economic Theory	Rationality and self-interest in peer to peer networks, J. Shneidman et et al, IPTPS 2003 Distributed algorithmic mechanism design: recent results and future directions, J. Feigenbaum et al, DIALM 2002. To share or not to share: an analysis of incentives to contribute in collaborative file sharing environments, K. Ranganathan, Wshop Economics of P2P systems 2003	Incentives for Cooperation in Peer-to-Peer Networks, K. Lai, Wshop Economics of P2P systems 2003 (short paper) The social cost of sharing, H. R. Varian, Wshop Economics of P2P systems 2003
Leftover		The future of sensor nets?	Research challenges in wireless networks of biomedical sensors, L. Schwiebert, ACM Sigmobile 2001 Research challenges in environmental observation and forecasting systems, D.C. Steere et al, Mobicom 2000 Design considerations for distributed microsensor systems, A. Chandrakasan et al, CICC 1999	Next century challenges: mobile networking for smart dust, J.M. Jahn et al, Mobicom 1999 System architecture directions for networked sensors, J. Hill et al, ASPLOS 2000 Next century challenges: scalable coordination in sensor networks, D. Estrin et al, Mobicom 1999
Leftover		P2P - Etc.	Starfish: highly-available block storage, E. Gabber et al, Usenix 2003. Turning the postal system into a generic digital communication mechanism, R. Y. Wang et al, SIGCOMM 2004.
Leftover		The End-to-End Approach	End to end arguments in system design, Saltzer, Reed and Clark, 1984 ESRT : Event-to-Sink Reliable Transport in wireless sensor networks, Y. Sankarasubramaniam et al, Mobihoc 2003 Middleboxes: taxonomy and issues, RFC 3234, zvon.org (read entire article by following "Next" links),	Rethinking the design of the Internet: the end-to-end arguments vs. the brave new world, Blumenthal and Clark, ACM Trans. Internet Technology, 2001 Untangling the Web from DNS, M. Walfish et al, NSDI 2004.
Leftover		Automatic Computing and Inference	Model checking large protocol implementations, M. Musuvathi et al, NSDI 2004. Overview of CAIDA Tools (give overview, and discuss at least five tools from different categories) Total Recall: system support for automated availability management, R. Bhagwan et al, NSDI 2004.	Inferring TCP Connection Characteristics Through Passive Measurements, S. Jaiswal et al, Infocom 2004. Multiple source, multiple destination network tomography, M. Rabbat et al, Infocom 2004.
Leftover		Modular Systems	The Click modular router, E. Kohler et al, ACM TOCS 2000. A composable service model with loss and a scheduling algorithm, S. Ayyorgun et al, Infocom 2004. Composition and behaviors of probabilistic I/O automata, Wu et al, TCS 1997.
Leftover		Practical theory perspectives	Graph-theoretic analysis of structured peer-to-peer systems: routing distances and fault resilience, D. Loguinov et al, SIGCOMM 2003 Computation in Networks of Passively Mobile Finite-State Sensors, Dana Angluin, James Aspnes, Zoe Diamadi, Michael Fischer, Rene Peralta, PODC 2004.
Leftover		Topology and Naming	Algorithmic aspects of topology control problems for ad hoc networks, E. Lloyd et al, Mobihoc 2002 Unreliable sensor grids: coverage, connectivity and diameter, S. Shakkottai et al, Infocom 2003 An address-free architecture for dynamic sensor networks, J. Elson et al, 2000 Prophet address allocation for large scale MANETs, H. Zhou et al, Infocom 2003	Biologically Inspired Topology Control Mechanism for Multi-hop Wireless Network, Z. Huang et al, Mobihoc 2003
Leftover		Classical Algorithms	A sqrt-N algorithm for mutual exclusion in decentralized systems, M. Maekawa, ACM TOCS, Apr. 1985. Replication strategies in unstructured peer-to-peer networks, E. Cohen et al, SIGCOMM 2002 Reliable communication in the presence of failures, K.P. Birman et al, ACM TOCS, Feb 1987.	Exploiting network proximity in peer-to-peer overlay networks, M. Castro et al, MSR TechReport 2002 Geometric ad-hoc routing: of theory and practice, F. Kuhn et al, PODC 2003 On the curvature of the Internet and its usage for overlay construction and distance estimation, Y. Shavitt et al, Infocom 2004. A practical distributed mutual exclusion protocol in dynamic peer-to-peer systems, S-D. Lin et al, IPTPS 2004. Scalable and dynamic quorum systems, Naor and Wieder, PODC 2003
Leftover		Caching	On the scale and performance of cooperative web proxy caching, A. Wolman et al, SOSP 1999 Squirrel: a decentralized, peer-to-peer web cache, S. Iyer et al, PODC 2002. Caching technologies for web applications, C. Mohan (IBM), VLDB 2001	A churn-resistant peer-to-peer web caching system, P. Linga et al, Wshop on Survivable & Self-Regenerative Systems 2003. The case for cooperative networking, V. N. Padmanabhan et al, IPTPS 2002. Approximate caches for packet classification, F. Chang et al, Infocom 2004. Comparing strength of locality of reference - popularity, majorization and some folk theorems, S. Vanichpun, Infocom 2004. Botz-4-Sale: Surviving Organized DDoS Attacks That Mimic Flash Crowds, Srikanth Kandula et al, NSDI 2005.

Note: The Spring 2015 schedule has roughly 55% new papers for the student sessions (20/36) compared to the last version of CS525 (SP14).

Presentation Schedule // CS 525: Advanced Distributed Systems // Spring 2015