Date
|
Presenters,
Slides,
and
Reviews
|
Topic
|
Main Papers
|
More Papers (optional)
Must-see Papers if your Project overlaps with the area.
|
1/20 |
Indy
[ppt] [pdf]
|
Introduction |
|
See topic "Spreading the Rumor" below |
1/22 |
Indy
[ppt] [pdf]
|
Before, There Were Clouds |
You can join the Googlegroups on Cloud Computing
|
-
Open Cirrus™ Cloud Computing Testbed: Federated Data Centers for Open Source Systems and Services Research, R. Campbell, I. Gupta, et al, HotCloud 2009 [HotCloud Version]
-
Datatecture: Data Center Overload, Tom Vanderbilt, New York Times Magazine, June 2009
-
Amazon EC2 and S3
-
Google AppEngine
-
Others: IBM Blue Cloud, SUN network.com, others (Joyent, Flexiscale, GoGrid) - see the GoogleGroups
-
Cost of a Cloud: Research Problems in Data Center Networks, A. Greenberg et al, ACM SIGCOMM CCR, 2009
-
A BluePrint for Introducing Disruptive Technology into the Internet, L. Peterson et al
-
Economic Perspectives on the History of the Computer Timesharing Industry, M. Campbell-Kelly and D. Garcia-Swartz
-
PlanetLab website
-
Emulab Website
-
ModelNet website
-
OpenCirrus
|
1/27 |
Indy
[ppt] [pdf]
|
Cloud Computing Continued |
|
|
1/29 |
Indy
[ppt] [pdf]
|
P2P Systems
|
|
|
1/29 |
Last day to sign up for a presentation slot (2/12/15 through 3/20/15). Two presenters per slot only. To sign up, you must come to Indy's office hours.
Instructions for Presentations and Reviews
|
2/3 |
Indy
[ppt] [pdf]
|
P2P Systems (contd.)
|
|
|
2/5 |
Indy
[ppt] [pdf]
|
Key-value Stores and NoSQL |
Others: MongoDB
|
|
2/10 |
Indy
[ppt] [pdf] |
Basic Distributed Algorithms Fundamentals and Sensor Networks |
|
|
2/12 |
Ashutosh Dhekne & Sayedhadi Hashemi
Slides 1: [ppt] [pdf]
Slides 2: [ppt] [pdf] |
Paxos and Commiting |
Please don't review the first paper
Student Presentations and Reviews Start - See Instructions
(Review these) - Student Presenters will present these
|
-
There is more consensus in egalitarian parliaments, I. Moraru, D. G. Andersen. M. Kaminsky, SOSP 2013.
-
Paxos made live - an engineering perspective, T. Chandra et al, PODC 2007
-
Using Paxos to build a scalable, consistent, and highly available datastore, J. Rao, E. Shekita, S. Tata, VLDB 2011
-
Chubby lock service, M. Burrows, OSDI 2006 (Google)
-
Paxos replicated state machines as the basis of a high-performance data store, W. Bolosky et al, NSDI 2011
|
2/17 |
Shyam Rajendran & Saikat Roychoudhury
Slides 1: [ppt] [pdf]
Slides 2: [pps] [pdf] [Demo] |
Cloud Programming |
|
-
Naiad: a timely dataflow system, D. G. Murray, et al, SOSP 2013
-
Pig latin: a not-so-foreign language for data processing, C. Olston et al, SIGMOD 2008 (Yahoo!)
-
MegaPipe: A New Programming Interface for Scalable I/O, S. Han, S. Marshall, B.-G. Chun, S. Ratnasamy, OSDI 2012
-
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language, Yuan Yu et al, OSDI 2008
-
Large-scale Incremental Processing Using Distributed Transactions and Notifications, D. Peng et al, OSDI 2010
-
Map-reduce-merge: simplified relational data processing on large clusters, H.-C. Yang et al, SIGMOD 2007
-
MapReduce Online, T. Condie et al, NSDI 2010
-
Wave Computing in the Cloud, B. He et al, HotOS 2009
-
Hadoop Streaming
-
HBase
-
Hive
-
Data challenges at Yahoo!, R. Baeza-Yates and Ramakrishnan, EDBT 08
-
Zookeeper (Yahoo!)
-
Zookeeper: wait-free coordination for Internet-scale systems, P. Hunt et al (Yahoo!), Usenix 2010
|
2/19 |
Sanjana Chandrasekhar & Pranav Moktali
Slides 1 [ppt]
Slides 2: [pdf] |
Stream Processing |
-
Adaptive Stream Processing using Dynamic Batch Sizing, Tathagata Das, Yuan Zhong, Ion Stoica, Scott Shenker, SoCC 2014
-
Stream: The Stanford data stream management system, A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, M. Datar, K. Ito, R. Motwani, U. Srivastava, J. Widom, Technical Report, Stanford University, 2004.
|
-
The design of the Borealis stream processing engine, D. J. Abadi et al, CIDR, 2005
-
Discretized Streams: Fault-tolerant Streaming Computation at Scale, M. Zaharia et al, SOSP 2013
-
L. Aniello, R. Baldoni, L. Querzoni, “Adaptive online scheduling in Storm,” Proc. ACM International Conference on Distributed Event-Based systems (DEBS), pp. 207-218, 2013.
-
M. Duller, J. S. Rellermeyer, G. Alonso, N. Tatbul, “Virtualizing stream processing,” Proc. ACM/IFIP/Usenix Middleware, Springer Lecture Notes in Computer Science, vol. 7049, pp. 269-288, 2011.
-
V. M. Gulisano, “StreamCloud: an elastic parallel-distributed stream processing engine,” PhD Thesis, Universidad Politécnica de Madrid, 2012.
-
S. Loesing, M. Hentschel, T. Kraska, D. Kossmann, “Stormy: an elastic and highly available streaming service in the cloud,” Proc. Joint EDBT/ICDT Workshops (EDBT-ICDT), pp. 55-60, 2012.
-
S. Schneider, H. Andrade, B. Gedik, A. Biem, “Elastic scaling of data parallel operators in stream processing,” Proc. International Parallel and Distributed Processing Symposium (IPDPS), pp. 1-12, 2009.
-
M. Yuan, K. L. Wu, G. Jacques-Silva, Y. Lu, “Efficient processing of streaming graphs for evolution-aware clustering,” Proc. ACM Conference of Information & Knowledge Management (CIKM), 2013.
|
2/24 |
Aditya Rastogi & Chinmay Kulkarni
Slides1: [ppt] [pdf]
Slides2: [ppt] [pdf] |
Somewhat Consistent |
|
-
Stronger semantics for low-latency geo-replicated storage, W. Lloyd, M. Freedman, M. Kaminsky, D. Andersen, NSDI 2013
-
Transaction chains: Achieving Serializability with Low Latency in Geo-Distributed Storage Systems, Y, Zhang et al, SOSP 2013
-
Tango: distributed data structures over a shared log, M. Balakrishnan, et al, SOSP 2013
-
Perspectives on the CAP theorem, S. Gilbert, N Lynch, Feb 2012
-
Don't settle for eventual: scalable causal consistency for wide-area storage with COPS, W. Lloyd, M. Freedman, M. Kaminsky, SOSP 2011
-
Making geo-replicated systems fast as possible, consistency when necessary, C. Li et al, OSDI 2012
-
Stronger semantics for low-latency geo-replicated storage, W. Llyod et al, NSDI 2013
-
Leveraging Sharding in the Design of Scalable Replication Protocols, H. Abu-Libdeh, R. van Renesse, Y. Vigfusson, SOCC 2013
-
Transactional storage for geo-replicated systems, Y. Sovran, R. Power, M. K. Aguilera, J. Li, SOSP 2011
-
Scalable consistency in Scatter, L. Glendenning, I. Beschastnikh, A. Krishnamurthy, T. Anderson, SOSP 2011
-
Towards robust distributed systems, Eric A. Brewer, Keynote, ACM PODC 2000
-
Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services, S Gilbert and N. Lynch, ACM SIGACT News, June 2002
-
Cumulus: Filesystem Backup to the Cloud, M. Vrable et al, FAST 2009
-
SPORC: Group Collaboration using Untrusted Cloud Resources, A. J. Feldman et al, OSDI 2010
|
2/26 |
Sharanya Bathey & Darshan Valia
Slides 1: [ppt] [pdf]
Slides 2: [pdf] |
Litmus Tests |
-
Salt: Combining ACID and BASE in a Distributed Database, Chao Xie, Chunzhi Su, Manos Kapritsos, Yang Wang, Navid Yaghmazadeh, Lorenzo Alvisi, and Prince Mahajan, OSDI 2014
-
Extracting More Concurrency from Distributed Transactions, Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, Jinyang Li, OSDI 2014
|
|
3/1 |
Project
Survey Report due, 11.59 pm [12pt font, single-sided, 3 pages for main
material + 1 page Business plan if applicable + any number of pages for
references] (In groups of 2-3)
Instructions for Survey and its Submission
|
3/3 |
Nirupam Roy & Vijetha Vijayendran
Slides 1: [ppt] [pdf]
Slides 2: [pdf] |
Adaptivity |
-
Starfish: a self-tuning system for big data analytics, H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, S. Babu, CIDR 2011.
-
Distributed Autonomous Virtual Resource Management in Datacenters Using Finite-Markov Decision Process, Liuhua Chen, Haiying Shen, Karan Sapra, SoCC 2014
|
-
CloudScale: elastic resource-scaling for multi-tenant cloud systems, Z. Shen, S. Subbaiah, X. Gu, J. Wilkes, SOCC 2011
-
Albatross: lightweight elasticity in shared storage databases for the cloud using live migration, S. Das, et al, VLDB 2010
-
PRESS: PRedictive Elastic ReSource Scaling for cloud systems, Z. Gong, X. Gu, J. Wilkes, CNSM, 2010.
-
EventWave: Programming Model and Runtime Support for Tightly-Coupled Elastic Cloud Applications, W.-C. Chuang et al, SOCC 2013
-
H. Herodotou, F. Dong, and S. Babu, “No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics,” Proc. ACM Symposium on Cloud Computing (SoCC), page 18, 2011.
-
AutoScale: dynamic, robust capacity management for multi-tier data centers,
A. Gandhi, M. Harchol-Balter, R. Raghunathan, M. A. Kozuch, ACM
Transactions on Computer Systems, vol. 30, no. 4, article 14 Nov. 2012.
-
The little engine(s) that could: scaling online social networks, J. M. Pujol et al, SIGCOMM CCR 2010.
-
S. Das, D. Agrawal. A. El Abbadi, “ElasTraS: An elastic transactional data store in the cloud,” Proc. ACM Workshop on Hot topics in cloud computing (HotCloud), 2009.
-
H. C. Lim, S. Babu, and J. S. Chase, “Automated Control for Elastic Storage,” Proc. International Conference on Autonomic Computing (ICAC), pp. 1–10, 2010.
-
P. Riteau, K. Keahey, C. Morin, “Bringing elastic Mapreduce to scientific clouds,” Proc. Annual Workshop on Cloud Computing and its Applications (Poster), 2011.
|
3/5 |
Vaijayanth Raghavan & Balachander Ramachandran
Slides 1: [ppt] [pdf]
Slides 2: [ppt] |
Blowing Hot and Cold: Storage |
-
f4: Facebook’s Warm BLOB Storage System,
Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest
Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar,
Linpeng Tang, Sanjeev Kumar, OSDI 2014
-
Pelican: A Building Block for Exascale Cold Data Storage,
Shobana Balakrishnan, Richard Black, Austin Donnelly, Paul England,
Adam Glass, Dave Harper, and Sergey Legtchenko, Aaron Ogus, Eric
Peterson and Antony Rowstron, OSDI 2014.
|
-
IOFlow: A Software-Defined Storage Architecture, E. Thereska, H. Ballani, et al. SOSP 2013
-
Rhea: automatic filtering for unstructured storage, C. Gkantsidis et al, NSDI 2013
-
Bigtable: A Distributed Storage System for Structured Data, Fay
Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E.
Gruber, OSDI 2006 (Google)
-
Robustness in the Salus Scalable Block Store, Y. Wang et al, NSDI 2013
-
Scaling Memcache at Facebook, R. Nishtala, et al, NSDI 2013
-
MemC3: compact and concurrent MemCache with dumber caching and smarter hashing, B. Fan et al, NSDI 2013
-
Spanner: Google's Globally-Distributed Database, J. C. Corbett, J. Dean, et al, OSDI 2012
-
Flat Datacenter Storage, E. B. Nightingale, J. Elson, J. Fan. O. Hofmann, J.Howell, Y. Suzue, OSDI 2012
-
Finding a Needle in Haystack: Facebook's Photo Storage, D. Beaver et al, OSDI 2010 [Link 1] [Link 2]
-
A Case for Redundant Arrays of Inexpensive Disks (RAID), D. Patterson et al, SIGMOD 1988
-
Ch. 1 from "The Innovator's Dilemma", C. M. Christensen (handout given in class)
-
HydraFS: A High-Throughput File System for the HYDRAstor Content-Addressable Storage System, C. Ungureanu et al, FAST 2010
-
Block-level RAID Is Dead, R. Appuswamy et al, HotStorage 2010
-
Mean Time to Meaningless: MTTDL, Markov Models, and Storage System Reliability, K. M. Greenan, HotStorage 2010
-
The Google File System, S. Ghemawat et al, SOSP 2003.
-
Megastore: Providing Scalable, Highly Available Storage for Interactive Services, J. Baker et al, CIRD 2010
-
Cassandra - a decentralized structured storage system, A. Lakshman and P. Malik (Facebook)
-
PNUTS: Yahoo!’s Hosted Data Serving Platform, Brian F. Cooper et al, VLDB 08
-
Hypertable (Yahoo!)
|
3/10 |
Ronald Wright & Uttam Thakore
Slides 1: [ppt] [pdf]
Slides 2: [pdf] |
Reliability |
|
-
Pico Replication: A High Availability Framework for Middleboxes, S. Rajagopalan, D. Williams, H. Jamjoom, SOCC 2013
-
On Fault Resilience of OpenStack, X. Ju, L. Soares, K. G. Shin, , K. D. Ryu, D. Da Silva, SOCC 2013
|
3/12 |
Kajori Banerjee & Syeda Persia
Slides 1: [ppt]
Slides 2: [ppt] |
A Touch of Sensor Nets |
|
-
Experiences from a decade of TinyOS development, P. Levis, OSDI 2012
-
Learn on the Fly: Data-driven Link Estimation
and Routing in Sensor Network Backbones,
Hongwei Zhang et al, Infocom 2006
-
Rumor Routing Algorithm For Sensor Networks, Braginsky et al
-
Energy-Efficient Communication Protocol for Wireless Microsensor Networks, Heinzelman et al
-
Adaptive Protocols for Information Dissemination in Wireless Sensor Networks, Kulik et al
-
Energy efficient routing in ad hoc disaster recovery Networks, G. Zussman et al, Infocom 2003
-
Locating and bypassing routing holes in sensor networks, Q. Fang et al, Infocom 2004.
|
3/17 |
Ajay Nair & Chaitanya Datye
Slides 1: [ppt]
Slides 2: [ppt] [pdf] |
Graph Processing |
-
LFGraph: Simple and Fast Distributed Graph Analytics, I. Hoque, I. Gupta, TRIOS 2013
-
GraphX: Graph Processing in a Distributed Dataflow Framework, Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, Ion Stoica, OSDI 2014.
|
-
A Distributed Graph Engine for Web Scale RDF Data, Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, Zhongyuan Wang, VLDB 2013
-
X-Stream: edge-centric graph processing using streaming partitions, A. Roy, I. Mihailovic, W. Zwanopoel, SOSP 2013
-
PowerGraph: distributed graph-parallel computation on natural graphs, J. Gonzalez, et al, OSDI 2012
-
Mining and indexing graphs for supergraph search, D. Yuan, P. Mitra, C. L. Giles, VLDB 2013
-
NeMa: Fast Graph Search with Label Similarity, Arijit Khan, Yinghui Wu, Charu Aggarwal, Xifeng Yan, VLDB 2013
-
Top-K Nearest Keyword Search on Large Graphs, Miao Qiao, Lu Qin, Hong Cheng, Jeffrey Yu, VLDB 2013
-
Efficient SimRank-based Similarity Join Over Large Graphs, Weiguo
Zheng, Lei Zou, Yansong Feng, Lei Chen, Dongyan Zhao, VLDB 2013
-
Distributed GraphLab: a framework for machine learning in the cloud, Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, J. M. Hellerstein, VLDB 2012.
-
GraphChi: large-Scale graph computation on just a PC, A. Kyrola, G. Blelloch, C. Guestrin, OSDI 2012.
-
Pregel: a system for large-scale graph processing, G. Malewicz et al, SIGMOD 2010
-
Towards energy-efficient database cluster design, W. Lang, et al, VLDB 2012
-
CrowdDB: query processing with the VLDB crowd, A. Feng et al, VLDB 2011
-
Fate and Destini: a framework for cloud recovery testing, H. Gunawi et al, NSDI 2011
-
Model checking a networked system without the network, R. Guerraoui, M. Yabandeh, NSDI 2011
-
SILT: a memory-efficient, high performance key-value store, H. Lim, B. Fan. D. Andersen, M. Kaminsky, SOSP 2011
-
Distributed GraphLab: a framework for machine-learning and datamining in the cloud, Y. Low et al, VLDB 2012
-
Densest subgraph in streaming and MapReduce, B. Bahmani, R. Kumar, S. Vassilvitskii, VLDB 2012
-
PrIter: a distributed framework for prioritized iterative computations, Y. Zhang et al, Socc 2011
-
R.
Chen, M. Yang, X. Weng, B. Choi, B. He, X. Li, “Improving large graph
processing on partitioned graphs in the cloud,” Proc. ACM Symposium on
Cloud Computing (SoCC), 2012.
-
J.
Shun, G. E. Blelloch, “Ligra: a lightweight graph processing framework
for shared memory,” SIGPLAN Notices, vol. 48, no. pp. 135-146, Feb.
2013.
-
L.
M. Vaquero, F. Cuadrado, D. Logothetis, C. Martella, C., “xDGP: A
dynamic graph processing system with adaptive partitioning,” Technical
report, Cornell University: arXiv:1309.1049 2013.
|
3/19 |
Tejala Thippeswamy & Yosub Shin
Slides 1: [ppt]
Slides 2: [pdf] |
Latency is King |
-
Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency, Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, Steven D. Gribble, SoCC 2014
-
PriorityMeister: Tail Latency QoS for Shared Networked Storage, Timothy Zhu, Alexey Tumanov, Michael A. Kozuch, Mor Harchol-Balter, Gregory R. Ganger, SoCC 2014
|
-
Bobtail: avoiding long tails in the cloud, Y. Xu, Z. Musgrave, B. Noble, M. Bailey, NSDI 2013
-
Small is Better: Avoiding Latency Traps in Virtualized Data Centers, Y. Xu, M. Bailey, B. Noble, F. Jahanian, SOCC 2013
-
The Tail at Scale, J. Dean, CACM 2013.
|
3/24 |
Spring Vacation - No Class.
|
3/26 |
Spring Vacation - No Class.
|
3/31 |
Baskar Retinasabapathi & Ashish Bijlani
Slides 1: [ppt]
Slides 2: [ppt] [pdf]
|
There's a P2P App for That |
|
-
Corona: a high performance publish-subscribe system for the World Wide Web, V. Ramasubramanian, R. Peterson, E. G. Sirer, NSDI 2006
-
Reliable client accounting for p2p-infrastructure hybrids, P. Aditya, et al, NSDI 2012
-
UsenetDHT: A Low-Overhead Design for Usenet
Emil Sit et al, NSDI 2008
-
Colyseus: A distributed architecture for interactive multiplayer games, A.R. Bharambe, Usenix NSDI 2006.
-
Peer-to-peer support for massively multiplayer games, B. Knutsson et al, Infocom 2004.
-
Operating system support for planetary-scale network services, A. Bavier et al, NSDI 2004.
-
CoDNS: Improving DNS Performance and Reliability via Cooperative Lookups, KyoungSoo Park et al, OSDI 2004 [ppt]
-
Wide-area cooperative storage with CFS, F. Dabek et al, SOSP 2001
-
Scalability of reliable group communication using overlays, F. Baccelli et al, Infocom 2004.
-
OceanStore: An Architecture for Global-Scale Persistent Storage , J. Kubiatowicz, ASPLOS 2000
|
4/2 |
Xander Masotto & Akash Kapoor
Slides 1: [ppt]
Slides 2: [ppt] [pdf]
|
Process it In-network |
|
-
Camdoop: exploiting in-network aggregation for big-data applications, P. Costa, A. Donnelly, A. Rowstron, G. O'Shea, NSDI 2012
-
Sailfish: a framework for large scale data processing, S. Rao et al, SOCC 2012
-
Making cloud intermediate data fault-tolerant, S. Ko et al, SOCC 2010
-
Trickle: a self-regulating algorithm for code propagation and maintenance in wireless sensor networks, P. Levis et al, NSDI 2004.
-
A framework for time indexing in sensor networks, He et al, ACM TOSN 2005.
-
Multi-resolution state retrieval in sensor networks, B. Deb et al, SNPA 2003
-
Robust location detection in emergency sensor networks, S. Ray et al, Infocom 2003
-
DIFS: A distributed index for features in sensor networks, B. Greenstein et al, SNPA 2003
-
Localized edge detection in sensor fields, K.K.Chintalapudi et al, SNPA 2003
-
Optimal energy balanced algorithm for selection in single hop sensor network, M. Singh et al, SNPA 2003
-
Sensor deployment and target localization based on virtual forces, Y. Zou et al, Infocom 2003
-
Localized algorithms in wireless ad-hoc networks: location discovery and sensor exposure, S. Meguerdichian et al, Mobihoc 2001
-
Amorphous Computing, H. Abelson et al, CACM 2000.
-
Probabilistic counting for database systems, Flajolet and Martin, JCSS, 1985
|
4/5 |
Project Midterm Report due, 11.59 pm [12pt font, single-sided, 8 pages for main material + 1 page Business plan if applicable + any number of pages for references] (In groups of 2-3)
Instructions for Midterm and its Submission
|
4/7 |
No class due to midterm reviews
(No reviews required, unless you're making up the count) |
How does it Really Behave? |
-
Measurement, modeling, and analysis of a peer-to-peer file-sharing workload
Krishna P. Gummadi et al, SOSP 2003
-
Understanding availability, R. Bhagwan et al, IPTPS 2003
-
Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming, L. Vu, I. Gupta, J. Liang, K. Nahrstedt, QShine 2007
-
An Evaluation of Amazon's Grid Computing Services: EC2, S3 and SQS, Simson Garfinkel, Harvard TechRep., 2007
-
What do Real-Life Hadoop Workloads Look Like, Cloudera Blog
|
-
Hadoop's Adolescence: An analysis of Hadoop usage in scientific workloads, K. Ren, Y. Kwon, M. Balazinska, B. Howe, VLDB 2013
-
Availability and Locality Measurements of Peer-to-Peer File Systems, J. Chu et al, SPIE 2002.
-
A measurement study of peer-to-peer file sharing systems, S. Saroui et al, MMCN 2002
-
Free riding on Gnutella
Adar and Huberman, First Monday, 2000
-
Small-world file sharing communities, A Iamnitchi et al, Infocom 2004.
|
4/9 |
No class due to midterm reviews
(No reviews required, unless you're making up the count) |
Low Fees Required - Probabilistic Membership |
|
-
Improving availability in distributed systems with failure informers, J. B. Lerners et al, NSDI 2013
-
Peer-to-peer membership management for gossip-based protocols, A.J. Ganesh et al, IEEE TOC, Feb 2003.
-
CONGRESS:CONnection-oriented Group address Resolution Service, A. Tal et al, 1997
-
Using random subsets to build scalable network services, D. Kostic et al, USITS 2003
-
T-Man: Fast Gossip-based Construction of
Large-Scale Overlay Topologies, M. Jelasity et al, U. Bologna Tech Report.
-
CYCLON: Inexpensive Membership Management
for Unstructured P2P Overlays, S. Voulgaris et al, Journal Network Systems and Management, June 2005
|
4/12 |
Midterm Reviews due, 11.59 PM
|
4/14 |
Qi Wang (qiwang11) & Dhruve Ashar
Slides 1: [ppt]
Slides 2: [ppt]
|
Cluster Scheduling |
-
The Power of Choice in Data-Aware Cluster Scheduling, Shivaram Venkataraman, Aurojit Panda, Ganesh Ananthanarayanan, Michael J. Franklin, Ion Stoica, OSDI 2014.
-
Reservation-based scheduling: if you're late don't blame us! Carlo Curino, Djellel Difallah, Chris Douglas, Subru Krishnan, Raghu Ramakrishnan, Sriram Rao, SoCC 2014
|
-
Omega: flexible, scalable schedulers for large compute clusters, Malte Schwarzkopf and Andy Konwinski and Michael Abd-El-Malek and John Wilkes, Eurosys 2013
-
Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing, Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, OSDI 2014
-
Sparrow: Distributed, Low Latency Scheduling, K. Ousterhout, P. Wendell, M. Zaharia, I. Stoica, SOSP 2013
-
ClouDiA: a Deployment advisor for Public Clouds, T. Zou et al, VLDB 2013
-
Distribution-based query scheduling, Y. Hi et al, VLDB 2013
-
Cake: enabling high-level SLOs on shared storage systems, A. Wang, et al, SOCC 2012
-
Dominant resource fairness: a fair allocation of multiple resource types, A. Ghodsi et al, NSDI 2011
-
Fast crash recovery in RAMCloud, D. Ongaro et al, SOSP 2011
-
PACMan: Coordinated memory caching for parallel jobs, G. Ananthanarayanan, A. Ghodsi, A. Wang, et al, NSDI 2012
-
Resilient Distributed Datasets: A fault-tolerant abstraction for in-memory cluster computing, M. Zaharia, M. Chowdhury, et al, NSDI 2012
-
Lightweight Locking For Main Memory Database Systems, K. Ren, A. ThomsonD. Abadi, VLDB 2013
-
Memcached: [Short article] [Main wiki] [Using memcached at Facebook] (read first two, look closely at third)
-
Redis
-
Zookeeper (Yahoo!)
-
Interactive Analysis of Webscale data, C. Olston et al, CIDR 2009 (Yahoo)
-
Mesos: a platform for fine-grained resource sharing in the data center, B. Hindman et al, NSDI 2011
-
Untangling cluster management with Helix, K. Gopalakrishna et al (LinkedIn), Socc 2012
-
Windows Azure Storage: a highly available cloud storage service with strong consistency, B. Calder et al, SOSP 2011
-
Don't lose sleep over availability: the GreenUp decentralized wakup service, S. Sen, J. Lorch, et al, NSDI 2012
-
Centrifuge: Integrated Lease Management and Partitioning for Cloud Services, A. Adya et al, NSDI 2010
-
ACMS: The Akamai Configuration Management System
Sherman et al, NSDI 2005.
-
Orchestrating the deployment of computations in the cloud with Conductor, A. Wieder et al, NSDI 2012
-
DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems, Y. Huai et al, Socc 2011
-
Declarative automated cloud resource orchestration, C. Liu et al, SOCC 2011
|
4/16 |
Alex Zahdeh & Sanchit Gupta
Slides 1: [ppt]
Slides 2: [ppt] [pdf]
|
Distributed Machine Learning |
-
Project Adam: Building an Efficient and Scalable Deep Learning Training System, Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman, OSDI 2014
-
Scaling Distributed Machine Learning with the Parameter Server,
Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed,
Vanja Josifovski, James Long, Eugene J. Shekita, Bor-Yiing Su, OSDI 2014
|
-
Exploiting iterative-ness for parallel ML computations,
Henggang Cui, Alexey Tumanov, Jinliang Wei, Lianghong Xu, Wei Dai,
Jesse Haber-Kucharsky, Qirong Ho, Gregory R. Ganger, Phillip B.
Gibbons*, Garth A. Gibson, Eric P. Xing, SoCC 2014
|
4/21 |
Hongwei Wang & Guangxiang Du
Slides 1: [pdf]
Slides 2: [ppt] [pdf]
|
Now Emerging |
|
-
Effective Straggler Mitigation: attack of the clones, G. Ananthanarayanan. A. Ghodsi, S. Shenker, I. Stoica, NSDI 2013
-
Scale-up vs Scale-out for Hadoop: Time to rethink?
R. Appuswamy et al, SOCC 2013
-
Natjam: Design and Evaluation of Eviction Policies For Supporting Priorities and Deadlines in Mapreduce Clusters, Brian Cho et al, SOCC 2013
-
ReStore: Reusing results of MapReduce jobs, I. Elghandour, A. Aboulnaga, VLDB 2012
-
Building wavelet histograms on large data in MapReduce, J. Jestes, K. Yi, F. Li, VLDB 2012.
-
Only aggressive elephants are fast elephants, J. Dittrich et al, VLDB 2012
-
Themis: an I/O efficient MapReduce, A. Rasmussen et al, Socc 2012
-
Social Content Matching in MapReduce, G. Morales, A. Gionis, M. Sozio, VLDB 2011
-
Improving MapReduce Performance in Heterogeneous Environments, M. Zaharia et al, OSDI 2008
-
Reining in the Outliers in Map-Reduce Clusters using Mantri, G. Ananthanarayanan, OSDI 2010
-
Oozie
|
4/23 |
Boyan Li & Richeng Huang
Slides 1: [pdf]
Slides 2: [ppt]
|
So Much Data! |
-
What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems,
Haryadi S. Gunawi, Mingzhe Hao, Tanakorn Leesatapornwongsa, Tiratat
Patana-anake, Thanh Do, Jeffry Adityatama, Kurnia J. Eliazar, Agung
Laksono, Jeffrey F. Lukman, Vincentius Martin, Anang D. Satria, SoCC
2014
-
Heterogeneity and dynamicity of clouds at scale: Google trace analysis, C. Reiss et al, SoCC 2012
|
-
An analysis of Facebook caching, Q. Huang, K. Birman, R. van Renesse, W. Lloyd, S. Kumar, H. C. Li, SOSP 2013
-
Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads, Y. Chen. S. Alspaugh, R. Katz, VLDB 2012
-
An empirical study on configuration errors in commercial and open-source systems, Z. Yin, et al, SOSP 2011
-
Projecting disk usage based on historical trends in a cloud environment, M. Stokely et al, SCC 2012 (Google).
-
Design implications for enterprise storage systems via multi-dimensional trace analysis, Y. Chen et al, SOSP 2011
-
Structured comparative analysis of systems logs to diagnose performance problems, K. Nagaraj, C. Killian, J. Neville, NSDI 2012
-
The unified logging infrastructure for data analystics at Twitter, G. Lee, et al, VLDB 2012
-
Modeling and synthesizing task placement constraints in Google compute clusters, B. Sharma et al, Socc 2011.
|
4/28 |
Harshit Dokania & Anirudh Jayakumar
Slides 1: [ppt][pdf]
Slides 2: [pdf]
|
Spreading the Rumor
|
|
-
Randomized Rumor Spreading, Karp and Shenker, FOCS 2000
-
Immunology as information processing, S. Forrest et al, 2000.
-
Adaptive and Efficient Epidemic-style Protocols for Reliable and Scalable Multicast, Gupta et al, IEEE TPDS, 2006.
-
Gossip-based ad hoc routing, Z. Haas et al, Infocom 2002
-
Spatial gossip and resource location protocols, Kempe, Kleinberg and Demers, STOC 2001
|
4/30 |
Indy
[ppt]
|
How do Networks Look? |
|
|
5/5 |
Indy
[ppt]
|
Completing the Circle |
(No reviews required for the following papers. Paper copies for offline papers were handed out during previous lecture.)
|
-
R. Hoffmann, "Why buy that theory?", 2003
-
R. P. Feynman, "Metaplast Corp."
|
END OF CLASSES
|
May 10 (Sunday) |
Project Final Report due, 11.59 pm [12pt font, single-sided, 12 pages for main material + 1 page Business plan if applicable + any number of pages for references] (In groups of 2-3)
(Deadline is Hard and final, no extensions!)
Instructions for Final Report and its Submission
|
|
|
The
following sessions are a rich source of ideas and directions for your
projects - mine them! We could not include them in the course schedule
due to the limited time.
|
Leftover |
|
Potpourri |
-
Distributed time-aware provenance, W. Zhou et al, VLDB 2013
-
Lightweight privacy-preserving peer-to-peer data integration, Y. Zhang et al, VLDB 2013
-
Understanding and Mitigating the Impact of Load Imbalance in the Memory Caching Tier,
Yu-Ju Hong, M. Thottethodi, SOCC 2013
-
More for your money: exploiting performance heterogeneity in public clouds, B. Farley et al, SOCC 2012
-
How to price shared optimizations in the cloud, P. Upadhyay, M. Balazinska, D. Suciu, VLDB 2012
-
Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs, H. Herodotu, S. Babu, VLDB 2011
-
Distributed Systems meets Economics: pricing in the cloud, H. Wang et al, HotCloud 2010
-
Optimizing Cost and Performance in Online Service Provider Networks, Z. Zhang et al, NSDI 2010
-
Reducing costs of spot instances via checkpointing in the Amazon Elastic Compute Cloud, S. Yi et al,” IEEE Intl. Conf. on Cloud Computing, 2010
-
Market-oriented Grids and Utility Computing: The State-of-the-art and Future Directions, J. Broberg et al, Journ. Grid Computing, 2007
|
|
Leftover |
|
Energy-Efficient Design |
|
|
Leftover |
(You can't sign up yet for this slot)
|
Securitae! |
|
|
Leftover |
|
In Byzantium |
|
-
UpRight Cluster Services, A. Clement et al, SOSP 2009
-
Zyzzyva: Speculative Byzantine Fault Tolerance (Awarded a best paper award.)
Ramakrishna Kotla et al, SOSP 2007
-
Practical Byzantine Fault-Tolerance, Castro et al, OSDI 1999.
-
Preserving peer replicas by rate-limited sampled voting, P. Maniatis et al, SOSP 2003
-
BAR Fault Tolerance for Cooperative Services. A. Aiyer et al, SOSP 2005.
-
BAR Gossip. Harry Li et al Usenix OSDI 2006
-
Scaling Byzantine Fault-Tolerant Replication to Wide Area Networks. Yair Amir et al, IEEE DSN 2006
-
BFT Protocols under Fire, Atul Singh et al, NSDI 2008
|
Leftover |
|
Geo-Distribution |
|
|
Leftover |
|
Old Wine: Stale or Vintage? |
|
-
2 P2P or Not 2 P2P, M. Roussopoulos et al, IPTPS 2004.
-
Scooped, again, J. Ledlie et al, IPTPS 2003.
-
A Note on Distributed Computing, A. Wollrath et al, MSR Techreport, 1994
-
Cloud computing is a trap, warns GNU founder Richard Stallman, Guardian (UK), Sep 29, 2008
-
(paper deleted - see their updated version in their SIGMOD 2009 paper) MapReduce - a major step backwards, D. DeWitt and M. Stonebraker
|
Leftover |
|
Publish-Subscribe/CDNs |
|
-
Gryphon Home
-
An efficient multicast protocol for content-based publish-subscribe systems, G. Banavar et al, ICDCS 1999
-
A reliable multicast framework for light-weight sessions and application level framing, S. Floyd et al, 1997
-
SCRIBE: the design of a large-scale event notification infrastructure, A. Rowstron et al, NGC 2001.
-
A shared control plane for overlay multicast, A. Nandi et al, NSDI 2007
-
FeedTree: Sharing Web micronews with peer-to-peer event notification, Sandler et al, IPTPS 2005.
-
Efficient Probabilistic Subsumption Checking for Content-based Publish/Subscribe Systems. A. Ouksel et al, Middleware 2006
-
Matching events in a content-based subscription system. M. K. Aguilera et al., PODC, 1999
-
Amazon's CloudFront service
|
Leftover |
|
Distributed Monitoring and Management |
|
-
Chukwa: A large-scale monitoring system, J. Boulon et al, CCA 2008
-
PlanetLab website
-
Emulab Website
-
WAIL website
-
Chukwa system (Hadoop monitoring)
-
Distributed system management: PlanetLab incidents and management tools, R. Adams, PlanetLab Techreport
-
PlanetLab management using Plush, J. Albrecht et al, ACM SIGOPS OSR, Jan 2006
-
A Scalable Distributed Information Management System. Praveen Yalagandula and Mike Dahlin. In Proceedings of ACM SIGCOMM, August, 2004.
-
Network imprecision: a new consistency metric for scalable monitoring, N. Jain, OSDI 2008
-
Field studies of computer system administrators: analysis of system management tools and practices, Barrett et al, IBM Almaden
-
Reducing the cost of IT operations: is automation always the answer? Brown and Hellerstein, IBM TJ Watson
|
Leftover |
|
Green Clouds |
|
|
Leftover |
|
Distributed Debugging |
|
-
Friday: Global Comprehension for Distributed Replay, Dennis Geels et al, NSDI 07
-
Pip: Detecting the Unexpected in Distributed Systems, P. Reynolds, NSDI 06
-
Performance Debugging for Distributed Systems of Black Boxes, A. Muthitacharoen et al, SOSP 03
-
Using Magpie for request extraction and workload modeling, P. Barham et al, OSDI 04
-
Pinpoint: Problem Determination in Large, Dynamic Internet Services, M. Chen et al, DSN 02
-
Life, Death, and the Critical Transition: Finding Liveness Bugs in Systems Code, NSDI 07
-
Using Queries for Distributed Monitoring and Forensics, A. Singh et al, EuroSys 06
|
Leftover |
|
Flash! |
|
|
Leftover |
|
The Middle or the End? |
(review any one of the following 3 papers)
|
-
Rethinking the design of the Internet: the end-to-end arguments vs. the brave new world, Blumenthal and Clark, ACM Trans. Internet Technology, 2001
-
Middleboxes no longer considered harmful, M. Walfish et al, OSDI 2004.
-
Scalable, Commodity Data Center Network Architecture, Al-Fares et al, SIGCOMM 2008
-
Internet-Scale Service Efficiency, J. H. Hamilton, LADIS 2008
-
Stable and Accurate Network Coordinates, Jonathan Ledlie, Peter Pietzuch, and Margo Seltzer, ICDCS 2006
-
On transport layer support for peer to peer networks, H-Y. Hsieh et al, IPTPS 2004.
-
A comparison of overlay routing and multihoming route control, A. Akella et al, SIGCOMM 2004.
-
Consensus Routing: The Internet as a Distributed System, John P. John et al, OSDI 2008
-
Overview of CAIDA Tools (give overview, and discuss at least five tools from different categories)
|
Leftover |
|
Availability-Aware Systems |
(read the papers, but no reviews required for this session)
|
|
Leftover |
|
Design Methodologies, Handling Stress |
(No class today, but if you submitted a review on time, you can skip one of the remaining review sessions)
|
-
Comparing the performance of DHTs under churn, J. Li et al, IPTPS 2004.
-
Routing design in operational networks: a look from the inside, D. A. Maltz et al, SIGCOMM 2004
-
(short paper) Tools for the code generation, J. Ambrosio 2003.
-
A protocol family approach to survivable storage infrastructures, J. Wylie et al, Fudico 2004.
-
Randomized ID selection for peer-to-peer networks, G. S. Manku, PODC 2004
-
Peer-to-Peer Approach to Resource Location in Grid Environments, A. Iamnitchi et al, 2003.
-
OSPF monitoring: architecture, design and deployment experience, A Shaikh et al, NSDI 2004
-
Metarouting, Griffin et al, SIGCOMM 2005.
-
Automatic Discovery of Mutual Exclusion Algorithms, Bar David et al, PODC 2003.
|
Leftover |
|
Sources of unreliability in networks |
|
-
Characterising the use of a campus wireless network, D. Schwab et al, Infocom 2004.
-
Origins of Internet Routing Instability, C. Labovitz et al, INFOCOM 1999
-
Firefly-inspired Heartbeat Synchronization in Overlay Networks, O. Babaoglu, SASO 2007
-
Gossip-Based Clock Synchronization for Large Decentralized Systems, K. Iwanicki et al, SelfMan 2006: 28-42
-
On the scalability of cooperative time synchronization in pulse-connected networks, Hu and Servetto, IEEE TON 2006.
-
Locating Internet routing instabilities, A. Feldmann et al, SIGCOMM 2004.
-
A longitudinal survey of Internet host reliability, D. Long et al, SRDS 1995
-
End-to-end Internet packet dynamics, V. Paxson, SIGCOMM 1997
-
Measurement and modeling of the temporal dependence in packet loss, M. Yajnik et al, Infocom 1999
-
Route flap damping exacerbates Internet routing convergence , Z. M. Mao et al, SIGCOMM 2002
-
Route oscillations in I-BGP with route reflection, A. Basu et al, SIGCOMM 2002
-
Stability issues in OSPF routing, A. Basu et al, SIGCOMM 20 01
-
On the effect of traffic self-similarity on network performance, K. Park et al, WSC 1997
-
Measurement and analysis of the error characteristics of an in building wireless network SIGCOMM 1996
-
Modeling the performance of wireless sensor networks, C-F. Chiasserini et al, Infocom 2004.
-
The synchronization of periodic routing messages, S. Floyd et al, IEEE/ACM TON, 1994.
-
Characterizing User Behavior and Network Performance in a Public Wireless LAN, Anand Balachandran et al, ACM SIGMETRICS 2002
|
Leftover |
|
A Step Back |
|
|
Leftover |
|
Distributed Management (2) |
|
|
Leftover |
|
Handling Stress |
|
|
Leftover |
|
Selfish algorithms |
|
|
Leftover |
|
Security |
|
|
Leftover |
|
Economic Theory |
|
|
Leftover |
|
The future of sensor nets? |
|
|
Leftover |
|
P2P - Etc. |
|
|
Leftover |
|
The End-to-End Approach |
|
|
Leftover |
|
Automatic Computing and Inference |
|
|
Leftover |
|
Modular Systems |
|
|
Leftover |
|
Practical theory perspectives |
|
|
Leftover |
|
Topology and Naming |
|
|
Leftover |
|
Classical Algorithms |
|
-
Exploiting network proximity in peer-to-peer overlay networks, M. Castro et al, MSR TechReport 2002
-
Geometric ad-hoc routing: of theory and practice, F. Kuhn et al, PODC 2003
-
On the curvature of the Internet and its usage for overlay construction and distance estimation, Y. Shavitt et al, Infocom 2004.
-
A practical distributed mutual exclusion protocol in dynamic peer-to-peer systems, S-D. Lin et al, IPTPS 2004.
-
Scalable and dynamic quorum systems, Naor and Wieder, PODC 2003
|
Leftover |
|
Caching |
|
-
A churn-resistant peer-to-peer web caching system, P. Linga et al, Wshop on Survivable & Self-Regenerative Systems 2003.
-
The case for cooperative networking, V. N. Padmanabhan et al, IPTPS 2002.
-
Approximate caches for packet classification, F. Chang et al, Infocom 2004.
-
Comparing strength of locality of reference - popularity, majorization and some folk theorems, S. Vanichpun, Infocom 2004.
-
Botz-4-Sale: Surviving Organized DDoS Attacks That Mimic Flash Crowds, Srikanth Kandula et al, NSDI 2005.
|