Interesting Papers from Cloud & Big Data
One of courses that I took in Columbia, with a really good course content, was Cloud and Big Data. I took the course in Fall '14 and TA'ed it in Spring '15. Every week, two papers were released that the students were supposed to read and submit a short summary about.
These are some really good papers and if you're interested in Distributed Systems, you'll really love reading them. I'm sharing the list here (in no particular order) and hope you enjoy reading them!
Cloud & Big Data Reading Paper List
-
The Google File System
Ghemawat, Sanjay;Gobioff, Howard; and Leung, Shun-Tak. ACM SIGOPS Operating Systems Review, 37(5) . 29-43.
GFS: Google File System
-
CloudCmp: Comparing Public Cloud Providers
Ang Li; Xiaowei Yang; Duke University, {angl, xwy}@cs.duke.edu; Srikanth Kandula; Ming Zhang; Microsoft Research {srikanth, mzh}@microsoft.com
CloudCmp: Comparing Public Cloud Providers
-
Xen and the Art of Virtualization
Barham, Paul;Dragovic, Boris;Fraser, Keir;Hand, Steven;Harris, Tim;Ho, Alex;Neugebauer, Rolf;Pratt, Ian; and Warfield, Andrew. ACM SIGOPS Operating Systems Review, 37(5) . 164-177.
Xen and the Art of Virtualization
-
Bigtable: A Distributed Storage System for Structured Data
Chang, Fay;Dean, Jeffrey;Ghemawat, Sanjay;Hsieh, Wilson C;Wallach, Deborah A;Burrows, Mike;Chandra, Tushar;Fikes, Andrew; and Gruber, Robert E. ACM Transactions on Computer Systems (TOCS), 26(2) 2008.
Bigtable: A Distributed Storage System for Structured Data
-
Live Migration of Virtual Machines
Clark, Christopher;Fraser, Keir;Hand, Steven;Hansen, Jacob Gorm;Jul, Eric;Limpach, Christian;Pratt, Ian; and Warfield, Andrew. Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation-Volume 2. 273-286.
Live Migration of Virtual Machines
-
Cassandra—A Decentralized Structured Storage System
Lakshman, Avinash and Malik, Prashant. Operating systems review, 44(2) 2010. 35.
Cassandra: A Decentralized Structured Storage System
-
Dynamo: Amazon's Highly Available Key-Value Store
DeCandia, Giuseppe;Hastorun, Deniz;Jampani, Madan;Kakulapati, Gunavardhan;Lakshman, Avinash;Pilchin, Alex;Sivasubramanian, Swaminathan;Vosshall, Peter; and Vogels, Werner. ACM SIGOPS Operating Systems Review, 41(6) . 205-220.
Dynamo: Amazon's Highly Available Key-Value Store
-
Serving Large-scale Batch Computed Data with Project Voldemort
Roshan Sumbaly; Jay Kreps; Lei Gao; Alex Feinberg; Chinmay Soman; Sam Shah;
LinkedIn, Usenix
Serving Large-scale Batch Computed Data with Project Voldemort
-
PNUTS: Yahoo!’s Hosted Data Serving Platform
Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni
Yahoo! Research.
PNUTS: Yahoo!’s Hosted Data Serving Platform
-
Hive: A Warehousing Solution Over a Map-Reduce Framework
Thusoo, Ashish;Sarma, Joydeep Sen; Jain, Namit;Shao, Zheng;Chakka, Prasad;Anthony, Suresh;Liu, Hao;Wyckoff, Pete; and Murthy, Raghotham. Proceedings of the VLDB Endowment, 2(2) 2009. 1626-1629.
Hive: A Warehousing Solution Over a Map-Reduce Framework
-
MapReduce: Simplified Data Processing on Large Clusters
Dean, Jeffrey and Ghemawat, Sanjay. Communications of the ACM, 51(1) 2008. 107-113.
MapReduce: Simplified Data Processing on Large Clusters
-
An Analysis of Facebook Photo Caching
Qi Huang, Ken Birman, Robbert van Renesse (Cornell University), Wyatt Lloyd (Princeton University), Sanjeev Kumar, Harry C. Li (Facebook Inc.)
An Analysis of Facebook Photo Caching
-
Scaling Memcache at Facebook
Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani, Facebook Inc. NSDI 2013
Scaling Memcache at Facebook
-
Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds
Thomas Ristenpart, Eran Tromer, Hovav Shacham, Stefan Savage
Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds
-
Apache Kafka: a Distributed Messaging System for Log Processing
Jay Kreps, Neha Narkhede, Jun Rao
Apache Kafka: a Distributed Messaging System for Log Processing
-
Spark: Cluster Computing with Working Sets
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica, University of California, Berkeley
Spark: Cluster Computing with Working Sets
-
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica University of California, Berkeley
RDD: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
-
Discretized Streams: Fault-Tolerant Streaming Computation at Scale
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, Ion Stoica, University of California, Berkeley
Discretized Streams: Fault-Tolerant Streaming Computation at Scale
-
Shark: SQL and Rich Analytics at Scale
Reynold S. Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, Ion Stoica
Shark: SQL and Rich Analytics at Scale