Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Department of Computer Science — University of San Francisco – p.1/?? Cloud Computing at Google Google has developed a layered system to handle webscale applications. Google File System BigTable MapReduce Department of Computer Science — University of San Francisco – p.2/?? Google File System What are the primary design issues surrounding GFS? Department of Computer Science — University of San Francisco – p.3/?? Design Issues Commodity hardware - failures are the rule Huge files Files tend to be written, then either appended or streamed Random writes are rare What sorts of applications would have this behavior? Multiple users may simultaneously write to a file API and application design should happen in tandem Sustained bandwidth more important than latency Department of Computer Science — University of San Francisco – p.4/?? Architecture Single master Many chunkservers Many clients Files are divided into 64MB chunks. A chunk is redundantly stored at many chunkservers. Department of Computer Science — University of San Francisco – p.5/?? Master What is the master’s role? Department of Computer Science — University of San Francisco – p.6/?? Master Maintain metadata namespace, access control, mapping of files to chunks, chunk locations. Refer clients to chunkservers. Control lease management, garbage collection, migration Department of Computer Science — University of San Francisco – p.7/?? Chunkserver What is the chunkserver’s role? Department of Computer Science — University of San Francisco – p.8/?? Chunkserver What is the chunkserver’s role? Serve up chunks to clients Department of Computer Science — University of San Francisco – p.9/?? Control flow What is the typical order of operations for a client that wants to read a file? Department of Computer Science — University of San Francisco – p.10/?? Control flow Client sends filename and offset to master. Master returns chunk handle and replica locations. Client chooses a replica and requests chunk range. Master not needed for further data exchange. Department of Computer Science — University of San Francisco – p.11/?? Advantages What are some advantages of this approach? Department of Computer Science — University of San Francisco – p.12/?? Advantages Simplicity of master. Data held in memory. No need to handle chunk access. Easy to handle failure; client just requests a new chunk. Department of Computer Science — University of San Francisco – p.13/?? Persistence Is a single master a potential point of failure? How can a master recover from a crash? Department of Computer Science — University of San Francisco – p.14/?? Persistence Master keeps all data structures in memory Each file action is logged. Master also periodically checkpoints. On failure, reload from checkpoint and play back log. Department of Computer Science — University of San Francisco – p.15/?? Chunk info How does the master know what chunks are stored at each chunkserver? Department of Computer Science — University of San Francisco – p.16/?? Chunk info Master periodically sends a heartbeat to each chunkserver. chunkserver responds with list of all stored chunks and their status. Occasionally, master may have stale information. Simplifies master and reduces overhead. Department of Computer Science — University of San Francisco – p.17/?? Consistency What does consistency mean? What does “defined” mean? Department of Computer Science — University of San Francisco – p.18/?? Consistency What does consistency mean? All clients see the same data What does “defined” mean? All clients see the complete results of a mutation. If a single mutation succeeds, it is consistent and defined. Concurrent writes may be consistent but not defined. Appends are handled more efficiently than random writes. Department of Computer Science — University of San Francisco – p.19/?? Implications for applications What implications does this model have for an application? Department of Computer Science — University of San Francisco – p.20/?? Implications for applications What implications does this model have for an application? Applications should append when possible Applications need to keep track of the defined region of the file. Applications will need to tolerate or filter occasional duplicate records. Department of Computer Science — University of San Francisco – p.21/?? Leases What is a lease? How is it used? Department of Computer Science — University of San Francisco – p.22/?? Leases What is a lease? How is it used? A lease is an object that is used to allow mutations to a chunk. The master grants this to one chunkserver (the primary) which then coordinates writes with other replicas. Department of Computer Science — University of San Francisco – p.23/?? Writing replicated data What is the order of operations for writing replicated data? Department of Computer Science — University of San Francisco – p.24/?? Writing replicated data Client obtains a lease Sends write request to primary Client sends data to all replicas; these are cached. Primary sends write request to all replicas. All replicas process writes to that chunk in the same order. What if a replica fails during this operation? Department of Computer Science — University of San Francisco – p.25/?? Data flow Data is pushed between replicas in a linear fashion. This is an interesting choice; they could have used multicast, or a tree. Why is this? Department of Computer Science — University of San Francisco – p.26/?? Bigtable Bigtable is implemented on top of GFS What are the goals of bigtable? What does it not provide? Department of Computer Science — University of San Francisco – p.27/?? Bigtable Bigtable is implemented on top of GFS What are the goals of bigtable? High availability, scalability, high performance What does it not provide? Complex relational queries, datatypes Department of Computer Science — University of San Francisco – p.28/?? Data model What is Bigtable’s data model? Department of Computer Science — University of San Francisco – p.29/?? Data model What is Bigtable’s data model? Multidimensional map: row name, column name, timestamp map to a data cell (string). Department of Computer Science — University of San Francisco – p.30/?? Rows Rows are broken into ranges called tablets, arranged lexicographically. What is the thinking behind this? Department of Computer Science — University of San Francisco – p.31/?? Column families Column keys are grouped into column families. What is the thinking behind this? Department of Computer Science — University of San Francisco – p.32/?? Data storage GFS is used to store data. Bigtable can coexist with other applications. Data files are written out using the SSTable file format. Chubby is used to provide locking and synchronization. Department of Computer Science — University of San Francisco – p.33/?? Architecture Master Tablet servers Clients Chubby Department of Computer Science — University of San Francisco – p.34/?? Tablet servers What do tablet servers do? Department of Computer Science — University of San Francisco – p.35/?? Tablet servers What do tablet servers do? Handle interactions with clients, read and write data Tablets are not replicated. Department of Computer Science — University of San Francisco – p.36/?? How does a client find a tablet? Root tablet accessed via Chubby This contains a map of tablets to tablet servers. This info is then cached by the client. Client communicates directly with the server. Department of Computer Science — University of San Francisco – p.37/?? Master What is the role of the master? Department of Computer Science — University of San Francisco – p.38/?? Master Keep track of tablet servers Place unassigned tablets. Department of Computer Science — University of San Francisco – p.39/?? Master How can the master tell that a tablet server has died? Department of Computer Science — University of San Francisco – p.40/?? Master How can the master tell that a tablet server has died? When a tablet server starts, it creates a lock in Chubby. Master queries server for the status of the lock. If server does not reply, master attempts to acquire lock. If successful, it redistributes that server’s tablets. Department of Computer Science — University of San Francisco – p.41/?? Discussion How does BigTable’s architecture compare to GFS? What advantages does this structure have? How does this compare to architectures such as Can or Chord that you might’ve learned about in 682? Department of Computer Science — University of San Francisco – p.42/?? MapReduce What is the basic paradigm of mapreduce? Department of Computer Science — University of San Francisco – p.43/?? MapReduce Define a map operation that is applied to each record in an input to generate key/value pairs Define a reduce operation applied to all elements with the same key to aggregate results. Department of Computer Science — University of San Francisco – p.44/?? Example the classic example, counting words: def map(document, words) : for word in words.split() : yield word, 1 def reduce(key, words) : yield key, sum(words) Department of Computer Science — University of San Francisco – p.45/?? Parallelizing Structuring your problem in this way allows the map function to run simultaneously on many different machines on subsets of your data. Reduce can then run in parallel for each key. Department of Computer Science — University of San Francisco – p.46/?? Implementation Input data is split into a number of sets. Keyspace is subdivided. A master is used to assign tasks to workers. Each mapping task is performed independently. results are eventually buffered, and the location returned to the master. The master then forwards mapped locations to reduce workers. Reduce workers collect all data associated with their keys, perform reduce, and write data to file. Department of Computer Science — University of San Francisco – p.47/?? Failure How is worker failure handled? Department of Computer Science — University of San Francisco – p.48/?? Failure How is worker failure handled? Workers are pinged. Active tasks belonging to non-responsive workers are reassigned. Completed map tasks must be redone. Department of Computer Science — University of San Francisco – p.49/?? Failure How is master failure handled? Department of Computer Science — University of San Francisco – p.50/?? Failure How is master failure handled? Checkpointing Restarting Department of Computer Science — University of San Francisco – p.51/?? Refinements The authors describe a number of refinements to MapReduce. What are they and why are they useful? Department of Computer Science — University of San Francisco – p.52/?? Refinements User-defined partitioning User-defined combining Specialized readers skipping bad records Department of Computer Science — University of San Francisco – p.53/?? MR vs DBMS Stonebraker, et al identify the sorts of tasks that MapReduce (Hadoop) excels at, and that RDBMS excel at. MapReduce: Extract-Transform-Load Complex analytics that require multiple passes Semi-structured data (key-value pairs) Quick-and-dirty problems Limited budget Department of Computer Science — University of San Francisco – p.54/?? MR vs DBMS Parallel DBMS: Grep log mining with group by join (combine user visits to URLs with PageRank table) Department of Computer Science — University of San Francisco – p.55/?? MR vs DBMS Stonebraker, et al suggest some reasons why DBMS might do better even on tasks that seem to be in MapReduce’s area of expertise: Repeated parsing of records Tuned compression in DBMS Intermediate data streamed, rather than written to disk Scheduling - DBMSs construct a query plan Department of Computer Science — University of San Francisco – p.56/?? Takeaway Hadoop could incorporate streaming and more job-aware scheduling SQL is arguably easier to write than mapReduce code. DBMSs need to be more plug-and-play DBMSs should work with filesystem data. Department of Computer Science — University of San Francisco – p.57/??
© Copyright 2026 Paperzz