Describing the General Architecture
- Hadoop's Distributed File System is abbreviated as HDFS
- It consists of a collection of clusters
- A cluster consists of a collection of nodes
-
These HDFS nodes can either be:
- A
NameNode - A
DataNode
- A
-
A cluster consists of:
- A single
NameNode - Some
DataNodes
- A single
- Typically, each HDFS node runs on its own computer (or server)
- Each node uses the storage of its computer
- A server on which a
NameNodelives is called a master server - A server on which a
DataNodelives is called a region server
Motivating NameNodes and DataNodes
- Huge files are split into small chunks known as data blocks
- Files larger than MB are separated into MB blocks
- Those blocks are stored across
DataNodes -
A
NameNodestores metadata including:- Which
DataNodescontain which blocks - Where those blocks are located
- etc.
- Which
Examples of Metadata from NameNodes
- Owners of files
- Permissions
- Block locations
- Block sizes
- File names
- File paths
- Number of data blocks
- Block IDs
- Block locations
- Number of replicas
Describing the Architecture of a NameNode
-
A
NameNodeconsists of:- A namespace
- A block management service
-
A namespace consists of:
- A file-directory tree
- Metadata for all files and directories within the tree
- Mappings of blocks to files within directories
-
A block management service is used for:
- Monitoring
DataNodesby sending out heartbeats - Handling registration of
DataNodes - Maintaining location of blocks
- Processing block reports
- Managing replica replacement
-
Performing block-related operations:
- Create
- Delete
- Modify
- Get block location
- Monitoring
Defining a NameNode
- Every HDFS cluster has a single
NameNode - This
NameNoderuns on an individual machine - A `NameNode is a master server
-
It achieves the following:
- Regulating any client-requested access to files
- Managing the namespace of the file system
- Storing metadata of data blocks within
DataNodesacross its cluster - Keeping metadata in memory for fast retrieval
- Sending requested transformations to
DataNodesto fulfill - Executing operations performed on the namespace
-
Namespace operations include the following:
- Opening files
- Closing files
- Renaming files
- Renaming directories
Defining a DataNode
- Every HDFS cluster has at least one
DataNode - A
DataNodemanages any file storage on its machine - Specifically, a file is split into one or more blocks
- Then, these blocks are stored in
DataNodes - A
DataNodewill perform any read/write instruction - These instructions are sent from the
NameNode - Then, the
DataNodewill perform any necessary deletion or replication operation - Remember,
DataNodesaren't capable of performing any transformations - Only something like MapReduce is capable of this
Summarizing the Steps of HDFS
- A client sends a request to a
NameNodeon a cluster -
The
NameNodesends that request to the appropriateDataNodes- It does this by analyzing the filesystem tree
- And it refers to the metadata
-
The
DataNodesfulfill the request- It does this by performing the appropriate read and write instructions
- Essentially, the
NameNodemanages the client's requests - Then, the
DataNodesprocess those requests
References
Previous
Next