Describing the General Architecture
- Hadoop's Distributed File System is abbreviated as HDFS
- It consists of a collection of clusters
- A cluster consists of a collection of nodes
-
These HDFS nodes can either be:
- A
NameNode
- A
DataNode
- A
-
A cluster consists of:
- A single
NameNode
- Some
DataNodes
- A single
- Typically, each HDFS node runs on its own computer (or server)
- Each node uses the storage of its computer
- A server on which a
NameNode
lives is called a master server - A server on which a
DataNode
lives is called a region server
Motivating NameNodes
and DataNodes
- Huge files are split into small chunks known as data blocks
- Files larger than MB are separated into MB blocks
- Those blocks are stored across
DataNodes
-
A
NameNode
stores metadata including:- Which
DataNodes
contain which blocks - Where those blocks are located
- etc.
- Which
Examples of Metadata from NameNodes
- Owners of files
- Permissions
- Block locations
- Block sizes
- File names
- File paths
- Number of data blocks
- Block IDs
- Block locations
- Number of replicas
Describing the Architecture of a NameNode
-
A
NameNode
consists of:- A namespace
- A block management service
-
A namespace consists of:
- A file-directory tree
- Metadata for all files and directories within the tree
- Mappings of blocks to files within directories
-
A block management service is used for:
- Monitoring
DataNodes
by sending out heartbeats - Handling registration of
DataNodes
- Maintaining location of blocks
- Processing block reports
- Managing replica replacement
-
Performing block-related operations:
- Create
- Delete
- Modify
- Get block location
- Monitoring
Defining a NameNode
- Every HDFS cluster has a single
NameNode
- This
NameNode
runs on an individual machine - A `NameNode is a master server
-
It achieves the following:
- Regulating any client-requested access to files
- Managing the namespace of the file system
- Storing metadata of data blocks within
DataNodes
across its cluster - Keeping metadata in memory for fast retrieval
- Sending requested transformations to
DataNodes
to fulfill - Executing operations performed on the namespace
-
Namespace operations include the following:
- Opening files
- Closing files
- Renaming files
- Renaming directories
Defining a DataNode
- Every HDFS cluster has at least one
DataNode
- A
DataNode
manages any file storage on its machine - Specifically, a file is split into one or more blocks
- Then, these blocks are stored in
DataNodes
- A
DataNode
will perform any read/write instruction - These instructions are sent from the
NameNode
- Then, the
DataNode
will perform any necessary deletion or replication operation - Remember,
DataNodes
aren't capable of performing any transformations - Only something like MapReduce is capable of this
Summarizing the Steps of HDFS
- A client sends a request to a
NameNode
on a cluster -
The
NameNode
sends that request to the appropriateDataNodes
- It does this by analyzing the filesystem tree
- And it refers to the metadata
-
The
DataNodes
fulfill the request- It does this by performing the appropriate read and write instructions
- Essentially, the
NameNode
manages the client's requests - Then, the
DataNodes
process those requests
References
Previous
Next