Describing Hadoop YARN
- YARN is a component of the Hadoop ecosystem
-
YARN is used for:
- Managing computing resources in a cluster
- Monitoring computing resources in a cluster
- Scheduling jobs involving processing
- It manages and monitors resources via
NodeManagers
- A job refers to a requested transformation
- An example of a job is a MapReduce job
- An application consists of one or many jobs
Describing the YARN Architecture
-
YARN consists of:
- Many different nodes in a cluster
- Separate daemons living on those nodes
- A node represents a single computer or server
- A cluster represents a collection of nodes
- These nodes are all interconnected with each other
-
The YARN daemons are:
ResourceManager
NodeManagers
ApplicationMasters
- Containers
- Typically, containers host any MapReduce job
- These jobs involve transforming blocks on
DataNodes
NodeManagers
are used for overseeing its container
How YARN Handles Resource Management
-
Resource management in YARN mostly is handled by:
- A
ResourceManager
- Some
NodeManagers
- A
-
A
ResourceManager
is used for:- Initializing an
ApplicationMaster
- Initializing containers
- Allocating requested resources to an
ApplicationMaster
-
Recording information about:
- Available resources
- Resources allocated to applications in the cluster
- Initializing an
-
A
NodeManager
is used for:- Monitoring containers on its node
- Restoring failed containers on its node
-
Reporting usage of resources to the
ResourceManager
- CPU resources
- Memory resources
- Disk resources
- Network resources
- Initializing containers on its node
- Typically, there is a single
ResourceManager
in a cluster - Typically, there is a single
NodeManager
per node
How YARN Handles Job Scheduling
-
Job scheduling in YARN mostly is handled by:
- Some
ApplicationMasters
- Some containers
- Some
-
An
ApplicationMaster
is used for:- Requesting for additional or fewer resources from the
ResourceManager
- Allocating these resources to its containers
- Monitoring its application
- Requesting for additional or fewer resources from the
-
Containers are used for:
- Running an assigned application
- Reporting the application status to the
ApplicationMaster
- Typically, there is a single
ApplicationMaster
per application
Illustrating the YARN Workflow
Defining the YARN Workflow
- Client submits an application
- The
ResourceManager
initializes a container -
The
ResourceManager
initializes anApplicationMaster
- There is an
ApplicationMaster
for each container
- There is an
-
An
ApplicationMaster
requests resources from theResourceManager
- It uses these resources for itself and its containers
-
The
ApplicationMaster
receives resources- It uses these resources for itself and its containers
-
The
AM
notifies theNM
to launch containers- These containers run the application (MapReduce jobs)
- Containers running
map
tasks are run on the same node as the relevant blocks - Containers running
reduce
tasks sometimes run on different nodes - Containers running
reduce
tasks start aftermap
tasks
-
The applications request metadata from the
NameNode
- Only metadata of relevant blocks in
DataNodes
is returned - These applications are executed in the containers
- Only metadata of relevant blocks in
-
The applications receive metadata from the
NameNode
- Only metadata of relevant blocks in
DataNodes
is received - These applications are executed in the containers
- Only metadata of relevant blocks in
-
Each daemon monitors resources
- The
ResourceManager
monitors the cluster's status - The
ApplicationMaster
monitors its application's status - The
NodeManager
monitors its node's status
- The
- The application is complete
- The
ApplicationMaster
unregisters itself from theResourceManager
References
Previous
Next