When you say digging, 1st thought, most would think that you would plant a tree.
How about digging in DATA
1st Hadoop is a framework for processing large chunks of data, consisting of 2 modules
- HDFS: Hadoop Distributed File System "for managing files".
- Map-Reduce: hadoop methodology for processing data, where big chunks of data is divided into smaller chunks, each directed to the map fn to extract the needed data from, then the reduce fn where the actual processing we need takes place.
Hadoop work on the whole data, in one time, so it is considered Batch processing.
2nd Hadoop eco-system
It would be annoying, that each time you wish to do a task, you write a java code for each of the map function, then the reduce function, compile the code.. etc. yet Hadoop eco-system provide us with tools that could do so for us
2nd Hadoop eco-system
It would be annoying, that each time you wish to do a task, you write a java code for each of the map function, then the reduce function, compile the code.. etc. yet Hadoop eco-system provide us with tools that could do so for us
- PIG: a scripting language "that is translated in the background to a Map-Reduce job"
- Hive: A SQL like query language "also translated to a Map-Reduce job".
- Impala: a SQL like query language
- SQOOP: for transferring bulk data between Apache Hadoop and structured datastores "RDBM".
- HUE "Hadoop User Experience": is a web interface,has editors and browsers for SQL, Hive ..etc.
- OOZIE: for workflow
3rd YARN Yet Another Resource Negotiator
YARN was introduced in hadoop 2 release, for a better management for resources "Containers: Memory+Processor". Container is the context where our map and reduce function runs
YARN was introduced in hadoop 2 release, for a better management for resources "Containers: Memory+Processor". Container is the context where our map and reduce function runs
- Resource Manager act as the CEO on all resources, he knows who is occupied and who is free
- Node Manager act as CXO under the CEO, knows whos is occupied and who is free on his only node.
- Container is our worker, context where map and reduce function resides.
i.e:
a Hadoop cluster constsist of many nodes "PCs", on each node there is a one node manager who control resources on this specific node, all node managers are managed by one resource manager.
a Hadoop cluster constsist of many nodes "PCs", on each node there is a one node manager who control resources on this specific node, all node managers are managed by one resource manager.
Comments
Post a Comment