`
thecloud
  • 浏览: 882336 次
文章分类
社区版块
存档分类
最新评论

Hadoop Starting

 
阅读更多

Hadoop employs amaster/slave architecture for both distributed storage and distributedcomputation. The distributed storage system is called the Hadoop File System,or HDFS. The NameNode is the master of HDFS that directs the slave DataNodedaemons to perform the low-level I/O tasks. The NameNode is the bookkeeper ofHDFS; it keeps track of how your files are broken down into file blocks, whichnodes store those blocks, and the overall health of the distributed filesystem.

The function of theNameNode is memory and I/O intensive. As such, the server hosting the NameNode typicallydoesn’t store any user data or perform any computations for a MapReduce programto lower the workload on the machine. This means that the NameNode server doesn’tdouble as a DataNode or a TaskTracker.

There isunfortunately a negative aspect to the importance of the NameNode – It’s asingle point of failure of your Hadoop cluster. For any of the other daemons,if their host nodes fail for software or hardware reasons, the Hadoop clusterwill likely continue to function smoothly or you can quickly restart it. Not sofor the NameNode.

Each slave machinein your cluster will host a DataNode daemon to perform the grunt work of thedistributed filesystem - reading andwriting HDFS blocks to actual files on the local filesystem. When you want toread or write a HDFS file, the file is broken into blocks and the NameNode willtell your client which DataNode each block resides in. Your client communicatesdirectly with the DataNode daemons to process the local files corresponding tothe blocks. Furthermore, a DataNode may communicate with other DataNodes toreplicate its data blocks for redundancy.

The SecondaryNameNode(SNN) is an assistant daemon for monitoring the state of the clusterHDFS. Like the NameNode, each cluster has one SNN, and it typically resides onits own machine as well. No other DataNode or TaskTracker daemons run on thesame server. The SNN differs from the NameNode in that this process doesn’t receiveor record any real-time changes to HDFS. Instead, it communicated with theNameNode to take snapshots of the HDFS metadata at intervals defined by thecluster configuration. Asmentioned earlier, the NameNode is a single point offailure for a Hadoop cluster, and the SNN snapshots help minimize the downtimeand loss of data. Nevertheless, a NameNode failre requires human interventionto reconfiguration the cluster to use the SNN as the primary NameNode.

The JobTrackerdaemon is the liaison between your application and Hadoop. Once you submit yourcode to your cluster, the JobTracker determines the execution plan bydetermining which files to process, assigns nodes to different tasks, andmonitors all tasks as they’re running. Should a task fail, the JobTracker willautomatically relaunch the task, possibly on a different node, up to apredefined limit of retries. There is only one JobTracker daemon per Hadoopcluster. It’s typically run on a server as a master node of the cluster.

Each TaskTracker isresponsible for executing the individual tasks that the JobTracker assigns.Although there is a single TaskTracker per slave node, each TaskTracker canspawn multiple JVMs to handle many map or reduce tasks in parallel. One responsibilityof the TaskTracker is to constantly communicate with the JobTracker. If theJobTracker fails to receive a heartbeat from a TaskTracker within a specifiedamount of time, it will assume the TaskTracker has crashed and will resubmitthe corresponding tasks to other nodes in the cluster.


For small clusters,the SNN can reside on one of the slave nodes. On the other hand, for largeclusters, separate the NameNode and JobTracker on two machines. The slavemachines each host a DataNode and TaskTracker, for running tasks on the samenode where their data is stored.


分享到:
评论

相关推荐

    Hadoop.Essentials.1784396680

    Starting with the concepts of Hadoop YARN, MapReduce, HDFS, and other Hadoop ecosystem components, you will soon learn many exciting topics such as MapReduce patterns, data management, and real-time ...

    Hadoop_in_Action

    CHAPTER 2 Starting Hadoop CHAPTER 3 Components of Hadoop PART 2 - Hadoop in Action CHAPTER 4 Writing basic MapReduce programs CHAPTER 5 Advanced MapReduce CHAPTER 6 Programming practices CHAPTER 7 ...

    Deep Learning with Hadoop

    Starting with understanding what deep learning is and what the various models associated with deep learning are, this book will then show you how to set up the Hadoop environment for deep learning....

    Scaling Big Data with Hadoop and Solr

    Starting with the basics of Apache Hadoop and Solr, this book then dives into advanced topics of optimizing search with some interesting real-world use cases and sample Java code.

    Hadoop集群安装

    STARTUP_MSG: Starting NameNode STARTUP_MSG: host = 主機一 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.18.3 STARTUP_MSG: build = ...

    hadoop-2.6.0编译好的64bit的native库

    Starting namenodes on [Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /opt/hadoop-2.6.0/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the ...

    Optimizing Hadoop for MapReduce(PACKT,2014)

    Starting with how MapReduce works and the factors that affect MapReduce performance, you will be given an overview of Hadoop metrics and several performance monitoring tools. Further on, you will ...

    Elasticsearch for Hadoop

    Starting Hadoop daemons Setting up Elasticsearch Downloading Elasticsearch Configuring Elasticsearch Installing Elasticsearch's Head plugin Installing the Marvel plugin Running and testing ...

    Hadoop实战

    目录: Part I Hadoop–A Distributed Programming Framework1 Introducing Hadoop2 Starting Hadoop3 Components of HadoopPart II Hadoop In Action4 Writing basic MapReduce programs5 Advanced MapReduce6 ...

    hadoop-2.4.1 64位 libhadoop.so.1.0.0

    Starting namenodes on [Java HotSpot(TM) Client VM warning: You have loaded library /hadoop/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the ...

    Hadoop Backup and Recovery Solutions(ydE).pdf

    Starting off with the basics of Hadoop administration, this book becomes increasingly exciting with the best strategies of backing up distributed storage databases. You will gradually learn about the...

    Hadoop MapReduce Cookbook

    Starting with installing Hadoop YARN, MapReduce, HDFS, and other Hadoop ecosystem components, with this book, you will soon learn about many exciting topics such as MapReduce patterns, using Hadoop to...

    Scaling.Big.Data.with.Hadoop.and.Solr.2nd.Edition.1783553391

    Starting with the basics of Apache Hadoop and Solr, the book covers advanced topics of optimizing search with some interesting real-world use cases and sample Java code. This is a step-by-step guide...

    Learning.Hadoop.2

    Starting with the core components of the framework?HDFS and YARN?this book will guide you through how to build applications using a variety of approaches. You will learn how YARN completely changes ...

    hadoop+hive+mapreduce的java例子

    基于hadoop的Hive数据仓库JavaAPI简单...2 Starting Hive Thrift Server 上面代表你已经成功的在端口为10002(默认的端口是10000)启动了hiveserver服务。这时候,你就可以通过Java代码来连接hiveserver,代码如下:

    Deep learning with Hadoop : build, implement and scale distributed d l models

    Starting with understanding what deep learning is, and what the various models associated with deep neural networks are, this book will then show you how to set up the Hadoop environment for deep ...

    Hadoop.MapReduce.v2.Cookbook.2nd.Edition.1783285478

    Starting with installing Hadoop YARN, MapReduce, HDFS, and other Hadoop ecosystem components, with this book, you will soon learn about many exciting topics such as MapReduce patterns, using Hadoop to...

    Hadoop in Action

    CHAPTER 2 Starting Hadoop CHAPTER 3 Components of Hadoop PART 2 - Hadoop in Action CHAPTER 4 Writing basic MapReduce programs CHAPTER 5 Advanced MapReduce CHAPTER 6 Programming practices CHAPTER...

    Hadoop MapReduce v2 Cookbook(PACKT,2ed,2015)

    Starting with installing Hadoop YARN, MapReduce, HDFS, and other Hadoop ecosystem components, with this book, you will soon learn about many exciting topics such as MapReduce patterns, using Hadoop to...

Global site tag (gtag.js) - Google Analytics