http://www.osedu.info/cloud-computing/hadoop/2013-04-29/83.html
Apache Hadoop HA Configuration |
時間:2013-04-29 18:50:31 來源: 作者: |
Disclaimer: Cloudera no longer approves of the recommendations in this post. Please see this documentation for configuration recommendations. One of the things we get a lot of questions about is how to make Hadoop highly available. There is still a lot of work to be done on this front, but we wanted to take a moment and share the best practices from one of our customers. Check out what Paul George has to say about how they keep thier NameNode up at ContextWeb. – Christophe Here at ContextWeb, our Apache Hadoop infrastructure has become a critical part of our day-to-day business operations. As such, it was important for us to find a way to resolve the single-point-of-failure issue that surrounds the master node processes, namely the NameNode and JobTracker. While it was easy for us to follow the best practice of offloading the secondary NameNode data to an NFS mount to protect metadata, ensuring that the processes were constantly available for job execution and data retrieval were of greater importance. We've leveraged some existing, well tested components that are available and commonly used in Linux systems today. Our solution primarily makes use of DRBD from LINBIT and Heartbeat from the Linux-HA project. The natural combination of these two projects provides us with a reliable and highly available solution, which addresses limitations that currently exist. While one could conceivably expand the use of these two projects to much deeper levels of protection, the goal of this post is to provide a basic working configuration as a starting point for further experimentation and tuning. There may be variations with regards to what works with your distribution or which requirements your organization has for SLAs and HA standards. These instructions are most relevant to CentOS 5.3 combined with Cloudera's Distribution for Hadoop, since that's what we run in our production environment. Hadoop Environment Each of our master nodes has the following hardware specifications: Dell PowerEdge 1950, 2x Quad Core Intel Xeon E5405 CPUs @ 2.0GHz, 16GB RAM, 2x 300GB 15k RPM SAS disks, RAID 1. 100GB of the available drive space is reserved for the DRBD volume, which will contain Hadoop's data. We use RAID and redundant hardware capabilities of the servers wherever possible to provide additional security for the master node processes. Each master node is connected to a different switch on our network using multiple network connections. The following diagram represents our production cluster configuration: HA Setup and Configuration Our planned hosts:
Properties that will be defined as part of our hadoop-site.xml:
With this environment in mind, configuring the HA setup comes down to six parts: 1. Install Sun JDK 6 2. Configure networking 3. Install DRBD and Heartbeat packages 4. Configure DRBD 5. Install Hadoop RPMs from Cloudera 6. Configure Heartbeat ** Except where noted, all procedures should be followed on both nodes. ** 1. Install Sun JDK The only version of Java JDK that should be used with Hadoop is Sun's own. This has been well documented. Grab a copy of the latest JDK fromhttp://java.sun.com/javase/downloads/index.jsp and install it on both of the nodes. At the time of this writing, the latest version is jdk-6u14-linux-amd64. We download the rpm.bin file and complete the installation: [root@master1 ~]# chmod +x jdk-6u14-linux-x64-rpm.bin 2. Configure Networking Each of our servers has two embedded gigabit ethernet ports, and we choose to bond them for HA and bandwidth purposes. We use LACP/802.3ad, which also requires changes to the switch configuration to support this mode. If you don't have LACP enabled switches, or cannot modify their configurations, there are other bonding options available through the driver. You can read more about Linux network bonding from /usr/share/doc/kernel-doc-2.6.18/Documentation/networking/bonding.txt on your system (requires installation of the package kernel-doc). The following is an example from our systems. Edit the file /etc/modprobe.conf: #/etc/modprobe.conf #/etc/sysconfig/network-scripts/ifcfg-eth0 #/etc/sysconfig/network-scripts/ifcfg-eth1 #/etc/sysconfig/network-scripts/ifcfg-bond0 [root@master1 ~]# service network restart [root@master1 ~]# echo "filer:/hadoop /mnt/hadoop nfs rsize=65536,wsize=65536,intr,soft,bg 0 0″ >> /etc/fstab 3. Install DRBD and Heartbeat Packages DRBD (including its kernel module) and Heartbeat are part of the "extras" repository: [root@master1 ~]# yum -y install drbd82 kmod-drbd82 heartbeat 4. Configure DRBD Important Note: Before continuing with the DRBD configuration, I highlyrecommend reading through the documentation and reviewing examples to get a clear understanding of the architecture and intended goals:http://www.drbd.org/docs/about/. In our kickstart configuration, we reserved 100GB of space in the RAID set; this will be used for our Hadoop data. The following /etc/drbd.conf file is created on both nodes: global { usage-count no; } [root@master1 ~]# echo "/dev/drbd0 /hadoop ext3 defaults,noauto 0 0″ >> /etc/fstab Command 'drbdmeta /dev/drbd0 v08 /dev/sda4 internal create-md' terminated with exit code 40 [root@master1 ~]# dd if=/dev/zero of=/dev/sda4 [root@master1 ~]# drbdadm — –overwrite-data-of-peer primary r0 [root@master1 ~]# cat /proc/drbd 5. Install Hadoop RPMs from Cloudera We used the web-based configurator provided by Cloudera (https://my.cloudera.com/), which builds an RPM containing repos for your custom configuration and the rest of their distribution. The resulting RPM is then installed on BOTH master nodes. Important Note: When we defined the hostname in the web-based configurator, we gave the name of the shared virtual hostname that is bound to the VIP as described earlier. [root@master1 ~]# rpm -ivh cloudera-repo-0.1.0-1.noarch.rpm [root@master1 ~]# yum -y install hadoop hadoop-conf-pseudo hadoop-jobtracker \ [root@master1 ~]# yum -y install hadoop-conf-prod-hadoop.domain.com [root@master1 ~]# chkconfig hadoop-namenode off [root@master1 ~]# chown -R hadoop:hadoop /hadoop [root@master1 ~]# umount /hadoop 6. Configure Heartbeat There are many options available for the Heartbeat configuration. Here, we attempt to show only the basics. While this example should work in most cases, you may wish to extend the configurations to take advantage of other features that the package provides. There are three key files that we edit to configure the Heartbeat package: · /etc/ha.d/ha.cf · /etc/ha.d/haresources · /etc/ha.d/authkeys The first, ha.cf, defines the general settings of the cluster. Our example: ## start of ha.cf The second file we'll edit, haresources, defines all cluster resources that will fail over from one node to the next. The resources include the shared IP address of the cluster, the DRBD resource "r0″ (from /etc/drbd.conf), the file system mount, and the three Hadoop master node initiation scripts that are invoked with the "start" parameter upon failover. This file must be the same on both nodes in the cluster. Note that the leading host name defines the preferred node. The IPaddr stated in this file will be the virtual IP that we chose in our planning phase. This IP will be brought up on the active node on the aliased interface bond0:0: ## start of haresources ## start of authkeys [root@master1 ~]# chmod 0600 /etc/ha.d/authkeys [root@master1 ~]# chkconfig –add heartbeat Setup Complete At this point, your master node processes should be brought up automatically by Heartbeat. If there are any problems, you should start resolving them by checking /var/log/ha.log for hints on what went wrong. If everything worked, feel free to test your failover by rebooting the master, pulling its power cord, or using whatever your favorite method of simulating a system failure may be. Based on the configuration values from this example, we find that the system takes about 10 seconds to bring everything back up online. |
沒有留言:
張貼留言