星期五, 1月 03, 2014

Apache namenode HA + DRBD + LACP NIC bonding 設定方法

http://www.osedu.info/cloud-computing/hadoop/2013-04-29/83.html

 

Apache Hadoop HA Configuration

時間:2013-04-29 18:50:31  來源:  作者:

 

Disclaimer: Cloudera no longer approves of the recommendations in this post. Please see this documentation for configuration recommendations.

One of the things we get a lot of questions about is how to make Hadoop highly available. There is still a lot of work to be done on this front, but we wanted to take a moment and share the best practices from one of our customers. Check out what Paul George has to say about how they keep thier NameNode up at ContextWeb. – Christophe

Here at ContextWeb, our Apache Hadoop infrastructure has become a critical part of our day-to-day business operations. As such, it was important for us to find a way to resolve the single-point-of-failure issue that surrounds the master node processes, namely the NameNode and JobTracker. While it was easy for us to follow the best practice of offloading the secondary NameNode data to an NFS mount to protect metadata, ensuring that the processes were constantly available for job execution and data retrieval were of greater importance. We've leveraged some existing, well tested components that are available and commonly used in Linux systems today. Our solution primarily makes use of DRBD from LINBIT and Heartbeat from the Linux-HA project. The natural combination of these two projects provides us with a reliable and highly available solution, which addresses limitations that currently exist.

While one could conceivably expand the use of these two projects to much deeper levels of protection, the goal of this post is to provide a basic working configuration as a starting point for further experimentation and tuning. There may be variations with regards to what works with your distribution or which requirements your organization has for SLAs and HA standards. These instructions are most relevant to CentOS 5.3 combined with Cloudera's Distribution for Hadoop, since that's what we run in our production environment.

Hadoop Environment

Each of our master nodes has the following hardware specifications:

Dell PowerEdge 1950, 2x Quad Core Intel Xeon E5405 CPUs @ 2.0GHz, 16GB RAM, 2x 300GB 15k RPM SAS disks, RAID 1. 100GB of the available drive space is reserved for the DRBD volume, which will contain Hadoop's data.

We use RAID and redundant hardware capabilities of the servers wherever possible to provide additional security for the master node processes. Each master node is connected to a different switch on our network using multiple network connections. The following diagram represents our production cluster configuration:

HA Setup and Configuration

Our planned hosts:

Node

Hostname

IP Address

1

master1.domain.com

192.168.4.116

2

master2.domain.com

192.168.4.117

Virtual

hadoop.domain.com

192.168.4.115

Properties that will be defined as part of our hadoop-site.xml:

Property

Value

Notes

dfs.data.dir

/hadoop/hdfs/data

On DRBD replicated volume

dfs.name.dir

/hadoop/hdfs/namenode

On DRBD replicated volume

fs.checkpoint.dir

/mnt/hadoop/secondarynamenode

On NFS

fs.default.name

hdfs://hadoop.contextweb.prod:8020

Shared virtual name

mapred.job.tracker

hadoop.contextweb.prod:8021

Shared virtual name

With this environment in mind, configuring the HA setup comes down to six parts:

1.    Install Sun JDK 6

2.    Configure networking

3.    Install DRBD and Heartbeat packages

4.    Configure DRBD

5.    Install Hadoop RPMs from Cloudera

6.    Configure Heartbeat

** Except where noted, all procedures should be followed on both nodes. **

1. Install Sun JDK

The only version of Java JDK that should be used with Hadoop is Sun's own. This has been well documented. Grab a copy of the latest JDK fromhttp://java.sun.com/javase/downloads/index.jsp and install it on both of the nodes. At the time of this writing, the latest version is jdk-6u14-linux-amd64.

We download the rpm.bin file and complete the installation:

[root@master1 ~]# chmod +x jdk-6u14-linux-x64-rpm.bin
[root@master1 ~]# ./jdk-6u14-linux-x64-rpm.bin

2. Configure Networking

Each of our servers has two embedded gigabit ethernet ports, and we choose to bond them for HA and bandwidth purposes. We use LACP/802.3ad, which also requires changes to the switch configuration to support this mode. If you don't have LACP enabled switches, or cannot modify their configurations, there are other bonding options available through the driver.

You can read more about Linux network bonding from /usr/share/doc/kernel-doc-2.6.18/Documentation/networking/bonding.txt on your system (requires installation of the package kernel-doc).

The following is an example from our systems.

Edit the file /etc/modprobe.conf:

#/etc/modprobe.conf
alias eth0 bnx2
alias eth1 bnx2
alias bond0 bonding
options bond0 mode=4 miimon=100 lacp_rate=1

Edit the file /etc/sysconfig/network-scripts/ifcfg-eth0:

#/etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
MASTER=bond0
SLAVE=yes
BOOTPROTO=static
DHCPCLASS=
ONBOOT=yes

Edit the file /etc/sysconfig/network-scripts/ifcfg-eth1:

#/etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
MASTER=bond0
SLAVE=yes
BOOTPROTO=static
DHCPCLASS=
ONBOOT=yes

Edit the file /etc/sysconfig/network-scripts/ifcfg-bond0:

#/etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
BOOTPROTO=static
IPADDR=192.168.4.116
NETMASK=255.255.255.0
GATEWAY=192.168.4.1
BONDING_MODULE_OPTS="mode=4 miimon=100 lacp_rate=1


Finally, reboot the system or restart networking:

[root@master1 ~]# service network restart

We also mount the NFS export at this time. It will be used as the base directory for the secondary namenode:

[root@master1 ~]# echo "filer:/hadoop /mnt/hadoop nfs rsize=65536,wsize=65536,intr,soft,bg 0 0 >> /etc/fstab
[root@master1 ~]# mkdir /mnt/hadoop
[root@master1 ~]# mount /mnt/hadoop

3. Install DRBD and Heartbeat Packages

DRBD (including its kernel module) and Heartbeat are part of the "extras" repository:

[root@master1 ~]# yum -y install drbd82 kmod-drbd82 heartbeat
[root@master1 ~]# chkconfig –add heartbeat

Tip: Sometimes the installation of the Heartbeat package fails on the first try. Just try again; it may work for you the second time.

4. Configure DRBD

Important Note: Before continuing with the DRBD configuration, I highlyrecommend reading through the documentation and reviewing examples to get a clear understanding of the architecture and intended goals:http://www.drbd.org/docs/about/.

In our kickstart configuration, we reserved 100GB of space in the RAID set; this will be used for our Hadoop data.

The following /etc/drbd.conf file is created on both nodes:


#
# drbd.conf example
#

global { usage-count no; }

resource r0 {
protocol C;
syncer { rate 100M; }
startup { wfc-timeout 0; degr-wfc-timeout 120; }


on master1.domain.com {
device /dev/drbd0;
disk /dev/sda4;
address 192.168.4.116:7788;
meta-disk internal;
}


on master2.domain.com {
device /dev/drbd0;
disk /dev/sda4;
address 192.168.4.117:7788;
meta-disk internal;
}
}

The following tasks are performed on BOTH nodes to set up the DRBD resource and start the DRBD service. (You should be prepared to perform these tasks at the same time, so that the peers locate each other when you start the DRBD service.)

[root@master1 ~]# echo "/dev/drbd0 /hadoop ext3 defaults,noauto 0 0 >> /etc/fstab
[root@master1 ~]# mkdir /hadoop
[root@master1 ~]# yes | drbdadm create-md r0
[root@master1 ~]# service drbd start

Tip: If the destination partition or disk that you are using for your DRBD volume previously had a file system created on it, you may receive a warning like the following one:

Command 'drbdmeta /dev/drbd0 v08 /dev/sda4 internal create-md' terminated with exit code 40
drbdadm aborting

If this is the case, try zeroing the destination partition disk with dd:

[root@master1 ~]# dd if=/dev/zero of=/dev/sda4

After the process is started, the following two commands should be run on thePRIMARY server ONLY:

[root@master1 ~]# drbdadm — –overwrite-data-of-peer primary r0
[root@master1 ~]# mkfs.ext3 /dev/drbd0
[root@master1 ~]# mount /hadoop

It will take some time to synchronize based on the size of the disk and the "syncer" rate defined in your drbd.config. You can check the status of this process from /proc/drbd:

[root@master1 ~]# cat /proc/drbd
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-x8664-build, 2008-10-03 11:30:17
0: cs:SyncSource st:Primary/Secondary ds:UpToDate/Inconsistent C r—
ns:18440304 nr:0 dw:27072452 dr:18511901 al:11746 bm:12767 lo:14 pe:12 ua:246 ap:1 oos:84438604
[==>.................] sync'ed: 18.0% (82459/100465)M
finish: 0:14:31 speed: 96,904 (77,472) K/sec


[root@master1 ~]# cat /proc/drbd
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-x8664-build, 2008-10-03 11:30:17
0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r—
ns:196492540 nr:0 dw:195994668 dr:3687117 al:1355324 bm:68 lo:0 pe:0 ua:0 ap:0 oos:0

5. Install Hadoop RPMs from Cloudera

We used the web-based configurator provided by Cloudera (https://my.cloudera.com/), which builds an RPM containing repos for your custom configuration and the rest of their distribution. The resulting RPM is then installed on BOTH master nodes.

Important Note: When we defined the hostname in the web-based configurator, we gave the name of the shared virtual hostname that is bound to the VIP as described earlier.

[root@master1 ~]# rpm -ivh cloudera-repo-0.1.0-1.noarch.rpm

Since we're only installing the master nodes, we skip the datanode and tasktracker RPMs on these hosts:

[root@master1 ~]# yum -y install hadoop hadoop-conf-pseudo hadoop-jobtracker \
> hadoop-libhdfs.x86_64 hadoop-namenode hadoop-native.x86_64 hadoop-secondarynamenode

After this, we install our custom-generated Hadoop config from an RPM named hadoop-conf- plus the cluster name and hostname that we defined in the config generator:

[root@master1 ~]# yum -y install hadoop-conf-prod-hadoop.domain.com
[root@master1 ~]# alternatives –display hadoop
hadoop – status is auto.
link currently points to /etc/hadoop/conf.prod-hadoop.domain.com
/etc/hadoop/conf.empty – priority 10
/etc/hadoop/conf.pseudo – priority 30
/etc/hadoop/conf.prod-hadoop.domain.com – priority 60
Current `best' version is /etc/hadoop/conf.prod-hadoop.domain.com.

We also disable automatic startup of the Hadoop processes on the node since they will be managed by the Heartbeat package:

[root@master1 ~]# chkconfig hadoop-namenode off
[root@master1 ~]# chkconfig hadoop-secondarynamenode off
[root@master1 ~]# chkconfig hadoop-jobtracker off

On the primary server only, we must start the namenode for the first time and format HDFS:

[root@master1 ~]# chown -R hadoop:hadoop /hadoop
[root@master1 ~]# /sbin/runuser -s /bin/bash – hadoop -c 'hadoop namenode -format'

Finally, dismount the DRBD volume mounted at /hadoop since it will be brought online through Heartbeat:

[root@master1 ~]# umount /hadoop

6. Configure Heartbeat

There are many options available for the Heartbeat configuration. Here, we attempt to show only the basics. While this example should work in most cases, you may wish to extend the configurations to take advantage of other features that the package provides.

There are three key files that we edit to configure the Heartbeat package:

·         /etc/ha.d/ha.cf

·         /etc/ha.d/haresources

·         /etc/ha.d/authkeys

The first, ha.cf, defines the general settings of the cluster. Our example:

## start of ha.cf
debugfile /var/log/ha.debug
logfile /var/log/ha.log
logfacility local0
keepalive 1
initdead 60
deadtime 5
mcast eth0 239.0.0.22 694 1 0
node master1.domain.com master2.domain.com
auto_failback off
## end of ha.cf

Additional parameters and their meanings can be found at http://linux-ha.org/ha.cf/.

The second file we'll edit, haresources, defines all cluster resources that will fail over from one node to the next. The resources include the shared IP address of the cluster, the DRBD resource "r0 (from /etc/drbd.conf), the file system mount, and the three Hadoop master node initiation scripts that are invoked with the "start" parameter upon failover.

This file must be the same on both nodes in the cluster. Note that the leading host name defines the preferred node. The IPaddr stated in this file will be the virtual IP that we chose in our planning phase. This IP will be brought up on the active node on the aliased interface bond0:0:

## start of haresources
master1.domain.com IPaddr::192.168.4.115
master1.domain.com drbddisk::r0
master1.domain.com Filesystem::/dev/drbd0::/hadoop::ext3::defaults
master1.domain.com hadoop-namenode
master1.domain.com hadoop-secondarynamenode
master1.domain.com hadoop-jobtracker
## end of haresources

The last file file to be edited, authkeys, should also be the same on both servers. Our example:

## start of authkeys
auth 1
1 sha1 hadoop-master
## end of authkeys

Change permissions on the authkeys file to keep the shared secret protected:

[root@master1 ~]# chmod 0600 /etc/ha.d/authkeys

Enable Heartbeat service to start on boot, and start the service on each node:

[root@master1 ~]# chkconfig –add heartbeat
[root@master1 ~]# chkconfig heartbeat on
[root@master1 ~]# service heartbeat start

Setup Complete

At this point, your master node processes should be brought up automatically by Heartbeat. If there are any problems, you should start resolving them by checking /var/log/ha.log for hints on what went wrong. If everything worked, feel free to test your failover by rebooting the master, pulling its power cord, or using whatever your favorite method of simulating a system failure may be. Based on the configuration values from this example, we find that the system takes about 10 seconds to bring everything back up online.

 

沒有留言:

LinkWithin-相關文件

Related Posts Plugin for WordPress, Blogger...