Friday, 24 May 2013

Hadoop Cluster Installation on 4 nodes (Using Clouera Manger Suit 4.5)

Structure of Hadoop cluster

Prerequisites
1.       Operating system requirement:              
1.       Red Hat-compatible systems
a.        Red Hat Enterprise Linux 5.7 , 64-bit

2.       Supported Browsers for Cloudera Manager Admin console
The Cloudera Manager Admin console, which you use to configure, manage, and monitor CDH, supports the following browsers:
·         Firefox 11 or later
·         Google Chrome
·         Internet Explorer 8
·         Internet Explorer 9
3.       Other Requirements
Cloudera Manager supports a variety of services and depends on resources being available.
Version Support
·         Cloudera Manager 4.5 supports CDH3 Update 1 (cdh3u1) or later and CDH4.0 or later. CDH3 Update 2 or later is strongly recommended.
·         If you want to use Cloudera Manager to manage Oozie, CDH3 Update 2 or later is required.
·         Cloudera Manager uses Python. Python is part of the default installation for all operating systems that Cloudera Manager supports, so there is no need to complete any installation tasks to make Python available. Cloudera Manager is tested with the default installation. Modifying the Python installation available on systems on which you install Cloudera Manager is not supported.
·         Impala 0.3 or later.

Resources
Cloudera Manager requires sufficient:
·         Disk space. For example, /var should be allocated a minimum of 5 GB of space.
·         RAM. 4GB is appropriate for most cases, and is required when using Oracle databases. 2GB may be sufficient for non-Oracle deployments involving fewer than 100 hosts.

Networking and Security
·         Cluster hosts must have a working network name resolution system. Properly configuring DNS and reverse DNS meets this requirement. If you use /etc/hosts instead of DNS, all hosts files must contain consistent information about host names and addresses across all nodes. For example, /etc/hosts might contain something of the form:

  

XXX.XX.XX.XX master
XXX.XX.XX.XX slave
XXX.XX.XX.XX slave2
XXX.XX.XX.XX slave3
 



·         No blocking by iptables or firewalls; make sure port 7180 is open because it is the port used to access Cloudera Manager after installation. Cloudera Manager communicates using specific ports, which must be open. For additional port information, see Configuring Ports for Cloudera Manager Free Edition.
·         No blocking by Security-Enhanced Linux (SELinux).


Automated Installation of Cloudera Manager and CDH

Step 1: Download and Run the Cloudera Manager Installer

Cloudera Manager accesses archive.cloudera.com by using yum on Red Hat systems, zypper on SUSE systems, or apt-get on Debian/Ubuntu systems. If your hosts access the Internet through an HTTP Proxy, you can configure yum, zypper, or apt-get, system-wide, to access archive.cloudera.com through a proxy. To do so, modify the system configuration on the Cloudera Manager Server host and on every cluster host where you want to install CDH. This is not required in all cases.
·         To configure your system to use a proxy:
On Red Hat systems, add the following property to /etc/yum.conf:


proxy=http://proxy_server:port_no/

                                                                                                                    
·         To download and run the Cloudera Manager Installer:
1.          Download cloudera-manager-installer.bin from the Cloudera Downloads page to the host where you want to install the Cloudera Manager Server. The host must be on your cluster or accessible to your cluster over your network. Install the Cloudera Manager Server on a single host.

2.           After downloading cloudera-manager-installer.bin, change it to have executable permission.

    $ chmod u+x cloudera-manager-installer.bin
            3.         Run cloudera-manager-installer.bin.
                                                                                                 
                       $ sudo ./cloudera-manager-installer.bin
                                                                                                 

4.       Read the Cloudera Manager Readme and then press Enter to choose Next.
5.       Read the Cloudera Manager License and then press Enter to choose Next. Use the arrow keys and press Enter to choose Yes to confirm you accept the license.
6.       Read the Oracle Binary Code License Agreement and then press Enter to choose Next. Use the arrow keys and press Enter to choose Yes to confirm you accept the Oracle Binary Code License Agreement. The Cloudera Manager installer begins installing the Oracle JDK and the Cloudera Manager repo files and then installs the packages. The installer also installs the embedded PostgreSQL database and the Cloudera Manager Server.
7.      Note the complete URL provided for the Cloudera Manager Admin Console, including the port number, which is 7180 by default. Click OK to continue.
8.       Click OK to exit the installer.


Step 2: Start the Cloudera Manager Admin Console

The Cloudera Manager Admin console enables you to use Cloudera Manager to configure, manage, and monitor Hadoop on your cluster. Before using the Cloudera Manager Admin console, gather information about the server's URL and port.
The server URL takes the following form:

http://<Server host>:<port>

<Server host> is the fully-qualified domain name or IP address of the host machine where the Cloudera Manager Server is installed.
 <port> is the port configured for the Cloudera Manager Server.

 The default port is 7180. For example, use a URL we used the following:

 http://master:7180/


Cloudera Manager does not support changing the admin username for the installed account. You can change the password using Cloudera Manager after you run the wizard in the next section.

To start the Cloudera Manager Admin console:

1.       In a web browser, enter the URL, including the port, for the Cloudera Server. The login screen for Cloudera Manager appears.
2.  Log into Cloudera Manager. The default credentials are:


Username: admin

Password: admin


#Note: It takes time to start the server on the port so wait for some time.

Step 3: Use Cloudera Manager for Automated CDH Installation and Configuration

The following instructions show you how to use the Cloudera Manager wizard to do an initial installation and configuration. The wizard helps you to install and set up Cloudera parcels or packages across your cluster and will:
·         Install and validate your Cloudera Manager License
·         Find the cluster hosts you specify via hostname and IP-address ranges
·         Connect to each host with SSH to install the Cloudera Manager Agent and CDH
·         Install the Oracle JDK on the cluster hosts (if not already installed)
·         Configure Hadoop automatically and start the Hadoop services

To use Cloudera Manager:

1.       The first time you start the Cloudera Manager Admin console, the install wizard starts up.
2.       To install the Free Edition, click Just Install the Latest Free Edition.

3.       Click Continue.

4.       To enable Cloudera Manager to automatically discover the cluster hosts where you want to install CDH, enter the cluster hostnames or IP addresses. You can also specify hostname and IP address ranges:


XXX.XX.XX.XX
XXX.XX.XX.XX
XXX.XX.XX.XX
XXX.XX.XX.XX
               






5.       Click Search.

Cloudera Manager identifies the hosts in your cluster to allow you to configure them for CDH. If there are a large number of hosts on your cluster, wait a few moments to allow them to be discovered and shown in the wizard. If the search is taking too long, you can stop the scan by clicking Abort Scan. To find additional hosts, add their host name or IP address and click Search again.


6.       Verify that the number of hosts shown matches the number of hosts where you want to install CDH. Deselect host entries that do not exist and deselect the hosts where you do not want to install CDH.

Click Continue



7.       Select the repository type you want to use for the installation.


To install using Parcels, select Parcels, and follow the directions at Installation using Parcels below. To install using Packages, select Packages, and follow the directions at Installation using Packages.

We have used Packages


Installation using Packages

1.       Choose the CDH version to install.
2.       Select the major release of CDH to install. This is often CDH4.
3.       Select the specific release of CDH to install from within the major version you selected. You may choose a custom repository, if desired.
4.       If available, select the specific release of Impala to install on your hosts. You may choose either the latest version or use a custom repository. If you do not want to install Impala, select None.
5.       Select the specific release of Cloudera Manager to install on your hosts. You may choose either the version that matches with the Cloudera Manager Server you are currently using or you can specify an installation from a custom repository.
6.       If you opted to use custom repositories for installation files, and your hosts do not have internet access, you must provide a GPG key URL. It will apply for all repositories.
7.       Click Continue.


Provide credentials for authenticating with hosts

1.       Select root or enter the user name for an account that has password-less sudo permissions(hadoopoc).
2.       Select an authentication method.
a.       If you choose to use password authentication, enter and confirm the password.
b.      If you choose to use public-key authentication provide a passphrase and path to the required key files.
c.       You can choose to specify an alternate ssh port. The default value is 22.
d.      You can specify the maximum number of host installations to run at once. The default value is 10.
3.       Click Continue to begin installing the Cloudera Manager Agent and Daemons on the cluster hosts. If you are installing from packages, the process also installs CDH on your hosts.



Install Cloudera Manager and CDH components

The status of installation on each host is displayed in the following screen.


1.       The Cloudera Manager wizard uses SSH to access the cluster hosts and follows a sequence of steps to download and install the Oracle JDK, Cloudera Manager Agents and Daemons, and the CDH packages, if you are installing from packages rather than parcels.

2.       If installation fails on a host, you can click the Uninstall link next to the failed host. This will give you the choice of uninstalling the failed hosts, or to try installation on that host again. To uninstall, click Uninstall Failed Hosts. To retry installation on all failed hosts, click Retry Failed Hosts.

3.       To avoid excessive network load, the wizard runs a limited number of installations in parallel, based on the value to indicated on the page where you provided your authentication credentials. The default is 10 simultaneous installations.
a.       If you are installing from packages, the wizard configures package repositories, installs the Oracle JDK, CDH, and the Cloudera Manager Agent, and then starts the Cloudera Manager Agent. The status of installation on each host is displayed. You can also click the Details link for individual hosts to view detailed information about the installation and error messages if installation fails on any hosts.

4.       When the Continue button appears at the bottom of the screen, the installation process is completed.
5.       When you continue, the Host Inspector runs to validate the installation, and provides a summary of what it finds, including all the versions of the installed components.


Choose the services you want to start on your cluster


1.       Choose the combination of services to install: Core Hadoop, HBase Services, All Services, or Custom Services.

2.       Click Inspect Role Assignments to see how the wizard will assign roles for the services you have chosen, and change them if you need to. The wizard evaluates the hardware configurations of the cluster hosts to determine the best machines for each role. For example, the wizard assigns the NameNode role to the machine that best meets the NameNode requirements. The wizard also configures other options, such as the number of map and reduce slots for TaskTracker, on the basis of the size of the cluster and the physical characteristics of each machines, such as the number of CPUs, amount of RAM, and disk space. These assignments are typically acceptable, but you can reassign services to nodes of your choosing, if desired. Click Continue when you are satisfied with the assignments.

3.       Review Configuration Changes to be applied.

4.       Confirm the settings entered for file system paths. The file paths required vary based on the services to be installed. For example, you might confirm the NameNode Data Directory and the DataNode Data Directory for HDFS or confirm the TaskTracker Local Data Directory List or JobTracker Local Data Directory for MapReduce.

5.       Click Continue. The wizard starts the services on your cluster.

6.       When all of the services are started, click Continue.

7.       Click Continue.


Step 4: Change the Default Administrator Password


As soon as possible after running the wizard and beginning to use Cloudera Manager, you should change the default administrator password.
To change the administrator password:

1.       On the main navigation bar, pull down the admin user menu 
2.       and select the Change Password option.
3.       Enter a new password twice and then click Submit

Step 5: Test the Installation

Now that you have finished with the CDH and Cloudera Manager installation, you are ready to test the installation. For testing instructions, see Testing the Installation.



Cloudera Manger View All Services



HDFS view on CMS




NameNode

 


Job Tracker



Beewax View




Issues faced and resolution:

Ø  Not able to access internet from the command line interface and yum command.
o   For general access

Set http_proxy shell variable

Type the following command to set proxy server:

              $ export http_proxy=http://proxy_server:port/

How do I setup proxy variable for all users?

To setup the proxy environment variable as a global variable, open /etc/profile file:

               $ vi /etc/profile

Add the following information:

               export http_proxy=http:// proxy_server:port /

o   For Yum update

To enable all yum operations to use a proxy server, specify the proxy server details in /etc/yum.conf. The proxy setting must specify the proxy server as a complete URL, including the TCP port number. If your proxy server requires a username and password, specify these by adding proxy_username and proxy_password settings.

Our case
# The proxy server - proxy server:port number
proxy=http:// proxy_server:port /
# The account details for yum connections
proxy_username=not required
proxy_password=not required


Ø  Python not working :Getting all updates
Update all the component in RHEL using command
       $ sudo yum clear all
       $ sudo yum update 


Ø  Firewall is blocking: Turned off firewall & iptable
Task: Disable / Turn off Linux Firewall (Red hat/CentOS/Fedora Core)

Type the following two commands (you must login as the root user):
$ /etc/init.d/iptables save
$ /etc/init.d/iptables stop

Turn off firewall on boot:
# chkconfig iptables off
Ø  Installation failed. Failed to receive heartbeat from agent/Host name issue: Configured the hostname

Changed the hostname in /etc/sysconfig/network

#HOSTNAME=SENxxxxxxxxx028
HOSTNAME=master

#HOSTNAME=SENxxxxxxxxx238
HOSTNAME=slave

#HOSTNAME=SENxxxxxxxxx239
HOSTNAME=slave2
#HOSTNAME=SENxxxxxxxxx009
HOSTNAME=slave3

 





Ø  User have no sudo access: Make the user as password less sudo user .
Upadte the /etc/sudoers (using the visudo command)
Add
Hadoop_user      ALL = (ALL) NOPASSWD: ALL
save.


Ø  Error in SELinux
To disable it permanently, edit /etc/selinux/config.



SELINUX=enforcing
to
SELINUX=disabled          
 



                                                                                                                                                       

Ø  Component fail to install: Download Install it using yum or rpm command

Eg:

$ rpm cloudera-manager-daemons-4.1.4-1.cm414.p0.461.x86_64.rpm

Or



$ rpm hadoop-2.0.0%2B556-1.cdh4.1.3.p0.23.el5.x86_64.rpm                             


Ø  Task tracker not working: Installed java not found Java
                                                                                                                                                       
Modifying CMF_AGENT_JAVA_HOME
In many cases, modifying the CMF_AGENT_JAVA_HOME environment variable is an effective solution for updating the configuration to accommodate a custom JAVA_HOME. Modifying the CMF_AGENT_JAVA_HOME environment variable enables all services on the host to find the JDK. To modify the CMF_AGENT_JAVA_HOME environment variable

1. Open /etc/default/cloudera-scm-agent.
2. Set the CMF_AGENT_JAVA_HOME environment variable to the java home in your environment. For example, you might modify the file to include the following line: export CMF_AGENT_JAVA_HOME=/usr/custom_java
3. Save and close the cloudera-scm-agent file.
4. Restart the Cloudera Manager Agent using the following command: sudo service cloudera-scm-agent restart

Ø  Not able to access HDFS by hadoop fs –ls:user access issue.

sudo su - hdfs
sudo -u hdfs hadoop fs -mkdir /user/$USER
sudo -u hdfs hadoop fs -chown $USER /user/$USER

Ø  Hive error :
                                                                                                                                                       
[saurav@localhost ~]$ hive
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
Hive history file=/tmp/saurav/hive_job_log_saurav_201301301109_1273106907.txt
hive> show tables
    > ;
FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Failed to create database '/var/lib/hive/metastore/metastore_db', see the next exception for details.
NestedThrowables:
java.sql.SQLException: Failed to create database '/var/lib/hive/metastore/metastore_db', see the next exception for details.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask



Solution

The error below suggests you don't have the MySQL JDBC driver installed.  Unfortunately, because of MySQL's licensing, Cloudera Manager can't just do that for you.   See https://ccp.cloudera.com/display/CDH4DOC/Hive+Installation#HiveInstallation-ConfiguringaremoteMySQLdatabaseasHiveMetastore for some instructions.



Ø  Hue/Beewax first user: The first user of should be the user name of your user name of machine
Example: The first user should be hadoop_user in our case.