Hadoop Cluster Installation
on 4 nodes (Using Clouera Manger Suit 4.5)
Structure of Hadoop
cluster
Prerequisites
1. Operating system requirement:
1.
Red Hat-compatible systems
a.
Red Hat
Enterprise Linux 5.7 , 64-bit
2. Supported Browsers for Cloudera Manager
Admin console
The Cloudera Manager Admin console, which you use to configure, manage,
and monitor CDH, supports the following browsers:
·
Firefox 11 or later
·
Google Chrome
·
Internet Explorer 8
·
Internet Explorer 9
3. Other
Requirements
Cloudera Manager supports a variety of services and depends on
resources being available.
Version Support
·
Cloudera Manager 4.5 supports CDH3 Update 1
(cdh3u1) or later and CDH4.0 or later. CDH3 Update 2 or later is strongly
recommended.
·
If you
want to use Cloudera Manager to manage Oozie, CDH3 Update 2 or later is
required.
·
Cloudera
Manager uses Python. Python is part of the default installation for all
operating systems that Cloudera Manager supports, so there is no need to
complete any installation tasks to make Python available. Cloudera Manager is
tested with the default installation. Modifying the Python installation
available on systems on which you install Cloudera Manager is not supported.
·
Impala
0.3 or later.
Resources
Cloudera Manager requires sufficient:
·
Disk space. For example, /var should be allocated a minimum of 5 GB of space.
·
RAM. 4GB is appropriate for most cases, and is
required when using Oracle databases. 2GB may be sufficient for non-Oracle
deployments involving fewer than 100 hosts.
Networking and Security
·
Cluster hosts must have a working network name
resolution system. Properly configuring DNS and reverse DNS meets this
requirement. If you use /etc/hosts
instead of DNS, all hosts files must contain
consistent information about host names and addresses across all nodes. For
example, /etc/hosts
might contain something of the form:
XXX.XX.XX.XX masterXXX.XX.XX.XX slaveXXX.XX.XX.XX slave2XXX.XX.XX.XX slave3
·
No blocking by iptables or firewalls; make sure
port 7180 is open because it is the port used to access Cloudera Manager after
installation. Cloudera Manager communicates using specific ports, which must be
open. For additional port information, see Configuring Ports for Cloudera
Manager Free Edition.
·
No blocking by Security-Enhanced Linux (SELinux).
Automated
Installation of Cloudera Manager and CDH
Step
1: Download and Run the Cloudera Manager Installer
Cloudera Manager accesses archive.cloudera.com by using yum on Red Hat systems, zypper on SUSE systems,
or apt-get on Debian/Ubuntu systems. If your hosts access the Internet through
an HTTP Proxy, you can configure yum, zypper, or apt-get, system-wide, to
access archive.cloudera.com through a proxy. To do so, modify the system
configuration on the Cloudera Manager Server host and on every cluster host
where you want to install CDH. This is not required in all cases.
·
To
configure your system to use a proxy:
On Red Hat systems, add the following
property to /etc/yum.conf:
proxy=http://proxy_server:port_no/
·
To
download and run the Cloudera Manager Installer:
1.
Download
cloudera-manager-installer.bin
from the Cloudera Downloads page to
the host where you want to install the Cloudera Manager Server. The host must
be on your cluster or accessible to your cluster over your network. Install the
Cloudera Manager Server on a single host.
2.
After downloading cloudera-manager-installer.bin, change it to have executable permission.
$ chmod u+x
cloudera-manager-installer.bin
3.
Run cloudera-manager-installer.bin.
$ sudo ./cloudera-manager-installer.bin
4. Read the Cloudera Manager Readme and then press Enter to
choose Next.
5. Read the Cloudera Manager License and then press Enter to
choose Next. Use the arrow keys and press Enter to choose Yes to
confirm you accept the license.
6.
Read
the Oracle Binary Code License Agreement and then press Enter to choose Next.
Use the arrow keys and press Enter to choose Yes to confirm you
accept the Oracle Binary Code License Agreement. The Cloudera Manager installer
begins installing the Oracle JDK and the Cloudera Manager repo files and then
installs the packages. The installer also installs the embedded PostgreSQL
database and the Cloudera Manager Server.
7.
Note
the complete URL provided for the Cloudera Manager Admin Console, including the
port number, which is 7180 by default. Click OK to continue.
8.
Click OK
to exit the installer.
Step 2: Start the Cloudera Manager Admin Console
The Cloudera Manager Admin console enables you to use
Cloudera Manager to configure, manage, and monitor Hadoop on your cluster.
Before using the Cloudera Manager Admin console, gather information about the
server's URL and port.
The server URL takes the following form:
http://<Server host>:<port>
<Server
host> is the fully-qualified
domain name or IP address of the host machine where the Cloudera Manager Server
is installed.
<port> is the port configured for the Cloudera Manager Server.
The default port is
7180. For example, use a URL we used the following:
http://master:7180/
Cloudera Manager does not support changing the admin username for the installed account. You can change the
password using Cloudera Manager after you run the wizard in the next section.
To start the Cloudera Manager Admin console:
1.
In a
web browser, enter the URL, including the port, for the Cloudera Server. The
login screen for Cloudera Manager appears.
2.
Log
into Cloudera Manager. The default credentials are:
Username: admin
Password: admin
#Note: It
takes time to start the server on the port so wait for some time.
Step
3: Use Cloudera Manager for Automated CDH Installation and Configuration
The following instructions show you how to use the
Cloudera Manager wizard to do an initial installation and configuration. The
wizard helps you to install and set up Cloudera parcels or packages across your
cluster and will:
·
Install
and validate your Cloudera Manager License
·
Find
the cluster hosts you specify via hostname and IP-address ranges
·
Connect
to each host with SSH to install the Cloudera Manager Agent and CDH
·
Install
the Oracle JDK on the cluster hosts (if not already installed)
·
Configure
Hadoop automatically and start the Hadoop services
To use Cloudera Manager:
1.
The
first time you start the Cloudera Manager Admin console, the install wizard
starts up.
2.
To
install the Free Edition, click Just Install the Latest Free Edition.
3.
Click Continue.
4.
To
enable Cloudera Manager to automatically discover the cluster hosts where you
want to install CDH, enter the cluster hostnames or IP addresses. You can also
specify hostname and IP address ranges:
XXX.XX.XX.XXXXX.XX.XX.XXXXX.XX.XX.XXXXX.XX.XX.XX
5.
Click Search.
Cloudera
Manager identifies the hosts in your cluster to allow you to configure them for
CDH. If there are a large number of hosts on your cluster, wait a few moments
to allow them to be discovered and shown in the wizard. If the search is taking
too long, you can stop the scan by clicking Abort Scan. To find
additional hosts, add their host name or IP address and click Search again.
6.
Verify
that the number of hosts shown matches the number of hosts where you want to
install CDH. Deselect host entries that do not exist and deselect the hosts
where you do not want to install CDH.
Click Continue
7.
Select
the repository type you want to use for the installation.
To
install using Parcels, select Parcels, and follow the directions at
Installation using Parcels below. To install using Packages, select Packages,
and follow the directions at Installation using Packages.
We have used Packages
Installation
using Packages
1. Choose the CDH version to install.
2. Select the major release of CDH to install. This is often CDH4.
3. Select the specific release of CDH to install from within the major
version you selected. You may choose a custom repository, if desired.
4. If available, select the specific release of Impala to install on your
hosts. You may choose either the latest version or use a custom repository. If
you do not want to install Impala, select None.
5. Select the specific release of Cloudera Manager to install on your
hosts. You may choose either the version that matches with the Cloudera Manager
Server you are currently using or you can specify an installation from a custom
repository.
6. If you opted to use custom repositories for installation files, and
your hosts do not have internet access, you must provide a GPG key URL. It will
apply for all repositories.
7.
Click Continue.
Provide
credentials for authenticating with hosts
1. Select root or enter the user name for an account that has
password-less sudo permissions(hadoopoc).
2.
Select an authentication method.
a. If you choose to use password authentication, enter and confirm the
password.
b. If you choose to use public-key authentication provide a passphrase
and path to the required key files.
c. You can choose to specify an alternate ssh port. The default value is
22.
d.
You can specify the maximum number of host
installations to run at once. The default value is 10.
3.
Click Continue to begin installing the
Cloudera Manager Agent and Daemons on the cluster hosts. If you are installing
from packages, the process also installs CDH on your hosts.
Install
Cloudera Manager and CDH components
The status of installation on each host is displayed in the following
screen.
1.
The Cloudera
Manager wizard uses SSH to access the cluster hosts and follows a sequence of
steps to download and install the Oracle JDK, Cloudera Manager Agents and
Daemons, and the CDH packages, if you are installing from packages rather than
parcels.
2.
If installation
fails on a host, you can click the Uninstall link next to the failed
host. This will give you the choice of uninstalling the failed hosts, or to try
installation on that host again. To uninstall, click Uninstall Failed Hosts.
To retry installation on all failed hosts, click Retry Failed Hosts.
3.
To
avoid excessive network load, the wizard runs a limited number of installations
in parallel, based on the value to indicated on the page where you provided
your authentication credentials. The default is 10 simultaneous installations.
a.
If you
are installing from packages, the
wizard configures package repositories, installs the Oracle JDK, CDH, and the
Cloudera Manager Agent, and then starts the Cloudera Manager Agent. The status
of installation on each host is displayed. You can also click the Details link
for individual hosts to view detailed information about the installation and
error messages if installation fails on any hosts.
4.
When
the Continue button appears at the bottom of the screen, the installation
process is completed.
5.
When
you continue, the Host Inspector runs to validate the installation, and
provides a summary of what it finds, including all the versions of the
installed components.
Choose
the services you want to start on your cluster
1.
Choose
the combination of services to install: Core Hadoop, HBase Services, All
Services, or Custom Services.
2.
Click Inspect
Role Assignments to see how the wizard will assign roles for the services
you have chosen, and change them if you need to. The wizard evaluates the
hardware configurations of the cluster hosts to determine the best machines for
each role. For example, the wizard assigns the NameNode role to the machine
that best meets the NameNode requirements. The wizard also configures other
options, such as the number of map and reduce slots for TaskTracker, on the
basis of the size of the cluster and the physical characteristics of each
machines, such as the number of CPUs, amount of RAM, and disk space. These
assignments are typically acceptable, but you can reassign services to nodes of
your choosing, if desired. Click Continue when you are satisfied with
the assignments.
3.
Review
Configuration Changes to be applied.
4.
Confirm
the settings entered for file system paths. The file paths required vary based
on the services to be installed. For example, you might confirm the NameNode
Data Directory and the DataNode Data Directory for HDFS or confirm the
TaskTracker Local Data Directory List or JobTracker Local Data Directory for
MapReduce.
5.
Click Continue.
The wizard starts the services on your cluster.
6.
When
all of the services are started, click Continue.
7.
Click Continue.
Step
4: Change the Default Administrator Password
As soon as
possible after running the wizard and beginning to use Cloudera Manager, you
should change the default administrator password.
To change the
administrator password:
1. On the main navigation bar, pull down the admin user menu
2. and select the Change Password option.
3.
Enter a new password twice and then click Submit
Step
5: Test the Installation
Now that you
have finished with the CDH and Cloudera Manager installation, you are ready to
test the installation. For testing instructions, see Testing the Installation.
Cloudera Manger View All Services
HDFS view on CMS
Job Tracker
Beewax View
Issues faced and resolution:
Ø
Not able to access internet from the command line interface and yum
command.
o
For general access
Set http_proxy shell variable
Type the following command to set proxy server:
$
export http_proxy=http://proxy_server:port/
How do I setup proxy variable for all users?
To setup the proxy environment variable as a global variable, open
/etc/profile file:
$ vi
/etc/profile
Add the following information:
export
http_proxy=http:// proxy_server:port /
o
For Yum update
To enable all yum operations to use a proxy server,
specify the proxy server details in /etc/yum.conf. The proxy setting must
specify the proxy server as a complete URL, including the TCP port number. If
your proxy server requires a username and password, specify these by adding
proxy_username and proxy_password settings.
Our case
# The proxy server - proxy
server:port number
proxy=http:// proxy_server:port /
# The account details for
yum connections
proxy_username=not
required
proxy_password=not
required
Ø
Python
not working :Getting all updates
Update all the component in RHEL using command
$ sudo yum clear all
$ sudo yum update
Ø
Firewall
is blocking: Turned off firewall & iptable
Task: Disable
/ Turn off Linux Firewall (Red hat/CentOS/Fedora Core)
Type the
following two commands (you must login as the root user):
$ /etc/init.d/iptables
save
$ /etc/init.d/iptables
stop
Turn off
firewall on boot:
# chkconfig iptables off
Ø Installation failed. Failed to receive
heartbeat from agent/Host name issue: Configured the hostname
Changed the
hostname in /etc/sysconfig/network
|
#HOSTNAME=SENxxxxxxxxx028
HOSTNAME=master
#HOSTNAME=SENxxxxxxxxx238
HOSTNAME=slave
#HOSTNAME=SENxxxxxxxxx239
HOSTNAME=slave2
#HOSTNAME=SENxxxxxxxxx009
HOSTNAME=slave3
|
Ø
User have
no sudo access: Make the user as password less sudo user .
Upadte
the /etc/sudoers (using the visudo command)
Add
Hadoop_user ALL = (ALL) NOPASSWD: ALL
save.
Ø
Error in
SELinux
To disable it permanently,
edit /etc/selinux/config.
|
SELINUX=enforcing
to
SELINUX=disabled
|
Ø Component fail to install: Download Install
it using yum or rpm command
Eg:
$ rpm cloudera-manager-daemons-4.1.4-1.cm414.p0.461.x86_64.rpm
Or
$ rpm hadoop-2.0.0%2B556-1.cdh4.1.3.p0.23.el5.x86_64.rpm
Ø Task tracker not working: Installed java
not found Java
Modifying CMF_AGENT_JAVA_HOME
In many cases, modifying the
CMF_AGENT_JAVA_HOME environment variable is an effective solution for updating
the configuration to accommodate a custom JAVA_HOME. Modifying the
CMF_AGENT_JAVA_HOME environment variable enables all services on the host to find
the JDK. To modify the CMF_AGENT_JAVA_HOME environment variable
1. Open
/etc/default/cloudera-scm-agent.
2. Set the
CMF_AGENT_JAVA_HOME environment variable to the java home in your environment.
For example, you might modify the file to include the following line: export
CMF_AGENT_JAVA_HOME=/usr/custom_java
3. Save and close the
cloudera-scm-agent file.
4. Restart the Cloudera
Manager Agent using the following command: sudo service cloudera-scm-agent
restart
Ø Not able to access HDFS by hadoop fs
–ls:user access issue.
sudo su - hdfs
sudo -u hdfs hadoop fs -mkdir /user/$USER
sudo -u hdfs hadoop fs -chown $USER /user/$USER
Ø Hive error :
[saurav@localhost ~]$ hive
Logging initialized using
configuration in file:/etc/hive/conf.dist/hive-log4j.properties
Hive history
file=/tmp/saurav/hive_job_log_saurav_201301301109_1273106907.txt
hive> show tables
> ;
FAILED: Error in metadata:
javax.jdo.JDOFatalDataStoreException: Failed to create database '/var/lib/hive/metastore/metastore_db', see the next exception for
details.
NestedThrowables:
java.sql.SQLException:
Failed to create database '/var/lib/hive/metastore/metastore_db', see the next
exception for details.
FAILED: Execution Error,
return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Solution
The error below suggests you don't have the MySQL JDBC driver
installed. Unfortunately, because of MySQL's licensing, Cloudera Manager
can't just do that for you. See https://ccp.cloudera.com/display/CDH4DOC/Hive+Installation#HiveInstallation-ConfiguringaremoteMySQLdatabaseasHiveMetastore for some instructions.
Ø Hue/Beewax first user: The first user of
should be the user name of your user name of machine
Example: The first user should be
hadoop_user in our case.





