📄 getting_started.txt

📁 HADOOP 0.18.0安装源代码头文件
💻 TXT
字号:
            Getting Started With Hadoop On Demand (HOD)            ===========================================1. Pre-requisites:==================Hardware:HOD requires a minimum of 3 nodes configured through a resource manager.Software:The following components are assumed to be installed before using HOD:* Torque:  (http://www.clusterresources.com/pages/products/torque-resource-manager.php)  Currently HOD supports Torque out of the box. We assume that you are  familiar with configuring Torque. You can get information about this  from the following link:   http://www.clusterresources.com/wiki/doku.php?id=torque:torque_wiki* Python (http://www.python.org/)  We require version 2.5.1 of Python.    The following components can be optionally installed for getting betterfunctionality from HOD:* Twisted Python: This can be used for improving the scalability of HOD  (http://twistedmatrix.com/trac/)* Hadoop: HOD can automatically distribute Hadoop to all nodes in the   cluster. However, it can also use a pre-installed version of Hadoop,  if it is available on all nodes in the cluster.  (http://hadoop.apache.org/core)  HOD currently supports Hadoop 0.15 and above.NOTE: HOD configuration requires the location of installs of these components to be the same on all nodes in the cluster. It will also make the configuration simpler to have the same location on the submitnodes.2. Resource Manager Configuration Pre-requisites:=================================================For using HOD with Torque:* Install Torque components: pbs_server on a head node, pbs_moms on all  compute nodes, and PBS client tools on all compute nodes and submit   nodes.* Create a queue for submitting jobs on the pbs_server.* Specify a name for all nodes in the cluster, by setting a 'node   property' to all the nodes.  This can be done by using the 'qmgr' command. For example:  qmgr -c "set node node properties=cluster-name"* Ensure that jobs can be submitted to the nodes. This can be done by  using the 'qsub' command. For example:  echo "sleep 30" | qsub -l nodes=3* More information about setting up Torque can be found by referring  to the documentation under:http://www.clusterresources.com/pages/products/torque-resource-manager.php3. Setting up HOD:==================* HOD is available under the 'contrib' section of Hadoop under the root  directory 'hod'.* Distribute the files under this directory to all the nodes in the  cluster. Note that the location where the files are copied should be  the same on all the nodes.* On the node from where you want to run hod, edit the file hodrc   which can be found in the <install dir>/conf directory. This file  contains the minimal set of values required for running hod.* Specify values suitable to your environment for the following   variables defined in the configuration file. Note that some of these  variables are defined at more than one place in the file.  * ${JAVA_HOME}: Location of Java for Hadoop. Hadoop supports Sun JDK    1.5.x  * ${CLUSTER_NAME}: Name of the cluster which is specified in the     'node property' as mentioned in resource manager configuration.  * ${HADOOP_HOME}: Location of Hadoop installation on the compute and    submit nodes.  * ${RM_QUEUE}: Queue configured for submiting jobs in the resource    manager configuration.  * ${RM_HOME}: Location of the resource manager installation on the    compute and submit nodes.* The following environment variables *may* need to be set depending on   your environment. These variables must be defined where you run the   HOD client, and also be specified in the HOD configuration file as the   value of the key resource_manager.env-vars. Multiple variables can be  specified as a comma separated list of key=value pairs.  * HOD_PYTHON_HOME: If you install python to a non-default location     of the compute nodes, or submit nodes, then, this variable must be     defined to point to the python executable in the non-standard     location.NOTE: You can also review other configuration options in the file andmodify them to suit your needs. Refer to the file config.txt for information about the HOD configuration.4. Running HOD:===============4.1 Overview:-------------A typical session of HOD will involve atleast three steps: allocate, run hadoop jobs, deallocate.4.1.1 Operation allocate------------------------The allocate operation is used to allocate a set of nodes and install andprovision Hadoop on them. It has the following syntax:  hod -c config_file -t hadoop_tarball_location -o "allocate \                                                cluster_dir number_of_nodes"The hadoop_tarball_location must be a location on a shared file systemaccesible from all nodes in the cluster. Note, the cluster_dir must existbefore running the command. If the command completes successfully thencluster_dir/hadoop-site.xml will be generated and will contain informationabout the allocated cluster's JobTracker and NameNode.For example, the following command uses a hodrc file in ~/hod-config/hodrc andallocates Hadoop (provided by the tarball ~/share/hadoop.tar.gz) on 10 nodes,storing the generated Hadoop configuration in a directory named~/hadoop-cluster:  $ hod -c ~/hod-config/hodrc -t ~/share/hadoop.tar.gz -o "allocate \                                                        ~/hadoop-cluster 10"HOD also supports an environment variable called HOD_CONF_DIR. If this isdefined, HOD will look for a default hodrc file at $HOD_CONF_DIR/hodrc.Defining this allows the above command to also be run as follows:  $ export HOD_CONF_DIR=~/hod-config  $ hod -t ~/share/hadoop.tar.gz -o "allocate ~/hadoop-cluster 10" 4.1.2 Running Hadoop jobs using the allocated cluster-----------------------------------------------------Now, one can run Hadoop jobs using the allocated cluster in the usual manner:  hadoop --config cluster_dir hadoop_command hadoop_command_argsContinuing our example, the following command will run a wordcount example onthe allocated cluster:  $ hadoop --config ~/hadoop-cluster jar \       /path/to/hadoop/hadoop-examples.jar wordcount /path/to/input /path/to/output 4.1.3 Operation deallocate--------------------------The deallocate operation is used to release an allocated cluster. Whenfinished with a cluster, deallocate must be run so that the nodes become freefor others to use. The deallocate operation has the following syntax:  hod -o "deallocate cluster_dir"Continuing our example, the following command will deallocate the cluster:  $ hod -o "deallocate ~/hadoop-cluster" 4.2 Command Line Options------------------------This section covers the major command line options available via the hodcommand:--helpPrints out the help message to see the basic options.--verbose-helpAll configuration options provided in the hodrc file can be passed on thecommand line, using the syntax --section_name.option_name[=value]. Whenprovided this way, the value provided on command line overrides the optionprovided in hodrc. The verbose-help command lists all the available options inthe hodrc file. This is also a nice way to see the meaning of theconfiguration options.-c config_fileProvides the configuration file to use. Can be used with all other options ofHOD. Alternatively, the HOD_CONF_DIR environment variable can be defined tospecify a directory that contains a file named hodrc, alleviating the need tospecify the configuration file in each HOD command.-b 1|2|3|4Enables the given debug level. Can be used with all other options of HOD. 4 ismost verbose.-o "help"Lists the operations available in the operation mode.-o "allocate cluster_dir number_of_nodes"Allocates a cluster on the given number of cluster nodes, and store theallocation information in cluster_dir for use with subsequent hadoop commands.Note that the cluster_dir must exist before running the command.-o "list"Lists the clusters allocated by this user. Information provided includes theTorque job id corresponding to the cluster, the cluster directory where theallocation information is stored, and whether the Map/Reduce daemon is stillactive or not.-o "info cluster_dir"Lists information about the cluster whose allocation information is stored inthe specified cluster directory.-o "deallocate cluster_dir"Deallocates the cluster whose allocation information is stored in thespecified cluster directory.-t hadoop_tarballProvisions Hadoop from the given tar.gz file. This option is only applicableto the allocate operation. For better distribution performance it isrecommended that the Hadoop tarball contain only the libraries and binaries,and not the source or documentation. -Mkey1=value1 -Mkey2=value2Provides configuration parameters for the provisioned Map/Reduce daemons(JobTracker and TaskTrackers). A hadoop-site.xml is generated with thesevalues on the cluster nodes-Hkey1=value1 -Hkey2=value2Provides configuration parameters for the provisioned HDFS daemons (NameNodeand DataNodes). A hadoop-site.xml is generated with these values on thecluster nodes-Ckey1=value1 -Ckey2=value2Provides configuration parameters for the client from where jobs can besubmitted. A hadoop-site.xml is generated with these values on the submitnode.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -