testmapred.java

来自「hadoop:Nutch集群平台」· Java 代码 · 共 456 行 · 第 1/2 页
JAVA
456 行
/** * Copyright 2006 The Apache Software Foundation * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * *     http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */package org.apache.hadoop.record.test;import org.apache.hadoop.mapred.*;import org.apache.hadoop.fs.*;import org.apache.hadoop.io.*;import org.apache.hadoop.io.SequenceFile.CompressionType;import org.apache.hadoop.conf.*;import junit.framework.TestCase;import java.io.*;import java.util.*;/********************************************************** * MapredLoadTest generates a bunch of work that exercises * a Hadoop Map-Reduce system (and DFS, too).  It goes through * the following steps: * * 1) Take inputs 'range' and 'counts'. * 2) Generate 'counts' random integers between 0 and range-1. * 3) Create a file that lists each integer between 0 and range-1, *    and lists the number of times that integer was generated. * 4) Emit a (very large) file that contains all the integers *    in the order generated. * 5) After the file has been generated, read it back and count *    how many times each int was generated. * 6) Compare this big count-map against the original one.  If *    they match, then SUCCESS!  Otherwise, FAILURE! * * OK, that's how we can think about it.  What are the map-reduce * steps that get the job done? * * 1) In a non-mapred thread, take the inputs 'range' and 'counts'. * 2) In a non-mapread thread, generate the answer-key and write to disk. * 3) In a mapred job, divide the answer key into K jobs. * 4) A mapred 'generator' task consists of K map jobs.  Each reads *    an individual "sub-key", and generates integers according to *    to it (though with a random ordering). * 5) The generator's reduce task agglomerates all of those files *    into a single one. * 6) A mapred 'reader' task consists of M map jobs.  The output *    file is cut into M pieces. Each of the M jobs counts the  *    individual ints in its chunk and creates a map of all seen ints. * 7) A mapred job integrates all the count files into a single one. * **********************************************************/public class TestMapRed extends TestCase {    /**     * Modified to make it a junit test.     * The RandomGen Job does the actual work of creating     * a huge file of assorted numbers.  It receives instructions     * as to how many times each number should be counted.  Then     * it emits those numbers in a crazy order.     *     * The map() function takes a key/val pair that describes     * a value-to-be-emitted (the key) and how many times it      * should be emitted (the value), aka "numtimes".  map() then     * emits a series of intermediate key/val pairs.  It emits     * 'numtimes' of these.  The key is a random number and the     * value is the 'value-to-be-emitted'.     *     * The system collates and merges these pairs according to     * the random number.  reduce() function takes in a key/value     * pair that consists of a crazy random number and a series     * of values that should be emitted.  The random number key     * is now dropped, and reduce() emits a pair for every intermediate value.     * The emitted key is an intermediate value.  The emitted value     * is just a blank string.  Thus, we've created a huge file     * of numbers in random order, but where each number appears     * as many times as we were instructed.     */    static public class RandomGenMapper implements Mapper {        Random r = new Random();        public void configure(JobConf job) {        }        public void map(WritableComparable key, Writable val, OutputCollector out, Reporter reporter) throws IOException {            int randomVal = ((RecInt) key).getData();            int randomCount = ((RecInt) val).getData();            for (int i = 0; i < randomCount; i++) {                out.collect(new RecInt(Math.abs(r.nextInt())),                        new RecString(new Text(Integer.toString(randomVal))));            }        }        public void close() {        }    }    /**     */    static public class RandomGenReducer implements Reducer {        public void configure(JobConf job) {        }        public void reduce(WritableComparable key,                Iterator it,                OutputCollector out,                Reporter reporter)                throws IOException {            int keyint = ((RecInt) key).getData();            while (it.hasNext()) {                Text val = ((RecString) it.next()).getData();                out.collect(new RecInt(Integer.parseInt(val.toString())),                        new RecString(new Text("")));            }        }        public void close() {        }    }    /**     * The RandomCheck Job does a lot of our work.  It takes     * in a num/string keyspace, and transforms it into a     * key/count(int) keyspace.     *     * The map() function just emits a num/1 pair for every     * num/string input pair.     *     * The reduce() function sums up all the 1s that were     * emitted for a single key.  It then emits the key/total     * pair.     *     * This is used to regenerate the random number "answer key".     * Each key here is a random number, and the count is the     * number of times the number was emitted.     */    static public class RandomCheckMapper implements Mapper {        public void configure(JobConf job) {        }        public void map(WritableComparable key, Writable val, OutputCollector out, Reporter reporter) throws IOException {            int pos = ((RecInt) key).getData();            Text str = ((RecString) val).getData();            out.collect(new RecInt(pos), new RecString(new Text("1")));        }        public void close() {        }    }    /**     */    static public class RandomCheckReducer implements Reducer {        public void configure(JobConf job) {        }                public void reduce(WritableComparable key, Iterator it, OutputCollector out, Reporter reporter) throws IOException {            int keyint = ((RecInt) key).getData();            int count = 0;            while (it.hasNext()) {                it.next();                count++;            }            out.collect(new RecInt(keyint), new RecString(new Text(Integer.toString(count))));        }        public void close() {        }    }    /**     * The Merge Job is a really simple one.  It takes in     * an int/int key-value set, and emits the same set.     * But it merges identical keys by adding their values.     *     * Thus, the map() function is just the identity function     * and reduce() just sums.  Nothing to see here!     */    static public class MergeMapper implements Mapper {        public void configure(JobConf job) {        }        public void map(WritableComparable key, Writable val, OutputCollector out, Reporter reporter) throws IOException {            int keyint = ((RecInt) key).getData();            Text valstr = ((RecString) val).getData();            out.collect(new RecInt(keyint), new RecInt(Integer.parseInt(valstr.toString())));        }        public void close() {        }    }    static public class MergeReducer implements Reducer {        public void configure(JobConf job) {        }                public void reduce(WritableComparable key, Iterator it, OutputCollector out, Reporter reporter) throws IOException {            int keyint = ((RecInt) key).getData();            int total = 0;            while (it.hasNext()) {                total += ((RecInt) it.next()).getData();            }            out.collect(new RecInt(keyint), new RecInt(total));        }        public void close() {        }    }    private static int range = 10;    private static int counts = 100;    private static Random r = new Random();    private static Configuration conf = new Configuration();    /**       public TestMapRed(int range, int counts, Configuration conf) throws IOException {       this.range = range;       this.counts = counts;       this.conf = conf;       }    **/    public void testMapred() throws Exception {	launch();    }    /**     *      */
testmapred.java - 源码说明

本页面展示了「hadoop:Nutch集群平台」中的 testmapred.java 源码文件，采用 Java 编程语言编写，共 456 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与hadoop相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?