{"id":347,"date":"2012-01-25T22:50:36","date_gmt":"2012-01-26T06:50:36","guid":{"rendered":"http:\/\/h2plus.biz\/hiromitsu\/?p=347"},"modified":"2018-08-06T13:48:23","modified_gmt":"2018-08-06T20:48:23","slug":"hadoop%e3%81%a7%e5%a7%8b%e3%82%81%e3%82%8b%e4%b8%a6%e5%88%97%e3%83%87%e3%83%bc%e3%82%bf%e8%a7%a3%e6%9e%90%ef%bc%8f%e5%be%8c%e7%b7%a8","status":"publish","type":"post","link":"https:\/\/h2plus.biz\/hiromitsu\/entry\/347","title":{"rendered":"Hadoop\u3067\u59cb\u3081\u308b\u4e26\u5217\u30c7\u30fc\u30bf\u89e3\u6790\uff0f\u5f8c\u7de8"},"content":{"rendered":"<p>1\u670813\u65e5\uff08\u91d1\uff09\u306bPalo Alto\u3067\u884c\u308f\u308c\u305f<a href=\"http:\/\/www.jtpa.org\/category\/event\/geeksalon\">JTPA\u306e\u30ae\u30fc\u30af\u30b5\u30ed\u30f3<\/a>\u306b\u53c2\u52a0\u3057\u3066\u304d\u305f\u3002\u4eca\u56de\u306f\u53c2\u52a0\u8005\u304c\u30e9\u30c3\u30d7\u30c8\u30c3\u30d7\u6301\u3061\u8fbc\u307f\u3067\u30b3\u30fc\u30c7\u30a3\u30f3\u30b0\u3057\u3066\u3044\u304f\u30cf\u30c3\u30ab\u30bd\u30f3\u5f62\u5f0f\u3067\u3001\u4f1a\u5834\u5165\u308a\u3059\u308b\u524d\u307e\u3067\u306bHadoop\u304c\u4f7f\u3048\u308b\u74b0\u5883\u3092\u81ea\u524d\u3067\u7528\u610f\u3057\u3066\u304a\u304f\u5fc5\u8981\u304c\u3042\u3063\u305f\u3002<\/p>\n<p><a href=\"http:\/\/h2plus.biz\/hiromitsu\/entry\/267\">\u524d\u7de8<\/a>\u3067\u306f\u3001Hadoop\u3092\u4f7f\u3063\u3066\u8a08\u7b97\u51e6\u7406\u3092\u3059\u308b\u305f\u3081\u306e\u6e96\u5099\u3068\u3057\u3066\u3001EC2\u4e0a\u306bHadoop\u30af\u30e9\u30b9\u30bf\u3092\u69cb\u7bc9\u3059\u308b\u65b9\u6cd5\u3092\u7d39\u4ecb\u3057\u305f\u3002\u5f8c\u7de8\u3067\u306f\u5b9f\u969b\u306b\u30ae\u30fc\u30af\u30b5\u30ed\u30f3\u3067\u624b\u3092\u52d5\u304b\u3057\u3066\u3084\u3063\u305f\u3053\u3068\u3092\u4e2d\u5fc3\u306b\u7d39\u4ecb\u3057\u3066\u3044\u3053\u3046\u3002<\/p>\n<p><!--more--><\/p>\n<h3>WordCount.java &#8211; \u6700\u521d\u306e\u30b5\u30f3\u30d7\u30eb\u30b3\u30fc\u30c9<\/h3>\n<p><a href=\"https:\/\/developer.yahoo.com\/hadoop\/tutorial\/module4.html\" class=\"broken_link\">Hadoop Tutorial \/ Module4: MapReduce<\/a>\u3067\u7d39\u4ecb\u3055\u308c\u3066\u3044\u308b<code>WordCount.java<\/code>\u3092\u4f7f\u3063\u3066\u3001EC2\u4e0a\u306eHadoop\u30af\u30e9\u30b9\u30bf\u3067\u51e6\u7406\u3092\u3055\u305b\u3066\u307f\u308b\u3002\u3044\u308f\u3086\u308b<code>\"Hello, wolrd!\"<\/code>\u7684\u306a\u30b5\u30f3\u30d7\u30eb\u3002<\/p>\n<p>\u307e\u305a\u306fEC2\u4e0a\u306eUbuntu\u306bSSH\u3067\u30ed\u30b0\u30a4\u30f3\u3057\u3066\u3001Hadoop\u306e\u30af\u30e9\u30b9\u30bf\u3092\u7acb\u3061\u4e0a\u3052\u3001Hadoop\u30af\u30e9\u30b9\u30bf\u306e\u30de\u30b9\u30bf\u30fc\u30ce\u30fc\u30c9\u306b\u30ed\u30b0\u30a4\u30f3\u3059\u308b\u3002<\/p>\n<pre>\nlocal$ ec2hadoop\nubuntu$ hadoop-ec2 launch-cluster hadoop-cluster 2\nubuntu$ hadoop-ec2 login hadoop-cluster\ncluster#\n<\/pre>\n<p>\u30db\u30fc\u30e0\u30c7\u30a3\u30ec\u30af\u30c8\u30ea\u306b\u4f5c\u696d\u7528\u30c7\u30a3\u30ec\u30af\u30c8\u30ea\u3092\u4f5c\u6210\u3057\u3066\u3001\u305d\u3053\u306b<code>WordCount.java<\/code>\u306e\u30bd\u30fc\u30b9\u30b3\u30fc\u30c9\u3092\u30b3\u30d4\u30fc\u3059\u308b\u3002<\/p>\n<pre>\ncluster# mkdir ~\/cordcount\ncluster# cp $HADOOP_HOME\/src\/examples\/org\/apache\/hadoop\/examples\/WordCount.java ~\/wordcount\n<\/pre>\n<p><code>WordCount.java<\/code>\u306f\u8907\u6570\u306e\u30c6\u30ad\u30b9\u30c8\u30d5\u30a1\u30a4\u30eb\u3092\u5165\u529b\u3068\u3057\u3066\u3001\u30d5\u30a1\u30a4\u30eb\u4e2d\u306e\u5358\u8a9e\u3092\u3059\u3079\u3066\u30ab\u30a6\u30f3\u30c8\u3057\u3066\u3001\u5404\u5358\u8a9e\u306e\u51fa\u73fe\u56de\u6570\u96c6\u8a08\u3059\u308b\u30a2\u30d7\u30ea\u3002\u7c21\u7565\u5316\u3057\u305f\u30bd\u30fc\u30b9\u30b3\u30fc\u30c9\u3092\u4ee5\u4e0b\u306b\u63b2\u8f09\u3057\u3066\u304a\u3053\u3046\u3002<\/p>\n<pre class=\"brush: java; title: ; notranslate\" title=\"\">\r\n\/**\r\n * Licensed to the Apache Software Foundation (ASF) under one\r\n * or more contributor license agreements.  See the NOTICE file\r\n * distributed with this work for additional information\r\n * regarding copyright ownership.  The ASF licenses this file\r\n * to you under the Apache License, Version 2.0 (the\r\n * &amp;amp;amp;quot;License&amp;amp;amp;quot;); you may not use this file except in compliance\r\n * with the License.  You may obtain a copy of the License at\r\n *\r\n *     http:\/\/www.apache.org\/licenses\/LICENSE-2.0\r\n *\r\n * Unless required by applicable law or agreed to in writing, software\r\n * distributed under the License is distributed on an &amp;amp;amp;quot;AS IS&amp;amp;amp;quot; BASIS,\r\n * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\r\n * See the License for the specific language governing permissions and\r\n * limitations under the License.\r\n *\/\r\n\r\npackage org.apache.hadoop.examples;\r\n\r\nimport java.io.IOException;\r\nimport java.util.*;\r\n\r\nimport org.apache.hadoop.fs.Path;\r\nimport org.apache.hadoop.conf.*;\r\nimport org.apache.hadoop.io.*;\r\nimport org.apache.hadoop.mapred.*;\r\nimport org.apache.hadoop.util.*;\r\n\r\npublic class WordCount extends Configured implements Tool {\r\n  \/**\r\n   * Counts the words in each line.\r\n   * For each line of input, break the line into words and emit them.\r\n   *\/\r\n  public static class MapClass extends MapReduceBase\r\n    implements Mapper&amp;amp;amp;lt;LongWritable, Text, Text, IntWritable&amp;amp;amp;gt; {\r\n    \r\n    private final static IntWritable one = new IntWritable(1);\r\n    private Text word = new Text();\r\n    \r\n    public void map(LongWritable key, Text value, \r\n                    OutputCollector&amp;amp;amp;lt;Text, IntWritable&amp;amp;amp;gt; output, \r\n                    Reporter reporter) throws IOException {\r\n      String line = value.toString();\r\n      StringTokenizer itr = new StringTokenizer(line);\r\n      while (itr.hasMoreTokens()) {\r\n        word.set(itr.nextToken());\r\n        output.collect(word, one);\r\n      }\r\n    }\r\n  }\r\n  \r\n  \/\/ A reducer class that just emits the sum of the input values.\r\n  public static class Reduce extends MapReduceBase\r\n    implements Reducer&amp;amp;amp;lt;Text, IntWritable, Text, IntWritable&amp;amp;amp;gt; {\r\n    \r\n    public void reduce(Text key, Iterator&amp;amp;amp;lt;IntWritable&amp;amp;amp;gt; values,\r\n                       OutputCollector&amp;amp;amp;lt;Text, IntWritable&amp;amp;amp;gt; output, \r\n                       Reporter reporter) throws IOException {\r\n      int sum = 0;\r\n      while (values.hasNext()) {\r\n        sum += values.next().get();\r\n      }\r\n      output.collect(key, new IntWritable(sum));\r\n    }\r\n  }\r\n\r\n  \/\/ The main driver for word count map\/reduce program.\r\n  \/\/ Invoke this method to submit the map\/reduce job.\r\n  public static void main(String[] args) throws Exception {\r\n    JobConf conf = new JobConf(getConf(), WordCount.class);\r\n    conf.setJobName(&amp;amp;amp;quot;wordcount&amp;amp;amp;quot;);\r\n \r\n    \/\/ the keys are words (strings)\r\n    conf.setOutputKeyClass(Text.class);\r\n    \/\/ the values are counts (ints)\r\n    conf.setOutputValueClass(IntWritable.class);\r\n    \r\n    conf.setMapperClass(MapClass.class);        \r\n    conf.setCombinerClass(Reduce.class);\r\n    conf.setReducerClass(Reduce.class);\r\n    \r\n    FileInputFormat.setInputPaths(conf, new Path(args[0]));\r\n    FileOutputFormat.setOutputPath(conf, new Path(args[1]));\r\n        \r\n    JobClient.runJob(conf);\r\n  }\r\n}\r\n<\/pre>\n<p>\u81ea\u5206\u306fJava\u306b\u99b4\u67d3\u307f\u306a\u3044\u306e\u3060\u304c\u3001\u77ed\u3044\u30d7\u30ed\u30b0\u30e9\u30e0\u306a\u306e\u3067\u7406\u89e3\u3059\u308b\u306f\u96e3\u3057\u304f\u306a\u3044\u3002<code>MapClass<\/code>\u30af\u30e9\u30b9\u306e<code>map()<\/code>\u95a2\u6570\u3067\u3001\u5165\u529b\u30d5\u30a1\u30a4\u30eb\u306e\u5404\u884c\u3092<code>StringTokenizer<\/code>\u3067\u5358\u8a9e\u306b\u533a\u5207\u308a\u3001&lt;\u5358\u8a9e, 1&gt;\u3068\u3044\u3046\u30da\u30a2\u3092\u51fa\u529b\u3059\u308b\uff08Map\u51e6\u7406\uff09\u3002\u6b21\u306b<code>Reduce<\/code>\u30af\u30e9\u30b9\u306e<code>reduce()<\/code>\u95a2\u6570\u3067\u3001<code>MapClass<\/code>\u306e\u51fa\u529b\u3057\u305f\u30da\u30a2\u306e\u5024\u306b\u5bfe\u3057\u3066\u7dcf\u8a08\u3092\u7b97\u51fa\u3059\u308b\uff08Reduce\u51e6\u7406\uff09\u3002<code>main()<\/code>\u95a2\u6570\u3067\u306f\u3001<code>JobConf<\/code>\u3092\u30bb\u30c3\u30c8\u30a2\u30c3\u30d7\u3057\u3066\u3001\u30b3\u30de\u30f3\u30c9\u30e9\u30a4\u30f3\u5f15\u6570\u3067\u6e21\u3055\u308c\u305f\u5165\u51fa\u529b\u5148\u30c7\u30a3\u30ec\u30af\u30c8\u30ea\u3092\u6307\u5b9a\u3057\u3066\u3044\u308b\u3002<\/p>\n<p>Hadoop\u306eMap\/Reduce\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u306e<a href=\"http:\/\/oss.infoscience.co.jp\/hadoop\/common\/docs\/current\/mapred_tutorial.html#Walk-through\" class=\"broken_link\">\u30a6\u30a9\u30fc\u30af\u30b9\u30eb\u30fc\u306e\u7bc0<\/a>\u306b\u3001\u8a73\u3057\u3044\u8aac\u660e\u304c\u65e5\u672c\u8a9e\u3067\u66f8\u304b\u308c\u3066\u3044\u308b\u306e\u3067\u53c2\u8003\u306b\u3059\u308b\u3068\u3044\u3044\u3060\u308d\u3046\u3002<\/p>\n<p><code>WordCount.java<\/code>\u3092\u30b3\u30f3\u30d1\u30a4\u30eb\u3057\u3066\u3001\u30af\u30e9\u30b9\u30b3\u30fc\u30c9\u3092jar\u30d5\u30a1\u30a4\u30eb\u306b\u30d1\u30c3\u30b1\u30fc\u30b8\u30f3\u30b0\u3059\u308b\u3002<\/p>\n<pre>\ncluster# cd ~\/wordcount\ncluster# mkdir classes\ncluster# javac -classpath $HADOOP_HOME\/hadoop-0.19.0-core.jar -d classes WordCount.java\ncluster# jar -cf wordcount.jar -C classes .\ncluster# tree\n.\n|-- WordCount.java\n|-- classes\n|   `-- org\n|       `-- apache\n|           `-- hadoop\n|               `-- examples\n|                   |-- WordCount$MapClass.class\n|                   |-- WordCount$Reduce.class\n|                   `-- WordCount.class\n`-- wordcount.jar\n5 directories, 5 files\n<\/pre>\n<h3>\u5206\u6563\u30d5\u30a1\u30a4\u30eb\u30b7\u30b9\u30c6\u30e0\uff08HDFS\uff09\u306e\u69cb\u7bc9<\/h3>\n<p>Hadoop\u306e\u5165\u51fa\u529b\u30d5\u30a1\u30a4\u30eb\u306f\u3001\u30ed\u30fc\u30ab\u30eb\u306e\u30d5\u30a1\u30a4\u30eb\u30b7\u30b9\u30c6\u30e0\u3068\u306f\u7570\u306a\u308b\u3001Hadoop\u5c02\u7528\u306e\u5206\u6563\u30d5\u30a1\u30a4\u30eb\u30b7\u30b9\u30c6\u30e0\uff08HDFS: Hadoop Distributed File System\uff09\u3092\u4ecb\u3057\u3066\u914d\u7f6e\u3057\u306a\u3051\u308c\u3070\u306a\u3089\u306a\u3044\u3002Hadoop\u30af\u30e9\u30b9\u30bf\u3067\u306f\u30c7\u30d5\u30a9\u30eb\u30c8\u3067HDFS\u304c\u30de\u30a6\u30f3\u30c8\u3055\u308c\u3066\u3044\u308b\u306e\u3067\u3001\u30d5\u30a1\u30a4\u30eb\u30b7\u30b9\u30c6\u30e0\u81ea\u4f53\u306e\u30bb\u30c3\u30c8\u30a2\u30c3\u30d7\u306f\u4e0d\u8981\u3060\u3002HDFS\u5185\u306e\u30d5\u30a1\u30a4\u30eb\u3084\u30c7\u30a3\u30ec\u30af\u30c8\u30ea\u3092\u64cd\u4f5c\u3059\u308b\u306b\u306f<code>hadoop dfs<\/code>\u30b3\u30de\u30f3\u30c9\u3092\u4f7f\u3046\uff08\u53c2\u8003\uff1a<a href=\"http:\/\/www.mwsoft.jp\/programming\/hadoop\/hadoop_hdfs_command.html\" class=\"broken_link\">Hadoop HDFS\u30b3\u30de\u30f3\u30c9\u5b9f\u884c\u30e1\u30e2<\/a>\uff09\u3002<\/p>\n<p>\u307e\u305a\u3001WordCount\u306e\u5165\u529b\u30c7\u30a3\u30ec\u30af\u30c8\u30ea\u3092HDFS\u5185\u306b\u4f5c\u6210\u3059\u308b\u3002<\/p>\n<pre>\ncluster# hadoop dfs -mkdir input\ncluster# hadoop dfs -ls\nFound 1 items\ndrwxr-xr-x   - root supergroup          0 2012-01-24 18:33 \/user\/root\/input\n<\/pre>\n<p>Hadoop\u306e\u30de\u30b9\u30bf\u30fc\u30ce\u30fc\u30c9\u306broot\u3068\u3057\u3066\u30ed\u30b0\u30a4\u30f3\u3057\u3066\u3044\u308b\u306e\u3067\u3001<code>\/user\/root<\/code>\u304cHDFS\u5185\u3067\u306e\u30db\u30fc\u30e0\u30c7\u30a3\u30ec\u30af\u30c8\u30ea\u306b\u306a\u3063\u3066\u3044\u308b\u3002\u8a66\u3057\u306b\u30ed\u30fc\u30ab\u30eb\u30d5\u30a1\u30a4\u30eb\u30b7\u30b9\u30c6\u30e0\u306b\u9069\u5f53\u306a\u30d5\u30a1\u30a4\u30eb\u3092\u4f5c\u6210\u3057\u3066\u3001\u305d\u308c\u3092HDFS\u5185\u306e<code>input<\/code>\u30c7\u30a3\u30ec\u30af\u30c8\u30ea\u306b\u30b3\u30d4\u30fc\u3057\u3066\u307f\u3088\u3046\u3002<\/p>\n<pre>\ncluster# echo Hello > hello.txt\ncluster# hadoop dfs -put hello.txt input\/\ncluster# hadoop dfs -lsr\ndrwxr-xr-x   - root supergroup          0 2012-01-24 18:38 \/user\/root\/input\n-rw-r--r--   3 root supergroup          6 2012-01-24 18:38 \/user\/root\/input\/hello.txt\ncluster# hadoop dfs -cat input\/hello.txt\nHello\ncluster# hadoop dfs -rm input\/hello.txt\ncluster# rm -f hello.txt\n<\/pre>\n<p>\u3067\u306f\u5b9f\u969b\u306bWordCount\u306e\u30a2\u30d7\u30ea\u30b1\u30fc\u30b7\u30e7\u30f3\u3067\u51e6\u7406\u3055\u305b\u308b\u5165\u529b\u30d5\u30a1\u30a4\u30eb\u3092\u7528\u610f\u3057\u3088\u3046\u3002\u305d\u308c\u306a\u308a\u306e\u30dc\u30ea\u30e5\u30fc\u30e0\u306e\u5165\u529b\u304c\u3042\u3063\u305f\u65b9\u304c\u3044\u3044\u306e\u3067\u3001RFC\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u3092\u30ed\u30fc\u30ab\u30eb\u306e\u30d5\u30a1\u30a4\u30eb\u30b7\u30b9\u30c6\u30e0\u4e0a\u306b\u5de1\u56de\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9\u3059\u308b\u30b9\u30af\u30ea\u30d7\u30c8\u3092\u4f5c\u3063\u3066\u307f\u305f\u3002<\/p>\n<pre>\ncluster# cd ~\/wordcount\ncluster# vi wgetrfc\n<\/pre>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\n#!\/bin\/bash\r\n\r\nINDEX=1\r\nCOUNTER=0\r\nBASEURL=http:\/\/www.ietf.org\/rfc\r\nOUTDIR=$PWD\/input\r\n\r\nrm -rf $OUTDIR\r\nmkdir -p $OUTDIR\r\n\r\nwhile test $COUNTER -lt $1; do\r\n  FILENAME=rfc$INDEX\r\n  if wget -q -O $OUTDIR\/$FILENAME $BASEURL\/$FILENAME; then\r\n    COUNTER=`expr $COUNTER + 1`\r\n    echo &amp;amp;amp;quot;$FILENAME OK&amp;amp;amp;quot;\r\n  else\r\n    rm -rf $OUTDIR\/$FILENAME\r\n  fi\r\n  INDEX=`expr $INDEX + 1`\r\ndone\r\n<\/pre>\n<p>\u3068\u308a\u3042\u3048\u305a100\u500b\u306eRFC\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u3092\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9\u3057\u3066\u304a\u3053\u3046\u3002<\/p>\n<pre>\ncluster# chmod +x wgetrfc\ncluster# .\/wgetrc 100\nrfc1 OK\nrfc2 OK\n...\nrfc103 OK\ncluster# cat input\/* | wc\n  32907  157746 1162188\n<\/pre>\n<p>3\u3064\u304f\u3089\u3044\u6b20\u756a\u304c\u3042\u308b\u3088\u3046\u3060\u304c\u30011MB\u307b\u3069\u306e\u5165\u529b\u30d5\u30a1\u30a4\u30eb\u304c\u3067\u304d\u305f\u306e\u3067HDFS\u306b\u30b3\u30d4\u30fc\u3059\u308b\u3002<\/p>\n<pre>\ncluster# hadoop dfs -put input\/* input\ncluster# hadoop dfs -ls input\nFound 100 items\n-rw-r--r--   3 root supergroup      21088 2012-01-24 19:22 \/user\/root\/input\/rfc1.txt\n-rw-r--r--   3 root supergroup       4510 2012-01-24 19:22 \/user\/root\/input\/rfc10.txt\n...\n-rw-r--r--   3 root supergroup      24529 2012-01-24 19:22 \/user\/root\/input\/rfc98.txt\n-rw-r--r--   3 root supergroup       1010 2012-01-24 19:22 \/user\/root\/input\/rfc99.txt\n<\/pre>\n<h3>WordCount\u306e\u5b9f\u884c<\/h3>\n<p>\u3053\u308c\u3067WordCount\u3092\u5b9f\u884c\u3059\u308b\u6e96\u5099\u304c\u6574\u3063\u305f\u3002\u4ee5\u4e0b\u306e\u30b3\u30de\u30f3\u30c9\u3067Hadoop\u30af\u30e9\u30b9\u30bf\u306b\u51e6\u7406\u3092\u3055\u305b\u3066\u307f\u3088\u3046\u3002WordCount\u306e\u7b2c1\u5f15\u6570\u306f\u5165\u529b\u30d5\u30a1\u30a4\u30eb\u7fa4\u304c\u7f6e\u304b\u308c\u3066\u3044\u308bHDFS\u4e0a\u306e\u30c7\u30a3\u30ec\u30af\u30c8\u30ea\uff08<code>\/user\/root\/input<\/code>\uff09\u3001\u7b2c2\u5f15\u6570\u306f\u540c\u3058\u304fHDFS\u4e0a\u306e\u51fa\u529b\u5148\u306e\u30c7\u30a3\u30ec\u30af\u30c8\u30ea\uff08<code>\/user\/root\/output<\/code>\uff09\u3092\u6307\u5b9a\u3059\u308b\u3002\u51fa\u529b\u5148\u30c7\u30a3\u30ec\u30af\u30c8\u30ea\u306f\u81ea\u5206\u3067\u5148\u306b\u4f5c\u3063\u3066\u3057\u307e\u3046\u3068&#8221;Output directory already exists&#8221;\u3068\u3044\u3046<code>FileAlreadyExistsException<\/code>\u30a8\u30e9\u30fc\u304c\u767a\u751f\u3057\u3066\u3057\u307e\u3046\u3002Hadoop\u304c\u52dd\u624b\u306b\u751f\u6210\u3057\u3066\u304f\u308c\u308b\u306e\u3067\u304a\u81b3\u7acb\u3066\u306f\u4e0d\u8981\u3060\u3002<\/p>\n<pre>\ncluster# cd ~\/wordcount\ncluster# hadoop jar wordcount.jar org.apache.hadoop.examples.WordCount input output\n12\/01\/24 19:36:54 INFO mapred.FileInputFormat: Total input paths to process : 100\n12\/01\/24 19:36:54 INFO mapred.JobClient: Running job: job_201201241700_0001\n12\/01\/24 19:36:55 INFO mapred.JobClient:  map 0% reduce 0%\n12\/01\/24 19:37:03 INFO mapred.JobClient:  map 1% reduce 0%\n...\n12\/01\/24 19:40:51 INFO mapred.JobClient:     Map output records=157747\n12\/01\/24 19:40:51 INFO mapred.JobClient:     Reduce input records=56339\n<\/pre>\n<p>Map\/Reduce\u306e\u51e6\u7406\u7d50\u679c\u306f<code>output\/part-00000.deflate<\/code>\u3068\u3044\u3046\u30d5\u30a1\u30a4\u30eb\u306b\u51fa\u529b\u3055\u308c\u3066\u3044\u308b\u3002<\/p>\n<pre>\ncluster# hadoop dfs -ls output\nFound 2 items\ndrwxr-xr-x   - root supergroup          0 2012-01-24 19:36 \/user\/root\/output\/_logs\n-rw-r--r--   3 root supergroup      67154 2012-01-24 19:40 \/user\/root\/output\/part-00000.deflate\n<\/pre>\n<p><code>hadoop dfs -cat<\/code>\u30b3\u30de\u30f3\u30c9\u3067\u51fa\u529b\u30d5\u30a1\u30a4\u30eb\u306e\u5185\u5bb9\u3092\u6a19\u6e96\u51fa\u529b\u306b\u30c0\u30f3\u30d7\u3057\u3066\u307f\u308b\u3068\u308f\u304b\u308b\u304c\u3001\u4e2d\u8eab\u306f\u30d0\u30a4\u30ca\u30ea\u30c7\u30fc\u30bf\u306b\u306a\u3063\u3066\u3044\u308b\u3002\u62e1\u5f35\u5b50\u306e<code>.deflate<\/code>\u304b\u3089\u63a8\u6e2c\u3067\u304d\u308b\u3088\u3046\u306bDeflate\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u3067\u5727\u7e2e\u3055\u308c\u3066\u3044\u308b\u304b\u3089\u3060\u3002Hadoop\u3067\u306f\u30af\u30e9\u30b9\u30bf\u9593\u3067\u30c7\u30fc\u30bf\u3092\u8ee2\u9001\u3059\u308b\u969b\u306e\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u5e2f\u57df\u3092\u7bc0\u7d04\u3059\u308b\u305f\u3081\u306b\u3001\u30c7\u30d5\u30a9\u30eb\u30c8\u3067\u30c7\u30fc\u30bf\u3092\u5727\u7e2e\u3059\u308b\u3088\u3046\u306b\u306a\u3063\u3066\u3044\u308b\u3089\u3057\u3044\u3002<\/p>\n<p>Deflate\u306e\u5727\u7e2e\u3092\u89e3\u51cd\u3059\u308b\u305f\u3081\u306b\u3001unzip\u30b3\u30de\u30f3\u30c9\u3092\u4f7f\u3063\u3066\u307f\u305f\u308a\u3057\u305f\u304c\u3046\u307e\u304f\u3044\u304b\u306a\u304b\u3063\u305f\u306e\u3067\uff08\u8ab0\u304b\u77e5\u3063\u3066\u3044\u305f\u3089\u6559\u3048\u3066\u304f\u3060\u3055\u3044\uff09\u3001Hadoop\u306e\u8a2d\u5b9a\u30d5\u30a1\u30a4\u30eb\u3092\u5909\u66f4\u3057\u3066\u51fa\u529b\u30c7\u30fc\u30bf\u3092\u5727\u7e2e\u3057\u306a\u3044\u3088\u3046\u306b\u3059\u308b\u3002<code>$HADOOP_HOME\/conf\/hadoop-site.xml<\/code>\u3092\u30a8\u30c7\u30a3\u30bf\u3067\u958b\u3044\u3066\u3001<code>mapred.output.compress<\/code>\u306e\u5024\u3092<code>true<\/code>\u304b\u3089<code>false<\/code>\u306b\u5909\u66f4\u3059\u308b\u3002<\/p>\n<pre>\ncluster# vi $HADOOP_HOME\/conf\/hadoop-site.xml\n<\/pre>\n<pre class=\"brush: xml; first-line: 36; title: ; notranslate\" title=\"\">\r\n&amp;lt;property&amp;gt;\r\n  &amp;lt;name&amp;gt;mapred.output.compress&amp;lt;\/name&amp;gt;\r\n  &amp;lt;value&amp;gt;false&amp;lt;\/value&amp;gt;\r\n&amp;lt;\/property&amp;gt;\r\n<\/pre>\n<p>\u51fa\u529b\u30c7\u30a3\u30ec\u30af\u30c8\u30ea\u3092\u524a\u9664\u3057\u3066\u3001WordCount\u3092\u518d\u5b9f\u884c\u3057\u3088\u3046\u3002<\/p>\n<pre>\ncluster# cd ~\/wordcount\ncluster# hadoop dfs -rmr output\ncluster# hadoop jar wordcount.jar org.apache.hadoop.examples.WordCount input output\ncluster# hadoop dfs -ls output\nFound 2 items\ndrwxr-xr-x   - root supergroup          0 2012-01-24 20:01 \/user\/root\/output\/_logs\n-rw-r--r--   3 root supergroup     186137 2012-01-24 20:04 \/user\/root\/output\/part-00000\n<\/pre>\n<p>\u4eca\u5ea6\u306f<code>.deflate<\/code>\u3068\u3044\u3046\u62e1\u5f35\u5b50\u306f\u4ed8\u3044\u3066\u3044\u306a\u3044\u306e\u3067\u3001\u5727\u7e2e\u3055\u308c\u3066\u3044\u306a\u3044\u3053\u3068\u304c\u308f\u304b\u308b\u3002\u5b9f\u884c\u7d50\u679c\u3092\u6a19\u6e96\u51fa\u529b\u306b\u8868\u793a\u3055\u305b\u3066\u307f\u3088\u3046\u3002<\/p>\n<pre>\ncluster# hadoop dfs -cat output\/part-00000\n!   13\n!=, 1\n\"   22\n\"#33,   1\n\"--\"    1\n...\n||=======>| 1\n||=>-   1\n}   6\n~   4\n\ufffdsec    1\n<\/pre>\n<p>\u8a18\u53f7\u3070\u304b\u308a\u3060\u304c\u30c6\u30ad\u30b9\u30c8\u3068\u3057\u3066\u8aad\u3081\u308b\u72b6\u614b\u306b\u306f\u306a\u3063\u3066\u3044\u308b\u3002HDFS\u4e0a\u306e\u30d5\u30a1\u30a4\u30eb\u3092\u30ed\u30fc\u30ab\u30eb\u306e\u30d5\u30a1\u30a4\u30eb\u30b7\u30b9\u30c6\u30e0\u306b\u53d6\u308a\u51fa\u3059\u5834\u5408\u306f<\/p>\n<pre>\ncluster# hadoop dfs -get output\/part-00000 .\n<\/pre>\n<p>\u3092\u5b9f\u884c\u3059\u308c\u3070\u3088\u3044\u3002\u30a8\u30c7\u30a3\u30bf\u3067\u30d5\u30a1\u30a4\u30eb\u306e\u4e2d\u8eab\u3092\u898b\u3066\u307f\u308b\u3068\u3044\u3044\u3060\u308d\u3046\u3002<\/p>\n<p>WordCount\u306f\u5358\u8a9e\u306e\u51fa\u73fe\u56de\u6570\u3092\u6570\u3048\u308b\u3060\u3051\u306e\u3064\u307e\u3089\u306a\u3044\u30d7\u30ed\u30b0\u30e9\u30e0\u3060\u304c\u3001Map\/Reduce\u306e\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u3084HDFS\u306e\u64cd\u4f5c\u65b9\u6cd5\u3092\u5b66\u3076\u306b\u306f\u5341\u5206\u306a\u30b5\u30f3\u30d7\u30eb\u3060\u3002<\/p>\n<p><code>$HADOOP_HOME\/src\/examples\/org\/apache\/hadoop\/examples\/<\/code>\u306b\u306f\u4ed6\u306e\u30b5\u30f3\u30d7\u30eb\u3082\u3042\u308b\u306e\u3067\u3001\u3069\u306e\u3088\u3046\u306bMap\/Reduce\u306e\u51e6\u7406\u3092\u3055\u305b\u308b\u304b\u53c2\u8003\u306b\u3059\u308b\u3068\u3044\u3044\u3060\u308d\u3046\u3002\u30b5\u30f3\u30d7\u30eb\u30bd\u30fc\u30b9\u3092\u8aad\u3080\u3060\u3051\u306a\u3089\u3001\u308f\u3056\u308f\u3056Hadoop\u30af\u30e9\u30b9\u30bf\u306e\u30de\u30b9\u30bf\u30fc\u30ce\u30fc\u30c9\u306b\u30ed\u30b0\u30a4\u30f3\u3057\u306a\u304f\u3066\u3082\u3001<a href=\"http:\/\/www.gtlib.gatech.edu\/pub\/apache\/hadoop\/core\/\">Hadoop\u306e\u30bd\u30fc\u30b9\u3092\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9<\/a>\u3059\u308c\u3070\u30ed\u30fc\u30ab\u30eb\u3067\u5b66\u7fd2\u3067\u304d\u308b\u3002<\/p>\n<p>\u4ee5\u4e0b\u3001\u30ae\u30fc\u30af\u30b5\u30ed\u30f3\u3092\u30db\u30b9\u30c8\u3057\u3066\u304f\u3060\u3055\u3063\u305f\u5c71\u4e2d\u6c0f\u3088\u308a\u30b3\u30e1\u30f3\u30c8\u3002<\/p>\n<blockquote><p>\n\u79c1\u306f\u4ee5\u524d\u304b\u3089\u753b\u50cf\u51e6\u7406\u306a\u3069\u5404\u7a2e\u79d1\u5b66\u6280\u8853\u8a08\u7b97\u7528\u306ePC\u30af\u30e9\u30b9\u30bf\u3092\u958b\u767a\u3057\u3066\u304d\u307e\u3057\u305f\u3002\u30aa\u30ea\u30b8\u30ca\u30eb\u3067\u5206\u6563PC\u30af\u30e9\u30b9\u30bf\u3092\u69cb\u7bc9\u3057\u3088\u3046\u3068\u3059\u308b\u3068\u3001\u975e\u5e38\u306b\u6ca2\u5c71\u306e\u6a5f\u80fd\u304c\u5fc5\u8981\u306b\u306a\u308a\u307e\u3059\u3002\u4f8b\u3048\u3070\u5b9f\u884c\u30d5\u30a1\u30a4\u30eb\u3084\u5165\u529b\u30c7\u30fc\u30bf\u3092\u5404\u30ef\u30fc\u30af\u30b9\u30c6\u30fc\u30b7\u30e7\u30f3\u306b\u914d\u7f6e\u3059\u308b\u6a5f\u80fd\u3084\u4e2d\u9593\u5b9f\u884c\u7d50\u679c\u3092\u53d6\u308a\u51fa\u3057\u3066\u4ed6\u306e\u30af\u30e9\u30b9\u30bf\u306b\u914d\u4fe1\u3059\u308b\u6a5f\u80fd\u3001\u30ea\u30e2\u30fc\u30c8\u3067\u30d7\u30ed\u30bb\u30b9\u306e\u5b9f\u884c\u3092\u7ba1\u7406\u3057\u305f\u308a\u767a\u751f\u3057\u305f\u30a8\u30e9\u30fc\u3092\u30cf\u30f3\u30c9\u30ea\u30f3\u30b0\u3059\u308b\u6a5f\u80fd\u3082\u5fc5\u8981\u3067\u3059\u3002\u3053\u308c\u3089\u3092\u5168\u3066\u63d0\u4f9b\u3057\u3001\u304b\u3064Java\u306e\u30af\u30e9\u30b9\u30d5\u30a1\u30a4\u30eb\u304b\u3089\u67d4\u8edf\u306b\u8a08\u7b97\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af\u3092\u69cb\u7bc9\u3067\u304d\u308b\u306e\u304cHadoop\u306a\u306e\u3067\u3059\u3002\n<\/p><\/blockquote>\n<h3>Amazon S3\u30b9\u30c8\u30ec\u30fc\u30b8\u3092HDFS\u3068\u3057\u3066\u4f7f\u3046<\/h3>\n<p>\u3053\u3053\u307e\u3067\u306f\u30de\u30b9\u30bf\u30fc\u30ce\u30fc\u30c9\u306e\u30ed\u30fc\u30ab\u30eb\u306b\u30de\u30a6\u30f3\u30c8\u3055\u308c\u305f\u30c7\u30d5\u30a9\u30eb\u30c8\u306eHDFS\u3092\u4f7f\u3063\u3066\u304d\u305f\u304c\u3001\u4eca\u5ea6\u306f<a href=\"https:\/\/aws.amazon.com\/s3\/\">Amazon S3<\/a>\u306bHDFS\u3092\u30de\u30a6\u30f3\u30c8\u3059\u308b\u3088\u3046\u306b\u3057\u3066\u307f\u308b\u3002S3\u3092HDFS\u306e\u30b9\u30c8\u30ec\u30fc\u30b8\u3068\u3057\u3066\u5229\u7528\u3059\u308b\u3053\u3068\u3067\u3001\u591a\u6570\u306eHadoop\u30af\u30e9\u30b9\u30bf\u304b\u3089\u5165\u51fa\u529b\u5148\u3092\u76f8\u4e92\u306b\u8a2d\u5b9a\u3067\u304d\u308b\u3088\u3046\u306b\u306a\u308b\u3002\u4f8b\u3048\u3070\u30011\u3064\u306eHadoop\u30af\u30e9\u30b9\u30bf\u304c\u51fa\u529b\u3057\u305f\u30c7\u30fc\u30bf\u3092S3\u4e0a\u306b\u4fdd\u5b58\u3057\u3066\u304a\u304d\u3001\u5225\u306eHadoop\u30af\u30e9\u30b9\u30bf\u304c\u305d\u308c\u3092\u5165\u529b\u30c7\u30fc\u30bf\u3068\u3057\u3066\u5229\u7528\u3059\u308b\u3068\u3044\u3063\u305f\u3053\u3068\u304c\u53ef\u80fd\u306b\u306a\u308b\u3002Hadoop\u306f\u30af\u30e9\u30a6\u30c9\u74b0\u5883\u3068\u975e\u5e38\u306b\u76f8\u6027\u304c\u3044\u3044\u3002<\/p>\n<p>\u30ed\u30fc\u30ab\u30eb\u306b\u30de\u30a6\u30f3\u30c8\u3055\u308c\u305fHDFS\u306f\u3001EC2\u4e0a\u3067Hadoop\u30af\u30e9\u30b9\u30bf\u3092\u7a3c\u50cd\u3055\u305b\u3066\u3044\u308b\u9593\u3067\u306a\u3051\u308c\u3070\u3001\u5165\u51fa\u529b\u30c7\u30fc\u30bf\u3092\u53d6\u308a\u51fa\u3059\u3053\u3068\u304c\u3067\u304d\u306a\u3044\u3068\u3044\u3046\u6b20\u70b9\u304c\u3042\u308b\u3002EC2 API Tools\u306e<code>hadoop-ec2 launch-cluster<\/code>\u30b3\u30de\u30f3\u30c9\u306b\u3088\u3063\u3066\u8d77\u52d5\u3055\u308c\u308bHadoop\u30af\u30e9\u30b9\u30bf\u306f\u30eb\u30fc\u30c8\u30c7\u30d0\u30a4\u30b9\u306b\u30a4\u30f3\u30b9\u30bf\u30f3\u30b9\u30b9\u30c8\u30a2\u3092\u4f7f\u3063\u3066\u3044\u308b\uff08EBS\u3067\u306f\u306a\u3044\uff09\u305f\u3081\u3001<code>hadoop-ec2 terminate-cluster<\/code>\u30b3\u30de\u30f3\u30c9\u3067Hadoop\u30af\u30e9\u30b9\u30bf\u3092\u7d42\u4e86\u3059\u308b\u3068\u540c\u6642\u306b\u3001HDFS\u306b\u4fdd\u5b58\u3055\u308c\u3066\u3044\u305f\u30c7\u30fc\u30bf\u3082\u6d88\u5931\u3059\u308b\u3002EC2\u4e0a\u3067Hadoop\u30af\u30e9\u30b9\u30bf\u3092\u69cb\u7bc9\u3059\u308b\u3068\u3001\u305f\u3068\u3048\u8a08\u7b97\u51e6\u7406\u3092\u3057\u3066\u3044\u306a\u304f\u3066\u3082\u3001\u30a4\u30f3\u30b9\u30bf\u30f3\u30b9\u3092\u7acb\u3061\u4e0a\u3052\u3066\u3044\u308b\u3060\u3051\u3067\u8ab2\u91d1\u3055\u308c\u3066\u3057\u307e\u3044\u3001\u6cb9\u65ad\u3059\u308b\u3068\u7c21\u5358\u306b\u30af\u30e9\u30a6\u30c9\u7834\u7523\u3057\u304b\u306d\u306a\u3044\u3002Hadoop\u306e\u51fa\u529b\u30c7\u30fc\u30bf\u3092S3\u4e0a\u306b\u4fdd\u5b58\u3067\u304d\u308c\u3070\u3001Hadoop\u30af\u30e9\u30b9\u30bf\u304c\u4e0d\u8981\u306b\u306a\u3063\u305f\u3089\u5373\u30a4\u30f3\u30b9\u30bf\u30f3\u30b9\u3092\u505c\u6b62\u3001\u307e\u305f\u306f\u7d42\u4e86\u3055\u305b\u308b\u3053\u3068\u304c\u3067\u304d\u308b\u306e\u3067\u3001\u305d\u3046\u3044\u3063\u305f\u9762\u3067\u3082S3\u3092\u30b9\u30c8\u30ec\u30fc\u30b8\u3068\u3057\u3066\u4f7f\u3046\u30e1\u30ea\u30c3\u30c8\u306f\u5927\u304d\u3044\u3060\u308d\u3046\u3002<\/p>\n<p>\u305d\u308c\u3067\u306f\u307e\u305a\u3001Amazon S3\u306e\u30bb\u30c3\u30c8\u30a2\u30c3\u30d7\u3092\u3057\u3066\u304a\u3053\u3046\u3002<\/p>\n<p>AWS\u306e\u7ba1\u7406\u30b3\u30f3\u30bd\u30fc\u30eb\u306b\u30ed\u30b0\u30a4\u30f3\u3057\u3066[Amazon S3]\u306e\u30bf\u30d6\u3092\u958b\u304d\u3001\u5de6\u4e0a\u306b\u3042\u308b[Create Bucket]\u30dc\u30bf\u30f3\u3092\u30af\u30ea\u30c3\u30af\u3059\u308b\u3002<br \/>\n<img loading=\"lazy\" width=\"464\" height=\"211\" src=\"http:\/\/h2plus.biz\/hiromitsu\/wp-content\/uploads\/2012\/01\/CreateBucket.jpg\" alt=\"\" title=\"CreateBucket\" class=\"aligncenter size-full wp-image-359\" srcset=\"https:\/\/h2plus.biz\/hiromitsu\/wp-content\/uploads\/2012\/01\/CreateBucket.jpg 464w, https:\/\/h2plus.biz\/hiromitsu\/wp-content\/uploads\/2012\/01\/CreateBucket-300x136.jpg 300w\" sizes=\"(max-width: 464px) 100vw, 464px\" \/><\/p>\n<p>\u30c0\u30a4\u30a2\u30ed\u30b0\u304c\u8868\u793a\u3055\u308c\u305f\u3089\u3001[Bucket Name]\u306bS3\u30d0\u30b1\u30c3\u30c8\u540d\u3092\u5165\u529b\u3059\u308b\u3002<br \/>\n<img loading=\"lazy\" width=\"527\" height=\"339\" src=\"http:\/\/h2plus.biz\/hiromitsu\/wp-content\/uploads\/2012\/01\/CreateBucketDialog.jpg\" alt=\"\" title=\"CreateBucketDialog\" class=\"aligncenter size-full wp-image-360\" srcset=\"https:\/\/h2plus.biz\/hiromitsu\/wp-content\/uploads\/2012\/01\/CreateBucketDialog.jpg 527w, https:\/\/h2plus.biz\/hiromitsu\/wp-content\/uploads\/2012\/01\/CreateBucketDialog-300x192.jpg 300w\" sizes=\"(max-width: 527px) 100vw, 527px\" \/><\/p>\n<p>\u306a\u304a\u30d0\u30b1\u30c3\u30c8\u540d\u306b\u306f\u3001Amazon S3\u3092\u4f7f\u7528\u3057\u3066\u3044\u308b\u3059\u3079\u3066\u306e\u30e6\u30fc\u30b6\u30fc\u9593\u3067\u4e00\u610f\u306a\u3082\u306e\u3092\u9078\u3070\u306a\u3051\u308c\u3070\u306a\u3089\u306a\u3044\u3002\u3053\u3053\u3067\u8a2d\u5b9a\u3057\u3066\u3044\u308b\u300c<code>h2plus-hadoop-hdfs<\/code>\u300d\u3068\u540c\u3058\u30d0\u30b1\u30c3\u30c8\u540d\u306f\u3082\u3046\u4f7f\u3048\u306a\u3044\u306e\u3067\u3001\u30aa\u30ea\u30b8\u30ca\u30eb\u306e\u30d0\u30b1\u30c3\u30c8\u540d\u3092\u4ed8\u3051\u308b\u3088\u3046\u306b\u3057\u3088\u3046\u3002\u30c0\u30a4\u30a2\u30ed\u30b0\u3067[Create]\u30dc\u30bf\u30f3\u3092\u30af\u30ea\u30c3\u30af\u3059\u308b\u3068S3\u30d0\u30b1\u30c3\u30c8\u304c\u4f5c\u6210\u3055\u308c\u308b\u3002<\/p>\n<p><a href=\"http:\/\/h2plus.biz\/hiromitsu\/wp-content\/uploads\/2012\/01\/S3Console.jpg\"><img loading=\"lazy\" width=\"968\" height=\"568\" src=\"http:\/\/h2plus.biz\/hiromitsu\/wp-content\/uploads\/2012\/01\/S3Console.jpg\" alt=\"\" title=\"S3Console\" class=\"aligncenter size-full wp-image-361\" srcset=\"https:\/\/h2plus.biz\/hiromitsu\/wp-content\/uploads\/2012\/01\/S3Console.jpg 968w, https:\/\/h2plus.biz\/hiromitsu\/wp-content\/uploads\/2012\/01\/S3Console-300x176.jpg 300w\" sizes=\"(max-width: 968px) 100vw, 968px\" \/><\/a><\/p>\n<p>\u6b21\u306bHadoop\u306e\u30de\u30b9\u30bf\u30fc\u30ce\u30fc\u30c9\u4e0a\u3067<code>$HADOOP_HOME\/conf\/hadoop-site.xml<\/code>\u3092\u7de8\u96c6\u3059\u308b\u3002<\/p>\n<pre class=\"brush: xml; first-line: 11; title: ; notranslate\" title=\"\">\r\n&amp;lt;property&amp;gt;\r\n  &amp;lt;name&amp;gt;fs.default.name&amp;lt;\/name&amp;gt;\r\n  &amp;lt;value&amp;gt;s3:\/\/h2plus-hadoop-hdfs&amp;lt;\/value&amp;gt;\r\n&amp;lt;\/property&amp;gt;\r\n\r\n&amp;lt;property&amp;gt;\r\n  &amp;lt;name&amp;gt;fs.s3.awsAccessKeyId&amp;lt;\/name&amp;gt;\r\n  &amp;lt;value&amp;gt;Your Access key ID&amp;lt;\/value&amp;gt;\r\n&amp;lt;\/property&amp;gt;\r\n\r\n&amp;lt;property&amp;gt;\r\n  &amp;lt;name&amp;gt;fs.s3.awsSecretAccessKey&amp;lt;\/name&amp;gt;\r\n  &amp;lt;value&amp;gt;Your Secret Access Key&amp;lt;\/value&amp;gt;\r\n&amp;lt;\/property&amp;gt;\r\n<\/pre>\n<p><code>fs.default.name<\/code>\u3092\u5148\u307b\u3069\u4f5c\u6210\u3057\u305fS3\u30d0\u30b1\u30c3\u30c8\u3078\u306eURL\u306b\u5909\u66f4\u3057\u3066\u3001\u5f8c\u7d9a\u306e\u884c\u306b<code>fs.s3.awsAccessKeyId<\/code>\u3068<code>fs.s3.awsSecretAccessKey<\/code>\u3068\u3044\u3046\u30d7\u30ed\u30d1\u30c6\u30a3\u3092\u8ffd\u52a0\u3059\u308b\u3002<code>fs.s3.awsAccessKeyId<\/code>\u3068<code>fs.s3.awsSecretAccessKey<\/code>\u306e\u5024\u306f\u3001<a href=\"http:\/\/h2plus.biz\/hiromitsu\/entry\/267\">\u524d\u7de8<\/a>\u3067<code>hadoop-ec2-env.sh<\/code>\u306b\u8a2d\u5b9a\u3057\u305f\u3082\u306e\u3068\u540c\u3058\u3067\u3001AWS\u306eSecurity Credentials\u30da\u30fc\u30b8\u306eAccess Credentials\u306b\u8868\u793a\u3055\u308c\u3066\u3044\u305f\u3082\u306e\u3060\u3002<br \/>\n<a href=\"http:\/\/h2plus.biz\/hiromitsu\/wp-content\/uploads\/2012\/01\/AccessKey.jpg\"><img alt=\"\" src=\"http:\/\/h2plus.biz\/hiromitsu\/wp-content\/uploads\/2012\/01\/AccessKey.jpg\" title=\"AccessKey\" class=\"aligncenter\" width=\"610\" \/><\/a><\/p>\n<p>\u3053\u308c\u3067HDFS\u304cS3\u4e0a\u306b\u30de\u30a6\u30f3\u30c8\u3055\u308c\u308b\u3088\u3046\u306b\u306a\u3063\u305f\u3002<\/p>\n<p>\u307e\u305a\u3001WordCount\u7528\u306e\u5165\u529b\u30c7\u30fc\u30bf\u3092HDFS\u306b\u8ee2\u9001\u3059\u308b\u3002AWS\u306e\u7ba1\u7406\u30b3\u30f3\u30bd\u30fc\u30eb\u306b\u306fS3\u30d0\u30b1\u30c3\u30c8\u306b\u30d5\u30a1\u30a4\u30eb\u3092\u30a2\u30c3\u30d7\u30ed\u30fc\u30c9\u3059\u308b\u6a5f\u80fd\u304c\u3064\u3044\u3066\u3044\u308b\u304c\u3001HDFS\u306b\u306f\u5bfe\u5fdc\u3057\u3066\u3044\u306a\u3044\u305f\u3081<code>hadoop dfs<\/code>\u30b3\u30de\u30f3\u30c9\u3092\u4f7f\u3046\u5fc5\u8981\u304c\u3042\u308b\u3002<\/p>\n<pre>\ncluster# cd ~\/wordcount\ncluster# hadoop dfs -mkdir input\ncluster# hadoop dfs -put input\/* input\n<\/pre>\n<p>AWS\u306e\u7ba1\u7406\u30b3\u30f3\u30bd\u30fc\u30eb\u304b\u3089S3\u30d0\u30b1\u30c3\u30c8\u306e\u4e2d\u8eab\u3092\u898b\u3066\u307f\u308b\u3068\u3001\u306a\u306b\u3084\u3089\u30d5\u30a1\u30a4\u30eb\u304c\u305f\u304f\u3055\u3093\u4f5c\u3089\u308c\u3066\u3044\u308b\u306e\u304c\u308f\u304b\u308b\u3002<\/p>\n<p><a href=\"http:\/\/h2plus.biz\/hiromitsu\/wp-content\/uploads\/2012\/01\/HDFSBlocks.jpg\"><img loading=\"lazy\" width=\"1096\" height=\"568\" src=\"http:\/\/h2plus.biz\/hiromitsu\/wp-content\/uploads\/2012\/01\/HDFSBlocks.jpg\" alt=\"\" title=\"HDFSBlocks\" class=\"aligncenter size-full wp-image-365\" srcset=\"https:\/\/h2plus.biz\/hiromitsu\/wp-content\/uploads\/2012\/01\/HDFSBlocks.jpg 1096w, https:\/\/h2plus.biz\/hiromitsu\/wp-content\/uploads\/2012\/01\/HDFSBlocks-300x155.jpg 300w, https:\/\/h2plus.biz\/hiromitsu\/wp-content\/uploads\/2012\/01\/HDFSBlocks-1024x530.jpg 1024w\" sizes=\"(max-width: 1096px) 100vw, 1096px\" \/><\/a><\/p>\n<p>\u3067\u306f\u540c\u3058\u3088\u3046\u306bHadoop\u30af\u30e9\u30b9\u30bf\u306bWordCount\u3092\u51e6\u7406\u3055\u305b\u3066\u307f\u3088\u3046\u3002<\/p>\n<pre>\ncluster# hadoop jar wordcount.jar org.apache.hadoop.examples.WordCount input output\n12\/01\/24 20:34:08 INFO mapred.FileInputFormat: Total input paths to process : 100\n12\/01\/24 20:34:10 INFO mapred.JobClient: Running job: job_201201241846_0001\n12\/01\/24 20:34:11 INFO mapred.JobClient:  map 0% reduce 0%\n...\n12\/01\/24 20:37:27 INFO mapred.JobClient:     Combine input records=157747\n12\/01\/24 20:37:27 INFO mapred.JobClient:     Map output records=157747\n12\/01\/24 20:37:27 INFO mapred.JobClient:     Reduce input records=56339\ncluster# hadoop dfs -ls output\nFound 2 items\ndrwxrwxrwx   -          0 1969-12-31 19:00 \/user\/root\/output\/_logs\n-rwxrwxrwx   1     186137 1969-12-31 19:00 \/user\/root\/output\/part-00000\n<\/pre>\n<p>\u8a2d\u5b9a\u30d5\u30a1\u30a4\u30eb\uff08<code>hadoop-site.xml<\/code>\uff09\u3092\u5909\u66f4\u3059\u308b\u3060\u3051\u3067\u3001\u7c21\u5358\u306bS3\u306e\u30af\u30e9\u30a6\u30c9\u30b9\u30c8\u30ec\u30fc\u30b8\u4e0a\u306b\u5206\u6563\u30d5\u30a1\u30a4\u30eb\u30b7\u30b9\u30c6\u30e0\u3092\u69cb\u7bc9\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u305f\u3002<\/p>\n<p>Hadoop\u30af\u30e9\u30b9\u30bf\u306e\u30de\u30b9\u30bf\u30fc\u30ce\u30fc\u30c9\u304b\u3089\u30ed\u30b0\u30a2\u30a6\u30c8\u3057\u3066\u3001\u30af\u30e9\u30b9\u30bf\u7528\u306eEC2\u30a4\u30f3\u30b9\u30bf\u30f3\u30b9\u3092\u3059\u3079\u3066\u505c\u6b62\u3055\u305b\u305f\u5f8c\u3067\u3082\u3001\u30ed\u30fc\u30ab\u30eb\u74b0\u5883\u306bHadoop\u3092\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u3057\u3066\u304a\u3051\u3070\u3001\u5f8c\u304b\u3089\u30af\u30e9\u30b9\u30bf\u306e\u51fa\u529b\u3092\u53d6\u5f97\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u308b\u3002<\/p>\n<pre>\ncluster# exit\nubuntu$ hadoop-ec2 terminate-cluster  hadoop-cluster\nubuntu$ exit\nlocal$ hadoop dfs -ls output\nFound 2 items\ndrwxrwxrwx   -          0 1969-12-31 19:00 \/user\/root\/output\/_logs\n-rwxrwxrwx   1     186137 1969-12-31 19:00 \/user\/root\/output\/part-00000\nlocal$ hadoop dfs -get output\/part-00000\n<\/pre>\n<h3>\u611f\u60f3<\/h3>\n<p>\u30ae\u30fc\u30af\u30b5\u30ed\u30f3\u3067\u624b\u3092\u52d5\u304b\u3057\u3066Hadoop\u306b\u89e6\u3063\u305f\u306e\u306f\u3053\u3053\u307e\u3067\u3060\u3063\u305f\u3002\u3060\u3044\u305f\u304418\u6642\u304f\u3089\u3044\u304b\u3089\u4eba\u304c\u96c6\u307e\u308a\u3060\u3057\u3066\u300122\u6642\u8fd1\u304f\u307e\u3067\u300c\u3042\uff5e\u305d\u3046\u304b\uff01\u300d\u3068\u304b\u300c\u52d5\u304b\u306d\u3047orz\u300d\u3068\u304b\u3053\u307c\u3057\u306a\u304c\u3089\u3001\u53c2\u52a0\u8005\u5168\u54e1\u304c\u30e9\u30c3\u30d7\u30c8\u30c3\u30d7\u306b\u5411\u304b\u3063\u3066\u9ed9\u3005\u3068Hadoop\u3068\u622f\u308c\u3066\u3044\u305f\u3002<\/p>\n<p>\u4eca\u56de\u3001\u5c71\u4e2d\u6c0f\u304c\u6e96\u5099\u3057\u3066\u304f\u3060\u3055\u3063\u305f\u4e8b\u524d\u8cc7\u6599\u306f\u3001\u300cHadoop\u3063\u3066\u805e\u3044\u305f\u3053\u3068\u3042\u308b\u3051\u3069\u89e6\u3063\u305f\u3053\u3068\u306a\u3044\u300d\u3068\u3044\u3046\u53c2\u52a0\u8005\u3092\u60f3\u5b9a\u3057\u3066\u4f5c\u3089\u308c\u3066\u3044\u305f\u3053\u3068\u3082\u3042\u308a\u3001\u30a6\u30a3\u30b6\u30fc\u30c9\u5f62\u5f0f\u3067\u7740\u3005\u3068Hadoop\u306e\u74b0\u5883\u3092\u6574\u5099\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u3001\u30ae\u30fc\u30af\u30b5\u30ed\u30f3\u3067\u306fHadoop\u306e\u5206\u6563\u4e26\u5217\u51e6\u7406\u3092\u76ee\u306e\u524d\u3067\u4f53\u611f\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u305f\u3002<\/p>\n<p>\u5b9f\u306f\u30ae\u30fc\u30af\u30b5\u30ed\u30f3\u3067\u4e88\u5b9a\u3055\u308c\u3066\u3044\u305f\u5185\u5bb9\u306f\u3082\u3046\u30ef\u30f3\u30b9\u30c6\u30c3\u30d7\u5148\u304c\u3042\u308a\u30011GB\u307b\u3069\u306efacebook\u306e\u30e6\u30fc\u30b6\u30fc\u30c7\u30fc\u30bf\u3092\u89e3\u6790\u3059\u308b\u3068\u3044\u3046\u3001\u3088\u308a\u5b9f\u8df5\u7684\u306a\u30cf\u30c3\u30ab\u30bd\u30f3\u3068\u306a\u308b\u306f\u305a\u3067\u3001\u3053\u306e\u30d6\u30ed\u30b0\u30a8\u30f3\u30c8\u30ea\u306e\u30bd\u30fc\u30b9\u306b\u3082\u306a\u3063\u3066\u3044\u308b\u5c71\u4e2d\u6c0f\u306e\u8cc7\u6599\u3067\u3082\u7d9a\u304d\u304c\u3042\u308b\u306e\u3060\u304c\u3001\u3072\u3068\u307e\u305a\u306fHadoop\u306e\u5165\u9580\u3068\u3044\u3046\u3053\u3068\u3067\u3053\u3053\u3067\u533a\u5207\u308a\u305f\u3044\u3068\u601d\u3046\uff08\u4f59\u88d5\u304c\u3042\u3063\u305f\u3089\u5225\u30a8\u30f3\u30c8\u30ea\u7acb\u3066\u308b\u304b\u3082\uff09\u3002<\/p>\n<p>\u30d6\u30ed\u30b0\u3078\u306e\u8ee2\u8f09\u3092\u5feb\u8afe\u3057\u3066\u304f\u3060\u3055\u3063\u305f\u5c71\u4e2d\u4ec1\u6c0f\u306b\u3001\u3053\u306e\u5834\u3092\u501f\u308a\u3066\u5fa1\u793c\u7533\u3057\u4e0a\u3052\u307e\u3059\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>1\u670813\u65e5\uff08\u91d1\uff09\u306bPalo Alto\u3067\u884c\u308f\u308c\u305fJTPA\u306e\u30ae\u30fc\u30af\u30b5\u30ed\u30f3\u306b\u53c2\u52a0\u3057\u3066\u304d\u305f\u3002\u4eca\u56de\u306f\u53c2\u52a0\u8005\u304c\u30e9\u30c3\u30d7\u30c8\u30c3\u30d7\u6301\u3061\u8fbc\u307f\u3067\u30b3\u30fc\u30c7\u30a3\u30f3\u30b0\u3057\u3066\u3044\u304f\u30cf\u30c3\u30ab\u30bd\u30f3\u5f62\u5f0f\u3067\u3001\u4f1a\u5834\u5165\u308a\u3059\u308b\u524d\u307e\u3067\u306bHadoop\u304c\u4f7f\u3048\u308b\u74b0\u5883\u3092\u81ea\u524d\u3067\u7528\u610f\u3057\u3066\u304a <a href='https:\/\/h2plus.biz\/hiromitsu\/entry\/347' class='excerpt-more'>[&#8230;]<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[159,84,26],"tags":[143,141,144,138,163,162,161,139,142,160,140],"_links":{"self":[{"href":"https:\/\/h2plus.biz\/hiromitsu\/wp-json\/wp\/v2\/posts\/347"}],"collection":[{"href":"https:\/\/h2plus.biz\/hiromitsu\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/h2plus.biz\/hiromitsu\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/h2plus.biz\/hiromitsu\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/h2plus.biz\/hiromitsu\/wp-json\/wp\/v2\/comments?post=347"}],"version-history":[{"count":4,"href":"https:\/\/h2plus.biz\/hiromitsu\/wp-json\/wp\/v2\/posts\/347\/revisions"}],"predecessor-version":[{"id":956,"href":"https:\/\/h2plus.biz\/hiromitsu\/wp-json\/wp\/v2\/posts\/347\/revisions\/956"}],"wp:attachment":[{"href":"https:\/\/h2plus.biz\/hiromitsu\/wp-json\/wp\/v2\/media?parent=347"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/h2plus.biz\/hiromitsu\/wp-json\/wp\/v2\/categories?post=347"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/h2plus.biz\/hiromitsu\/wp-json\/wp\/v2\/tags?post=347"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}