本文是关于Hadoop2.7.7在虚拟机环境下安装的详细步骤…
前期准备
安装包下载
为节约时间,建议提前下载好安装包。
- hadoop版本:hadoop-2.7.7.tar.gz
- jdk版本:jdk-8u191-linux-x64.tar.gz
集群规划
本文集群环境基于滴滴研发云,其他同学可以自行搭建虚拟机环境。
节点概况
ip | host | name | system |
---|---|---|---|
master | 10.96.81.166 | jms-master-01 | centos7.2 [ CPU: 4 & 内存: 12G & 硬盘大小: 100G ] |
node | 10.96.113.243 | jms-master-02 | centos7.2 [ CPU: 4 & 内存: 12G & 硬盘大小: 100G ] |
node | 10.96.85.231 | jms-master-03 | centos7.2 [ CPU: 4 & 内存: 12G & 硬盘大小: 100G ] |
用户规范
统一登录用户组及用户为hadoop。
1 | [root@jms-master-01 ~] groupadd hadoop |
目录规范
软件安装目录:/home/hadoop/tools
安装包存放目录:/home/hadoop/tools/package
1 | [hadoop@jms-master-01 ~]$mkdir -p /home/hadoop/tools/package |
系统配置
hosts配置
需要注意的是,要注释掉127.0.0.1的host映射。每个节点都需要配置。
1 | [root@jms-master-01 ~]# vim /etc/hosts |
ssh免密
需要注意的是,本机也需要配置ssh免密。另外, 免密是和用户相关的,比如root下配置了免密,hadoop用户需要重新配置免密。每个节点都需要配置。
- 查看是否存在密钥
1
[hadoop@jms-master-01 ~]$ cat ~/.ssh/id_rsa.pub
- (如果不存在)生成密钥(一路回车即可)
1
[hadoop@jms-master-01 ~]$ ssh-keygen -t rsa
- 远程复制密钥
1
2
3
4
5
6
7
8
9[hadoop@jms-master-01 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.81.166
[hadoop@jms-master-01 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.113.243
[hadoop@jms-master-01 ~]$ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.85.231
[hadoop@jms-master-02 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.81.166
[hadoop@jms-master-02 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.113.243
[hadoop@jms-master-02 ~]$ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.85.231
[hadoop@jms-master-03 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.81.166
[hadoop@jms-master-03 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.113.243
[hadoop@jms-master-03 ~]$ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.85.231
jdk安装
jdk安装目录:
1 | [hadoop@jms-master-01 ~]$ mkdir -p /home/hadoop/tools/java |
上传安装包jdk-8u191-linux-x64.tar.gz至目录/home/hadoop/tools/package
1 | scp jdk-8u191-linux-x64.tar.gz hadoop@10.96.81.166:~/tools/package/ |
解压
1 | tar -xzvf jdk-8u191-linux-x64.tar.gz -C /home/hadoop/tools/java/ |
配置环境变量
1 | [root@jms-master-01 ~]# vim /etc/profile |
java home
1 | export JAVA_HOME=/home/hadoop/tools/java/jdk1.8.0_191 |
另一种比较优雅的环境变量配置可选用
1 | sudo vi /etc/profile.d/jdk-1.8.sh |
配置刷新(root和hadoop用户都刷新立即生效),并验证
1 | [root@jms-master-01 ~]# source /etc/profile |
我们只在master节点安装,其他节点直接scp复制即可。
复制jdk
1 | [hadoop@jms-master-01 ~]$ scp -r /home/hadoop/tools/java/jdk1.8.0_191 hadoop@10.96.113.243:/home/hadoop/tools/java/jdk1.8.0_191 |
复制配置文件
1 | [hadoop@jms-master-01 ~]$ scp /etc/profile root@10.96.113.243:/etc/profile |
最后别忘了在另外两个节点上验证jdk是否安装成功。
1 | [hadoop@jms-master-02 ~]$ java -version |
至此所有节点jdk安装成功。
关闭防火墙(研发云环境可以略过)
查看防火墙状态
1 | firewall-cmd —state |
关闭防火墙
1 | systemctl stop firewalld.service |
开启防火墙
1 | systemctl start firewalld.service |
禁止开机启动启动防火墙
1 | systemctl disable firewalld.service |
Hadoop安装
安装目录规划
项目 | 目录规划 | 备注 |
---|---|---|
hadoop安装目录: | /home/hadoop/tools/hadoop-2.7.7 | 建议统一创建软连接,规范配置 |
hadoop数据存放根目录: | /home/hadoop/tools/hadoop_data | 建议目录:/data |
hdfs-site.xml dfs.namenode.name.dir | /home/hadoop/tools/hadoop_data/hadoop/dfs/name | 建议目录: /data/hadoop/dfs/name |
hdfs-site.xml dfs.datanode.data.dir | /home/hadoop/tools/hadoop_data/hadoop/dfs/data | 建议目录: /data/hadoop/dfs/data |
core-site.xml hadoop.tmp.dir | /home/hadoop/tools/hadoop_temp | 建议目录:/temp |
其实数据目录最好安装在根目录下,这样避免了和用户目录之间的耦合。本文只是临时搭建,所以没有特殊要求。
上传解压
上传安装包并解压至master规划目录。
1 | scp hadoop-2.7.7.tar.gz hadoop@10.96.81.166:~/tools/package/ |
环境变量
在/etc/profile文件中配置: hadoop home
1 | export HADOOP_HOME=/home/hadoop/tools/hadoop-2.7.7 |
如果在安装之后发现启动报错,可以用下面的参数开启hadoop的debug模式来查看详细的日志。
1 | export HADOOP_ROOT_LOGGER=DEBUG,console |
还有一种比较优雅的环境变量配置方式是在/etc/profile.d/目录中配置。
1 | sudo vi /etc/profile.d/hadoop-2.7.7.sh |
修改hadoop配置文件
需要修改6个配置文件:
1 | Hadoop环境变量 |
hadoop-env.sh,yarn-env.sh
添加jdk环境变量即可。
1 | The java implementation to use. |
slaves
配置从节点(这里需要注意的是hadoop3.0+版本以后,slaves文件命名为workers了)
1 | [hadoop@jms-master-01 package]$ cat /home/hadoop/tools/hadoop-2.7.7/etc/hadoop/slaves |
core-site.xml
1 | <configuration> |
hdfs-site.xml
1 | <configuration> |
mapred-site.xml
mapred-site.xml需要通过复制mapred-site.xml.template创建。
1 | <configuration> |
yarn-site.xml
1 | <configuration> |
以上我们初步完成了master节点的hadoop安装。然后将master节点的hadoop解压包以及配置文件复制到其他两个节点。
1 | scp -r /home/hadoop/tools/hadoop-2.7.7 hadoop@10.96.113.243:/home/hadoop/tools/ |
别忘了source /etc/profile 使配置生效。然后检查是否安装成功:
1 | [hadoop@jms-master-01 ~]$ hadoop version |
启动hadoop
启动前,需要先在master节点格式化hdfs
1 | /home/hadoop/tools/hadoop-2.7.7/bin/hdfs namenode -format testCluster |
启动
1 | /home/hadoop/tools/hadoop-2.7.7/sbin/start-dfs.sh |
Jps查看启动状态
master节点
1 | [hadoop@jms-master-01 ~]$ jps |
node节点
1 | [hadoop@jms-master-02 ~]$ jps |
管理界面
HDFS管理界面:http://10.96.81.166:50070/dfshealth.html#tab-overview
MR管理界面:http://10.96.81.166:8088/cluster/apps/RUNNING
打开连接查看管理界面详情.
运行一个woldcount
创建两个测试文件
1
2
3
4
5
6vim test1
aaa bbb ccc ddd
eee fff ggg hhh
vim test2
aaa bbb ccc ddd 111
eee fff ggg hhh 111创建一个hdfs测试目录,并上传测试文件到该目录中
1
2hadoop fs -mkdir -p /user/hadoop/test/input
hadoop fs -put test* /user/hadoop/test/input/先退出集群的安全模式
1
hdfs dfsadmin -safemode leave
提交任务到yarn
1
yarn jar /home/hadoop/tools/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /user/hadoop/input /user/hadoop/output
运行ing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69[hadoop@jms-master-02 ~]$ yarn jar /home/hadoop/tools/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /user/hadoop/input /user/hadoop/output
19/03/15 20:08:39 INFO client.RMProxy: Connecting to ResourceManager at jms-master-01/10.96.81.166:8032
19/03/15 20:08:40 INFO input.FileInputFormat: Total input paths to process : 2
19/03/15 20:08:40 INFO mapreduce.JobSubmitter: number of splits:2
19/03/15 20:08:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552651623473_0002
19/03/15 20:08:41 INFO impl.YarnClientImpl: Submitted application application_1552651623473_0002
19/03/15 20:08:41 INFO mapreduce.Job: The url to track the job: http://jms-master-01:8088/proxy/application_1552651623473_0002/
19/03/15 20:08:41 INFO mapreduce.Job: Running job: job_1552651623473_0002
19/03/15 20:08:48 INFO mapreduce.Job: Job job_1552651623473_0002 running in uber mode : false
19/03/15 20:08:48 INFO mapreduce.Job: map 0% reduce 0%
19/03/15 20:08:57 INFO mapreduce.Job: map 100% reduce 0%
19/03/15 20:09:06 INFO mapreduce.Job: map 100% reduce 100%
19/03/15 20:09:07 INFO mapreduce.Job: Job job_1552651623473_0002 completed successfully
19/03/15 20:09:07 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=75
FILE: Number of bytes written=368945
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=274
HDFS: Number of bytes written=31
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=12594
Total time spent by all reduces in occupied slots (ms)=6686
Total time spent by all map tasks (ms)=12594
Total time spent by all reduce tasks (ms)=6686
Total vcore-milliseconds taken by all map tasks=12594
Total vcore-milliseconds taken by all reduce tasks=6686
Total megabyte-milliseconds taken by all map tasks=12896256
Total megabyte-milliseconds taken by all reduce tasks=6846464
Map-Reduce Framework
Map input records=4
Map output records=8
Map output bytes=78
Map output materialized bytes=81
Input split bytes=228
Combine input records=8
Combine output records=6
Reduce input groups=4
Reduce shuffle bytes=81
Reduce input records=6
Reduce output records=4
Spilled Records=12
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=306
CPU time spent (ms)=2490
Physical memory (bytes) snapshot=698810368
Virtual memory (bytes) snapshot=6430330880
Total committed heap usage (bytes)=556793856
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=46
File Output Format Counters
Bytes Written=31查看运行结果
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16[hadoop@jms-master-01 xiepengjie]$ hadoop fs -ls /user/hadoop/output
19/03/15 20:16:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Found 2 items
-rw-r—r— 3 hadoop supergroup 0 2019-03-15 20:15 /user/hadoop/output/_SUCCESS
-rw-r—r— 3 hadoop supergroup 54 2019-03-15 20:15 /user/hadoop/output/part-r-00000
[hadoop@jms-master-01 xiepengjie]$ hadoop fs -cat /user/hadoop/output/part-r-00000
19/03/15 20:16:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
111 2
aaa 2
bbb 2
ccc 2
ddd 2
eee 2
fff 2
ggg 2
hhh 2
至此,hadoop集群搭建完毕。
#note/env
本文链接: https://stefanxiepj.github.io/archives/72e2ab6b.html
版权声明: 本作品采用 知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议 进行许可。转载请注明出处!
![知识共享许可协议](https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png)