怎樣做原創(chuàng)短視頻網(wǎng)站百度關(guān)鍵詞工具
hadoop集群搭建
hadoop摘要
Hadoop 是一個開源的分布式存儲和計算框架,旨在處理大規(guī)模數(shù)據(jù)集并提供高可靠性、高性能的數(shù)據(jù)處理能力。它主要包括以下幾個核心組件:
-
Hadoop 分布式文件系統(tǒng)(HDFS):HDFS 是 Hadoop 的分布式文件存儲系統(tǒng),用于存儲大規(guī)模數(shù)據(jù),并通過數(shù)據(jù)的副本和自動故障恢復(fù)機制來提供高可靠性和容錯性。
-
YARN(Yet Another Resource Negotiator):YARN 是 Hadoop 的資源管理平臺,用于調(diào)度和管理集群中的計算資源,支持多種數(shù)據(jù)處理框架的并行計算,如 MapReduce、Spark 等。
-
MapReduce:MapReduce 是 Hadoop 最初提出的一種分布式計算編程模型,用于并行處理大規(guī)模數(shù)據(jù)集。它將計算任務(wù)分解為 Map 和 Reduce 兩個階段,充分利用集群的計算資源進行并行計算。
-
Hadoop 生態(tài)系統(tǒng):除了上述核心組件外,Hadoop 還包括了一系列相關(guān)項目和工具,如 HBase(分布式列存數(shù)據(jù)庫)、Hive(數(shù)據(jù)倉庫基礎(chǔ)設(shè)施)、Spark(快速通用的集群計算系統(tǒng))等,構(gòu)成了完整的大數(shù)據(jù)處理生態(tài)系統(tǒng)。
總的來說,Hadoop 提供了一套強大的工具和框架,使得用戶能夠在分布式環(huán)境下高效地存儲、處理和分析大規(guī)模數(shù)據(jù),是大數(shù)據(jù)領(lǐng)域的重要基礎(chǔ)設(shè)施之一。
基礎(chǔ)環(huán)境
規(guī)劃
hadoop001 | hadoop02 | hadoop003 | |
---|---|---|---|
HDFS | NameNode | DataNode | SecondaryNameNode DateNode |
YARN | NodeManager | ResourceManager NodeManager | NodeManager |
關(guān)閉防火墻 安全模塊
systemctl disable --now firewalld
#關(guān)閉并且禁止防火墻自啟動
setenforce 0
#關(guān)閉增強模塊
sed -i 's/SELINUX=enforcing/SELINUX=permissive/g' /etc/selinux/config
#禁止自啟動
添加主機地址映射
cat >> /etc/hosts << lxf
192.168.200.41 hadoop001
192.168.200.42 hadoop002
192.168.200.43 hadoop003
lxf
傳遞軟件包
#最好創(chuàng)建一個獨立目錄存放軟件包
#
[root@hadoop001 ~]# mkdir -p /export/{data,servers,software}
[root@hadoop001 ~]# tree /export/
/export/
├── data #存放數(shù)據(jù)
├── servers #安裝服務(wù)
└── software #存放服務(wù)的軟件包3 directories, 0 files
[root@hadoop001 ~]#
配置ssh免密
master 和兩臺slave都要配置免密
#在hadoop001節(jié)點
[root@hadoop001 ~]#
[root@hadoop001 ~]# ssh-keygen -t rsa
[root@hadoop001 ~]#
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:2AY6p3kFWziXFmXH7e/PHgrAVZ9H7hkCWAtCc19UwZw root@hadoop001
The key's randomart image is:
+---[RSA 2048]----+
| .+.+=+o=+o+|
| .+=o.* oE.|
| = = + o +o|
| . @. . o.+|
| o + So o.|
| = o . .|
| o . . o |
| . . ..o|
| . .=|
+----[SHA256]-----+
[root@hadoop001 ~]#
[root@hadoop001 ~]# hosts=("hadoop001" "hadoop002" "hadoop003"); for host in "${hosts[@]}"; do echo "將公鑰分發(fā)到 $host"; ssh-copy-id $host; done
將公鑰分發(fā)到 hadoop001
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@hadoop001's password: Number of key(s) added: 1Now try logging into the machine, with: "ssh 'hadoop001'"
and check to make sure that only the key(s) you wanted were added.將公鑰分發(fā)到 hadoop002
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'hadoop002 (192.168.200.42)' can't be established.
ECDSA key fingerprint is SHA256:ADFjDGD2MxgCqL5fQuWhn+0T5drPiTXERvlMiu/QXjA.
ECDSA key fingerprint is MD5:d2:2b:06:cb:13:48:e0:87:d7:f3:87:8b:2c:56:e4:da.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@hadoop002's password: Number of key(s) added: 1Now try logging into the machine, with: "ssh 'hadoop002'"
and check to make sure that only the key(s) you wanted were added.將公鑰分發(fā)到 hadoop003
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@hadoop003's password: Number of key(s) added: 1Now try logging into the machine, with: "ssh 'hadoop003'"
and check to make sure that only the key(s) you wanted were added.
[root@hadoop001 ~]#
[root@hadoop001 ~]#
#在hadoop002和hadoop003重復(fù)操作 使得三臺節(jié)點可以互相通信
安裝服務(wù)
安裝java hadoop
[root@hadoop001]# tree -L 2 /export/
/export/
├── data #因為還沒有啟動服務(wù)所以沒有數(shù)據(jù)文件
├── servers
│ ├── hadoop-3.3.6
│ └── jdk1.8.0_221
└── software├── hadoop-3.3.6-aarch64.tar.gz└── jdk-8u221-linux-x64.tar.gz5 directories, 2 files
[root@hadoop001 export]#
#同樣在另外兩個節(jié)點分別解壓到對應(yīng)目錄
配置環(huán)境變量
#/etc/profile.d/my_env.sh 我們可以專門為hadoop設(shè)置一個環(huán)境變量文件夾用作修改,而不去直接修改/etc/profile文件,這樣在系統(tǒng)啟動時或者用戶登錄時會自動加載這些環(huán)境變量。[root@hadoop001 ~]# cat >> /etc/profile.d/my_env.sh << lxf
> #JAVA_HOME
> export JAVA_HOME=/export/servers/jdk1.8.0_221
> export PATH=$PATH:$JAVA_HOME/bin
> #HADOOP_HOME
> export HADOOP_HOME=/export/servers/hadoop-3.3.6
> export PATH=$PATH:$HADOOP_HOME/bin
> export PATH=$PATH:$HADOOP_HOME/sbin
> lxf
[root@hadoop001 ~]# source /etc/profile
[root@hadoop001 ~]# echo $HADOOP_HOME
/export/servers/hadoop-3.3.6
[root@hadoop001 ~]# echo $JAVA_HOME
/export/servers/jdk1.8.0_221
[root@hadoop001 ~]# #測試環(huán)境變量
[root@hadoop001 ~]# hadoop version
Hadoop 3.3.6
Source code repository https://github.com/apache/hadoop.git -r 1be78238728da9266a4f88195058f08fd012bf9c
Compiled by ubuntu on 2023-06-18T23:15Z
Compiled on platform linux-aarch_64
Compiled with protoc 3.7.1
From source with checksum 5652179ad55f76cb287d9c633bb53bbd
This command was run using /export/servers/hadoop-3.3.6/share/hadoop/common/hadoop-common-3.3.6.jar
[root@hadoop001 ~]# java -version
java version "1.8.0_221"
Java(TM) SE Runtime Environment (build 1.8.0_221-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.221-b11, mixed mode)
[root@hadoop001 ~]#
#分發(fā)環(huán)境配置文件
scp /etc/profile.d/my_env.sh hadoop002:/etc/profile.d/my_env.sh
scp /etc/profile.d/my_env.sh hadoop003:/etc/profile.d/my_env.sh
#分發(fā)完成后注意測試環(huán)境變量是否成功配置
修改Hadoop配置文件
[root@hadoop001 hadoop]# pwd
/export/servers/hadoop-3.3.6/etc/hadoop
[root@hadoop001 hadoop]# grep '^export' hadoop-env.sh
export JAVA_HOME=/export/servers/jdk1.8.0_221
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"
export HADOOP_OS_TYPE=${HADOOP_OS_TYPE:-$(uname -s)}
[root@hadoop001 hadoop]#
修改core-site.xml文件
<configuration><property><name>fs.defaultFS</name><value>hdfs://hadoop001:9000</value></property><property><name>hadoop.tmp.dir</name><value>/export/data/hadoop/tmp</value></property>
</configuration>
修改hdfs-site.xml文件
<configuration><property><name>dfs.relication</name><value>2</value></property><property><name>dfs.namenode.name.dir</name><value>/export/data/hadoop/name</value></property><property><name>dfs.namenode.secondary.http-address</name><value>hadoop002:50090</value></property><property><name>dfs.namenode.data.dir</name><value>/export/data/hadoop/data</value></property>
</configuration>
修改mapred-site.xml文件
<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property>
</configuration>
修改yarn-site.xml文件
<configuration>
<!-- Site specific YARN configuration properties --><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.resourcemanager.hostname</name><value>hadoop001</value></property>
</configuration>
修改workers文件
[root@hadoop001 hadoop]# cat >> workers << lxf
> hadoop001
> hadoop002
> hadoop003
> lxf
[root@hadoop001 hadoop]#
向集群分發(fā)配置文件
#開始分發(fā)
#向hadoop002
scp /export/servers/hadoop-3.3.6/etc/hadoop hadoop002:/export/servers/hadoop-3.3.6/etc/hadoop
#向hadoop003
scp /export/servers/hadoop-3.3.6/etc/hadoop hadoop002:/export/servers/hadoop-3.3.6/etc/hadoop
hadoop集群
hadoop集群文件系統(tǒng)初始化
Hadoop集群的初始化是非常重要的,它確保了集群的各個組件在啟動時處于正確的狀態(tài),并且能夠正確地協(xié)調(diào)彼此的工作。在第一次啟動Hadoop集群時,初始化是必需的,具體原因如下:
- 文件系統(tǒng)初始化:Hadoop的分布式文件系統(tǒng)(HDFS)需要在啟動時進行初始化,這包括創(chuàng)建初始的目錄結(jié)構(gòu)、設(shè)置權(quán)限和準(zhǔn)備必要的元數(shù)據(jù)等操作。
- 元數(shù)據(jù)準(zhǔn)備:Hadoop的各個組件(比如NameNode、ResourceManager等)需要進行元數(shù)據(jù)的準(zhǔn)備工作,包括創(chuàng)建必要的數(shù)據(jù)庫表、清理日志文件等。
- 配置檢查:初始化過程還會對各個節(jié)點的配置進行檢查,確保配置的正確性和一致性,以避免在后續(xù)運行中出現(xiàn)問題。
- 啟動必要的服務(wù):在初始化過程中,Hadoop會啟動各個必要的服務(wù),比如NameNode、DataNode、ResourceManager、NodeManager等,以確保集群的核心組件都能夠正常運行。
#在啟動hadoop集群之前,需要對主節(jié)點hadoop001進行格式化
[root@hadoop001 ~]# hadoop namenode -format
.......
......
啟動
分布啟動
啟動hdfs
[root@hadoop001 ~]# start-dfs.sh
啟動yarn
[root@hadoop001 ~]# start-yarn.sh
集群一鍵啟動
[root@hadoop001 ~]# start-all.sh
關(guān)閉
分布關(guān)閉
關(guān)閉hdfs
[root@hadoop001 ~]# stop-dfs.sh
啟動yarn
[root@hadoop001 ~]# stop-yarn.sh
集群一鍵啟動
[root@hadoop001 ~]# stop-all.sh
測試
查看hadoop的WebUI
成功訪問
可以在C:\Windows\System32\drivers\etc\hosts下面修改hosts主機地址映射然后在宿主機的瀏覽器中就可以使用主機名:端口號去訪問
如果沒有設(shè)置就只能使用ip:端口號去訪問
關(guān)閉hdfs
[root@hadoop001 ~]# stop-dfs.sh
啟動yarn
[root@hadoop001 ~]# stop-yarn.sh
集群一鍵啟動
[root@hadoop001 ~]# stop-all.sh
測試
查看hadoop的WebUI
成功訪問
可以在C:\Windows\System32\drivers\etc\hosts下面修改hosts主機地址映射然后在宿主機的瀏覽器中就可以使用主機名:端口號去訪問
如果沒有設(shè)置就只能使用ip:端口號去訪問