因为工作中需要用到Hive On Spark的模式,做数据仓库,但是由于开发环境的服务器资源较为紧张,目前不能将CDH部署到开发环境,毕竟CDH整个安装下来32G内存估计也耗的快差不多了。因此准备安装原生的Hadoop,Hive,Spark,确实很久没有手动安装原生环境了。
今天分享一下安装过程:
开发环境的服务器的配置为:cpu 16核心,内存为32G
mkdir –p /home/module/java/
下载jdk-8u202-linux-x64.tar.gz到该目录
tar –zvxf jdk-8u202-linux-x64.tar.gz
jdk路径为:/home/module/java/jdk1.8.0_202
修改/etc/profile
export JAVA_HOME=/home/module/java/jdk1.8.0_202
export PATH=$PATH:$JAVA_HOME/bin
java –version命令进行验证
mysql安装略过,网上教程较多,也可以使用docker的方式进行安装
详见docker安装mysql教程,这里不再展开
ssh-keygen -t rsa
cd ~/.ssh/
cat id_rsa.pub >> authorized_keys
chmod 600 ./authorized_keys
ssh localhost 进行验证
将安装包放在/home/module
tar –zvxf hadoop-3.3.1.tar.gz
cd /home/module/hadoop-3.3.1/etc/hadoop
[root@data-dev-server hadoop]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.10 data-dev-server
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/home/module/hadoop-3.3.1
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export HADOOP_YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HIVE_HOME=/home/module/apache-hive-3.1.2-bin
export PATH=$PATH:$HIVE_HOME/bin
export SPARK_HOME=/home/module/spark-2.3.0-bin-without-hive
export PATH=$SPARK_HOME/bin:$PATH
hdfs namenode –format
start-dfs.sh 启动hdfs
start-yarn.sh 启动yarn
jps看一下
NameNode SecondaryNameNode DataNode为hdfs进程
NodeManager ResourceManager 为yarn进程
Hadoop fs –ls / 命令验证一下hadoop安装成功
tar –zvxf apache-hive-3.1.2-bin.tar.gz
cd /home/module/apache-hive-3.1.2-bin/conf
内容为
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
CREATE DATABASE `hive` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci */;
CREATE USER 'hive'@'%' IDENTIFIED BY '123456';
GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'%';
FLUSH PRIVILEGES;
cd /home/module/apache-hive-3.1.2-bin/conf
初始化hive元数据 ./schematool -dbType mysql -initSchema
启动 metastore服务
nohup hive --service metastore
使用hive命令行,进入
下载spark-2.3.0-bin-without-hive.tgz
tar –zvxf spark-2.3.0-bin-without-hive.tgz
cd /home/module/spark-2.3.0-bin-without-hive/conf
cp spark-defaults.conf.template spark-defaults.conf
修改spark-defaults.conf,内容为:
spark.master yarn
spark.home /home/module/spark-2.3.0-bin-without-hive
spark.eventLog.enabled true
spark.eventLog.dir hdfs://data-dev-server:9000/tmp/spark
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.executor.memory 1g
spark.driver.memory 1g
spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.yarn.archive hdfs://data-dev-server:9000/spark/jars/spark2.3.0-without-hive-libs.jar
spark.yarn.jars hdfs://data-dev-server:9000/spark/jars/spark2.3.0-without-hive-libs.jar
hadoop fs -mkdir -p /tmp/spark
hadoop fs -mkdir -p /spark/jars
cd /home/module/spark-2.3.0-bin-without-hive/jars
hadoop fs -put ./jars/* /spark/jars/
cd /home/module/spark-2.3.0-bin-without-hive
jar cv0f spark2.3.0-without-hive-libs.jar -C ./jars/ .
hadoop fs -put spark2.3.0-without-hive-libs.jar /spark/jars/
cd /home/module/spark-2.3.0-bin-without-hive/conf
修改spark-env.sh
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export HADOOP_CONF_DIR={HADOOP_HOME}/etc/hadoop/
cp jars/* /home/module/apache-hive-3.1.2-bin/lib/
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
--driver-memory 1G \
--num-executors 3 \
--executor-memory 1G \
--executor-cores 1 \
/home/module/spark-2.3.0-bin-without-hive/examples/jars/spark-examples_*.jar 10
验证hive on spark
创建一张表,select count(*) from dws_user;
可以看到使用的spark引擎,整合成功
留言与评论(共有 0 条评论) “” |