Hive On Spark伪分布式开发环境搭建

前言

因为工作中需要用到Hive On Spark的模式,做数据仓库,但是由于开发环境的服务器资源较为紧张,目前不能将CDH部署到开发环境,毕竟CDH整个安装下来32G内存估计也耗的快差不多了。因此准备安装原生的Hadoop,Hive,Spark,确实很久没有手动安装原生环境了。

今天分享一下安装过程:

开发环境的服务器的配置为:cpu 16核心,内存为32G


Hadoop前置安装

JDK安装

mkdir –p /home/module/java/

下载jdk-8u202-linux-x64.tar.gz到该目录

tar –zvxf jdk-8u202-linux-x64.tar.gz

jdk路径为:/home/module/java/jdk1.8.0_202


修改/etc/profile

export JAVA_HOME=/home/module/java/jdk1.8.0_202

export PATH=$PATH:$JAVA_HOME/bin


java –version命令进行验证


mysql安装

mysql安装略过,网上教程较多,也可以使用docker的方式进行安装

详见docker安装mysql教程,这里不再展开




免登陆

ssh-keygen -t rsa

cd ~/.ssh/

cat id_rsa.pub >> authorized_keys

chmod 600 ./authorized_keys

ssh localhost 进行验证


安装hadoop-3.3.1.tar.gz

将安装包放在/home/module

tar –zvxf hadoop-3.3.1.tar.gz

cd /home/module/hadoop-3.3.1/etc/hadoop


修改/etc/hosts

[root@data-dev-server hadoop]# cat /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.1.10 data-dev-server


core-site.xml主要内容

hadoop.tmp.dir

file:/home/hadoop_data/tmp

Abase for other temporary directories.

fs.defaultFS

hdfs://data-dev-server:9000


hdfs-site.xml 主要内容

dfs.replication

1

dfs.namenode.name.dir

file:/usr/local/hadoop/tmp/dfs/name

dfs.datanode.data.dir

file:/usr/local/hadoop/tmp/dfs/data

dfs.http.address

data-dev-server:50070


mapred-site.xml


mapreduce.framework.name

yarn

mapreduce.jobhistory.address

data-dev-server:10020

mapreduce.jobhistory.webapp.address

data-dev-server:19888

yarn.app.mapreduce.am.env

HADOOP_MAPRED_HOME=${HADOOP_HOME}

mapreduce.map.env

HADOOP_MAPRED_HOME=${HADOOP_HOME}

mapreduce.reduce.env

HADOOP_MAPRED_HOME=${HADOOP_HOME}



yarn-site.xml



yarn.resourcemanager.hostname

data-dev-server

yarn.nodemanager.aux-services

mapreduce_shuffle

yarn.nodemanager.pmem-check-enabled

false

yarn.nodemanager.vmem-check-enabled

false

yarn.log-aggregation-enable

true

yarn.log.server.url

http://data-dev-server:19888/jobhistory/logs

yarn.log-aggregation.retain-seconds

104800




etc/profile


export PATH=$PATH:$JAVA_HOME/bin

export HADOOP_HOME=/home/module/hadoop-3.3.1


export HADOOP_MAPRED_HOME=${HADOOP_HOME}

export HADOOP_COMMON_HOME=${HADOOP_HOME}

export HADOOP_HDFS_HOME=${HADOOP_HOME}

export HADOOP_YARN_HOME=${HADOOP_HOME}

export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop



export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export HIVE_HOME=/home/module/apache-hive-3.1.2-bin

export PATH=$PATH:$HIVE_HOME/bin


export SPARK_HOME=/home/module/spark-2.3.0-bin-without-hive

export PATH=$SPARK_HOME/bin:$PATH


初始化启动Hadoop

hdfs namenode –format


start-dfs.sh 启动hdfs

start-yarn.sh 启动yarn

jps看一下

NameNode SecondaryNameNode DataNode为hdfs进程

NodeManager ResourceManager 为yarn进程


Hadoop fs –ls / 命令验证一下hadoop安装成功



安装apache-hive-3.1.2-bin.tar.gz

tar –zvxf apache-hive-3.1.2-bin.tar.gz

cd /home/module/apache-hive-3.1.2-bin/conf


建立hive-site.xml

内容为

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

hive.metastore.warehouse.dir

/user/hive/warehouse

设置hdfs中的默认目录

javax.jdo.option.ConnectionURL

jdbc:mysql://10.20.29.52:33061/hive?createDatabaseIfNotExist=true&useSSL=false

保存元数据的数据库连接

javax.jdo.option.ConnectionDriverName

com.mysql.jdbc.Driver

数据库驱动,需要拷贝到${HIVE_HOME}/lib目录

javax.jdo.option.ConnectionUserName

hive

用户名和密码

javax.jdo.option.ConnectionPassword

123456

用户名和密码

hive.cli.print.header

true

hive.cli.print.current.db

true

hive.metastore.schema.verification

false

system:user.name

root

user name

hive.server2.thrift.bind.host

data-dev-server

Bind host on which to run the HiveServer2 Thrift service.


hive.server2.thrift.port

11000


hive.metastore.uris

thrift://data-dev-server:9083


spark.yarn.jars

hdfs://data-dev-server:9000/spark/jars/*.jar


hive.execution.engine

spark


hive.spark.client.connect.timeout

10000ms



创建hive数据库

CREATE DATABASE `hive` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci */;

CREATE USER 'hive'@'%' IDENTIFIED BY '123456';

GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'%';

FLUSH PRIVILEGES;



启动Hive

cd /home/module/apache-hive-3.1.2-bin/conf

初始化hive元数据 ./schematool -dbType mysql -initSchema

启动 metastore服务

nohup hive --service metastore


使用hive命令行,进入


安装spark,实现mr引擎切换为spark

下载spark-2.3.0-bin-without-hive.tgz

tar –zvxf spark-2.3.0-bin-without-hive.tgz


cd /home/module/spark-2.3.0-bin-without-hive/conf

cp spark-defaults.conf.template spark-defaults.conf

修改spark-defaults.conf,内容为:

spark.master yarn

spark.home /home/module/spark-2.3.0-bin-without-hive

spark.eventLog.enabled true

spark.eventLog.dir hdfs://data-dev-server:9000/tmp/spark

spark.serializer org.apache.spark.serializer.KryoSerializer

spark.executor.memory 1g

spark.driver.memory 1g

spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

spark.yarn.archive hdfs://data-dev-server:9000/spark/jars/spark2.3.0-without-hive-libs.jar

spark.yarn.jars hdfs://data-dev-server:9000/spark/jars/spark2.3.0-without-hive-libs.jar


将spark相关的jar包上传到hdfs

hadoop fs -mkdir -p /tmp/spark

hadoop fs -mkdir -p /spark/jars

cd /home/module/spark-2.3.0-bin-without-hive/jars

hadoop fs -put ./jars/* /spark/jars/


cd /home/module/spark-2.3.0-bin-without-hive

jar cv0f spark2.3.0-without-hive-libs.jar -C ./jars/ .

hadoop fs -put spark2.3.0-without-hive-libs.jar /spark/jars/


cd /home/module/spark-2.3.0-bin-without-hive/conf

修改spark-env.sh

export SPARK_DIST_CLASSPATH=$(hadoop classpath)

export HADOOP_CONF_DIR={HADOOP_HOME}/etc/hadoop/


cp jars/* /home/module/apache-hive-3.1.2-bin/lib/



验证spark

spark-submit \

--class org.apache.spark.examples.SparkPi \

--master yarn \

--deploy-mode client \

--driver-memory 1G \

--num-executors 3 \

--executor-memory 1G \

--executor-cores 1 \

/home/module/spark-2.3.0-bin-without-hive/examples/jars/spark-examples_*.jar 10


验证hive on spark

创建一张表,select count(*) from dws_user;

可以看到使用的spark引擎,整合成功


发表评论
留言与评论(共有 0 条评论) “”
   
验证码:

相关文章

推荐文章