spark on hive 踩坑

搭建hadoop环境

掠过

搭建hive环境

掠过

搭建spark

  1. 注意版本兼容

在pom文件中查看

  1. 下载spark-2.0.0-bin-hadoop2.4-without-hive版本,without-hive必须

我的版本 hive:2.3.6 spark :spark-2.0.0-bin-hadoop2.4-without-hive hadoop:2.7

  1. 复制jar包

    • cp scala-library-***.jar /hive_home/lib/

    • cp spark-core_***.jar /hive_home/lib/

    • cp spark-network-common_***.jar /hive_home/lib/

    • chill-java chill jackson-module-paranamer jackson-module-scala jersey-container-servlet-core

    • jersey-server json4s-ast kryo-shaded minlog scala-xml spark-launcher

    • spark-network-shuffle spark-unsafe xbean-asm5-shaded

从spark的jars文件夹中复制过去

  1. 配置hive-site.xml

    <property>
       <name>hive.enable.spark.execution.engine</name>
       <value>true</value>
     </property>
     <property>
       <name>spark.master</name>
       <--! <value>spark://localhost:7077</value> -->
       <value>local</value>
       <description/>
     </property>
    <property>
       <name>hive.execution.engine</name>
       <value>spark</value>
       <description>
         Expects one of [mr, tez, spark].
         Chooses execution engine. Options are: mr (Map reduce, default), tez, spark. While MR
         remains the default engine for historical reasons, it is itself a historical engine
         and is deprecated in Hive 2 line. It may be removed without further warning.
       </description>
    

hadoop的坑

如果Datanode存活数量少于1个,则无法提交job,原因,hdfs文件系统异常(可能是执行了多次初始化,或者断点,网络变动等原因)

解决方法:

找到hdfs-site.xml

<property>    
    <name>dfs.datanode.data.dir</name>    
    <value>file:/hadoop/data/dfs/datanode</value>  
</property>

/hadoop/data下所有内容清空,然后重新执行hadoop namenode -format初始化文件系统