Spark 使用遇到的问题

发布时间 2023-10-08 17:24:42作者: INnoVation-V2

Spark 使用遇到的问题

环境信息

IDEA版本:Build #IU-232.8660.185, built on July 26, 2023

系统版本:Macos 14.0

Docker版本:

image-20231008165706560

一、Docker运行Spark集群

这里使用bitnami发行的spark image

image-20231008165448620

github文档地址:https://github.com/bitnami/containers/tree/main/bitnami/spark#configuration

下载完成后,

  1. 在磁盘上随便找个地方创建一个文件夹,

  2. 新建一个文件,文件名必须是docker-compose.yml

  3. 文件内容如下

    # Copyright VMware, Inc.
    # SPDX-License-Identifier: APACHE-2.0
    
    version: '2'
    
    services:
      spark:
        image: docker.io/bitnami/spark:3.5
        environment:
          - SPARK_MODE=master
          - SPARK_RPC_AUTHENTICATION_ENABLED=no
          - SPARK_RPC_ENCRYPTION_ENABLED=no
          - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
          - SPARK_SSL_ENABLED=no
          - SPARK_USER=spark
        ports:
          - '8080:8080'
      spark-worker:
        image: docker.io/bitnami/spark:3.5
        environment:
          - SPARK_MODE=worker
          - SPARK_MASTER_URL=spark://spark:7077
          - SPARK_WORKER_MEMORY=1G
          - SPARK_WORKER_CORES=1
          - SPARK_RPC_AUTHENTICATION_ENABLED=no
          - SPARK_RPC_ENCRYPTION_ENABLED=no
          - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
          - SPARK_SSL_ENABLED=no
          - SPARK_USER=spark
    

    image版本必须和你的一致,比如说我的image版本是

    image-20231008170037415

    那就修改成

    image: docker.io/bitnami/spark:3.5.0-debian-11-r9
    
  4. 启动Spark

    打开命令符,进入文件所在目录,单worker启动

    docker-compose up
    

    启动多个worker

    docker-compose up --scale spark-worker=3
    

二、IDEA 2023创建Scala Maven项目

  1. 新建一个项目,选择IntelliJ Build System

    image-20231008171301912
  2. 右上角搜索Add Framework Support

    image-20231008171707322

  3. 然后添加Maven即可

    image-20231008171720666

三、IDEA连接Docker Spark

  1. IDEA新建一个Maven Scala项目

  2. 添加如下依赖

        <dependencies>
            <dependency>
                <groupId>org.scala-lang</groupId>
                <artifactId>scala3-library_3</artifactId>
                <version>3.3.1</version>
            </dependency>
    
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.13</artifactId>
                <version>3.5.0</version>
            </dependency>
        </dependencies>
    
  3. 查看Spark启动日志,获取Master运行端口,我这里是7077

    image-20231008170642172

  4. 也可进入Spark WebUI地址查看,WebUI默认地址http://localhost:8080/,URL后面的端口值即为Master运行端口

    image-20231008170714048

  5. 编写Spark代码

      def main(args: Array[String]): Unit = {
        val conf = new SparkConf()
        conf.setAppName("myapp")
        conf.setMaster("spark://localhost:7077")
    
        val sc = new SparkContext(conf)
        val rdd = sc.parallelize(List(1, 2, 3, 4, 5, 6))
        print(rdd)
      }
    

    注意Master的地址,把7077改成你的端口即可,不要修改localhost

  6. 运行后可能会报错

    Exception in thread "main" java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x55b0dcab) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x55b0dcab
    	at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala:213)
    	at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:121)
    	at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:358)
    	at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:295)
    	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:344)
    	at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:196)
    	at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:284)
    	at org.apache.spark.SparkContext.<init>(SparkContext.scala:483)
    	at Main$.main(Main.scala:9)
    	at Main.main(Main.scala)
    

    需要修改运行设置

    image-20231008171034369

    Modify Options --> Add VM Options --> 添加下面语句

    --add-exports java.base/sun.nio.ch=ALL-UNNAMED
    

    之后运行即可