Spark教程——(2)编写spark-submit测试Demo
2024-09-03 02:00:02
创建Maven项目:
填写Maven的pom文件如下:
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.world.chenfei</groupId> <artifactId>JavaSparkPi</artifactId> <version>1.0-SNAPSHOT</version> <properties> <maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> <encoding>UTF-8</encoding> <spark.version>2.1.0</spark.version> </properties> <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>${spark.version}</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.4.3</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>org.world.chenfei.JavaSparkPi</mainClass> </transformer> </transformers> </configuration> </execution> </executions> </plugin> </plugins> </build> </project>
编写一个蒙特卡罗求PI的代码:
package org.world.chenfei; import java.util.ArrayList; import java.util.List; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.Function2; public class JavaSparkPi { public static void main(String[] args) throws Exception { SparkConf sparkConf = new SparkConf().setAppName("JavaSparkPi")/*.setMaster("local[2]")*/; JavaSparkContext jsc = new JavaSparkContext(sparkConf); int slices = (args.length == 1) ? Integer.parseInt(args[0]) : 2; int n = 100000 * slices; List<Integer> l = new ArrayList<Integer>(n); for (int i = 0; i < n; i++) { l.add(i); } JavaRDD<Integer> dataSet = jsc.parallelize(l, slices); int count = dataSet.map(new Function<Integer, Integer>() { @Override public Integer call(Integer integer) { double x = Math.random() * 2 - 1; double y = Math.random() * 2 - 1; return (x * x + y * y ) ? 1 : 0; } }).reduce(new Function2<Integer, Integer, Integer>() { @Override public Integer call(Integer integer, Integer integer2) { return integer + integer2; } }); System.out.println("Pi is roughly " + 4.0 * count / n); jsc.stop(); } }
将本项目打包为Jar文件:
此时在target目录下,就会生成这个项目的Jar包
尝试将该jar包在本地执行:
C:\Users\Administrator\Desktop\swap>java -jar JavaSparkPi-1.0-SNAPSHOT.jar
执行失败,并返回如下信息:
C:\Users\Administrator\Desktop\swap>java -jar JavaSparkPi-1.0-SNAPSHOT.jar Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 19/04/28 16:24:30 INFO SparkContext: Running Spark version 2.1.0 19/04/28 16:24:30 WARN SparkContext: Support for Scala 2.10 is deprecated as of Spark 2.1.0 19/04/28 16:24:30 WARN NativeCodeLoader: Unable to load native-hadoop library fo r your platform... using builtin-java classes where applicable 19/04/28 16:24:30 ERROR SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: A master URL must be set in your configuration at org.apache.spark.SparkContext.<init>(SparkContext.scala:379) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.sc ala:58) at org.world.chenfei.JavaSparkPi.main(JavaSparkPi.java:15) 19/04/28 16:24:30 INFO SparkContext: Successfully stopped SparkContext Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration at org.apache.spark.SparkContext.<init>(SparkContext.scala:379) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.sc ala:58) at org.world.chenfei.JavaSparkPi.main(JavaSparkPi.java:15)
将Jar包上传到服务器上,并执行以下命令:
spark-submit --class org.world.chenfei.JavaSparkPi --executor-memory 500m --total-executor-cores /home/cf/JavaSparkPi-1.0-SNAPSHOT.jar
执行成功,并返回如下信息:
[root@node1 ~]# spark-submit --class org.world.chenfei.JavaSparkPi --executor-memory 500m --total-executor-cores /home/cf/JavaSparkPi-1.0-SNAPSHOT.jar SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding -.cdh5./jars/slf4j-log4j12-.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding -.cdh5./jars/avro-tools--cdh5.14.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] …… // :: INFO util.Utils: Successfully started service . // :: INFO spark.SparkEnv: Registering MapOutputTracker // :: INFO spark.SparkEnv: Registering BlockManagerMaster // :: INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-97788ddb-d5eb-48ce-aa9b-e030102dd06c …… // :: INFO util.Utils: Fetching spark://10.200.101.131:41504/jars/JavaSparkPi-1.0-SNAPSHOT.jar to /tmp/spark-e96463c3-1979-4247-957c-b381f65ddc88/userFiles-666197fa-738d-41e1-a670-a758af1ef9e1/fetchFileTemp2787870198743975902.tmp // :: INFO executor.Executor: Adding --957c-b381f65ddc88/userFiles-666197fa-738d-41e1-a670-a758af1ef9e1/JavaSparkPi-1.0-SNAPSHOT.jar to class loader // :: INFO executor.Executor: Finished task ). bytes result sent to driver // :: INFO executor.Executor: Finished task ). bytes result sent to driver // :: INFO scheduler.TaskSetManager: Finished task ) ms on localhost (executor driver) (/) // :: INFO scheduler.TaskSetManager: Finished task ) ms on localhost (executor driver) (/) // :: INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool // :: INFO scheduler.DAGScheduler: ResultStage (reduce at JavaSparkPi.java:) finished in 0.682 s // :: INFO scheduler.DAGScheduler: Job finished: reduce at JavaSparkPi.java:, took 1.102582 s …… Pi is roughly 3.14016 …… // :: INFO ui.SparkUI: Stopped Spark web UI at http://10.200.101.131:4040 // :: INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! // :: INFO storage.MemoryStore: MemoryStore cleared // :: INFO storage.BlockManager: BlockManager stopped // :: INFO storage.BlockManagerMaster: BlockManagerMaster stopped // :: INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! // :: INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. // :: INFO spark.SparkContext: Successfully stopped SparkContext // :: INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. // :: INFO util.ShutdownHookManager: Shutdown hook called // :: INFO util.ShutdownHookManager: Deleting directory /tmp/spark-e96463c3---957c-b381f65ddc88
计算结果为:
Pi is roughly 3.14016
最新文章
- 激活Windows 8.1 RTM原来如此简单
- poj3294 出现次数大于n/2 的公共子串
- 我的PhoneGap安装配置经历
- codevs 3008 加工生产调度[贪心]
- jquery validate使用
- AsyncTask来源分析(一)
- win7 下安装 ubuntu 16.04双系统
- 用 Python 编写网络爬虫 笔记
- Redux源码分析之createStore
- 程序员50题(JS版本)(七)
- vue项目两级全选(多级原理也一样),感觉有点意思,随手一记
- Js--动态生成表格
- HDU 6375(双端队列 ~)
- 实现私有化(Pimpl) --- QT常见的设计模式
- Ubuntu中基于QT的系统网线连接状态的实时监视
- jsp动作之 forward
- python之数据类型1
- Javac语法糖之增强for循环
- redis 新开端口号
- 一 创建github账号以及上传工程到github