Flink WordCount入门

下面通过一个单词统计的案例，快速上手应用 Flink，进行流处理（Streaming）和批处理（Batch）

单词统计（批处理）

引入依赖

<!--flink核心包-->

  <dependency>

      <groupId>org.apache.flink</groupId>

      <artifactId>flink-java</artifactId>

      <version>1.7.2</version>

  </dependency>

  <!--flink流处理包-->

  <dependency>

      <groupId>org.apache.flink</groupId>

      <artifactId>flink-streaming-java_2.12</artifactId>

      <version>1.7.2</version>

  </dependency>

代码实现

public class WordCountBatch {

    public static void main(String[] args) throws Exception {

        String inputFile= "E:\\data\\word.txt";

        String outPutFile= "E:\\data\\wordResult.txt";

        ExecutionEnvironment executionEnvironment = ExecutionEnvironment.getExecutionEnvironment();

        //1. 读取数据

        DataSource<String> dataSource = executionEnvironment.readTextFile(inputFile);

        //2. 对数据进行处理，转成word,1的格式

        FlatMapOperator<String, Tuple2<String, Integer>> flatMapOperator = dataSource.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {

            @Override

            public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) throws Exception {

                String[] words = s.split(" ");

                for (String word : words) {

                    collector.collect(new Tuple2<>(word, 1));

                }

            }

        });

        //3. 对数据分组，相同word的一个组

        UnsortedGrouping<Tuple2<String, Integer>> tuple2UnsortedGrouping = flatMapOperator.groupBy(0);

        //4. 对分组后的数据求和

        AggregateOperator<Tuple2<String, Integer>> sum = tuple2UnsortedGrouping.sum(1);

        //5. 写出数据

        sum.writeAsCsv(outPutFile).setParallelism(1);

        //执行

        executionEnvironment.execute("wordcount batch process");

    }

}

执行 main 方法，得出结果。我测试的 word.txt 内容如下：

ni hao hi

wang mei mei

liu mei

ni hao

wo hen hao

this is a good idea

Apache Flink

输出的文件结果：

a,1

mei,3

Apache,1

Flink,1

good,1

hen,1

hi,1

idea,1

ni,2

is,1

liu,1

this,1

wo,1

hao,3

wang,1

单词统计（流数据）

需求：Socket 模拟实时发送单词，使用 Flink 实时接收数据，对指定时间窗口内（如 5s）的数据进行聚合统计，每隔 1s 汇总计算一次，并且把时间窗口内计算结果打印出来

public class WordCountStream {

    public static void main(String[] args) throws Exception {

        int port = 7000;

        StreamExecutionEnvironment executionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStreamSource<String> textStream = executionEnvironment.socketTextStream("192.168.56.103", port, "\n");

        SingleOutputStreamOperator<Tuple2<String, Integer>> tuple2SingleOutputStreamOperator = textStream.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {

            @Override

            public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) throws Exception {

                String[] split = s.split("\\s");

                for (String word : split) {

                    collector.collect(Tuple2.of(word, 1));

                }

            }

        });

        SingleOutputStreamOperator<Tuple2<String, Integer>> word = tuple2SingleOutputStreamOperator.keyBy(0)

                .timeWindow(Time.seconds(5),Time.seconds(1)).sum(1);

        word.print();

        executionEnvironment.execute("wordcount stream process");

    }

}

运行起来之后，我们就可以开始发送 socket 请求过去。我们测试可以使用 netcat 工具。

在 linux 上安装好后，使用下面的命令：

nc -lk 7000

然后发送数据即可。

巴特西

Flink WordCount入门

单词统计（批处理）

单词统计（流数据）

最新文章

热门文章