Watermartks是通过additional的时间戳来控制窗口激活的时间,allowedLateness来控制窗口的销毁时间。

  注: 因为此特性包括官方文档在1.3~1.5版本均未做改变,所以此处使用1.5版的文档    
 
在EventTime的情况下,       
 
1. 一条记录的事件时间来控制此条记录属于哪一个窗口,Watermarks来控制这个窗口什么时候激活。
   
2. 假如一个窗口时间为00:00:00~00:00:05,Watermarks为5秒,那么当flink收到事件事件为00:00:10秒的数据时,即Watermarks到达00:00:05,激活这个窗口。
 
 
3. Watermarks激活窗口的方式,官方文档推荐为复写AssignerWithPeriodicWatermarks,与我们当前项目实现方式一致[https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/event_timestamps_watermarks.html]
 
4. 或者也可以使用我们项目中已经用到的在env级别下的config中设置watermark的方式
 
     env.getConfig().setAutoWatermarkInterval(applConfig.getWatermarkInterval());
 
 
当窗口被激活且运行完毕以后,此时这个窗口不一定被销毁,窗口状态有可能会被继续保持,这一点取决于allowedLateness
 
 
 
 
In a nutshell, a window is created as soon as the first element that should belong to this window arrives, and the window is completely removed when the time (event or processing time) passes its end timestamp plus the user-specified allowed lateness (see Allowed Lateness). Flink guarantees removal only for time-based windows and not for other types, e.g. global windows (see Window Assigners). For example, with an event-time-based windowing strategy that creates non-overlapping (or tumbling) windows every 5 minutes and has an allowed lateness of 1 min, Flink will create a new window for the interval between 12:00 and 12:05 when the first element with a timestamp that falls into this interval arrives, and it will remove it when the watermark passes the 12:06 timestamp.
 
In addition, each window will have a Trigger (see Triggers) and a function (ProcessWindowFunction, ReduceFunction, AggregateFunction or FoldFunction) (see Window Functions) attached to it. The function will contain the computation to be applied to the contents of the window, while the Trigger specifies the conditions under which the window is considered ready for the function to be applied. A triggering policy might be something like “when the number of elements in the window is more than 4”, or “when the watermark passes the end of the window”. A trigger can also decide to purge a window’s contents any time between its creation and removal. Purging in this case only refers to the elements in the window, and not the window metadata. This means that new data can still be added to that window.
 
Apart from the above, you can specify an Evictor (see Evictors) which will be able to remove elements from the window after the trigger fires and before and/or after the function is applied.[https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/operators/windows.html#window-lifecycle]
 
       
1. 假如设置allowedLateness为60秒,那么窗口的状态会一直保持到事件时间为00:01:05的数据到达,或者如果最后一条数据早于00:01:05秒,则等到最后一条数据到达后再等待此数据于00:01:05的差值时间。    
 
2. 那么在窗口被销毁前,可以通过一些方式再次激活。注意,allowedLateness只能控制窗口销毁行为,并不能控制窗口再次激活的行为,这是独立的两部分行为。
 
3. 官方文档推荐的方式为Getting late data as a side output,可以单独获得再次被激活的窗口流https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/operators/windows.html#getting-late-data-as-a-side-output
目前不确定原始流内是否也包含了再次被激活的窗口数据,待测试,从代码上看应该也包含在内。    
   已确认,原始流内窗口也会被重新激活一次
 
final OutputTag<T> lateOutputTag = new OutputTag<T>("late-data"){};
DataStream<T> input = ...;
SingleOutputStreamOperator<T> result = input
    .keyBy(<key selector>)
    .window(<window assigner>)
    .allowedLateness(<time>)
    .sideOutputLateData(lateOutputTag)
    .<windowed transformation>(<window function>);
DataStream<T> lateStream = result.getSideOutput(lateOutputTag);
 
 
4. 或者复写Triggers[https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/operators/windows.html#triggers]

最新文章

  1. angular view之间的数据传递
  2. jquery 建议编辑器
  3. [算法导论]红黑树实现(插入和删除) @ Python
  4. 19SpringMvc_在业务控制方法中收集List集合中包含JavaBean参数
  5. ios添加方法快捷方式
  6. 三 GPU 并行编程的运算架构
  7. [Hive - LanguageManual ] Windowing and Analytics Functions (待)
  8. Python异常处理体系
  9. MYsql数据库ERROR总结
  10. java split函数应该注意的问题
  11. Spark sql ---JSON
  12. Jquery判断Checkbox是否选中三种方法
  13. macOS上实现Qt应用程序做文件关联打开
  14. blade 已开源
  15. HDU 1088(文本处理 **)
  16. dispatchers 设置
  17. linux代码笔记
  18. android_serialport_api hacking
  19. .NET Core MemoryCache缓存获取全部缓存键
  20. CentOS7下 Python2.7.5升级为Python2.7.13

热门文章

  1. 自己动手写HashMap
  2. 并发模型之Master-Worker设计模式
  3. python中字典,没键加键,有键操作其键对应的值,的思想
  4. WinForm实现Rabbitmq官网6个案例-Work Queues
  5. LeetCode 533----Lonely Pixel II
  6. Ubuntu 批量修改图片大小
  7. 对View的onMeasure()方法的进一步研究
  8. vue2.0中的计算属性
  9. Ubuntu更换硬盘
  10. Ddos 反射性防护 simple