Writing UDTF's

GenericUDTF Interface

A custom UDTF can be created by extending the GenericUDTF abstract class and then implementing the initializeprocess, and possibly close methods. The initialize method is called by Hive to notify the UDTF the argument types to expect. The UDTF must then return an object inspector corresponding to the row objects that the UDTF will generate. Once initialize() has been called, Hive will give rows to the UDTF using the process() method. While in process(), the UDTF can produce and forward rows to other operators by calling forward(). Lastly, Hive will call the close() method when all the rows have passed to the UDTF.

UDTF Example:

 
 
 
 
/**
 * GenericUDTFCount2 outputs the number of rows seen, twice. It's output twice
 * to test outputting of rows on close with lateral view.
 *
 */
public class GenericUDTFCount2 extends GenericUDTF {
 
  Integer count = Integer.valueOf(0);
  Object forwardObj[] = new Object[1];
 
  @Override
  public void close() throws HiveException {
    forwardObj[0] = count;
    forward(forwardObj);
    forward(forwardObj);
  }
 
  @Override
  public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {
    ArrayList<String> fieldNames = new ArrayList<String>();
    ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
    fieldNames.add("col1");
        fieldOIs);
  }
 
  @Override
  public void process(Object[] args) throws HiveException {
    count = Integer.valueOf(count.intValue() + 1);
  }
 
}

For reference, here is the abstract class:

 
 
/**
 * A Generic User-defined Table Generating Function (UDTF)
 *
 * Generates a variable number of output rows for a single input row. Useful for
 * explode(array)...
 */
 
public abstract class GenericUDTF {
  Collector collector = null;
 
  /**
 * Initialize this GenericUDTF. This will be called only once per instance.
 *
 * @param args
 *          An array of ObjectInspectors for the arguments
 * @return A StructObjectInspector for output. The output struct represents a
 *         row of the table where the fields of the stuct are the columns. The
 *         field names are unimportant as they will be overridden by user
 *         supplied column aliases.
   */
  public abstract StructObjectInspector initialize(ObjectInspector[] argOIs)
      throws UDFArgumentException;
 
  /**
 * Give a set of arguments for the UDTF to process.
 *
 * @param o
 *          object array of arguments
   */
  public abstract void process(Object[] args) throws HiveException;
 
  /**
 * Called to notify the UDTF that there are no more rows to process.
 * Clean up code or additional forward() calls can be made here.
   */
  public abstract void close() throws HiveException;
 
  /**
 * Associates a collector with this UDTF. Can't be specified in the
 * constructor as the UDTF may be initialized before the collector has been
 * constructed.
 *
 * @param collector
   */
  public final void setCollector(Collector collector) {
    this.collector = collector;
  }
 
  /**
 * Passes an output row to the collector.
 *
 * @param o
 * @throws HiveException
   */
  protected final void forward(Object o) throws HiveException {
  }
 
}
 

最新文章

  1. iOS拍照上传后,在web端显示旋转 Swift+OC版解决方案
  2. HTML 全局属性_02
  3. VS2012调试时无法启动程序和拒绝访问问题汇总
  4. poj 3335(半平面交)
  5. C# 调用配置文件SQL语句 真2B!
  6. keil中如何得知所编译程序所占空间大小?
  7. awk 的一个奇怪异常
  8. C++ 动态分配类对象
  9. 两种解决Qt5显示中文乱码的方法(使用QStringLiteral和#pragma execution_character_set(&quot;utf-8&quot;)两种方法)
  10. HTML5 jQuery+FormData 异步上传文件,带进度条
  11. laravel项目使用twemproxy部署redis集群
  12. 《Systems Performance》阅读笔记及收获
  13. 关于asp.net中链接数据库的问题
  14. 八大排序算法详解(动图演示 思路分析 实例代码java 复杂度分析 适用场景)
  15. 聊聊动态语言那些事(Python)
  16. 基于反射实现实体DTO映射
  17. 【Unity】用代码给按钮动态添加点击事件
  18. [Jobdu] 题目1455:珍惜现在,感恩生活
  19. Linux下模拟多线程的并发并发shell脚本
  20. Pandas级联

热门文章

  1. Android--Tween补间动画
  2. 【Android】打开本地的html文件
  3. 当你「ping 一下」的时候,你知道它背后的逻辑吗?
  4. Python:Selenium 1:浏览器驱动
  5. dnSpy 强大的.Net反编译软件
  6. RabbitMQ消息队列(五)-安装amqp扩展并订阅/发布Demo(.Net Core版)
  7. JDK源码分析(4)之 LinkedList 相关
  8. 滚动 docker 中的 nginx 日志
  9. [转]koa-router使用指南
  10. Java消息中间件----ActiveMQ入门①