在hadoop中,由于一个Task可能由多个节点同时运行,当每个节点完成Task时,一个Task可能会出现多个结果,为了避免这种情况的出现,使用了OutPutCommitter。所以OutPutCommitter主要的功能是在作业或任务完成时,确保结果的正确提交。OutPutCommitter的主要功能是:

1.在作业初始化被调用;例,在初始化Job时,为Job创建临时的输出目录

2.在作业完成时清理后续工作;例,在Job完成后删除临时的输出目录

3.设置任务的临时输出。在Job的临时目录下创建一个side-effect file。

4.检查任务是否需要被提交。如果任务之前结果已经被提交,避免了任务重复提交。

5.提交任务的结果。

6.放弃提交任务。

下面看看OutPutCommitter的源代码

  1 public abstract class OutputCommitter {
2 /**
3 * For the framework to setup the job output during initialization. This is
4 * called from the application master process for the entire job. This will be
5 * called multiple times, once per job attempt.
6 * 在初始化事设置Job的输出。这个方法主要是被整个Job的master调用。它是在每个Job时被调用。
7 * @param jobContext Context of the job whose output is being written.
8 * @throws IOException if temporary output could not be created
9 */
10 public abstract void setupJob(JobContext jobContext) throws IOException;
11
12 /**
13 * For cleaning up the job's output after job completion. This is called
14 * from the application master process for the entire job. This may be called
15 * multiple times.
16 * 在工作完成后清理Job的输出。这个方法主要是被整个Job的master调用。也可能被多次调用。该方法已经不再使用。
17 * 已经被commitJob和commitJob代替。
18 * @param jobContext Context of the job whose output is being written.
19 * @throws IOException
20 * @deprecated Use {@link #commitJob(JobContext)} and
21 * {@link #commitJob(JobContext, JobStatus.State)} instead.
22 */
23 @Deprecated
24 public void cleanupJob(JobContext jobContext) throws IOException { }
25
26 /**
27 * For committing job's output after successful job completion. Note that this
28 * is invoked for jobs with final runstate as SUCCESSFUL. This is called
29 * from the application master process for the entire job. This is guaranteed
30 * to only be called once. If it throws an exception the entire job will
31 * fail.
32 * 当Job成功完成时提交所有Job的输出。这个通过调用Job的最终的状态为SUCCESSFUL,
33 * 该方法仅仅被整个Job的master调用。它仅能被调用一次。
34 * @param jobContext Context of the job whose output is being written.
35 * @throws IOException
36 */
37 public void commitJob(JobContext jobContext) throws IOException {
38 cleanupJob(jobContext);
39 }
40
41
42 /**
43 * For aborting an unsuccessful job's output. Note that this is invoked for
44 * jobs with final runstate as {@link JobStatus.State#FAILED} or
45 * {@link JobStatus.State#KILLED}. This is called from the application
46 * master process for the entire job. This may be called multiple times.
47 * 中止一个不成功作业的输出。该方法需要调用查看Job的最终的运行状态(Failed或Killed),
48 * 该方法也是被Master多次调用。
49 * @param jobContext Context of the job whose output is being written.
50 * @param state final runstate of the job
51 * @throws IOException
52 */
53 public void abortJob(JobContext jobContext, JobStatus.State state)
54 throws IOException {
55 cleanupJob(jobContext);
56 }
57
58 /**
59 * Sets up output for the task. This is called from each individual task's
60 * process that will output to HDFS, and it is called just for that task. This
61 * may be called multiple times for the same task, but for different task
62 * attempts.
63 * 设置任务的输出。每个单一的Task所调用该方法将结果输出到HDFS上,它可以被同一个Task多次调用。
64 * @param taskContext Context of the task whose output is being written.
65 * @throws IOException
66 */
67 public abstract void setupTask(TaskAttemptContext taskContext)
68 throws IOException;
69
70 /**
71 * Check whether task needs a commit. This is called from each individual
72 * task's process that will output to HDFS, and it is called just for that
73 * task.
74 * 检查任务是否需要被提交。
75 * @param taskContext
76 * @return true/false
77 * @throws IOException
78 */
79 public abstract boolean needsTaskCommit(TaskAttemptContext taskContext)
80 throws IOException;
81
82 /**
83 * To promote the task's temporary output to final output location.
84 * If {@link #needsTaskCommit(TaskAttemptContext)} returns true and this
85 * task is the task that the AM determines finished first, this method
86 * is called to commit an individual task's output. This is to mark
87 * that tasks output as complete, as {@link #commitJob(JobContext)} will
88 * also be called later on if the entire job finished successfully. This
89 * is called from a task's process. This may be called multiple times for the
90 * same task, but different task attempts. It should be very rare for this to
91 * be called multiple times and requires odd networking failures to make this
92 * happen. In the future the Hadoop framework may eliminate this race.
93 *
94 * @param taskContext Context of the task whose output is being written.
95 * @throws IOException if commit is not successful.
96 */
97 public abstract void commitTask(TaskAttemptContext taskContext)
98 throws IOException;
99
100 /**
101 * Discard the task output. This is called from a task's process to clean
102 * up a single task's output that can not yet been committed. This may be
103 * called multiple times for the same task, but for different task attempts.
104 * 放弃Task的结果的输出。
105 * @param taskContext
106 * @throws IOException
107 */
108 public abstract void abortTask(TaskAttemptContext taskContext)
109 throws IOException;
110
111 /**
112 * Is task output recovery supported for restarting jobs?
113 *
114 * If task output recovery is supported, job restart can be done more
115 * efficiently.
116 *
117 * @return <code>true</code> if task output recovery is supported,
118 * <code>false</code> otherwise
119 * @see #recoverTask(TaskAttemptContext)
120 */
121 public boolean isRecoverySupported() {
122 return false;
123 }
124
125 /**
126 * Recover the task output.
127 *
128 * The retry-count for the job will be passed via the
129 * {@link MRJobConfig#APPLICATION_ATTEMPT_ID} key in
130 * {@link TaskAttemptContext#getConfiguration()} for the
131 * <code>OutputCommitter</code>. This is called from the application master
132 * process, but it is called individually for each task.
133 *
134 * If an exception is thrown the task will be attempted again.
135 *
136 * This may be called multiple times for the same task. But from different
137 * application attempts.
138 *
139 * @param taskContext Context of the task whose output is being recovered
140 * @throws IOException
141 */
142 public void recoverTask(TaskAttemptContext taskContext)
143 throws IOException
144 {}
145 }

最新文章

  1. c# 中int.ToString()的格式化的示例
  2. android之数据存储之SQLite
  3. 趣味算法:字符串反转的N种方法(转)
  4. 【hihoCoder】1049.后序遍历
  5. 如何管理linux开机自启服务
  6. C++网络编程 Java网络编程
  7. 公钥、私钥、CA认证、数字签名、U盾
  8. YEdit
  9. android Shader类简介_渲染图像示例
  10. ORA-12505, TNS:listener does not currently know of SID given in connect descriptor (二)
  11. 移植Oracle procedure 到 postgresql
  12. [C语言 - 6] static &amp; extern
  13. Linux系统下查看某文件修改的时间戳
  14. kobject_create_and_add
  15. 微信小程序-weui实例代码提取
  16. Linux(Ubuntu)使用日记------ssh远程登录腾讯云
  17. Redis 常用操作命令,非常详细!
  18. Django 学习第十二天——Auth 系统
  19. CSS grid layout
  20. MYSQL 1093 之You can&#39;t specify target table for update in FROM clause解决办法

热门文章

  1. hdu 1195 Open the Lock(广搜,简单)
  2. Javascript操作表格隔行变色
  3. C# 构造函数的使用方法
  4. POJ2485Highways
  5. 欧拉工程第63题:Powerful digit counts
  6. [转]C++常见内存错误汇总
  7. Debug过程中的mock (及display窗口的使用)
  8. 50. Pow(x, n)
  9. js判断浏览器类型 js判断ie6不执行
  10. DNSget Ip