以前也编译过,但是每次编译都忘记怎么做,然后都得重新找需要下载的文件。

编译文件:buildall.sh

如果想只编译前端可以这样运行:

buildall.sh -fe_only

编译时会去S3下载一些文件,由于在国外下载很慢,所以可以在本地开ss去下载好再上传到编译服务器上

那么会下载哪些东西呢?

编辑bin/bootstrap_toolchain.py

找到下面这几句话

def wget_and_unpack_package(download_path, file_name, destination, wget_no_clobber):
print "URL {0}".format(download_path)
print "Downloading {0} to {1}".format(file_name, destination)
# --no-clobber avoids downloading the file if a file with the name already exists sh.wget(download_path, directory_prefix=destination, no_clobber=wget_no_clobber)
print "Extracting {0}".format(file_name)
sh.tar(z=True, x=True, f=os.path.join(destination, file_name), directory=destination)
sh.rm(os.path.join(destination, file_name))

把后面4行注释掉,就不会去真正下载了:

def wget_and_unpack_package(download_path, file_name, destination, wget_no_clobber):
print "URL {0}".format(download_path)
print "Downloading {0} to {1}".format(file_name, destination)
# --no-clobber avoids downloading the file if a file with the name already exists
"""
sh.wget(download_path, directory_prefix=destination, no_clobber=wget_no_clobber)
print "Extracting {0}".format(file_name)
sh.tar(z=True, x=True, f=os.path.join(destination, file_name), directory=destination)
sh.rm(os.path.join(destination, file_name))
"""

然后找到这段话:

def bootstrap(toolchain_root, packages):
"""Downloads and unpacks each package in the list `packages` into `toolchain_root` if it
doesn't exist already.
"""
if not try_get_platform_release_label():
check_custom_toolchain(toolchain_root, packages)
return # Detect the compiler
compiler = "gcc-{0}".format(os.environ["IMPALA_GCC_VERSION"]) for p in packages:
pkg_name, pkg_version = unpack_name_and_version(p)
if check_for_existing_package(toolchain_root, pkg_name, pkg_version, compiler):
continue
if pkg_name != "kudu" or os.environ["KUDU_IS_SUPPORTED"] == "true":
download_package(toolchain_root, pkg_name, pkg_version, compiler)
else:
build_kudu_stub(toolchain_root, pkg_version, compiler)
write_version_file(toolchain_root, pkg_name, pkg_version, compiler,
get_platform_release_label())

把最后一句话注释掉:

def bootstrap(toolchain_root, packages):
"""Downloads and unpacks each package in the list `packages` into `toolchain_root` if it
doesn't exist already.
"""
if not try_get_platform_release_label():
check_custom_toolchain(toolchain_root, packages)
return # Detect the compiler
compiler = "gcc-{0}".format(os.environ["IMPALA_GCC_VERSION"]) for p in packages:
pkg_name, pkg_version = unpack_name_and_version(p)
if check_for_existing_package(toolchain_root, pkg_name, pkg_version, compiler):
continue
if pkg_name != "kudu" or os.environ["KUDU_IS_SUPPORTED"] == "true":
download_package(toolchain_root, pkg_name, pkg_version, compiler)
else:
build_kudu_stub(toolchain_root, pkg_version, compiler)
"""
write_version_file(toolchain_root, pkg_name, pkg_version, compiler,
get_platform_release_label())
"""

运行buildall.sh后屏幕就会打印出需要下载的东西,上传到toolchain文件夹就行。上传结束后再把刚才注释的代码恢复就好

加速编译前端

运行下面的命令,去掉测试,只编译前端代码

./buildall.sh -skiptests -fe_only

上面命令运行成功后,找到infra/python/deps/pip_download.py的下段代码

def download_package(pkg_name, pkg_version):
'''Download the required package. Sometimes the download can be flaky, so we use the
retry decorator.'''
pkg_type = 'sdist' # Don't download wheel archives for now
# This JSON endpoint is not provided by PyPI mirrors so we always need to get this
# from pypi.python.org.
pkg_info = json.loads(urlopen('https://pypi.python.org/pypi/%s/json' % pkg_name).read()) downloader = URLopener()
for pkg in pkg_info['releases'][pkg_version]:
if pkg['packagetype'] == pkg_type:
filename = pkg['filename']
expected_md5 = pkg['md5_digest']
if os.path.isfile(filename) and check_md5sum(filename, expected_md5):
print "File with matching md5sum already exists, skipping %s" % filename
return True
pkg_url = "{0}/packages/{1}".format(PYPI_MIRROR, pkg['path'])
print "Downloading %s from %s" % (filename, pkg_url)
downloader.retrieve(pkg_url, filename)
actual_md5 = md5(open(filename).read()).hexdigest()
if check_md5sum(filename, expected_md5):
return True
else:
print "MD5 mismatch in file %s." % filename
return False
print "Could not find archive to download for %s %s %s" % (
pkg_name, pkg_version, pkg_type)
sys.exit(1)

这段代码会打开url链接下载第三方软件,然后检查md5值,非常慢,所以注释掉整个代码,返回True:

def download_package(pkg_name, pkg_version):
return True
'''Download the required package. Sometimes the download can be flaky, so we use the
retry decorator.'''
"""
pkg_type = 'sdist' # Don't download wheel archives for now
# This JSON endpoint is not provided by PyPI mirrors so we always need to get this
# from pypi.python.org.
pkg_info = json.loads(urlopen('https://pypi.python.org/pypi/%s/json' % pkg_name).read()) downloader = URLopener()
for pkg in pkg_info['releases'][pkg_version]:
if pkg['packagetype'] == pkg_type:
filename = pkg['filename']
expected_md5 = pkg['md5_digest']
if os.path.isfile(filename) and check_md5sum(filename, expected_md5):
print "File with matching md5sum already exists, skipping %s" % filename
return True
pkg_url = "{0}/packages/{1}".format(PYPI_MIRROR, pkg['path'])
print "Downloading %s from %s" % (filename, pkg_url)
downloader.retrieve(pkg_url, filename)
actual_md5 = md5(open(filename).read()).hexdigest()
if check_md5sum(filename, expected_md5):
return True
else:
print "MD5 mismatch in file %s." % filename
return False
print "Could not find archive to download for %s %s %s" % (
pkg_name, pkg_version, pkg_type)
sys.exit(1)
"""

修改前端词法、语法解析源码

impala使用了jflex做词法解析,java_cup去做语法解析。

java_cup有个java_cup.runtime.Symbol类用来表示解析到的每个词,其中left属性代表词的行号,right属性代表词的列号,但是用行列来代表词在sql中的位置很不方便,我想要修改成可以获取当前词的开始、末尾在字符串的下标。

因此编辑fe/src/main/jflex/sql-scanner.flex文件,增加一个%char让jflex记录字符偏移变量到yychar

然后把newToken改成这样:

private ExtendSymbol newToken(int id, Object value) {
String text = yytext();
return new ExtendSymbol(id, yyline+1, yycolumn+1, value,
this.yychar, this.yychar + text.length(), text);
}

这样就通过SqlScanner拿到当前词在Reader中的位置了,另外我们希望Symbol还能提供位置信息,所以增加一个子类:

package java_cup.runtime;

import java_cup.runtime.Symbol;

public class ExtendSymbol extends Symbol {
public int start = -1;
public int end = -1;
public String text; public ExtendSymbol(int id, int left, int right, Object value,
int start, int end, String text) {
super(id, left, right, value);
this.start = start;
this.end = end;
this.text = text;
} public ExtendSymbol(int id, ExtendSymbol left, ExtendSymbol right, Object value) {
this(id, left.left, right.right, value, left.start, right.end, null);
} public ExtendSymbol(int id, ExtendSymbol left, ExtendSymbol right) {
this(id, left, right, null);
} public ExtendSymbol(int id, int left, int right, Object value) {
this(id, left, right, value, -1, -1, null);
} public ExtendSymbol(int id, Object o) {
this(id, -1, -1, o);
} public ExtendSymbol(int id, int left, int right) {
this(id, left, right, (Object)null);
} public ExtendSymbol(int sym_num) {
super(sym_num, -1);
} ExtendSymbol(int sym_num, int state) {
super(sym_num, state);
}
}

增加一个符号工厂类:

package java_cup.runtime;

import java_cup.runtime.Symbol;
import java_cup.runtime.SymbolFactory; public class ExtendSymbolFactory implements SymbolFactory {
@Override
public ExtendSymbol newSymbol(String name, int id, Symbol left, Symbol right, Object value) {
return new ExtendSymbol(id, (ExtendSymbol) left, (ExtendSymbol) right, value);
} @Override
public ExtendSymbol newSymbol(String name, int id, Symbol left, Symbol right) {
return new ExtendSymbol(id, (ExtendSymbol) left, (ExtendSymbol) right);
} @Override
public ExtendSymbol newSymbol(String name, int id, Object o) {
return new ExtendSymbol(id, o);
} @Override
public ExtendSymbol newSymbol(String name, int id) {
return new ExtendSymbol(id);
} @Override
public ExtendSymbol startSymbol(String name, int id, int state) {
return new ExtendSymbol(id, state);
}
}

为语法块增加位置信息就比较复杂了,主要给org/apache/impala/analysis中的类增加一个带位置信息和子语法块的类:

package org.apache.impala.analysis;

import java.util.List;

public class SyntaxBlock {
public int startPosition = -1;
public int endPosition = -1;
public List<SyntaxBlock> subBlocks; public SyntaxBlock() {
} public SyntaxBlock(int startPosition, int endPosition) {
this.startPosition = startPosition;
this.endPosition = endPosition;
} public SyntaxBlock(int startPosition, int endPosition, List<SyntaxBlock> subBlocks) {
this.startPosition = startPosition;
this.endPosition = endPosition;
this.subBlocks = subBlocks;
}
}

然后加上一个子类ObjectSyntaxBlock,用来存放String、Object、HashMap、ArrayList、Pair这些类型的语法块

package org.apache.impala.analysis;

import java.util.List;

public class ObjectSyntaxBlock<T> extends SyntaxBlock {
public T objectValue; public ObjectSyntaxBlock() {
} public ObjectSyntaxBlock(T objectValue) {
this.objectValue = objectValue;
} public ObjectSyntaxBlock(int startPosition, int endPosition, T objectValue) {
super(startPosition, endPosition);
this.objectValue = objectValue;
} public ObjectSyntaxBlock(int startPosition, int endPosition, List<SyntaxBlock> subBlocks, T objectValue) {
super(startPosition, endPosition, subBlocks);
this.objectValue = objectValue;
} public T getObjectValue() {
return objectValue;
}
}

最难的一步是修改sql-parse.cup文件,需要把所有语法信息都修改了,非常耗时间

和原版的区别是非终结符的类型如果是String、Object、HashMap、ArrayList、Pair、enum和非org.apache.impala.analysis包的类的话,需要用ObjectSyntaxBlock包装一层

比如

nonterminal List<UnionOperand> values_operand_list;
nonterminal TDescribeOutputStyle describe_output_style;

需要修改成

nonterminal ObjectSyntaxBlock<List<UnionOperand>> values_operand_list;
nonterminal ObjectSyntaxBlock<TDescribeOutputStyle> describe_output_style;

语法块的定义比如

table_ref ::=
dotted_path:path
{: RESULT = new TableRef(path, null); :}
| dotted_path:path alias_clause:alias
{: RESULT = new TableRef(path, alias); :}
| LPAREN query_stmt:query RPAREN alias_clause:alias
{: RESULT = new InlineViewRef(alias, query); :}
;

需要修改为:

table_ref ::=
dotted_path:path
{:
ExtendSymbol _0_symbol = (ExtendSymbol) CUP$SqlParser$stack.peek(); RESULT = new TableRef(path.objectValue, null);
RESULT.startPosition = _0_symbol.start;
RESULT.endPosition = _0_symbol.end;
RESULT.subBlocks = Lists.newArrayList(
(SyntaxBlock) _0_symbol.value
);
:}
| dotted_path:path alias_clause:alias
{:
ExtendSymbol _1_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 1);
ExtendSymbol _0_symbol = (ExtendSymbol) CUP$SqlParser$stack.peek(); RESULT = new TableRef(path.objectValue, alias.objectValue);
RESULT.startPosition = _1_symbol.start;
RESULT.endPosition = _0_symbol.end;
RESULT.subBlocks = Lists.newArrayList(
(SyntaxBlock) _1_symbol.value,
(SyntaxBlock) _0_symbol.value
);
:}
| LPAREN query_stmt:query RPAREN alias_clause:alias
{:
ExtendSymbol _3_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 3);
ExtendSymbol _2_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 2);
ExtendSymbol _1_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 1);
ExtendSymbol _0_symbol = (ExtendSymbol) CUP$SqlParser$stack.peek(); RESULT = new InlineViewRef(alias.objectValue, query);
RESULT.startPosition = _3_symbol.start;
RESULT.endPosition = _0_symbol.end;
RESULT.subBlocks = Lists.newArrayList(
(SyntaxBlock) _3_symbol.value,
(SyntaxBlock) _2_symbol.value,
(SyntaxBlock) _1_symbol.value,
(SyntaxBlock) _0_symbol.value
);
:}
;

像SelectStmt这种类,我们让他直接或间接继承SyntaxBlock,所以可以直接设置位置信息,不用包装

select_stmt ::=
select_clause:selectList
{:
RESULT = new SelectStmt(selectList, null, null, null, null, null, null);
:}
|
select_clause:selectList
from_clause:fromClause
where_clause:wherePredicate
group_by_clause:groupingExprs
having_clause:havingPredicate
opt_order_by_clause:orderByClause
opt_limit_offset_clause:limitOffsetClause
{:
RESULT = new SelectStmt(selectList, fromClause, wherePredicate, groupingExprs,
havingPredicate, orderByClause, limitOffsetClause);
:}
;

改成:

select_stmt ::=
select_clause:selectList
{:
ExtendSymbol _0_symbol = (ExtendSymbol) CUP$SqlParser$stack.peek(); RESULT = new SelectStmt(selectList, null, null, null, null, null, null);
RESULT.startPosition = _0_symbol.start;
RESULT.endPosition = _0_symbol.end;
RESULT.subBlocks = Lists.newArrayList(
(SyntaxBlock) _0_symbol.value
);
:}
|
select_clause:selectList
from_clause:fromClause
where_clause:wherePredicate
group_by_clause:groupingExprs
having_clause:havingPredicate
opt_order_by_clause:orderByClause
opt_limit_offset_clause:limitOffsetClause
{:
ExtendSymbol _6_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 6);
ExtendSymbol _5_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 5);
ExtendSymbol _4_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 4);
ExtendSymbol _3_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 3);
ExtendSymbol _2_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 2);
ExtendSymbol _1_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 1);
ExtendSymbol _0_symbol = (ExtendSymbol) CUP$SqlParser$stack.peek(); RESULT = new SelectStmt(selectList, fromClause, wherePredicate, groupingExprs.objectValue,
havingPredicate, orderByClause.objectValue, limitOffsetClause);
RESULT.startPosition = _6_symbol.start;
RESULT.endPosition = _0_symbol.end;
RESULT.subBlocks = Lists.newArrayList(
(SyntaxBlock) _6_symbol.value,
(SyntaxBlock) _5_symbol.value,
(SyntaxBlock) _4_symbol.value,
(SyntaxBlock) _3_symbol.value,
(SyntaxBlock) _2_symbol.value,
(SyntaxBlock) _1_symbol.value,
(SyntaxBlock) _0_symbol.value
);
:}
;

其中groupingExprs.objectValue是因为groupingExprsopt_order_by_clause语法块的对象,opt_order_by_clause是ObjectSyntaxBlock类型的,所以本来可以直接引用,现在需要加上objectValue才能访问到包装里面的对象

github地址:impala增加语法块位置库

最新文章

  1. 深入学习 celery
  2. 64位电脑上配置mysql-connector-odbc的方法
  3. Android蓝牙实例(和单片机蓝牙模块通信)
  4. My97DatePicker控件
  5. 优化LibreOffice如此简单
  6. expect语法
  7. iOS 网络编程:JSON解析
  8. 在C#里实现各种窗口切换特效,多达13种特效
  9. 从汇编来看c语言之变量
  10. LeeCode-Rotate Array
  11. Extract Datasets
  12. 数据结构录 之 单调队列&amp;单调栈。
  13. winform 适配high dpi
  14. noip2012
  15. hdoj:2031
  16. [ML学习笔记] XGBoost算法
  17. VS编程,编辑WPF过程中,点击设计器中界面某一控件,在XAML中高亮突出显示相应的控件代码的设置方法。
  18. Python学习笔记之逻辑回归
  19. tornado 路由、模板语言、session
  20. PLSQL Developer 配置Oralce11g连接 转

热门文章

  1. Elasticsearch的Groovy Script自定义评分检索
  2. js邮箱,汉字,数字 表单验证
  3. Github上值得关注的前端项目-转自好友trigkit4
  4. 洛谷 P2633 Count on a tree 主席树
  5. python3安装xadmin失败
  6. 15条JavaScript最佳实践【转】
  7. mybatis中 #跟$的区别
  8. redhat下搭建jdk+tomcat环境
  9. CodeForces 52B Right Triangles 矩阵上的计数
  10. 迷宫求解_数据结构c语言版