lucene版本升级到4.6.0以上之后使用ik分词器遇到的问题
2024-09-29 23:41:10
在将lucene core版本从4.5.1升级到4.7.0后,如下代码使用ik分词器报错
IKAnalyzer analyzer = new IKAnalyzer(true);
StringReader reader=new StringReader(line);
TokenStream ts=analyzer.tokenStream("", reader);
CharTermAttribute term=ts.getAttribute(CharTermAttribute.class);
while(ts.incrementToken()){
...
}
异常信息:
java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.
后来发现是lucene从4.6.0开始TokenStream使用方法更改的问题,在使用incrementToken方法前必须调用reset方法,详见api http://lucene.apache.org/core/4_6_0/core/index.html
The workflow of the new TokenStream
API is as follows:
- Instantiation of
TokenStream
/TokenFilter
s which add/get attributes to/from theAttributeSource
. - The consumer calls
reset()
. - The consumer retrieves attributes from the stream and stores local references to all attributes it wants to access.
- The consumer calls
incrementToken()
until it returns false consuming the attributes after each call. - The consumer calls
end()
so that any end-of-stream operations can be performed. - The consumer calls
close()
to release any resource when finished using theTokenStream
.
更改代码为如下运行正常
IKAnalyzer analyzer = new IKAnalyzer(true);
StringReader reader=new StringReader(line);
TokenStream ts=analyzer.tokenStream("", reader);
CharTermAttribute term=ts.getAttribute(CharTermAttribute.class);
ts.reset();
while(ts.incrementToken()){
...
}
最新文章
- linux系统下使用xampp 丢失mysql root密码【xampp的初始密码为空】
- iOS开发之第三方库的学习--hpple的使用
- CentOS7使用阿里云镜像安装Mongodb
- HTML5外包
- C#连接数据库的四种方法
- 分布式缓存系统热点key解决方案
- 使用Flexible 实现手淘H5 页面的终端适配学习
- W​i​n​D​B​G​调​试​技​巧
- 数据存储(两)--SAX发动机XML记忆(附Demo)
- 微信公众号Unauthorized API function
- linux中查看和开放端口
- RSA算法原理——(3)RSA加解密过程及公式论证
- Kafka数据迁移
- css格式比较及选择器类型总结
- python csv与字典操作
- HTML 滚动条样式修改
- perl debug
- EM算法理解
- VBA中字符串连接/字符串拼接中“&;”和“+”的区别
- 像烟瘾一样的Adobe Flash,真的戒不掉吗?