转自:http://www.lai18.com/content/7084969.html

Facet说明

我们在浏览网站的时候,经常会遇到按某一类条件查询的情况,这种情况尤以电商网站最多,以天猫商城为例,我们选择某一个品牌,系统会将该品牌对应的商品展示出来,效果图如下:

如上图,我们关注的是品牌,选购热点等方面,对于类似的功能我们用lucene的term查询当然可以,但是在数据量特别大的情况下还用普通查询来实现显然会因为FSDirectory.open等耗时的操作造成查询效率的低下,同时普通查询是全部document都扫描一遍,这样显然造成了查询效率低;

lucene提供了facet查询用于对同一类的document进行聚类化,这样在查询的时候先关注某一个方面,这种显然缩小了查询范围,进而提升了查询效率;

facet模块提供了多个用于处理facet的统计和值处理的方法;

要实现facet的功能,我们需要了解facetField,FacetField定义了dim和此field对应的path,需要特别注意的是我们在做facetField索引的时候,需要事先调用FacetsConfig.build(Document);

FacetField的indexOptions设置为了DOCS_AND_FREQS_AND_POSITIONS的,即既索引又统计出现的频次和出现的位置,这样做主要是为了方便查询和统计;

相应的在存储的时候我们需要利用FacetsConfig和DirectoryTaxonomyWriter;

DirectoryTaxonomyWriter用来利用Directory来存储Taxono信息到硬盘;

DirectoryTaxonomyWriter的构造器如下:

public DirectoryTaxonomyWriter(Directory directory, OpenMode openMode,
TaxonomyWriterCache cache) throws IOException { dir = directory;
IndexWriterConfig config = createIndexWriterConfig(openMode);
indexWriter = openIndexWriter(dir, config); // verify (to some extent) that merge policy in effect would preserve category docids
assert !(indexWriter.getConfig().getMergePolicy() instanceof TieredMergePolicy) :
"for preserving category docids, merging none-adjacent segments is not allowed"; // after we opened the writer, and the index is locked, it's safe to check
// the commit data and read the index epoch
openMode = config.getOpenMode();
if (!DirectoryReader.indexExists(directory)) {
indexEpoch = 1;
} else {
String epochStr = null;
Map<String, String> commitData = readCommitData(directory);
if (commitData != null) {
epochStr = commitData.get(INDEX_EPOCH);
}
// no commit data, or no epoch in it means an old taxonomy, so set its epoch to 1, for lack
// of a better value.
indexEpoch = epochStr == null ? 1 : Long.parseLong(epochStr, 16);
} if (openMode == OpenMode.CREATE) {
++indexEpoch;
} FieldType ft = new FieldType(TextField.TYPE_NOT_STORED);
ft.setOmitNorms(true);
parentStreamField = new Field(Consts.FIELD_PAYLOADS, parentStream, ft);
fullPathField = new StringField(Consts.FULL, "", Field.Store.YES); nextID = indexWriter.maxDoc(); if (cache == null) {
cache = defaultTaxonomyWriterCache();
}
this.cache = cache; if (nextID == 0) {
cacheIsComplete = true;
// Make sure that the taxonomy always contain the root category
// with category id 0.
addCategory(new FacetLabel());
} else {
// There are some categories on the disk, which we have not yet
// read into the cache, and therefore the cache is incomplete.
// We choose not to read all the categories into the cache now,
// to avoid terrible performance when a taxonomy index is opened
// to add just a single category. We will do it later, after we
// notice a few cache misses.
cacheIsComplete = false;
}
}

由上述代码可知,DirectoryTaxonomyWriter先打开一个IndexWriter,在确保indexWriter打开和locked的前提下,读取directory对应的segments中需要提交的内容,如果读取到的内容为空,说明是上次的内容,设置indexEpoch为1,接着对cache进行设置;判断directory中是否还包含有document,如果有设置cacheIsComplete为false,反之为true;

最新文章

  1. word20161225
  2. IOS-UIDynamic
  3. OpenGL快问快答
  4. c#知识点总结
  5. 啊哈C!思考快你一步——用编程轻松提升逻辑力
  6. Delphi 2 Unleashed (一) 介绍
  7. linux安全加固(2)
  8. Hbase原理
  9. c#操作xml增删改查
  10. Codeforces Gym 100733A Shit&#225;lia 计算几何
  11. hdu4714 Tree2cycle 把树剪成链
  12. 初识IO流之小型资源管理器
  13. GCC 编绎选项 转
  14. python进阶-------进程线程(二)
  15. Canvas-图片旋转
  16. 冲刺No.4
  17. pyquery 学习
  18. P3321 [SDOI2015]序列统计
  19. rt-thread中动态内存分配之小内存管理模块方法的一点理解
  20. Oracle12c 性能优化攻略:攻略1-3: 匹配表类型与业务需求

热门文章

  1. python并发之IO模型(一)
  2. IO多路复用、协程
  3. php 使用sendmail发送邮件
  4. Activiti 5.16 流程图高亮追踪 中文乱码问题解决方法
  5. ubuntu安装java jdk
  6. NavigationDrawer和NavigationView-Android M新控件
  7. Python框架之Tornado(概述)
  8. Java底层代码实现多文件读取和写入
  9. OpenSSL for Android
  10. collectionView的案例