public final class Lucene54DocValuesFormat
extends DocValuesFormat
Lucene 5.4 DocValues format.

Encodes the five per-document value types (Numeric,Binary,Sorted,SortedSet,SortedNumeric) with these strategies:

NUMERIC:

  • Delta-compressed: per-document integers written as deltas from the minimum value, compressed with bitpacking. For more information, see DirectWriter.
  • Table-compressed: when the number of unique values is very small (< 256), and when there are unused "gaps" in the range of values used (such as SmallFloat), a lookup table is written instead. Each per-document entry is instead the ordinal to this table, and those ordinals are compressed with bitpacking (DirectWriter).
  • GCD-compressed: when all numbers share a common divisor, such as dates, the greatest common denominator (GCD) is computed, and quotients are stored using Delta-compressed Numerics.
  • Monotonic-compressed: when all numbers are monotonically increasing offsets, they are written as blocks of bitpacked integers, encoding the deviation from the expected delta.
  • Const-compressed: when there is only one possible non-missing value, only the missing bitset is encoded.
  • Sparse-compressed: only documents with a value are stored, and lookups are performed using binary search.

BINARY:

  • Fixed-width Binary: one large concatenated byte[] is written, along with the fixed length. Each document's value can be addressed directly with multiplication (docID * length).
  • Variable-width Binary: one large concatenated byte[] is written, along with end addresses for each document. The addresses are written as Monotonic-compressed numerics.
  • Prefix-compressed Binary: values are written in chunks of 16, with the first value written completely and other values sharing prefixes. chunk addresses are written as Monotonic-compressed numerics. A reverse lookup index is written from a portion of every 1024th term.

SORTED:

  • Sorted: a mapping of ordinals to deduplicated terms is written as Binary, along with the per-document ordinals written using one of the numeric strategies above.

SORTED_SET:

  • Single: if all documents have 0 or 1 value, then data are written like SORTED.
  • SortedSet table: when there are few unique sets of values (< 256) then each set is assigned an id, a lookup table is written and the mapping from document to set id is written using the numeric strategies above.
  • SortedSet: a mapping of ordinals to deduplicated terms is written as Binary, an ordinal list and per-document index into this list are written using the numeric strategies above.

SORTED_NUMERIC:

  • Single: if all documents have 0 or 1 value, then data are written like NUMERIC.
  • SortedSet table: when there are few unique sets of values (< 256) then each set is assigned an id, a lookup table is written and the mapping from document to set id is written using the numeric strategies above.
  • SortedNumeric: a value list and per-document index into this list are written using the numeric strategies above.

Files:

  1. .dvd: DocValues data
  2. .dvm: DocValues metadata

转自:http://lucene.apache.org/core/6_4_2/core/org/apache/lucene/codecs/lucene54/Lucene54DocValuesFormat.html

可以看到占用空间非常小!!!

du -sm elasticsearch/nodes/0/indices/hec_test2/0/index/*
299 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fdt
1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fdx
1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fnm
148 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.doc
130 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.tim
5 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.tip
1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene54_0.dvd
1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene54_0.dvm
1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.si
1 elasticsearch/nodes/0/indices/hec_test2/0/index/segments_7
0 elasticsearch/nodes/0/indices/hec_test2/0/index/write.lock

最新文章

  1. guava学习--Objects
  2. 如何在真机上调试Android应用程序(图文详解)(zz)
  3. 关于android listview去掉分割线
  4. UIWebView 加载网页、文件、 html-b
  5. XSS解决方案系列之四:关于编码
  6. #python-dateutil下载地址
  7. Android 增加(键盘)按键
  8. [0] JDK与JRE的区别
  9. 如何查看安装的sql server是什么版本
  10. 【学习笔记】深入理解超时调用(setTimeout)和间歇调用(setInterval)
  11. html5shiv.js和respond.min.js
  12. Android调试错误-No resource identifier found for attribute &#39;showAsAction&#39;
  13. SQLServer之创建LOGON触发器
  14. 通信协议:HTTP、TCP、UDP
  15. 剑指offer(54)字符流中第一个不重复的数字
  16. redis 频率限制
  17. 我是这样手写 Spring 的(麻雀虽小五脏俱全)
  18. Ubuntu 14.04 安装 SteamOS 会话
  19. Python CSV Reader/Writer 例子--转载
  20. win7卸载打印机驱动

热门文章

  1. mybatis自动映射和手动映射
  2. mongodb 的创建和使用
  3. Laya 分帧加载优化
  4. [luoguP2577] [ZJOI2005]午餐(DP)
  5. iOS React Native 环境的搭建
  6. ***iOS学习之Table View的简单使用和DEMO示例(共Plain普通+Grouped分组两种)
  7. T1365 浴火银河星际跳跃 codevs
  8. 【Nginx】I/O多路转接之select、poll、epoll
  9. Mockito的简单使用方法演示样例
  10. bug集合及其解决方法