目的

研究聚合查询的BUCKETS桶·到底是如何计算?

PS:es版本为7.8.1

Bucket概念

关于es聚合查询,官方介绍,可以参考 es聚合查询-bucket

有道翻译:

桶聚合不像指标聚合那样计算字段的指标,相反,它们创建文档的桶。每个桶都与一个标准相关联(取决于聚合类型),该标准确定当前上下文中的文档是否“属于”它。换句话说,存储桶有效地定义了文档集。除了存储桶本身,存储桶聚合还计算并返回“落入”每个存储桶的文档数量。

与度量聚合相反,桶聚合可以容纳子聚合。这些子聚合将为它们的“父”桶聚合所创建的桶聚合。

有不同的桶聚合器,每个都有不同的“桶”策略。有的定义单个桶,有的定义固定数量的多个桶,还有的在聚合过程中动态创建桶。

备注:单个响应中允许的最大桶数受名为search.max_buckets的动态集群设置限制。它默认为10,000,尝试返回超过限制的请求将失败并出现异常。

search.max_buckets

官网看下search.max_buckets这个参数:

有道翻译:

search.max_buckets
(Dynamic, integer)单个响应中允许的最大聚合桶数。默认值为10000。 Requests that attempt to return more than this limit will return an error.
试图返回超过此限制的请求将返回错误。

缘起

在一次排查问题中,遇到如下报错日志:

trying to create too many buckets. must be less than or equal to: [10000] but was [10001].

关于以上问题的分析以及原因可参看我的这篇实战分析博文进行了解:trying to create too many buckets,本篇博文,我主要是要来验证一下search.max_buckets这个配置项的计算桶的个数究竟是如何进行统计算桶数的。

数据准备

1、创建测试索引库(PUT请求)

注意:此处建库有一定数据倾向性,多数字段mapping我设置了字段存储类型为keyword类型,是为了后面方便测试聚合操作,原因是keyword类型的数据可以满足类似名称、类别、状态码、邮政编码和标签等数据的要求,不进行分词,常常被用来过滤、排序和聚合。

如下:我构建一个用于测试聚合分桶查询的手机信息索引库,用于演示我下面的操作使用。

localhost:9200/phones_test_bucket
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"price": {
"type": "long"
},
"color": {
"type": "keyword"
},
"size": {
"type": "long"
},
"category": {
"type": "keyword"
},
"label": {
"type": "keyword"
},
"release_date": {
"type": "date"
}
}
}
} ===返回===
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "phones_test_bucket"
}

2、添加模拟数据

下面我将根据手机名称,颜色,类别进行聚合分桶查询,然后通过更改search.max_buckets的配置参数来验证分桶参数的取值关系。

localhost:9200/phones_test_bucket/_bulk
{"index":{}}
{"name":"小米","price":3400,"color":"白色","size":6.21,"category":"标准版","label":"性价比1","release_date":"2023-02-06"}
{"index":{}}
{"name":"小米","price":3400,"color":"白色","size":6.21,"category":"升级版","label":"性价比2","release_date":"2023-03-06"}
{"index":{}}
{"name":"小米","price":2400,"color":"黑色","size":6.21,"category":"升级版","label":"性价比3","release_date":"2023-02-06"}
{"index":{}}
{"name":"小米","price":3400,"color":"黑色","size":6.21,"category":"标准版","label":"性价比4","release_date":"2023-03-06"}
{"index":{}}
{"name":"苹果","price":2400,"color":"远峰蓝色","size":6.21,"category":"标准版","label":"流畅","release_date":"2023-02-06"}
{"index":{}}
{"name":"华为","price":5200,"color":"白色","size":6.21,"category":"标准版","label":"高端1","release_date":"2023-03-06"}
{"index":{}}
{"name":"华为","price":5200,"color":"黑色","size":6.21,"category":"标准版","label":"高端2","release_date":"2023-04-06"}
{"index":{}}
{"name":"华为","price":5900,"color":"黑色","size":6.21,"category":"升级版","label":"高端3","release_date":"2023-05-06"}
{"index":{}}
{"name":"华为","price":5900,"color":"白色","size":6.21,"category":"升级版","label":"高端4","release_date":"2023-05-06"}

3、分桶参数设置

在开始测试之前,我们需要关注下search.max_buckets这个参数的设置API,在一开始我就截图了,官网对这个参数说明的默认值是10000(我的es版本是7.8.1),截至我写这篇博文时,es最新版本已经更新到8.6,感兴趣可以去官网看看,8.6版本分桶参数说明了,此参数的默认值也变更了,变更为65536。

修改es分桶最大配置(PUT请求)
http://127.0.0.1:9200/_cluster/settings
{
"persistent": {
"search.max_buckets": 2
}
} ===返回===
{
"acknowledged": true,
"persistent": {
"search": {
"max_buckets": "2"
}
},
"transient": {}
}
修改查看分桶最大配置(GET请求)
http://127.0.0.1:9200/_cluster/settings
//无请求参数
====返回====
{
"persistent": {
"search": {
"max_buckets": "2"
}
},
"transient": {}
}

4、测试

1、第一组测试

单字段分组-最大分桶2-结果失败

http://127.0.0.1:9200/phones_test_bucket/_search
// 第一组数据,"max_buckets": "2"的情况下,分组失败
{
"size":0,
"aggs":{
"group_by_name":{
"terms":{
"field":"name" } }
}
}
===返回====
{
"error": {
"root_cause": [
{
"type": "too_many_buckets_exception",
"reason": "Trying to create too many buckets. Must be less than or equal to: [2] but was [3]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
"max_buckets": 2
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "phones_test_bucket",
"node": "UuMcBk37TNWHjY4hVtzyVA",
"reason": {
"type": "too_many_buckets_exception",
"reason": "Trying to create too many buckets. Must be less than or equal to: [2] but was [3]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
"max_buckets": 2
}
}
]
},
"status": 503
}

单字段分组-最大分桶数3-结果成功

http://127.0.0.1:9200/phones_test_bucket/_search
// 第一组数据,"max_buckets": "3"的情况下,分组成功
{
"size":0,
"aggs":{
"group_by_name":{
"terms":{
"field":"name" } }
}
}
===返回结果===
{
"took": 32,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 9,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"group_by_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "华为",
"doc_count": 4
},
{
"key": "小米",
"doc_count": 4
},
{
"key": "苹果",
"doc_count": 1
}
]
}
}
}

2、第二组测试

多字段分组-最大分桶7-结果失败

http://127.0.0.1:9200/phones_test_bucket/_search
// 多字段分组查询,name+color,第二组,"max_buckets": "7"的情况下,分组失败
{
"aggs": {
"group_by_name": {
"terms": {
"field": "name"
},
"aggs": {
"group_by_color": {
"terms": {
"field": "color"
}
}
}
}
}
}
===返回===
{
"error": {
"root_cause": [
{
"type": "too_many_buckets_exception",
"reason": "Trying to create too many buckets. Must be less than or equal to: [7] but was [8]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
"max_buckets": 7
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "phones_test_bucket",
"node": "UuMcBk37TNWHjY4hVtzyVA",
"reason": {
"type": "too_many_buckets_exception",
"reason": "Trying to create too many buckets. Must be less than or equal to: [7] but was [8]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
"max_buckets": 7
}
}
]
},
"status": 503
}

多字段分组-最大分桶8-结果成功

http://127.0.0.1:9200/phones_test_bucket/_search
// 多字段分组查询,name+color,第二组,"max_buckets": "8"的情况下,分组成功
{
"size":0,
"aggs": {
"group_by_name": {
"terms": {
"field": "name"
},
"aggs": {
"group_by_color": {
"terms": {
"field": "color"
}
}
}
}
}
}
===返回===
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 9,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"group_by_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "华为",
"doc_count": 4,
"group_by_color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "白色",
"doc_count": 2
},
{
"key": "黑色",
"doc_count": 2
}
]
}
},
{
"key": "小米",
"doc_count": 4,
"group_by_color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "白色",
"doc_count": 2
},
{
"key": "黑色",
"doc_count": 2
}
]
}
},
{
"key": "苹果",
"doc_count": 1,
"group_by_color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "远峰蓝色",
"doc_count": 1
}
]
}
}
]
}
}
}

3、第三组测试

多字段分组-最大分桶16-结果失败

http://127.0.0.1:9200/phones_test_bucket/_search
// 多字段分组查询,name+color,第三组,"max_buckets": "17"的情况下,分组成功
{
"size":0,
"aggs": {
"group_by_name": {
"terms": {
"field": "name"
},
"aggs": {
"group_by_color": {
"terms": {
"field": "color"
},
"aggs": {
"group_by_category": {
"terms": {
"field": "category"
}
}
}
}
}
}
}
}
===返回===
{
"error": {
"root_cause": [],
"type": "search_phase_execution_exception",
"reason": "",
"phase": "fetch",
"grouped": true,
"failed_shards": [],
"caused_by": {
"type": "too_many_buckets_exception",
"reason": "Trying to create too many buckets. Must be less than or equal to: [16] but was [17]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
"max_buckets": 16
}
},
"status": 503
}

多字段分组-最大分桶17-结果成功

http://127.0.0.1:9200/phones_test_bucket/_search
// 多字段分组查询,name+color,第三组,"max_buckets": "17"的情况下,分组成功
{
"size":0,
"aggs": {
"group_by_name": {
"terms": {
"field": "name"
},
"aggs": {
"group_by_color": {
"terms": {
"field": "color"
},
"aggs": {
"group_by_category": {
"terms": {
"field": "category"
}
}
}
}
}
}
}
}
===返回===
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 9,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"group_by_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "华为",
"doc_count": 4,
"group_by_color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "白色",
"doc_count": 2,
"group_by_category": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "升级版",
"doc_count": 1
},
{
"key": "标准版",
"doc_count": 1
}
]
}
},
{
"key": "黑色",
"doc_count": 2,
"group_by_category": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "升级版",
"doc_count": 1
},
{
"key": "标准版",
"doc_count": 1
}
]
}
}
]
}
},
{
"key": "小米",
"doc_count": 4,
"group_by_color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "白色",
"doc_count": 2,
"group_by_category": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "升级版",
"doc_count": 1
},
{
"key": "标准版",
"doc_count": 1
}
]
}
},
{
"key": "黑色",
"doc_count": 2,
"group_by_category": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "升级版",
"doc_count": 1
},
{
"key": "标准版",
"doc_count": 1
}
]
}
}
]
}
},
{
"key": "苹果",
"doc_count": 1,
"group_by_color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "远峰蓝色",
"doc_count": 1,
"group_by_category": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "标准版",
"doc_count": 1
}
]
}
}
]
}
}
]
}
}
}

测试结论

以第三组测试数据为例子,按名称+颜色+类别进行聚合分组,最终发现临界值出现在max_buckets为17,那这个17是怎么算的,为甚设置17就可以查出来,设置max_buckets为16就报Trying to create too many buckets错呢。来,我画了一张分桶模拟图,动动你的小手,我们来数一数。



结合我上面测试的返回值,看一下,结果是不是正好对应了17个桶,想必到这里你应该也知道分桶的个数到底是怎么计算的了吧。

测试结论:es聚合分组的桶数计算规则,具体分了多少桶,是和数据相关的,数据异同越大,
分桶数目越多,对于多字段嵌套查询,嵌套的层数越深,分桶数越大。所以不建议大量字
段嵌套进行聚合查询,容易引发分桶爆炸,触发熔断查询。

最新文章

  1. Ansible Tower
  2. JS 内置对象
  3. Make something people want
  4. Java 生成验证码
  5. 实验一 Java开发环境的熟悉-刘蔚然
  6. Android布局— — —表格布局
  7. R cannot be resolved to a variable
  8. VIPServer VS LVS
  9. 【HDOJ】5154 Harry and Magical Computer
  10. 设置HTTP代理
  11. Echarts Jqplot嵌extjs4 windows 装配方法
  12. 用jQuery的ajax的功能实现输入自动提示的功能
  13. Java文件流之练习
  14. Arduino Pro or Pro Mini, ATmega328 (5V, 16 MHz)成功烧录方法
  15. 在VMware安装Centos7
  16. Fisher–Yates shuffle 洗牌算法(zz)
  17. [转]session和cookie的区别和联系,session的生命周期,多个服务部署时session管理
  18. 一个远程启动windows c++程序引发的技术决策现象
  19. iOS UI-UIScrollView控件实现图片轮播 (UIPageControl-分页指示器)
  20. 自然语言处理--中文文本向量化counterVectorizer()

热门文章

  1. Quartz.NET 任务调度框架的demo实例
  2. STM32F4寄存器初始化:PWM输出
  3. Windows环境下FTP Server在局域网内的搭建
  4. HGAME 2023 WP week1
  5. jenkins简单安装及配置(Windows环境
  6. RocketMQ - 消费者启动机制
  7. js中的for循环,循环次数会多出一次。当循环到最后一个的时候,循环还会继续,并且此时i就变成remove?
  8. CF873E - Awards For Contestants
  9. LRU 居然翻译成最近最少使用?真相原来是这样!(附力扣题解)
  10. LeetCode算法训练-回溯 491.递增子序列 46.全排列 47.全排列 II