1.如何高效使用Solr查询功能 ?
2.单个字段分组统计如何实现?
3.IN条件查询有几种方式?
4.多个字段分组统计是否只支持count?

Cloudera公司已经推出了基于Hadoop平台的查询统计分析工具Impala,只要熟悉SQL,就可以熟练地使用Impala来执行查询与分析的功能。不过Impala的SQL和关系数据库的SQL还是有一点微妙地不同的。
下面,我们设计一个表,通过该表中的数据,来将SQL查询与统计的语句,使用Solr查询的方式来与SQL查询对应。这个翻译的过程,是非常有趣的,你可以看到Solr一些很不错的功能。
用来示例的表结构设计,如图所示:

下面,我们通过给出一些SQL查询统计语句,然后对应翻译成Solr查询语句,然后对比结果

查询对比
条件组合查询
SQL查询语句:
  1. SELECT log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type
  2. FROM v_i_event
  3. WHERE prov_id = 1 AND net_type = 1 AND area_id = 10304 AND time_type = 1 AND time_id >= 20130801 AND time_id <= 20130815
  4. ORDER BY log_id LIMIT 10;

复制代码

查询结果,如图所示:

Solr查询URL:
  1. http://slave1:8888/solr-cloud/i_event/select?q=*:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type&fq=prov_id:1 AND net_type:1 AND area_id:10304 AND time_type:1 AND time_id:[20130801 TO 20130815]&sort=log_id asc&start=0&rows=10

复制代码

查询结果,如下所示:

  1. <response>
  2. <lst name="responseHeader">
  3. <int name="status">0</int>
  4. <int name="QTime">4</int>
  5. </lst>
  6. <result name="response" numFound="77" start="0">
  7. <doc>
  8. <int name="log_id">6827</int>
  9. <long name="start_time">1375072117</long>
  10. <long name="end_time">1375081683</long>
  11. <int name="prov_id">1</int>
  12. <int name="city_id">103</int>
  13. <int name="area_id">10304</int>
  14. <int name="idt_id">11002</int>
  15. <int name="cnt">0</int>
  16. <int name="net_type">1</int>
  17. </doc>
  18. <doc>
  19. <int name="log_id">6827</int>
  20. <long name="start_time">1375072117</long>
  21. <long name="end_time">1375081683</long>
  22. <int name="prov_id">1</int>
  23. <int name="city_id">103</int>
  24. <int name="area_id">10304</int>
  25. <int name="idt_id">11000</int>
  26. <int name="cnt">0</int>
  27. <int name="net_type">1</int>
  28. </doc>
  29. <doc>
  30. <int name="log_id">6851</int>
  31. <long name="start_time">1375142158</long>
  32. <long name="end_time">1375146391</long>
  33. <int name="prov_id">1</int>
  34. <int name="city_id">103</int>
  35. <int name="area_id">10304</int>
  36. <int name="idt_id">14001</int>
  37. <int name="cnt">5</int>
  38. <int name="net_type">1</int>
  39. </doc>
  40. <doc>
  41. <int name="log_id">6851</int>
  42. <long name="start_time">1375142158</long>
  43. <long name="end_time">1375146391</long>
  44. <int name="prov_id">1</int>
  45. <int name="city_id">103</int>
  46. <int name="area_id">10304</int>
  47. <int name="idt_id">11002</int>
  48. <int name="cnt">23</int>
  49. <int name="net_type">1</int>
  50. </doc>
  51. <doc>
  52. <int name="log_id">6851</int>
  53. <long name="start_time">1375142158</long>
  54. <long name="end_time">1375146391</long>
  55. <int name="prov_id">1</int>
  56. <int name="city_id">103</int>
  57. <int name="area_id">10304</int>
  58. <int name="idt_id">10200</int>
  59. <int name="cnt">55</int>
  60. <int name="net_type">1</int>
  61. </doc>
  62. <doc>
  63. <int name="log_id">6851</int>
  64. <long name="start_time">1375142158</long>
  65. <long name="end_time">1375146391</long>
  66. <int name="prov_id">1</int>
  67. <int name="city_id">103</int>
  68. <int name="area_id">10304</int>
  69. <int name="idt_id">14000</int>
  70. <int name="cnt">4</int>

复制代码

对比上面结果,除了根据idt_id排序方式不同以外(Impala是升序,Solr是降序),其他是相同的。
单个字段分组统计
SQL查询语句:
  1. SELECT prov_id, SUM(cnt) AS sum_cnt, AVG(cnt) AS avg_cnt, MAX(cnt) AS max_cnt, MIN(cnt) AS min_cnt, COUNT(cnt) AS count_cnt
  2. FROM v_i_event
  3. GROUP BY prov_id;

复制代码

查询结果,如图所示:

Solr查询URL:
  1. http://slave1:8888/solr-cloud/i_event/select?q=*:*&stats=true&stats.field=cnt&rows=0&indent=true

复制代码

查询结果,如下所示:

  1. <response>
  2. <lst name="responseHeader">
  3. <int name="status">0</int>
  4. <int name="QTime">2</int>
  5. </lst>
  6. <result name="response" numFound="4088" start="0"></result>
  7. <lst name="stats">
  8. <lst name="stats_fields">
  9. <lst name="cnt">
  10. <double name="min">0.0</double>
  11. <double name="max">1258.0</double>
  12. <long name="count">4088</long>
  13. <long name="missing">0</long>
  14. <double name="sum">32587.0</double>
  15. <double name="sumOfSquares">9170559.0</double>
  16. <double name="mean">7.971379647749511</double>
  17. <double name="stddev">46.69344567709268</double>
  18. <lst name="facets" />
  19. </lst>
  20. </lst>
  21. </lst>
  22. </response>

复制代码

对比查询结果,Solr提供了更多的统计项,如标准差(stddev)等,与SQL查询结果是一致的。
IN条件查询
SQL查询语句:
[cde lang="sql"]
SELECT log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_typ
FROM v_i_event
WHERE prov_id = 1 AND net_type = 1 ANDcity_id IN(106,103) AND idt_id IN(12011,5004,6051,6056,8002) AND time_type = 1AND time_id >= 20130801 AND time_id <= 20130815
ORDER BY log_id, start_time DESC LIMIT 10;
[/code]
查询结果,如图所示:
 

Solr查询URL:

  1. http://slave1:8888/solr-cloud/i_event/select?q=*:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id, cnt,net_type&fq=prov_id:1 AND net_type:1 AND (city_id:106 OR city_id:103) AND (idt_id:12011 OR idt_id:5004 OR idt_id:6051 OR idt_id:6056 OR idt_id:8002)
    AND time_type:1 AND time_id:[20130801 TO 20130815]&sort=log_id asc ,start_time desc&start=0&rows=10

复制代码

  1. http://slave1:8888/solr-cloud/i_event/select?q=*:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id, cnt ,net_type&fq=prov_id:1&fq=net_type:1&fq=(city_id:106 OR city_id:103)&fq=(idt_id:12011 OR idt_id:5004 OR idt_id:6051 OR idt_id:6056 OR idt_id:8002)&fq=time_type:1&fq=time_id:[20130801
    TO 20130815]&sort=log_id asc,start_time desc&start=0&rows=10

复制代码

查询结果,如下所示:

  1. <response>
  2. <lst name="responseHeader">
  3. <int name="status">0</int>
  4. <int name="QTime">6</int>
  5. </lst>
  6. <result name="response" numFound="63" start="0">
  7. <doc>
  8. <int name="log_id">6553</int>
  9. <long name="start_time">1374054184</long>
  10. <long name="end_time">1374054254</long>
  11. <int name="prov_id">1</int>
  12. <int name="city_id">103</int>
  13. <int name="area_id">10307</int>
  14. <int name="idt_id">12011</int>
  15. <int name="cnt">0</int>
  16. <int name="net_type">1</int>
  17. </doc>
  18. <doc>
  19. <int name="log_id">6553</int>
  20. <long name="start_time">1374054184</long>
  21. <long name="end_time">1374054254</long>
  22. <int name="prov_id">1</int>
  23. <int name="city_id">103</int>
  24. <int name="area_id">10307</int>
  25. <int name="idt_id">5004</int>
  26. <int name="cnt">2</int>
  27. <int name="net_type">1</int>
  28. </doc>
  29. <doc>
  30. <int name="log_id">6555</int>
  31. <long name="start_time">1374055060</long>
  32. <long name="end_time">1374055158</long>
  33. <int name="prov_id">1</int>
  34. <int name="city_id">103</int>
  35. <int name="area_id">70104</int>
  36. <int name="idt_id">5004</int>
  37. <int name="cnt">3</int>
  38. <int name="net_type">1</int>

复制代码

对比查询结果,是一致的。
开区间范围条件查询
SQL查询语句:
SELECTlog_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type
FROM v_i_event
WHERE net_type = 1 AND idt_idIN(12011,5004,6051,6056,8002) AND time_type = 1 AND start_time >= 1373598465AND end_time < 1374055254
ORDER BY log_id, start_time, idt_id DESCLIMIT 30;
查询结果,如图所示:
 
Solr查询URL:
  1. http://slave1:8888/solr-cloud/i_event/select?q=*:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type&fq=net_type:1 AND (idt_id:12011 OR idt_id:5004 OR idt_id:6051 OR idt_id:6056 OR idt_id:8002) AND time_type:1 AND start_time:[1373598465
    TO 1374055254]&fq =-start_time:1374055254&sort=log_id asc,start_time asc,idt_id desc&start=0&rows=30

复制代码

  1. http://slave1:8888/solr-cloud/i_event/select?q=*:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type&fq=net_type:1 AND (idt_id:12011 OR idt_id:5004 OR idt_id:6051 OR idt_id:6056 OR idt_id:8002) AND time_type:1 AND start_time:[1373598465
    TO 1374055254] AND -start_time:1374055254&sort=log_id asc,start_time asc,idt_id desc&start=0&rows=30

复制代码

  1. http://slave1:8888/solr-cloud/i_event/select?q=*:*&fl=log_id,start_time,end_time,prov_id,city_id,area_id,idt_id,cnt,net_type&fq=net_type:1&fq=idt_id:12011 OR idt_id:5004 OR idt_id:6051 OR idt_id:6056 OR idt_id:8002&fq =time_type:1&fq=start_time:[1373598465
    TO 1374055254]&fq =-start_time:1374055254&sort=log_id asc,start_time asc,idt_id desc&start=0&rows=30

复制代码

查询结果,如下所示:

  1. <response>
  2. <lst name="responseHeader">
  3. <int name="status">0</int>
  4. <int name="QTime">5</int>
  5. </lst>
  6. <result name="response" numFound="4" start="0">
  7. <doc>
  8. <int name="log_id">6553</int>
  9. <long name="start_time">1374054184</long>
  10. <long name="end_time">1374054254</long>
  11. <int name="prov_id">1</int>
  12. <int name="city_id">103</int>
  13. <int name="area_id">10307</int>
  14. <int name="idt_id">12011</int>
  15. <int name="cnt">0</int>
  16. <int name="net_type">1</int>
  17. </doc>
  18. <doc>
  19. <int name="log_id">6553</int>
  20. <long name="start_time">1374054184</long>
  21. <long name="end_time">1374054254</long>
  22. <int name="prov_id">1</int>
  23. <int name="city_id">103</int>
  24. <int name="area_id">10307</int>
  25. <int name="cnt">2</int>
  26. <int name="net_type">1</int>
  27. </doc>
  28. <doc>
  29. <int name="log_id">6555</int>
  30. <long name="start_time">1374055060</long>
  31. <long name="end_time">1374055158</long>
  32. <int name="prov_id">1</int>
  33. <int name="city_id">103</int>
  34. <int name="area_id">70104</int>
  35. <int name="idt_id">12011</int>
  36. <int name="cnt">0</int>
  37. <int name="net_type">1</int>
  38. </doc>
  39. <doc>
  40. <int name="log_id">6555</int>
  41. <long name="start_time">1374055060</long>
  42. <long name="end_time">1374055158</long>
  43. <int name="prov_id">1</int>
  44. <int name="city_id">103</int>
  45. <int name="area_id">70104</int>
  46. <int name="idt_id">5004</int>
  47. <int name="cnt">3</int>
  48. <int name="net_type">1</int>
  49. </doc>
  50. </result>
  51. </response>

复制代码

多个字段分组统计(只支持count函数)
SQL查询语句:
SELECT city_id, area_id, COUNT(cnt) AScount_cnt
FROM v_i_event
WHERE prov_id = 1 AND net_type = 1
GROUP BY city_id, area_id;
查询结果,如图所示:
 
Solr查询URL:
  1. http://slave1:8888/solr-cloud/i_event/select?q=*:*&facet=true&facet.pivot=city_id,area_id&fq=prov_id:1 AND net_type:1&rows=0&indent=true

复制代码

对比上面结果,Solr查询结果,需要从上面的各组中进行合并,得到最终的统计结果,结果和SQL结果是一致的。
多个字段分组统计(支持count、sum、max、min等函数)
一次对多个字段进行独立分组统计,Solr可以很好的支持。这相当于执行两个带有GROUP BY子句的SQL,这两个GROUP BY分别只对一个字段进行汇总统计。
SQL查询语句:
  1. SELECT city_id, area_id, COUNT(cnt) AS count_cnt
  2. FROM v_i_event
  3. WHERE prov_id = 1 AND net_type = 1
  4. GROUP BY city_id;
  5. SELECT city_id, area_id, COUNT(cnt) AS count_cnt
  6. FROM v_i_event
  7. WHERE prov_id = 1 AND net_type = 1
  8. GROUP BY area_id;

复制代码

查询结果,不再显示。

Solr查询URL:
  1. >http://slave1:8888/solr-cloud/i_event/select?q=*:*&stats=true&stats.field=cnt&f.cnt.stats.facet=city_id&&f.cnt.stats.facet=area_id&fq=prov_id:1 AND net_type:1&rows=0&indent=true

复制代码

查询结果,如下所示:
  1. <response>
  2. <lst name="responseHeader">
  3. <int name="status">0</int>
  4. <int name="QTime">72</int>
  5. </lst>
  6. <result name="response" numFound="1171" start="0"></result>
  7. <lst name="facet_counts">
  8. <lst name="facet_queries" />
  9. <lst name="facet_fields" />
  10. <lst name="facet_dates" />
  11. <lst name="facet_ranges" />
  12. <lst name="facet_pivot">
  13. <arr name="city_id,area_id">
  14. <lst>
  15. <str name="field">city_id</str>
  16. <int name="value">103</int>
  17. <int name="count">678</int>
  18. <arr name="pivot">
  19. <lst>
  20. <str name="field">area_id</str>
  21. <int name="value">10307</int>
  22. <int name="count">298</int>
  23. </lst>
  24. <lst>
  25. <str name="field">area_id</str>
  26. <int name="value">10315</int>
  27. <int name="count">120</int>
  28. </lst>
  29. <lst>
  30. <str name="field">area_id</str>
  31. <int name="value">10317</int>
  32. <int name="count">86</int>
  33. </lst>
  34. <lst>
  35. <str name="field">area_id</str>
  36. <int name="value">10304</int>
  37. <int name="count">67</int>
  38. </lst>
  39. <lst>
  40. <str name="field">area_id</str>
  41. <int name="value">10310</int>
  42. <int name="count">49</int>
  43. </lst>
  44. <lst>
  45. <str name="field">area_id</str>
  46. <int name="value">70104</int>
  47. <int name="count">48</int>
  48. </lst>
  49. <lst>
  50. <str name="field">area_id</str>
  51. <int name="value">10308</int>
  52. <int name="count">6</int>
  53. </lst>
  54. <lst>
  55. <str name="field">area_id</str>
  56. <int name="value">0</int>
  57. <int name="count">2</int>
  58. </lst>
  59. <lst>
  60. <str name="field">area_id</str>
  61. <int name="value">10311</int>
  62. <int name="count">2</int>
  63. </lst>
  64. </arr>
  65. </lst>
  66. <lst>
  67. <str name="field">city_id</str>
  68. <int name="value">0</int>
  69. <int name="count">463</int>
  70. <arr name="pivot">
  71. <lst>
  72. <str name="field">area_id</str>
  73. <int name="value">0</int>
  74. <int name="count">395</int>
  75. </lst>
  76. <lst>
  77. <str name="field">area_id</str>
  78. <int name="value">10307</int>
  79. <int name="count">68</int>

复制代码

对比上面结果,Solr查询结果,需要从上面的各组中进行合并,得到最终的统计结果,结果和SQL结果是一致的。
多个字段联合分组统计(支持count、sum、max、min等函数)
SQL查询语句:
SELECT city_id, area_id, SUM(cnt) ASsum_cnt, AVG(cnt) AS avg_cnt, MAX(cnt) AS max_cnt, MIN(cnt) AS min_cnt,COUNT(cnt) AS count_cnt
FROM v_i_event
WHERE prov_id = 1 AND net_type = 1
GROUP BY city_id, area_id;
查询结果,如图所示:
 
Solr目前不能简单的支持这种查询,如果想要满足这种查询统计,需要在schema的设计上,将一个字段设置为多值,然后通过多个值进行分组统计。如果应用中查询统计分析的模式比较固定,预先知道哪些字段会用于联合分组统计,完全可以在设计的时候,考虑设置多值字段来满足这种需求。

---------------------

本文来自 Eric-L90 的CSDN 博客 ,全文地址请点击:https://blog.csdn.net/brotherdong90/article/details/52690163?utm_source=copy

最新文章

  1. 使用ab对nginx进行压力测试
  2. 转载《android:scaleType属性》
  3. 用css画图标
  4. mybatis xml 大于 小于
  5. HDU 3586 Information Disturbing 树形DP+二分
  6. php 获取链接参数
  7. C#-datagridview设置列宽
  8. Websocket协议数据帧传输和关闭连接
  9. 20160127 linux 学习笔记
  10. python中对文件、文件夹的操作需要涉及到os模块和shutil模块。
  11. Java------------运算符优先级速记口诀
  12. HDU 1234 开门人和关门人
  13. NGUI_Input
  14. 2018-2019-2 网络对抗技术 20165336 Exp3 免杀原理与实践
  15. CF D.Mister B and Astronomers
  16. MySQL查询日志总结
  17. android屏蔽系统锁屏的办法
  18. 【持续更新中&#183;&#183;&#183;】Linux下的小技巧
  19. Java08-java语法基础(七)构造方法
  20. .NET-DataTable或DataSet转JSON

热门文章

  1. windows下尝试编写node模块
  2. 从后端接口下载文件的2种方式:get方式、post方式
  3. python selenium-webdriver 标签页切换(十五)
  4. Git安装及密钥的生成(转)
  5. HanLP代码与词典分离方案与流程
  6. C#3.0:新特性
  7. 【设计模式】JDK源码中用到的设计模式
  8. Google program AB程序的基本理解
  9. SPI初识
  10. Debian 7.0(Wheezy) 安装配置笔记