SQL编写一般思路:

1)复杂的查询,先划分为小任务,以降低难度。分别实现各个小任务后,再进行汇总;
2)涉及多表时,先进行联表查询;
3)简单分组,一般只需要group by即可;
4)组内TopN问题,使用row_number,rank,dense_rank;
5)熟练掌握常用函数;

1. 常用函数

1)字符串

split,分割字符串为数组,split(“a|b|c”, “\|”) => 返回数组 [a, b, c]

参数1:待分割到字符串;

参数2:分割字符,因为"|“在Java中是特殊字符,所以需要进行转义,转义使用两个”";

substr,取子字符串,substr(moviename, -5, 4)

参数1:原字符串;

参数2:截取的开始位置,如果是负数,则从右往左计数,如-1表示最后一个字符,-2表示倒数第二个字符;

参数3:截取长度;

示例:
0: jdbc:hive2://master:10000> select *, substr(moviename, -5, 4) as year from t_movie limit 5;
+------------------+-------------------------------------+-------------------------------+-------+
| t_movie.movieid | t_movie.moviename | t_movie.movietype | year |
+------------------+-------------------------------------+-------------------------------+-------+
| 1 | Toy Story (1995) | Animation|Children's|Comedy | 1995 |
| 2 | Jumanji (1995) | Adventure|Children's|Fantasy | 1995 |
| 3 | Grumpier Old Men (1995) | Comedy|Romance | 1995 |
| 4 | Waiting to Exhale (1995) | Comedy|Drama | 1995 |
| 5 | Father of the Bride Part II (1995) | Comedy | 1995 |
+------------------+-------------------------------------+-------------------------------+-------+

2)时间函数

year,获取时间的年份;

month,获取时间的月份;

from_unixtime,将时间戳转换为时间;

unix_timestamp():获取当前时间戳;

unix_timestamp(string date):时间转换为时间戳;

获取当前日期 & 时间:
当前日期:
0: jdbc:hive2://master:10000> select current_date();
+-------------+
| _c0 |
+-------------+
| 2019-09-21 |
+-------------+
当前时间:
0: jdbc:hive2://master:10000> select current_timestamp();
+--------------------------+
| _c0 |
+--------------------------+
| 2019-09-21 18:05:27.768 |
+--------------------------+
当前时间戳:
0: jdbc:hive2://master:10000> select unix_timestamp();
+-------------+
| _c0 |
+-------------+
| 1569060416 |
+-------------+ 从时间中获取年份:
0: jdbc:hive2://master:10000> select year("2019-09-21 18:05:27.768 ") as year;
+-------+
| year |
+-------+
| 2019 |
+-------+ 从时间戳中获取月份:
0: jdbc:hive2://master:10000> select month(from_unixtime(1569060416)) as month;
+--------+
| month |
+--------+
| 9 |
+--------+

3)聚合函数

sum、avg等;

4)explode

将数组等拆分为多行

0: jdbc:hive2://master:10000> select m.*, t.type from t_movie m lateral view explode(split(movietype, "\\|")) t as type limit 10;
+------------+---------------------------+-------------------------------+-------------+
| m.movieid | m.moviename | m.movietype | t.type |
+------------+---------------------------+-------------------------------+-------------+
| 1 | Toy Story (1995) | Animation|Children's|Comedy | Animation |
| 1 | Toy Story (1995) | Animation|Children's|Comedy | Children's |
| 1 | Toy Story (1995) | Animation|Children's|Comedy | Comedy |
| 2 | Jumanji (1995) | Adventure|Children's|Fantasy | Adventure |
| 2 | Jumanji (1995) | Adventure|Children's|Fantasy | Children's |
| 2 | Jumanji (1995) | Adventure|Children's|Fantasy | Fantasy |
| 3 | Grumpier Old Men (1995) | Comedy|Romance | Comedy |
| 3 | Grumpier Old Men (1995) | Comedy|Romance | Romance |
| 4 | Waiting to Exhale (1995) | Comedy|Drama | Comedy |
| 4 | Waiting to Exhale (1995) | Comedy|Drama | Drama |
+------------+---------------------------+-------------------------------+-------------+

5)collect_set,可以理解为该函数实现了explode相反到功能;

collect_list:可以包含重复数据;collect_set:去重;

将多行某字段到数据,合并为一个数组,需要结合group by进行分组,以确定合并到行到范围。

0: jdbc:hive2://master:10000> select moviename, collect_set(type) as types from (select m.*, t.type from t_movie m lateral view explode(split(movietype, "\\|")) t as type limit 10) t group by moviename;
+---------------------------+---------------------------------------+
| moviename | types |
+---------------------------+---------------------------------------+
| Jumanji (1995) | ["Children's","Adventure","Fantasy"] |
| Toy Story (1995) | ["Comedy","Children's","Animation"] |
| Grumpier Old Men (1995) | ["Comedy","Romance"] |
| Waiting to Exhale (1995) | ["Drama","Comedy"] |
+---------------------------+---------------------------------------+

2. 常见场景及方法

2. 常见场景及方法
1)简单条件过滤;
使用where,显示movieid为1到电影名:
0: jdbc:hive2://master:10000> select moviename from t_movie where movieid = 1;
+-------------------+
| moviename |
+-------------------+
| Toy Story (1995) |
+-------------------+ 2)联表条件过滤;
使用join on,获取评分为5的电影名:
select moviename, rate from t_rating r join t_movie m on r.movieid=m.movieid where rate=5 limit 5;
+-----------------------------------------+-------+
| moviename | rate |
+-----------------------------------------+-------+
| One Flew Over the Cuckoo's Nest (1975) | 5.0 |
| Bug's Life, A (1998) | 5.0 |
| Ben-Hur (1959) | 5.0 |
| Christmas Story, A (1983) | 5.0 |
| Beauty and the Beast (1991) | 5.0 |
+-----------------------------------------+-------+ 3)分组统计;
使用group by和聚合函数
获取电影的平均评分:
select movieid, avg(rate) avg_rate from t_rating group by movieid order by movieid limit 5;
+----------+---------------------+
| movieid | avg_rate |
+----------+---------------------+
| 1 | 4.146846413095811 |
| 2 | 3.20114122681883 |
| 3 | 3.01673640167364 |
| 4 | 2.7294117647058824 |
| 5 | 3.0067567567567566 |
+----------+---------------------+ 4)组内TopN;
使用row_number,rank,dense_rank;
获取各部门工资最高的三名员工:
select * from (select deptid, name, (salary+nvl(bonus, 0)) salary, dense_rank() over(partition by deptid order by salary desc) as rank from emp) t where rank<=3;
+-----------+---------+-----------+---------+
| t.deptid | t.name | t.salary | t.rank |
+-----------+---------+-----------+---------+
| 10 | KING | 5000.0 | 1 |
| 10 | CLARK | 2450.0 | 2 |
| 10 | MILLER | 1300.0 | 3 |
| 20 | FORD | 3000.0 | 1 |
| 20 | SCOTT | 3000.0 | 1 |
| 20 | JONES | 2975.0 | 2 |
| 20 | ADAMS | 1100.0 | 3 |
| 30 | BLAKE | 2850.0 | 1 |
| 30 | ALLEN | 1900.0 | 2 |
| 30 | TURNER | 1500.0 | 3 |
+-----------+---------+-----------+---------+ 5)累加;
使用sum() over(partition by order by)
获取每个人按月累计消费:
select name, dt, cost, sum(cost) over(partition by name, month(dt) order by cost) as sum from t_order order by name;
+-------+-------------+-------+------+
| name | dt | cost | sum |
+-------+-------------+-------+------+
| jack | 2015-01-01 | 10 | 10 |
| jack | 2015-01-05 | 46 | 56 |
| jack | 2015-01-08 | 55 | 111 |
| jack | 2015-02-03 | 23 | 23 |
| jack | 2015-04-06 | 42 | 42 |
| mart | 2015-04-08 | 62 | 62 |
| mart | 2015-04-09 | 68 | 130 |
| mart | 2015-04-11 | 75 | 205 |
| mart | 2015-04-13 | 94 | 299 | 注意,如果over语句中没有order by,则求取的是该月份的消费总金额,而不会出现累加。
select name, dt, cost, sum(cost) over(partition by name, month(dt)) as sum from t_order order by name;
+-------+-------------+-------+------+
| name | dt | cost | sum |
+-------+-------------+-------+------+
| jack | 2015-01-01 | 10 | 111 |
| jack | 2015-01-05 | 46 | 111 |
| jack | 2015-01-08 | 55 | 111 |
| jack | 2015-02-03 | 23 | 23 |
| jack | 2015-04-06 | 42 | 42 |
| mart | 2015-04-08 | 62 | 299 |
| mart | 2015-04-09 | 68 | 299 |
| mart | 2015-04-11 | 75 | 299 |
| mart | 2015-04-13 | 94 | 299 |

最新文章

  1. 中文分词工具探析(一):ICTCLAS (NLPIR)
  2. C#开发微信门户及应用(35)--微信支付之企业付款封装操作
  3. &amp;&amp;&amp;&amp;数组去重方法总结&amp;&amp;&amp;&amp;&amp;
  4. T4
  5. language model —— basic model 语言模型之基础模型
  6. socket reuse
  7. 配置redis外网可访问,并只允许指定的ip可访问redis
  8. volatile关键字
  9. 使用Xcode GPU Frame Caputre教程
  10. 利用HTML5开发Android(7)---HTML5本地存储之Database Storage
  11. storyBoard使用介绍
  12. linux之utime函数解析
  13. Android开发之adb无法连接
  14. SVG之颜色、渐变和笔刷的使用
  15. 【XSY2787】Mythological VII 贪心
  16. fibonacci数列的性质和实现方法
  17. java 获取指定日前的前一天
  18. 转:UFLDL_Tutorial 笔记(deep learning绝佳的入门资料 )
  19. Ubuntu 16.04 标题栏实时显示上下行网速、CPU及内存使用率
  20. Java对象的序列化与反序列化-Json篇

热门文章

  1. Java 8 CompletableFuture思考
  2. Centos7下tomcat关闭异常问题
  3. 用Visual Studio2019自定义项目模板
  4. C# 集合ArrayList :可以存储任何类型的数组,的基本用法
  5. 【Socket编程】【第一节】【Socket基本原理和套接字】
  6. 「雕爷学编程」Arduino动手做(27)——BMP280气压传感器
  7. node的http模块
  8. 前端ie7的兼容问题及解决方案(未完待续)
  9. Django操作session
  10. 虚拟机安装配置(VMware与CentOS安装)