KingbaseES 的 Lateral 连接

一、什么是 Lateral 连接

根据文档，它的作用是：

LATERAL 关键字可以位于子 SELECT FROM 项之前。这允许子 SELECT 引用 FROM 列表中出现在它之前的 FROM 项的列。（没有 LATERAL，每个子 SELECT 都是独立评估的，因此不能交叉引用任何其他 FROM 项。）

FROM中出现的表函数，前面也可以加上关键字Lateral，但对于函数来说，Lateral是可选的；FROM在任何情况下，函数的参数都可以引用对前“表”的列。

基本上，它的作用是对于主“表”中的每一行，它使用主选择行作为参数来计算子查询。与for循环遍历查询返回的行，非常相似。

二、Lateral 的用途

1、语法糖

语法糖(Syntactic sugar)，指计算机语言中添加的某种语法，这种语法对语言的功能并没有影响，但是更方便程序员使用。通常来说使用语法糖能够增加程序的可读性，从而减少程序代码出错的机会。

横向连接允许您重用计算列，使您的查询整洁易读。让我们通过一起重写一个糟糕的查询来了解横向连接。

select

    (pledged / fx_rate) as pledged_usd,

    (pledged / fx_rate) / backers_count as avg_pledge_usd,

    (goal / fx_rate) - (pledged / fx_rate) as amt_from_goal,

    (deadline - launched_at) / 86400.00 as duration,

    ((goal / fx_rate) - (pledged / fx_rate)) / ((deadline - launched_at) / 86400.00) as usd_needed_daily

from kickstarter_data;

使用横向连接，可以只定义一次计算列，然后可以在查询的其他部分引用这些列值。

select

    pledged_usd,

    avg_pledge_usd,

    amt_from_goal,

    duration,

    (usd_from_goal / duration) as usd_needed_daily

from kickstarter_data,

    lateral (select pledged / fx_rate as pledged_usd) pu,

    lateral (select pledged_usd / backers_count as avg_pledge_usd) apu,

    lateral (select goal / fx_rate as goal_usd) gu,

    lateral (select goal_usd - pledged_usd as usd_from_goal) ufg,

    lateral (select (deadline - launched_at)/86400.00 as duration) dr;

2、子查询增强模式

lateral连接更像是相关子查询，而不是普通子查询，因为lateral连接右侧的表达式对其左侧的每一行进行比较 - 就像相关子查询一样 - 而普通子查询只根据关联条件比较一次。（查询计划器有方法，可以优化两者的性能。）

另外，请记住，相关子查询的等价物是LEFT JOIN lateral... ON true：

lateral 和交叉应用是一回事。

参考以下查询。

Select A.*

, (Select min(B.val) Column1 from B where B.Fk1 = A.PK  )

, (Select max(B.val) Column2 from B where B.Fk1 = A.PK  )

FROM A ;

在这种情况下，可以使用 LATERAL 。

Select A.*

     , x.Column1

     , x.Column2

FROM A LEFT JOIN LATERAL

   (Select  min(B.val) Column1, max(B.val) Column2, B.Fk1 from B where B.Fk1 = A.PK ) x ON true;

在此查询中，由于条件子句，不能使用普通连接，可以使用 lateral 或交叉应用。

3、避免重复执行子查询或函数

用户行为A的每行记录的optdate值，与用户行为B表的最大 optdate值并列表格。

（1）数据准备

用户字典表，用户行为A表，用户行为B表

create table users (user_id int PRIMARY KEY, username text);

create table optA (opta_id int PRIMARY KEY, user_id int, optdate timestamp , note text) ;

create index optA_i1 on optA(user_id,optdate desc );

create table optB (optb_id int PRIMARY KEY, user_id int, optdate timestamp , note text) ;

create index optB_i1 on optB(user_id,optdate desc );

（2）普通语句

每行都执行用户行为B表的查询语句，消耗很多CPU计算时间。

select optA.*,(select max(optdate) from optB where optB.user_id=optA.user_id)

from optA

where opta.user_id=88;

                                           QUERY PLAN

------------------------------------------------------------------------------------------------

 Bitmap Heap Scan on opta  (cost=4.19..33.58 rows=5 width=56)

   Recheck Cond: (user_id = 88)

   ->  Bitmap Index Scan on opta_i1  (cost=0.00..4.19 rows=5 width=0)

         Index Cond: (user_id = 88)

   SubPlan 2

     ->  Result  (cost=4.17..4.18 rows=1 width=8)

           InitPlan 1 (returns $1)

             ->  Limit  (cost=0.15..4.17 rows=1 width=8)

                   ->  Index Only Scan using optb_i1 on optb  (cost=0.15..20.25 rows=5 width=8)

                         Index Cond: ((user_id = opta.user_id) AND (optdate IS NOT NULL))

(10 rows)

（3）基本优化

使用子查询，实现避免重复执行用户行为B表的查询语句

select optA.*, optB.*

from optA

         join (select user_id, max(optdate) from optB group by user_id) optB on optB.user_id = optA.user_id

where opta.user_id = 88;

                                       QUERY PLAN

----------------------------------------------------------------------------------------

 Nested Loop  (cost=8.38..25.78 rows=25 width=60)

   ->  Bitmap Heap Scan on opta  (cost=4.19..12.66 rows=5 width=48)

         Recheck Cond: (user_id = 88)

         ->  Bitmap Index Scan on opta_i1  (cost=0.00..4.19 rows=5 width=0)

               Index Cond: (user_id = 88)

   ->  Materialize  (cost=4.19..12.81 rows=5 width=12)

         ->  GroupAggregate  (cost=4.19..12.74 rows=5 width=12)

               Group Key: optb.user_id

               ->  Bitmap Heap Scan on optb  (cost=4.19..12.66 rows=5 width=12)

                     Recheck Cond: (user_id = 88)

                     ->  Bitmap Index Scan on optb_i1  (cost=0.00..4.19 rows=5 width=0)

                           Index Cond: (user_id = 88)

（4）LATERAL 连接

依然有逐行执行子查询的现象

select optA.*, optB.*

from optA

         cross join lateral (select max(optdate)

                             from optB

                             where optB.user_id = optA.user_id) optB

where opta.user_id = 88;

                                          QUERY PLAN

----------------------------------------------------------------------------------------------

 Nested Loop  (cost=8.36..33.68 rows=5 width=56)

   ->  Bitmap Heap Scan on opta  (cost=4.19..12.66 rows=5 width=48)

         Recheck Cond: (user_id = 88)

         ->  Bitmap Index Scan on opta_i1  (cost=0.00..4.19 rows=5 width=0)

               Index Cond: (user_id = 88)

   ->  Result  (cost=4.17..4.18 rows=1 width=8)

         InitPlan 1 (returns $1)

           ->  Limit  (cost=0.15..4.17 rows=1 width=8)

                 ->  Index Only Scan using optb_i1 on optb  (cost=0.15..20.25 rows=5 width=8)

                       Index Cond: ((user_id = $0) AND (optdate IS NOT NULL))

(10 rows)

（5）使用字典表和 LATERAL 连接

在两个事实表之间，使用字典表作为过度，可以避免重复执行子查询

select optA.*, optB.*

from optA

    join users on users.user_id=optA.user_id

         cross join lateral (select max(optdate)

                             from optB

                             where optB.user_id = users.user_id) optB

where opta.user_id = 88;

                                             QUERY PLAN

----------------------------------------------------------------------------------------------------

 Nested Loop  (cost=8.52..25.09 rows=5 width=56)

   ->  Nested Loop  (cost=4.33..12.37 rows=1 width=12)

         ->  Index Only Scan using users_pkey on users  (cost=0.15..8.17 rows=1 width=4)

               Index Cond: (user_id = 88)

         ->  Result  (cost=4.17..4.18 rows=1 width=8)

               InitPlan 1 (returns $1)

                 ->  Limit  (cost=0.15..4.17 rows=1 width=8)

                       ->  Index Only Scan using optb_i1 on optb  (cost=0.15..20.25 rows=5 width=8)

                             Index Cond: ((user_id = $0) AND (optdate IS NOT NULL))

   ->  Bitmap Heap Scan on opta  (cost=4.19..12.66 rows=5 width=48)

         Recheck Cond: (user_id = 88)

         ->  Bitmap Index Scan on opta_i1  (cost=0.00..4.19 rows=5 width=0)

               Index Cond: (user_id = 88)

(13 rows)

4、CTE或视图，含有分组自居和聚合函数

（1）过滤列不是分组列

这种情况下，分组列的索引没有 Index Cond 定位数据，只是遍历索引行。扫描记录数等于满足条件的数据行，以及之前的数据行，执行时长取决于分组列值处于索引行的位置。

with optA as (select user_id, max(optdate) max_dt

              from optA

              group by user_id)

select users.*, optA.*

from users

         cross join lateral (select *

                             from optA

                             where optA.user_id = users.user_id) optA

where users.username = 'ABC';

Nested Loop  (cost=0.85..60792.44 rows=9 width=32)

  Join Filter: (optB.user_id = users.user_id)

  ->  Index Scan using users_username on users

        Index Cond: (opta.username = 'ABC')

  ->  GroupAggregate  (cost=0.42..58714.70 rows=91969 width=20)

        Group Key: optA.user_id

        ->  Index Scan using optA_user_id on optA  (cost=0.42..47795.01 rows=1000000 width=8)

（2）连接条件使用any (subquery)

分组列的索引，通过 Index Cond 定位数据，扫描记录数等于满足条件的数据行。

with optA as (select user_id, max(optdate) max_dt

              from optA

              group by user_id)

select users.*, optA.*

from users

         cross join lateral (select *

                             from optA

                             where optA.user_id = any (select users.user_id)) optA

where users.username = 'ABC';

Nested Loop  (cost=0.85..57.61 rows=11 width=32)

  ->  Index Scan usingusers_username on users

        Index Cond: (opta.username = 'ABC')

  ->  GroupAggregate  (cost=0.42..48.83 rows=11 width=20)

        Group Key: optA.user_id

        ->  Index Scan using optA_user_id on optA  (cost=0.42..48.61 rows=11 width=8)

              Index Cond: (optB.user_id = users.user_id)

5、分组查询，获得每个用户的最新时间，或者最新行

（1）数据准备

用户日志表，包含user_id 和 log_date

create table log

(

    log_date timestamp,

    user_id  int,

    note     text

);

insert into log

select now() - ((100000 * random())::numeric(20, 3)::text)::interval log_date,

       (random() * 1000000)::int % 100                               id,

       md5(id)

from generate_series(1, 100000) id

order by random();

-- 索引列与排序列的次序和模式，保持一致

create index log_i1 on log (user_id, log_date DESC NULLS LAST);

（2）普通语句

顺序扫描log表或复合条件的所有记录，通过聚合函数max和窗口函数row_number

explain analyse

select user_id, max(log_date)

from log

group by user_id;

explain analyse

select *

from (select *, row_number() over (partition by user_id order by log_date desc ) sn from log) l

where sn = 1;

（3）递归 CTE 语句

方便检索单列或整行，使用表格的整行类型。仅读取每个用户的最新记录，使用的总数据块数和执行时长，远少于普通语句。

WITH RECURSIVE cte AS (

    ( -- 需要括号

        SELECT l AS my_row -- 整行记录

        FROM log l

        ORDER BY user_id, log_date DESC NULLS LAST

        LIMIT 1

    )

    UNION ALL

    SELECT (SELECT l -- 整行记录

            FROM log l

            WHERE l.user_id > (c.my_row).user_id

            ORDER BY l.user_id, l.log_date DESC NULLS LAST

            LIMIT 1)

    FROM cte c

    WHERE (c.my_row).user_id IS NOT NULL

)

SELECT (my_row).* -- 分解行

FROM cte

WHERE (my_row).user_id IS NOT NULL

ORDER BY (my_row).user_id;

（4）使用 LATERAL 连接的递归 CTE 语句

递归 CTE 语句，逻辑复杂不易理解，而且每行记录的列，有聚合分解计算。

使用LATERAL 连接，不仅语句易读，而且可以节省10%的CPU计算时长。

WITH RECURSIVE cte AS (

    ( -- 需要括号

        SELECT *

        FROM log

        WHERE 1 = 1

        ORDER BY user_id, log_date DESC NULLS LAST

        LIMIT 1

    )

    UNION ALL

    SELECT l.*

    FROM cte c

             CROSS JOIN LATERAL (

        SELECT l.*

        FROM log l

        WHERE l.user_id > c.user_id -- lateral 参照条件

        ORDER BY l.user_id, l.log_date DESC NULLS LAST

        LIMIT 1

        ) l

)

    TABLE cte

        ORDER BY user_id;

（5）users字典表和 LATERAL 连接

只要user_id保证每个相关项恰好有一行，表格布局就几乎无关紧要，理想情况下，表格的物理排序与log表格同步。

查询语句，包含字典表和 LATERAL 连接。由于使用更简洁的查询树，执行时长较递归CTE节省10%。

CREATE TABLE users (

   user_id  INT PRIMARY KEY

 , username text NOT NULL

);

insert into users select generate_series(1,100) id, md5(id) ;

SELECT u.user_id, l.*

FROM users u

         cross join  LATERAL (

    SELECT l.*

    FROM log l

    WHERE l.user_id = u.user_id -- lateral参照

    ORDER BY l.log_date DESC NULLS LAST

    LIMIT 1

    ) l ;

（6）不使用 LATERAL 连接的 select 子查询

拥有users字典表时，也可以不依靠 LATERAL 连接，达到不读取多余记录的查询语句。

由于记录行，分解成若干列，需要CPU计算用时，比 LATERAL 连接多用时长10%，且与列的数量正相关。

SELECT  (combo1).*

FROM (

   SELECT u.user_id

        , (SELECT (l.*)::log

           FROM   log l

           WHERE  l.user_id = u.user_id

           ORDER  BY l.log_date DESC NULLS LAST

           LIMIT  1) AS combo1

   FROM   users u

   ) sub;

三、Lateral 的限制

数据类型转换，cast vs :: 。

这两种语法格式，都是“显式类型转换”，完全相同。在SQL代码中的某些特殊位置的表达式，只允许使用函数式表示法。

-- 合法一

SELECT elem[1], elem[2]

FROM   ( VALUES ('1,2'::TEXT) ) AS q(arr),

       LATERAL CAST(String_To_Array(q.arr, ',') AS INT[]) AS elem

;

-- 合法二

SELECT elem[1], elem[2]

FROM   ( VALUES ('1,2'::TEXT) ) AS q(arr),

       LATERAL  (select String_To_Array(q.arr, ',')::INT[] AS elem) as t

;

-- 非法

SELECT elem[1], elem[2]

FROM   ( VALUES ('1,2'::TEXT) ) AS q(arr),

       LATERAL String_To_Array(q.arr, ',')::int[] AS elem ；

错误:  语法错误 在 "::" 或附近的

第3行       LATERAL String_To_Array(q.arr, ',')::int[] AS elem;

另一种SQL语句，CREATE INDEX 语句，也会触发相同的错误信息。如果使用cast函数和数据类型名函数，则是合法的 create index 语句。
create index t02_i1 on t02 (id::int);

错误:  语法错误 在 "::" 或附近的

第1行create index t02_i2 on t02 (id::int);

-- 合法改写 CREATE INDEX 语句

create index t02_i1 on t02 ((id::int));

巴特西

KingbaseES 的 Lateral 连接

一、什么是 Lateral 连接

二、Lateral 的用途

1、语法糖

2、子查询增强模式

3、避免重复执行子查询或函数

（1）数据准备

（2）普通语句

（3）基本优化

（4）LATERAL 连接

（5）使用字典表和 LATERAL 连接

4、CTE或视图，含有分组自居和聚合函数

（1）过滤列不是分组列

（2）连接条件使用any (subquery)

5、分组查询，获得每个用户的最新时间，或者最新行

（1）数据准备

（2）普通语句

（3）递归 CTE 语句

（4）使用 LATERAL 连接的递归 CTE 语句

（5）users字典表和 LATERAL 连接

（6）不使用 LATERAL 连接的 select 子查询

三、Lateral 的限制

最新文章

热门文章