转载自:https://yq.aliyun.com/articles/7593

函数作用:

gp_dist_random('gp_id')本质上就是在所有节点查询gp_id,
gp_dist_random('pg_authid')就是在所有节点查询pg_authid,

使用greenplum时,如果需要调用一个函数,这个函数很可能就在master执行,而不会跑到segment上去执行。
例如 random()函数。
通过select random()来调用的话,不需要将这条SQL发送到segment节点,所以执行计划如下,没有gather motion的过程。

postgres=# explain analyze select random();
QUERY PLAN
----------------------------------------------------------------------------------------
Result (cost=0.01..0.02 rows=1 width=0)
Rows out: 1 rows with 0.017 ms to end, start offset by 0.056 ms.
InitPlan
-> Result (cost=0.00..0.01 rows=1 width=0)
Rows out: 1 rows with 0.004 ms to end of 2 scans, start offset by 0.059 ms.
Slice statistics:
(slice0) Executor memory: 29K bytes.
(slice1) Executor memory: 29K bytes.
Statement statistics:
Memory used: 128000K bytes
Total runtime: 0.074 ms
(11 rows)

如果要让这个函数在segment执行,怎么办呢?
通过gp_dist_random('gp_id')来调用,gp_dist_random的参数是一个可查询的视图,或表。

postgres=# explain analyze select random() from gp_dist_random('gp_id');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 240:1 (slice1; segments: 240) (cost=0.00..4.00 rows=240 width=0)
Rows out: 240 rows at destination with 6.336 ms to first row, 59 ms to end, start offset by 4195 ms.
-> Seq Scan on gp_id (cost=0.00..4.00 rows=1 width=0)
Rows out: Avg 1.0 rows x 240 workers. Max 1 rows (seg0) with 0.073 ms to first row, 0.075 ms to end, start offset by 4207 ms.
Slice statistics:
(slice0) Executor memory: 471K bytes.
(slice1) Executor memory: 163K bytes avg x 240 workers, 163K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Total runtime: 4279.445 ms
(10 rows)

gp_id在每个segment中都有一条记录,所以以上SQL会在每个SEGMENT中调用一次random()并返回所有结果,例如我的测试环境中有240个segment, 那么以上SQL将返回240条记录。

在gp_id的定义中,介绍了gp_dist_random用它可以做一些管理的工作:
譬如查询数据库的大小,查询表的大小,其实都是这样统计的。
src/backend/catalog/postgres_bki_srcs

/*-------------------------------------------------------------------------
*
* gp_id.h
* definition of the system "database identifier" relation (gp_dbid)
* along with the relation's initial contents.
*
* Copyright (c) 2009-2010, Greenplum inc
*
* NOTES
* Historically this table was used to supply every segment with its
* identification information. However in the 4.0 release when the file
* replication feature was added it could no longer serve this purpose
* because it became a requirement for all tables to have the same physical
* contents on both the primary and mirror segments. To resolve this the
* information is now passed to each segment on startup based on the
* gp_segment_configuration (stored on the master only), and each segment
* has a file in its datadirectory (gp_dbid) that uniquely identifies the
* segment.
*
* The contents of the table are now irrelevant, with the exception that
* several tools began relying on this table for use as a method of remote
* function invocation via gp_dist_random('gp_id') due to the fact that this
* table was guaranteed of having exactly one row on every segment. The
* contents of the row have no defined meaning, but this property is still
* relied upon.
*/
#ifndef _GP_ID_H_
#define _GP_ID_H_ #include "catalog/genbki.h"
/*
* Defines for gp_id table
*/
#define GpIdRelationName "gp_id" /* TIDYCAT_BEGINFAKEDEF CREATE TABLE gp_id
with (shared=true, oid=false, relid=5001, content=SEGMENT_LOCAL)
(
gpname name ,
numsegments smallint ,
dbid smallint ,
content smallint
); TIDYCAT_ENDFAKEDEF
*/

查询数据库大小的GP函数

postgres=# \df+ pg_database_size
List of functions
Schema | Name | Result data type | Argument data types | Type | Data access | Volatility | Owner | Language | Source code | Description
------------+------------------+------------------+---------------------+--------+----------------+------------+----------+----------+-----------------------+-------------------------------------------------------------
pg_catalog | pg_database_size | bigint | name | normal | reads sql data | volatile | dege.zzz | internal | pg_database_size_name | Calculate total disk space usage for the specified database
pg_catalog | pg_database_size | bigint | oid | normal | reads sql data | volatile | dege.zzz | internal | pg_database_size_oid | Calculate total disk space usage for the specified database
(2 rows)

其中pg_database_size_name 的源码如下:
很明显,在统计数据库大小时也用到了select sum(pg_database_size('%s'))::int8 from gp_dist_random('gp_id');

Datum
pg_database_size_name(PG_FUNCTION_ARGS)
{
int64 size = 0;
Name dbName = PG_GETARG_NAME(0);
Oid dbOid = get_database_oid(NameStr(*dbName)); if (!OidIsValid(dbOid))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_DATABASE),
errmsg("database \"%s\" does not exist",
NameStr(*dbName)))); size = calculate_database_size(dbOid); if (Gp_role == GP_ROLE_DISPATCH)
{
StringInfoData buffer; initStringInfo(&buffer); appendStringInfo(&buffer, "select sum(pg_database_size('%s'))::int8 from gp_dist_random('gp_id');", NameStr(*dbName)); size += get_size_from_segDBs(buffer.data);
} PG_RETURN_INT64(size);
}

不信我们可以直接查询这个SQL,和使用pg_database_size函数得到的结果几乎是一样的,只差了calculate_database_size的部分。

postgres=# select sum(pg_database_size('postgres'))::int8 from gp_dist_random('gp_id');
sum
----------------
16006753522624
(1 row) postgres=# select pg_database_size('postgres');
pg_database_size
------------------
16006763924106
(1 row)

gp_dist_random('gp_id')本质上就是在所有节点查询gp_id,
gp_dist_random('pg_authid')就是在所有节点查询pg_authid,
例如:

postgres=# select * from gp_dist_random('gp_id');
gpname | numsegments | dbid | content
-----------+-------------+------+---------
Greenplum | -1 | -1 | -1
Greenplum | -1 | -1 | -1
Greenplum | -1 | -1 | -1
Greenplum | -1 | -1 | -1
Greenplum | -1 | -1 | -1
Greenplum | -1 | -1 | -1
Greenplum | -1 | -1 | -1
Greenplum | -1 | -1 | -1
Greenplum | -1 | -1 | -1
Greenplum | -1 | -1 | -1
。。。。。。

如果不想返回太多记录,可以使用limit 来过滤,但是执行还是会在所有的segment都执行,如下:

postgres=# explain analyze select random() from gp_dist_random('gp_id') limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.04 rows=1 width=0)
Rows out: 1 rows with 5.865 ms to first row, 5.884 ms to end, start offset by 4212 ms.
-> Gather Motion 240:1 (slice1; segments: 240) (cost=0.00..0.04 rows=1 width=0)
Rows out: 1 rows at destination with 5.857 ms to end, start offset by 4212 ms.
-> Limit (cost=0.00..0.02 rows=1 width=0)
Rows out: Avg 1.0 rows x 240 workers. Max 1 rows (seg0) with 0.062 ms to first row, 0.063 ms to end, start offset by 4228 ms.
-> Seq Scan on gp_id (cost=0.00..4.00 rows=1 width=0)
Rows out: Avg 1.0 rows x 240 workers. Max 1 rows (seg0) with 0.060 ms to end, start offset by 4228 ms.
Slice statistics:
(slice0) Executor memory: 463K bytes.
(slice1) Executor memory: 163K bytes avg x 240 workers, 163K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Total runtime: 4288.007 ms
(14 rows)

最新文章

  1. Mybatis XML配置
  2. Python for Infomatics 第14章 数据库和SQL应用二(译)
  3. goroutine
  4. 腾讯云环境配置之PHP5.6.3 + redis扩展 稳定版
  5. BackgroundWorker的使用
  6. 远控软件VNC攻击案例研究
  7. 日期字符串转换 and 两个日期相减
  8. 是否以某字符串结尾 是否以某字符串开始 是否是整数 裁减字符串空格 是否是浮点数 是否所有字符为数字类型 是否为空 是否是EMAIL 是否是电话号码 身份证号码验证-支持新的带x身份证 日期验证
  9. redundant 行记录格式
  10. 【数据库】Mean web开发 02-Windows下Mongodb安装配置及常用客户端管理工具
  11. 201521123057 《Java程序设计》第13周学习总结
  12. Float精度丢失
  13. 【Django】 gunicorn部署纪要
  14. newJob_newFell
  15. MySQL高级知识(七)——索引面试题分析
  16. 五大理由分析Springboot 2.0为什么选择HikariCP
  17. ActiveQt框架 禁止弹出ActiveX控件交互提示
  18. CRM 2013 批量更新two options的缺省值
  19. [IR] Open Source Search Engines
  20. Spring MVC 异步测试

热门文章

  1. css拓展
  2. 从其他数据库迁移到MySQL及MySQL特点
  3. Scala 面向对象编程之Trait
  4. PB数据窗口分页
  5. Centos 安装PHP-redis扩展
  6. Java版Kafka使用及配置解释
  7. Access-Control-Max-Age
  8. 线程三(Mutex)
  9. Python基本数据类型及实例详解
  10. SVM-支持向量机总结