无序uuid对数据库的影响

由于最近在做超大表的性能测试,在该过程中发现了无序uuid做主键对表插入性能有一定影响。结合实际情况发现当表的数据量越大,对表插入性能的影响也就越大。

测试环境

PostgreSQL创建插入脚本,测试各种情况的tps。

数据库版本:PostgreSQL 10.4 (ArteryBase 5.0.0, Thunisoft)

操作系统配置:CentOS Linux release 7 ,32GB内存,8 cpu

测试参数:pgbench -M prepared -r -n -j 8 -c 8 -T 60 -f /opt/thunisoft/pgbench_uuid_v4.sql -U sa pgbenchdb

空表,1000w数据,5000w数据,一亿数据的各种主键测试。

测试无序的uuid,有序的uuid,序列,有普通btree,有唯一索引和没有主键的情况

测试

1.创建表

--无序的uuid
pgbenchdb=# create table test_uuid_v4(id char(32) primary key);
CREATE TABLE
--有序的uuid
pgbenchdb=# create table test_time_nextval(id char(32) primary key);
CREATE TABLE
--递增序列
pgbenchdb=# create table test_seq_bigint(id int8 primary key);
CREATE TABLE
--创建序列
create sequence test_seq start with 1 ;

2.测试脚本

--测试无序uuid脚本
vi pgbench_uuid_v4.sql
insert into test_uuid_v4 (id) values (replace(uuid_generate_v4()::text,'-',''));
--测试有序uuid脚本
vi pgbench_time_nextval.sql
insert into test_time_nextval (id) values (replace(uuid_time_nextval()::text,'-',''));
--测试序列脚本
vi pgbench_seq_bigint.sql
insert into test_seq_bigint (id) values (nextval('test_seq'::regclass));

无序uuid,无数据情况

磁盘使用情况
avg-cpu: %user %nice %system %iowait %steal %idle
0.76 0.00 0.38 4.67 0.00 94.19 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 96.00 0.00 2048.00 42.67 1.02 10.67 0.00 10.67 10.33 99.20
dm-0 0.00 0.00 0.00 96.00 0.00 2048.00 42.67 1.02 10.66 0.00 10.66 10.32 99.10
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 tps:
[thunisoft@localhost thunisoft]$ pgbench -M prepared -r -n -j 8 -c 8 -T 60 -f /opt/thunisoft/pgbench_uuid_v4.sql -U sa pgbenchdb
transaction type: /opt/thunisoft/pgbench_uuid_v4.sql
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 8
duration: 60 s
number of transactions actually processed: 53494
latency average = 8.974 ms
tps = 891.495404 (including connections establishing)
tps = 891.588967 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
9.006 insert into test_uuid_v4 (id) values (replace(uuid_generate_v4()::text,'-',''));

无数据情况下,tps

       类别     |  第一次  | 第二次  | 第三次 | 平均值(tps) |%util |await
---------------+---------+---------+---------+---------+-------+-------
无序uuid | 919 | 907 | 891 | 906 | 99.2% | 10.66
有序uuid | 985 | 882 | 932 | 933 | 98.7% | 4.4
序列 | 1311 | 1277 | 1280 | 1289 | 97.5% | 3.4

向表里面初始化100w数据

pgbenchdb=# insert into test_uuid_v4 (id) select  replace(uuid_generate_v4()::text,'-','') from generate_series(1,1000000);
INSERT 0 1000000
Time: 43389.817 ms (00:43.390)
pgbenchdb=# insert into test_time_nextval (id) select replace(uuid_time_nextval()::text,'-','') from generate_series(1,1000000);
INSERT 0 1000000
Time: 30585.134 ms (00:30.585)
pgbenchdb=# insert into test_seq_bigint select generate_series (1,1000000);
INSERT 0 1000000
Time: 9818.639 ms (00:09.819)
无序uuid插入100w需要43s,有序需要30s,序列需要10s。

插入一百万数据后的tps

       类别     |  第一次  | 第二次  | 第三次 | 平均值(tps) |%util |await
---------------+---------+---------+---------+---------+-------+-------
无序uuid | 355 | 440 | 302 | 365 | 98.8% | 13
有序uuid | 948 | 964 | 870 | 927 | 97.2% | 4.0
序列 | 1159 | 1234 | 1115 | 1169 | 96.6% | 3.5

插入一千万数据后的tps

       类别     |  第一次  | 第二次  | 第三次 | 平均值(tps) |%util |await
---------------+---------+---------+---------+---------+-------+-------
无序uuid | 260 | 292 | 227 | 260 | 99.2% | 16.8
有序uuid | 817 | 960 | 883 | 870 | 97.7% | 3.9
序列 | 1305 | 1261 | 1270 | 1278 | 96.8% | 3.0

插入五千万数据后

向表中插入5kw数据,并且添加主键
pgbenchdb=# insert into test_time_nextval (id) select replace(uuid_time_nextval()::text,'-','') from generate_series(1,50000000);
INSERT 0 50000000
Time: 453985.318 ms (07:33.985)
pgbenchdb=# insert into test_seq_bigint select generate_series (1,50000000);
INSERT 0 50000000
Time: 352206.160 ms (05:52.206)
pgbenchdb=# insert into test_uuid_v4 (id) select replace(uuid_generate_v4()::text,'-','') from generate_series(1,50000000);
INSERT 0 50000000
Time: 1159689.338 ms (00:19:19.689) 在无主键情况下,插入五千万数据,有序uuid耗时7分钟,序列耗时6分钟,而无序uuid耗时接近20分钟。 pgbenchdb=# alter table test_uuid_v4 add primary key ("id");
ALTER TABLE
Time: 845199.296 ms (14:05.199)
pgbenchdb=# alter table test_time_nextval add primary key ("id");
ALTER TABLE
Time: 932151.103 ms (15:32.151)
pgbenchdb=# alter table test_seq_bigint add primary key ("id");
ALTER TABLE
Time: 148138.871 ms (02:28.139) pgbenchdb=# select pg_size_pretty(pg_total_relation_size('test_uuid_v4'));
pg_size_pretty
----------------
6072 MB
(1 row) Time: 0.861 ms
pgbenchdb=# select pg_size_pretty(pg_total_relation_size('test_time_nextval'));
pg_size_pretty
----------------
6072 MB
(1 row) Time: 0.942 ms
pgbenchdb=# select pg_size_pretty(pg_total_relation_size('test_seq_bigint'));
pg_size_pretty
----------------
2800 MB
(1 row) Time: 0.699 ms

插入5kw后

       类别     |  第一次  | 第二次  | 第三次 | 平均值(tps) |%util |await
---------------+---------+---------+---------+---------+-------+-------
无序uuid | 162 | 163 | 163 | 163 | 99.6% | 18.4
有序uuid | 738 | 933 | 979 | 883 | 97.7% | 3.9
序列 | 1132 | 1264 | 1265 | 1220 | 96.8% | 3.5

插入1亿条数据后

       类别     |  第一次  | 第二次  | 第三次 | 平均值(tps) |%util |await
---------------+---------+---------+---------+---------+-------+-------
无序uuid | 121 | 131 | 143 | 131 | 99.6% | 28.2
有序uuid | 819 | 795 | 888 | 834 | 99.2% | 28.7
序列 | 1193 | 1115 | 1109 | 1139 | 96.8% | 11.3

普通btree索引

上面测了无序uuid,1kw情况下,有主键的tps是260,无主键的tps是1234。尝试测试普通的索引,和唯一索引tps

--创建普通索引
pgbenchdb=# create index i_test_uuid_v4_id on test_uuid_v4(id);
CREATE INDEX
Time: 316367.010 ms (05:16.367)
--创建普通索引后
[thunisoft@localhost thunisoft]$ pgbench -M prepared -r -n -j 8 -c 8 -T 60 -f /opt/thunisoft/pgbench_uuid_v4.sql -U sa pgbenchdb
transaction type: /opt/thunisoft/pgbench_uuid_v4.sql
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 8
duration: 60 s
number of transactions actually processed: 13308
latency average = 36.080 ms
tps = 221.727391 (including connections establishing)
tps = 221.749660 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
38.512 insert into test_uuid_v4 (id) values (replace(uuid_generate_v4()::text,'-',''));
--创建唯一索引
pgbenchdb=# drop index i_test_uuid_v4_id;
DROP INDEX
Time: 267.451 ms
pgbenchdb=# create unique index i_test_uuid_v4_id on test_uuid_v4(id);
CREATE INDEX
Time: 153372.622 ms (02:33.373)
[thunisoft@localhost thunisoft]$ pgbench -M prepared -r -n -j 8 -c 8 -T 60 -f /opt/thunisoft/pgbench_uuid_v4.sql -U sa pgbenchdb
^[[3~transaction type: /opt/thunisoft/pgbench_uuid_v4.sql
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 8
duration: 60 s
number of transactions actually processed: 13847
latency average = 34.693 ms
tps = 230.593988 (including connections establishing)
tps = 230.620469 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
36.410 insert into test_uuid_v4 (id) values (replace(uuid_generate_v4()::text,'-',''));

无论是普通btree索引和唯一索引,都会影响插入的效率。

删除所有的主键索引

--删除所有主键
alter table test_uuid_v4 drop constraint "test_uuid_v4_pkey";
alter table test_time_nextval drop constraint "test_time_nextval_pkey" ;
alter table test_seq_bigint drop constraint "test_seq_bigint_pkey"; 1,--无序uuid:测试pgbench_uuid_v4.sql
[thunisoft@localhost thunisoft]$ pgbench -M prepared -r -n -j 8 -c 8 -T 60 -f /opt/thunisoft/pgbench_uuid_v4.sql -U sa pgbenchdb
transaction type: /opt/thunisoft/pgbench_uuid_v4.sql
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 8
duration: 60 s
number of transactions actually processed: 74109
latency average = 6.479 ms
tps = 1234.842229 (including connections establishing)
tps = 1235.042674 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
6.112 insert into test_uuid_v4 (id) values (replace(uuid_generate_v4()::text,'-','')); 2、--有序uuid,测试pgbench_time_nextval.sql
[thunisoft@localhost thunisoft]$ pgbench -M prepared -r -n -j 8 -c 8 -T 60 -f /opt/thunisoft/pgbench_time_nextval.sql -U sa pgbenchdb
transaction type: /opt/thunisoft/pgbench_time_nextval.sql
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 8
duration: 60 s
number of transactions actually processed: 74027
latency average = 6.486 ms
tps = 1233.364360 (including connections establishing)
tps = 1233.482292 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
6.186 insert into test_time_nextval (id) values (replace(uuid_time_nextval()::text,'-',''));
3、--序列,测试pgbench_seq_bigint.sql
[thunisoft@localhost thunisoft]$ pgbench -M prepared -r -n -j 8 -c 8 -T 60 -f /opt/thunisoft/pgbench_seq_bigint.sql -U sa pgbenchdb
transaction type: /opt/thunisoft/pgbench_seq_bigint.sql
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 8
duration: 60 s
number of transactions actually processed: 76312
latency average = 6.290 ms
tps = 1271.832907 (including connections establishing)
tps = 1272.124397 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
5.916 insert into test_seq_bigint (id) values (nextval('test_seq'::regclass));

删除主键约束后,三种情况下tps非常接近,都达到了1200+。

Btree索引,插入操作的平均tps对比

 类别/平均tps    |  无数据  | 一千万  | 五千万 | 一亿 		|
---------------+---------+---------+---------+---------+
无序uuid | 960 | 260 | 163 | 131 |
有序uuid | 933 | 870 | 883 | 834 |
序列 | 1289 | 1278 | 1220 | 1139 |

根据测试数据可以看出无序的uuid在数据到达1kw后插入数据的tps下降的非常厉害,而有序的uuid和递增序列下降的比较少。到一亿数据的tps有序uuid是无序的6倍,序列是无序uuid的9倍。

创建单独的表空间用来存储索引信息

如果有多快磁盘那么可以将索引和数据分开存储,以此来加快写入的速度。

创建单独的索引空间:

create tablespace indx_test owner sa location '/home/tablespace/index_test';

指定索引存储目录:

create index i_test_uuid_v4_id on test_uuid_v4 using btree(id) tablespace indx_test;

关于有序uuid

测试使用的sequential-uuids插件,生成的有序uuid。

有序uuid的结构为(block ID; random data),实际上就是把数据拆成两部分,一部分自增,一部分随机。

sequential-uuids

sequential-uuids-git

提供了两种算法:

1.uuid_sequence_nextval(sequence regclass, block_size int default 65536, block_count int default 65536)

前缀为自增序列,如果块ID使用2字节存储,一个索引BLOCK里面可以存储256条记录(假设8K的BLOCK,一条记录包括uuid VALUE(16字节)以及ctid(6字节),所以一个索引页约存储363条记录(8000 /(16 + 6)))

2.uuid_time_nextval(interval_length int default 60, interval_count int default 65536) RETURNS uuid

默认每60秒内的数据的前缀是一样的,前缀递增1,到65535后循环。

使用uuid_time_nextval生成的有序uuid
pgbenchdb=# select id from test_time_nextval;
id
----------------------------------
a18b7dd0ca92b0b5c1844a402f9c6999
a18b540b8bbe0ddb2b6d0189b2e393c6
a18b83eb7320b0a90e625185421e065e
a18bade4ff15e05dab81ecd3f4c2dee4
a18b79e41c3bc8d2d4ba4b70447e6b29
a18bdad18d9e0d2fa1d9d675bc7129f0
a18b13723ec7be9a2f1a3aec5345a88b
a18bd9d866047aec69a064d30e9493d2
a18bd76e8c787c7464479502f381e6d7
a18ba5c0c966f81cfdbeff866618da8d
......

有序uuid前四位有序,后面的随机生成。

结语

1.关于有序的uuid,前4位是有序的,后面都是随机生成的。

2.在该环境中发现,无序uuid随着数据量的不断增大,tps下滑比较厉害。

3.由于btree索引的存在,无序的uuid会导致大量的离散io。导致磁盘使用率高。进而影响插入效率。随着表数据量的增大更加明显。

4.该测试是在普通的磁盘上面测试,并未在ssd上面测试。

5.如果要使用有序uuid,有多种实现方式,还需要考虑分布式情况下生成全局有序uuid。

最新文章

  1. MySQL自增ID 起始值 修改方法
  2. javase基础复习攻略《六》
  3. oracle断电重启之ORA-00600[4194]
  4. Flip Game I && II
  5. 解决:未找到setenv命令
  6. 微软Hololens学院教程-Hologram 230-空间场景建模(Spatial mapping )【微软教程已经更新,本文是老版本】
  7. storm实战总结笔记
  8. Django中的F和Q函数
  9. CF368 D - Persistent Bookcase
  10. TLS详解
  11. Servlet(一):Hello Servlet
  12. Create and Embed an Application Manifest (UAC)
  13. Unity shader学习之屏幕后期处理效果之运动模糊
  14. 一个OpenGL小程序
  15. 跟我学SharePoint 2013视频培训课程——探索默认的列表和库(6)
  16. 不应直接存储或返回可变成员 Mutable members should not be stored or returned directly
  17. Rails 5 Test Prescriptions 第8章 Integration Testing with Capybara and Cucumber
  18. 03-SSH综合案例:商城表关系分析
  19. 关于Mybatis 反向生成后 查询结果全部为null 解决办法
  20. JAVA的静态代理与动态代理比较--转载

热门文章

  1. PAT 乙级 -- 1013 -- 数素数
  2. poj2987最大权闭包(输出最少建塔个数)
  3. WindowsPE 第七章 资源表
  4. Redis—简单动态字符串(SDS)
  5. HellowWorld详解
  6. Tomcat启动乱码解决
  7. promise用法解析
  8. 自定义WPF分页控件
  9. MSSQL·最长输出长度限制之解决方案
  10. cms菜单栏二级折叠与交互解决方案(js)(1)