Re: Sequence vs UUID - Mailing list pgsql-general
From | veem v |
---|---|
Subject | Re: Sequence vs UUID |
Date | |
Msg-id | CAB+=1TV9gpAA-cGt_AShCmMA9fiu9jAa4egZF+oMELApOz-JYQ@mail.gmail.com Whole thread Raw |
In response to | Re: Sequence vs UUID (veem v <veema0000@gmail.com>) |
Responses |
Re: Sequence vs UUID
|
List | pgsql-general |
Was trying to test the performance for simple read/write for the bigint vs UUID. What we see is , ~3 times performance degradation while joining on bigint vs UUID columns. Also even just generation of sequence vs bigint itself is degrading by ~3times too. Also even insert performance on same table for ~10million rows is ~1min 39sec for bigint vs ~3minute 11 sec in case of UUID. Is such extent of degradation in performance this expected for UUID?
CREATE TABLE test1_UUID ( id bigint,source_id varchar(36) PRIMARY KEY, Name varchar(20) );
CREATE TABLE test2_UUID (id bigint,source_id varchar(36) PRIMARY KEY,Name varchar(20) );
CREATE TABLE test1_bigint ( id bigint PRIMARY KEY, source_id varchar(36) , Name varchar(20));
CREATE TABLE test2_bigint ( id bigint PRIMARY KEY, source_id varchar(36) , Name varchar(20));
Loaded same 10million rows.
Merge Join (cost=12.31..875534.52 rows=10000021 width=100) (actual time=0.042..6974.575 rows=10000000 loops=1)
-> Index Scan using test1_bigint_pkey on test1_bigint a (cost=0.43..362780.75 rows=10000021 width=50) (actual time=0.020..2070.079 rows=10000000 loops=1)
-> Index Scan using test2_bigint_2_pkey on test2_bigint b (cost=0.43..362780.75 rows=10000021 width=50) (actual time=0.019..2131.086 rows=10000000 loops=1)
Planning Time: 0.207 ms
Execution Time: 7311.210 ms
set enable_seqscan=off;
explain Analyze select * from test1_UUID a , test2_UUID b where a.source_id = b.source_id;
Merge Join (cost=2.75..2022857.05 rows=10000021 width=100) (actual time=0.043..21954.213 rows=10000000 loops=1)
Merge Cond: ((a.source_id)::text = (b.source_id)::text)
-> Index Scan using test1_uuid_pkey on test1_UUID a (cost=0.56..936420.18 rows=10000021 width=50) (actual time=0.022..7854.143 rows=10000000 loops=1)
-> Index Scan using test2_uuid_2_pkey on test2_UUID b (cost=0.56..936437.90 rows=10000021 width=50) (actual time=0.017..7971.187 rows=10000000 loops=1)
Planning Time: 0.516 ms
Execution Time: 22292.801 ms
**********
create sequence myseq cache 32767;
select count(nextval('myseq') ) from generate_series(1,10000000)
1 row retrieved starting from 1 in 4 s 521 ms (execution: 4 s 502 ms, fetching: 19 ms)
select count(gen_random_uuid()) from generate_series(1,10000000)
1 row retrieved starting from 1 in 11 s 145 ms (execution: 11 s 128 ms, fetching: 17 ms)
On Mon, 30 Jan, 2023, 4:59 pm veem v, <veema0000@gmail.com> wrote:
I have a question, As i understand here, usage wise there are multiple benefits of UUID over sequences like, in case of distributed app where we may not be able to rely on one point generator like sequences, in case of multi master architecture, sharding.If we just look in terms of performance wise, the key advantage of sequence is that for read queries, because of the storage size it will be smaller and thus it will cache more index rows and so will be beneficial during read queries and should also be beneficial even on joins because of its smaller size. Also fetching a value from sequence is cheaper than calculating the UUIDS. But the downside is during write operation, it can be a point of contention in case of concurrent data load as every incoming request will try to modify same table/index page/block. But as its mentioned in this blog (https://www.2ndquadrant.com/en/blog/sequential-uuid-generators/), state this UUID can be made sequential so even this can be sequential using prefix-timestamp etc. However isn't it that making the UUID sequential will again actually be a disadvantage and can be contention point for this unique index as each incoming write will now fight for same block/page while doing concurrent data load and will contend for the same table block or say one side of the index branch/leaf block etc, whereas in case of random UUIDs the write was spreading across multiple blocks so there was no contention on any specific blocks? Please correct if my understanding is wrong?On Sun, 29 Jan, 2023, 10:33 am Miles Elam, <miles.elam@productops.com> wrote:On Sat, Jan 28, 2023 at 8:02 PM Ron <ronljohnsonjr@gmail.com> wrote:
>
> Then it's not a Type 4 UUID, which is perfectly fine; just not random.
Yep, which is why it really should be re-versioned to UUIDv8 to be
pedantic. In everyday use though, almost certainly doesn't matter.
> Also, should now() be replaced by clock_timestamp(), so that it can be
> called multiple times in the same transaction?
Not necessary. Instead of 122 bits of entropy, you get 106 bits of
entropy and a new incremented prefix every minute. now() vs
clock_timestamp() wouldn't make a substantive difference. Should still
be reasonably safe against the birthday paradox for more than a
century when creating more than a million UUIDs per second.
pgsql-general by date: