Re: Sequence vs UUID - Mailing list pgsql-general

From veem v
Subject Re: Sequence vs UUID
Date
Msg-id CAB+=1TW-+gvMiTdb6fkx0LPE4c6+HPYn-MNWDTTc4sj8pdeyVg@mail.gmail.com
Whole thread Raw
In response to Re: Sequence vs UUID  (Benedict Holland <benedict.m.holland@gmail.com>)
Responses Re: Sequence vs UUID  (Miles Elam <miles.elam@productops.com>)
List pgsql-general
Tested the UUIDv7 generator for postgres as below. 

With regards to performance , It's still way behind the sequence. I was expecting the insert performance of UUID v7 to be closer to the sequence ,  but it doesn't seem so, as it's 500ms vs 3000ms. And the generation takes a lot longer time as compared to sequence too i.e. 59ms vs 1700ms. Read time or the index scan looks close i.e. 2.3ms vs 2.6ms.

*************** UUID7 Alternate way ***********************
create or replace function uuid_generate_v7()
returns uuid
as $$
declare
  unix_ts_ms bytea;
  uuid_bytes bytea;
begin
  unix_ts_ms = substring(int8send(floor(extract(epoch from clock_timestamp()) * 1000)::bigint) from 3);

  -- use random v4 uuid as starting point (which has the same variant we need)
  uuid_bytes = uuid_send(gen_random_uuid());

  -- overlay timestamp
  uuid_bytes = overlay(uuid_bytes placing unix_ts_ms from 1 for 6);

  -- set version 7
  uuid_bytes = set_byte(uuid_bytes, 6, (b'0111' || get_byte(uuid_bytes, 6)::bit(4))::bit(8)::int);

  return encode(uuid_bytes, 'hex')::uuid;
end
$$
language plpgsql
volatile;


*************** Postgress *****************
CREATE UNLOGGED TABLE test_bigint (  id bigint  PRIMARY KEY);
 CREATE UNLOGGED TABLE test_uuid (   id uuid  PRIMARY KEY);
CREATE UNLOGGED TABLE test_uuid7 (   id uuid  PRIMARY KEY);
create sequence myseq cache 32767;

************* sequence generation vs UUID generation Test**************
explain analyze  select count(nextval('myseq') ) from generate_series(1,100000);

QUERY PLAN
Aggregate  (cost=1500.00..1500.01 rows=1 width=8) (actual time=59.037..59.039 rows=1 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1000.00 rows=100000 width=0) (actual time=18.541..34.200 rows=100000 loops=1)
Planning Time: 0.099 ms
Execution Time: 59.687 ms

explain analyze  select count(gen_random_uuid()) from generate_series(1,100000);

QUERY PLAN
Aggregate  (cost=1500.00..1500.01 rows=1 width=8) (actual time=900.633..900.634 rows=1 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1000.00 rows=100000 width=0) (actual time=12.893..65.820 rows=100000 loops=1)
Planning Time: 0.051 ms
Execution Time: 904.868 ms

explain analyze  select count(uuid_generate_v7()) from generate_series(1,100000);

QUERY PLAN
Aggregate  (cost=26250.00..26250.01 rows=1 width=8) (actual time=1710.609..1710.610 rows=1 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1000.00 rows=100000 width=0) (actual time=21.807..69.168 rows=100000 loops=1)
Planning Time: 0.048 ms
Execution Time: 1711.187 ms

*************** Insert Test***************

explain analyze INSERT INTO test_bigint select nextval('myseq')  from generate_series(1,100000);

QUERY PLAN
Insert on test_bigint  (cost=0.00..2250.00 rows=0 width=0) (actual time=551.707..551.708 rows=0 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1250.00 rows=100000 width=8) (actual time=12.438..91.337 rows=100000 loops=1)
Planning Time: 0.053 ms
Execution Time: 552.240 ms

explain analyze  INSERT INTO test_uuid  select gen_random_uuid()  from generate_series(1,100000);

QUERY PLAN
Insert on test_uuid  (cost=0.00..2250.00 rows=0 width=0) (actual time=2040.743..2040.744 rows=0 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1250.00 rows=100000 width=16) (actual time=12.829..982.446 rows=100000 loops=1)
Planning Time: 0.051 ms
Execution Time: 2041.242 ms

explain analyze  INSERT INTO test_uuid7  select uuid_generate_v7()  from generate_series(1,100000);

QUERY PLAN
Insert on test_uuid7  (cost=0.00..27000.00 rows=0 width=0) (actual time=3234.456..3234.457 rows=0 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..26000.00 rows=100000 width=16) (actual time=12.453..1933.217 rows=100000 loops=1)
Planning Time: 0.051 ms
Execution Time: 3234.971 ms

************ Index performance********

Explain analyze select * from test_bigint where id in (select id from test_bigint limit 1000);

EXPLAIN
QUERY PLAN
Nested Loop  (cost=27.22..1322.70 rows=1000 width=8) (actual time=0.414..2.307 rows=1000 loops=1)
  ->  HashAggregate  (cost=26.92..28.92 rows=200 width=8) (actual time=0.370..0.520 rows=1000 loops=1)
        Group Key: test_bigint_1.id
        Batches: 1  Memory Usage: 145kB
        ->  Limit  (cost=0.00..14.42 rows=1000 width=8) (actual time=0.012..0.163 rows=1000 loops=1)
              ->  Seq Scan on test_bigint test_bigint_1  (cost=0.00..1444.18 rows=100118 width=8) (actual time=0.011..0.093 rows=1000 loops=1)
  ->  Index Only Scan using test_bigint_pkey on test_bigint  (cost=0.29..6.53 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=1000)
        Index Cond: (id = test_bigint_1.id)
        Heap Fetches: 1000
Planning Time: 0.395 ms
Execution Time: 2.395 ms


Explain analyze  select * from test_uuid where  id in (select id from test_uuid limit 1000);

QUERY PLAN
Nested Loop  (cost=28.32..1459.58 rows=1000 width=16) (actual time=0.431..3.355 rows=1000 loops=1)
  ->  HashAggregate  (cost=27.91..29.91 rows=200 width=16) (actual time=0.399..0.556 rows=1000 loops=1)
        Group Key: test_uuid_1.id
        Batches: 1  Memory Usage: 145kB
        ->  Limit  (cost=0.00..15.41 rows=1000 width=16) (actual time=0.011..0.185 rows=1000 loops=1)
              ->  Seq Scan on test_uuid test_uuid_1  (cost=0.00..1541.85 rows=100085 width=16) (actual time=0.010..0.093 rows=1000 loops=1)
  ->  Index Only Scan using test_uuid_pkey on test_uuid  (cost=0.42..7.21 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=1000)
        Index Cond: (id = test_uuid_1.id)
        Heap Fetches: 1000
Planning Time: 0.234 ms
Execution Time: 3.419 ms

Explain analyze  select * from test_uuid7 where  id in (select id from test_uuid7 limit 1000);

QUERY PLAN
Nested Loop  (cost=28.32..1416.01 rows=1000 width=16) (actual time=0.403..2.586 rows=1000 loops=1)
  ->  HashAggregate  (cost=27.91..29.91 rows=200 width=16) (actual time=0.371..0.546 rows=1000 loops=1)
        Group Key: test_uuid7_1.id
        Batches: 1  Memory Usage: 145kB
        ->  Limit  (cost=0.00..15.41 rows=1000 width=16) (actual time=0.011..0.161 rows=1000 loops=1)
              ->  Seq Scan on test_uuid7 test_uuid7_1  (cost=0.00..1541.85 rows=100085 width=16) (actual time=0.010..0.091 rows=1000 loops=1)
  ->  Index Only Scan using test_uuid7_pkey on test_uuid7  (cost=0.42..6.99 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=1000)
        Index Cond: (id = test_uuid7_1.id)
        Heap Fetches: 1000
Planning Time: 0.101 ms
Execution Time: 2.661 ms

On Thu, 2 Feb 2023 at 20:52, Benedict Holland <benedict.m.holland@gmail.com> wrote:
Well... until two processes generate an identical UUID. That happened to me several times. It's rare but when that happens, oh boy that is a mess to figure out.

Thanks,
Ben

On Thu, Feb 2, 2023, 10:17 AM Miles Elam <miles.elam@productops.com> wrote:
On Wed, Feb 1, 2023 at 10:48 AM Kirk Wolak <wolakk@gmail.com> wrote:


On Wed, Feb 1, 2023 at 1:34 PM veem v <veema0000@gmail.com> wrote:

1) sequence generation vs UUID generation, execution time increased from ~291ms to 5655ms.
2) Insert performance of "sequence" vs "UUID"  execution time increased from ~2031ms to 10599ms.
3) Index performance for sequence vs UUID,  execution time increased from ~.3ms to .5ms.


Yes, assuming that UUIDs would be efficient as keys when they are randomly generated, versus sequences (which tend to expand in one direction, and have been relatively optimized for years).

This article explains in detail what is going on.  If I were doing this, I would strongly consider a ULID because of getting the best of both worlds.

Of course, YMMV...  And since ULIDs are not native to PG, there is overhead, but it is far more performant, IMO...

Biased comparison. ULIDs have a timestamp component. The closest UUID equivalent in Postgres is UUIDv1 from the uuid-ossp extension, not v4. Another difference not mentioned in the blog article is that UUID is versioned, meaning you can figure out what kind of data is in the UUID, whereas ULIDs are a "one size fits all" solution.

There is an implementation of sequential UUIDs for Postgres I posted earlier in this thread. In addition, here is an implementation of UUIDv7 for Postgres:

I would suggest running your tests against v1, v7, and sequential UUID before jumping on ULID, which has no native type/indexing in Postgres.

It should also be noted that apps cannot provide a bigint ID due to collisions, but an app can generate UUIDs and ULIDs without fear, essentially shifting the generation time metric in UUID/ULID's favor over a bigserial.

- Miles


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: From Clause Conditional
Next
From: "Joe Wildish"
Date:
Subject: Re: Logical Replication - "invalid ordering of speculative insertion changes"