Re: Sequence vs UUID - Mailing list pgsql-general

From Kirk Wolak
Subject Re: Sequence vs UUID
Date
Msg-id CACLU5mSVke5W-gY4cvkL_XhgZx26fKkYbyZ_KLaNjxmK=gfEog@mail.gmail.com
Whole thread Raw
In response to Re: Sequence vs UUID  (veem v <veema0000@gmail.com>)
Responses Re: Sequence vs UUID  (Miles Elam <miles.elam@productops.com>)
List pgsql-general


On Wed, Feb 1, 2023 at 1:34 PM veem v <veema0000@gmail.com> wrote:
I tried to test quickly below on dbfiddle, below with the UUID as data type and in each of the below cases the UUID performance seems drastically reduced as compared to sequence performance. Let me know if anything is wrong in my testing here?

1) sequence generation vs UUID generation, execution time increased from ~291ms to 5655ms.
2) Insert performance of "sequence" vs "UUID"  execution time increased from ~2031ms to 10599ms.
3) Index performance for sequence vs UUID,  execution time increased from ~.3ms to .5ms.


Yes, assuming that UUIDs would be efficient as keys when they are randomly generated, versus sequences (which tend to expand in one direction, and have been relatively optimized for years).

This article explains in detail what is going on.  If I were doing this, I would strongly consider a ULID because of getting the best of both worlds.

Of course, YMMV...  And since ULIDs are not native to PG, there is overhead, but it is far more performant, IMO...

Also, I hold out hope that one of the Gods of PostgreSQL on this list might give us an internal ULID implementation fixing that last problem...

HTH

 

 PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit

CREATE UNLOGGED TABLE test_bigint (  id bigint  PRIMARY KEY);
CREATE UNLOGGED TABLE test_uuid (   id uuid  PRIMARY KEY);
create sequence myseq cache 32767;

************ sequence generation vs UUID generation Test**************
explain analyze  select count(nextval('myseq') ) from generate_series(1,100000);

QUERY PLAN
Aggregate  (cost=1500.00..1500.01 rows=1 width=8) (actual time=291.030..291.030 rows=1 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1000.00 rows=100000 width=0) (actual time=53.332..63.941 rows=100000 loops=1)
Planning Time: 0.155 ms
Execution Time: 291.719 ms

explain analyze  select count(gen_random_uuid()) from generate_series(1,100000);

QUERY PLAN
Aggregate  (cost=1500.00..1500.01 rows=1 width=8) (actual time=5654.453..5654.454 rows=1 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1000.00 rows=100000 width=0) (actual time=84.328..514.214 rows=100000 loops=1)
Planning Time: 0.082 ms
Execution Time: 5655.158 ms

*************** Insert Test***************

explain analyze INSERT INTO test_bigint select nextval('myseq')  from generate_series(1,100000);

QUERY PLAN
Insert on test_bigint  (cost=0.00..2250.00 rows=0 width=0) (actual time=2030.960..2030.961 rows=0 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1250.00 rows=100000 width=8) (actual time=48.102..636.311 rows=100000 loops=1)
Planning Time: 0.065 ms
Execution Time: 2031.469 ms

explain analyze  INSERT INTO test_uuid  select gen_random_uuid()  from generate_series(1,100000);


QUERY PLAN
Insert on test_uuid  (cost=0.00..2250.00 rows=0 width=0) (actual time=10599.230..10599.230 rows=0 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1250.00 rows=100000 width=16) (actual time=36.975..6289.811 rows=100000 loops=1)
Planning Time: 0.056 ms
Execution Time: 10599.805 ms

************ Index performance********

Explain analyze select * from test_bigint where id in (select id from test_bigint limit 100);

QUERY PLAN
Nested Loop  (cost=2.98..734.71 rows=100 width=8) (actual time=0.083..0.269 rows=100 loops=1)
  ->  HashAggregate  (cost=2.69..3.69 rows=100 width=8) (actual time=0.046..0.062 rows=100 loops=1)
        Group Key: test_bigint_1.id
        Batches: 1  Memory Usage: 24kB
        ->  Limit  (cost=0.00..1.44 rows=100 width=8) (actual time=0.011..0.025 rows=100 loops=1)
              ->  Seq Scan on test_bigint test_bigint_1  (cost=0.00..1444.18 rows=100118 width=8) (actual time=0.011..0.017 rows=100 loops=1)
  ->  Index Only Scan using test_bigint_pkey on test_bigint  (cost=0.29..7.31 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=100)
        Index Cond: (id = test_bigint_1.id)
        Heap Fetches: 100
Planning Time: 0.279 ms
Execution Time: 0.302 ms

Explain analyze  select * from test_uuid where  id in (select id from test_uuid limit 100);

QUERY PLAN
Nested Loop  (cost=3.21..783.31 rows=100 width=16) (actual time=0.080..0.474 rows=100 loops=1)
  ->  HashAggregate  (cost=2.79..3.79 rows=100 width=16) (actual time=0.046..0.066 rows=100 loops=1)
        Group Key: test_uuid_1.id
        Batches: 1  Memory Usage: 24kB
        ->  Limit  (cost=0.00..1.54 rows=100 width=16) (actual time=0.010..0.025 rows=100 loops=1)
              ->  Seq Scan on test_uuid test_uuid_1  (cost=0.00..1541.85 rows=100085 width=16) (actual time=0.009..0.016 rows=100 loops=1)
  ->  Index Only Scan using test_uuid_pkey on test_uuid  (cost=0.42..7.79 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=100)
        Index Cond: (id = test_uuid_1.id)
        Heap Fetches: 100
Planning Time: 0.180 ms
Execution Time: 0.510 ms

On Tue, 31 Jan 2023 at 03:28, Ron <ronljohnsonjr@gmail.com> wrote:

And populate that column with UUIDs generated by the gen_random_uuid() function.

(Requires v13.)

On 1/30/23 13:46, Adrian Klaver wrote:
> On 1/30/23 11:43, veem v wrote:
>> Thank You So much for the details. I am a bit new to postgres. And these
>> test results I picked were from a dev system. If I understand it
>> correctly, do you mean these settings(usage of C locale or "native"
>> 16-byte uuid) which you mentioned should be there in a production system
>>   and thus we should test the performance of the UUID vs sequence on a
>> similar setup? Or say if this sort of degradation of UUID performance is
>> not expected then , how to get these settings tweaked ON, so as to see
>> the best string type or UUID performance, can you please guide me here?
>
> No what is being said is change:
>
> source_id varchar(36)
>
> to
>
> source_id uuid
>
> as i:
>
> https://www.postgresql.org/docs/current/datatype-uuid.html
>
>>
>> On Mon, 30 Jan 2023 at 22:18, Tom Lane <tgl@sss.pgh.pa.us
>> <mailto:tgl@sss.pgh.pa.us>> wrote:
>>
>>     Dominique Devienne <ddevienne@gmail.com
>>     <mailto:ddevienne@gmail.com>> writes:
>>      > On Mon, Jan 30, 2023 at 5:11 PM veem v <veema0000@gmail.com
>>     <mailto:veema0000@gmail.com>> wrote:
>>      >> CREATE TABLE test1_UUID ( id bigint,source_id varchar(36)
>>     PRIMARY KEY, Name varchar(20) );
>>
>>      > Maybe if you used a "native" 16-byte uuid, instead of its textual
>>      > serialization with dashes (36 bytes + length overhead), the gap would
>>      > narrow.
>>
>>     Yeah, especially if your database is not using C locale. The
>>     strcoll or ICU-based comparisons done on string types can be
>>     enormously more expensive than the memcmp() used for binary
>>     types like native uuid.
>>
>>                              regards, tom lane
>>
>

--
Born in Arizona, moved to Babylonia.


pgsql-general by date:

Previous
From: Christophe Pettus
Date:
Subject: Re: VACUUM vs VACUUM FULL (was: SELECT * FROM huge_table LIMIT 10; Why does it take more than 10 min to complete, with cold caches)
Next
From: Adrian Klaver
Date:
Subject: Re: Best Open Source OS for Postgresql