Test report on GENERATED/IDENTITY - Mailing list pgsql-hackers

From Zoltan Boszormenyi
Subject Test report on GENERATED/IDENTITY
Date
Msg-id 45EEE6B5.2020305@dunaweb.hu
Whole thread Raw
List pgsql-hackers
Hi,

I made some tests to prove that GENERATED can help
boost performance. I created a table like this:

create table t1 ( id serial, i1 integer, i2 integer, g1 integer generated always as (         case when i1 is null then
i2               when i2 is null then i1                else i1 + i2 end));
 

I inserted 1 million records into the table:
 for (i = 1; i <= 1000; i++)   for (j = 1; j <= 1000; j++)      INSERT INTO t1 (i1, i2) VALUES (i, j);

After VACUUM FULL ANALYZE,
I timed  SELECT id, i1, i2, g1 FROM t1
and SELECT id, i1, i2, <generation expression> FROM t1,
result redirected to /dev/null. Results of ten consecutive runs are:

SELECT id, i1, i2, g1 FROM t1
------------------------------------------

2.59user 0.18system 0:04.75elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33536minor)pagefaults 0swaps
2.57user 0.19system 0:04.59elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34068minor)pagefaults 0swaps
2.60user 0.16system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33826minor)pagefaults 0swaps
2.59user 0.17system 0:04.82elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34067minor)pagefaults 0swaps
2.59user 0.16system 0:04.61elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34068minor)pagefaults 0swaps
2.61user 0.17system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33574minor)pagefaults 0swaps
2.59user 0.18system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34067minor)pagefaults 0swaps
2.59user 0.18system 0:04.67elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32147minor)pagefaults 0swaps
2.58user 0.19system 0:04.63elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33584minor)pagefaults 0swaps
2.73user 0.16system 0:04.94elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34066minor)pagefaults 0swaps

Average is 4.68 seconds.

SELECT id, i1, i2, <generation expression> FROM t1
------------------------------------------

2.76user 0.18system 0:05.49elapsed 53%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33945minor)pagefaults 0swaps
2.69user 0.17system 0:05.23elapsed 54%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33917minor)pagefaults 0swaps
2.60user 0.18system 0:05.04elapsed 55%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32027minor)pagefaults 0swaps
2.64user 0.16system 0:05.03elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32626minor)pagefaults 0swaps
2.60user 0.15system 0:05.03elapsed 54%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32550minor)pagefaults 0swaps
2.77user 0.18system 0:05.21elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32609minor)pagefaults 0swaps
2.71user 0.17system 0:05.07elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33043minor)pagefaults 0swaps
2.66user 0.17system 0:05.12elapsed 55%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33042minor)pagefaults 0swaps
2.80user 0.16system 0:05.19elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33043minor)pagefaults 0swaps
2.71user 0.16system 0:05.14elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32052minor)pagefaults 0swaps

Average is 5.16 seconds.

Using a single GENERATED column gains a good 9.4% in runtime.

With the IDENTITY, the picture is not so bright.
I expected some cost but not that much. Why is
[simple_]heap_update() so expensive? I created a table and
times inserting 1 million rows into it:

create table t2 (id serial, i1 integer);
or
create table t2 (id serial generated always as identity, i1 integer);

Using a serial column gave me about 12 seconds
on the average of 5 runs. With an IDENTITY column,
I got 61 seconds once and 66 seconds twice.
So, the strictness of the identity column gave me 500-550%
performance penalty.

With a single unique index on i1, I got 24.4 seconds
with the serial column and 67 seconds for the identity
column. I run these only once so this last one isn't
representative.

I tried to use heap_inplace_update() to update the
newly updated or inserted tuple in place but it gave me

ERROR:  heap_inplace_update: wrong tuple length

even when I already filled the IDENTITY column with
a constant Datum with an Int64 value 0 converted to
the type of the column.

If I read it correctly, the HOT patch would give me
a speedup for this case?

Best regards,
Zoltán Böszörményi



pgsql-hackers by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: GIST and TOAST
Next
From: "Zeugswetter Andreas ADI SD"
Date:
Subject: Re: Auto creation of Partitions