Hi,
I made some tests to prove that GENERATED can help
boost performance. I created a table like this:
create table t1 ( id serial, i1 integer, i2 integer, g1 integer generated always as ( case when i1 is null then
i2 when i2 is null then i1 else i1 + i2 end));
I inserted 1 million records into the table:
for (i = 1; i <= 1000; i++) for (j = 1; j <= 1000; j++) INSERT INTO t1 (i1, i2) VALUES (i, j);
After VACUUM FULL ANALYZE,
I timed SELECT id, i1, i2, g1 FROM t1
and SELECT id, i1, i2, <generation expression> FROM t1,
result redirected to /dev/null. Results of ten consecutive runs are:
SELECT id, i1, i2, g1 FROM t1
------------------------------------------
2.59user 0.18system 0:04.75elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33536minor)pagefaults 0swaps
2.57user 0.19system 0:04.59elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34068minor)pagefaults 0swaps
2.60user 0.16system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33826minor)pagefaults 0swaps
2.59user 0.17system 0:04.82elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34067minor)pagefaults 0swaps
2.59user 0.16system 0:04.61elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34068minor)pagefaults 0swaps
2.61user 0.17system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33574minor)pagefaults 0swaps
2.59user 0.18system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34067minor)pagefaults 0swaps
2.59user 0.18system 0:04.67elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32147minor)pagefaults 0swaps
2.58user 0.19system 0:04.63elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33584minor)pagefaults 0swaps
2.73user 0.16system 0:04.94elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34066minor)pagefaults 0swaps
Average is 4.68 seconds.
SELECT id, i1, i2, <generation expression> FROM t1
------------------------------------------
2.76user 0.18system 0:05.49elapsed 53%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33945minor)pagefaults 0swaps
2.69user 0.17system 0:05.23elapsed 54%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33917minor)pagefaults 0swaps
2.60user 0.18system 0:05.04elapsed 55%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32027minor)pagefaults 0swaps
2.64user 0.16system 0:05.03elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32626minor)pagefaults 0swaps
2.60user 0.15system 0:05.03elapsed 54%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32550minor)pagefaults 0swaps
2.77user 0.18system 0:05.21elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32609minor)pagefaults 0swaps
2.71user 0.17system 0:05.07elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33043minor)pagefaults 0swaps
2.66user 0.17system 0:05.12elapsed 55%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33042minor)pagefaults 0swaps
2.80user 0.16system 0:05.19elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33043minor)pagefaults 0swaps
2.71user 0.16system 0:05.14elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32052minor)pagefaults 0swaps
Average is 5.16 seconds.
Using a single GENERATED column gains a good 9.4% in runtime.
With the IDENTITY, the picture is not so bright.
I expected some cost but not that much. Why is
[simple_]heap_update() so expensive? I created a table and
times inserting 1 million rows into it:
create table t2 (id serial, i1 integer);
or
create table t2 (id serial generated always as identity, i1 integer);
Using a serial column gave me about 12 seconds
on the average of 5 runs. With an IDENTITY column,
I got 61 seconds once and 66 seconds twice.
So, the strictness of the identity column gave me 500-550%
performance penalty.
With a single unique index on i1, I got 24.4 seconds
with the serial column and 67 seconds for the identity
column. I run these only once so this last one isn't
representative.
I tried to use heap_inplace_update() to update the
newly updated or inserted tuple in place but it gave me
ERROR: heap_inplace_update: wrong tuple length
even when I already filled the IDENTITY column with
a constant Datum with an Int64 value 0 converted to
the type of the column.
If I read it correctly, the HOT patch would give me
a speedup for this case?
Best regards,
Zoltán Böszörményi