Test report on GENERATED/IDENTITY - Mailing list pgsql-hackers
From | Zoltan Boszormenyi |
---|---|
Subject | Test report on GENERATED/IDENTITY |
Date | |
Msg-id | 45EEE6B5.2020305@dunaweb.hu Whole thread Raw |
List | pgsql-hackers |
Hi, I made some tests to prove that GENERATED can help boost performance. I created a table like this: create table t1 ( id serial, i1 integer, i2 integer, g1 integer generated always as ( case when i1 is null then i2 when i2 is null then i1 else i1 + i2 end)); I inserted 1 million records into the table: for (i = 1; i <= 1000; i++) for (j = 1; j <= 1000; j++) INSERT INTO t1 (i1, i2) VALUES (i, j); After VACUUM FULL ANALYZE, I timed SELECT id, i1, i2, g1 FROM t1 and SELECT id, i1, i2, <generation expression> FROM t1, result redirected to /dev/null. Results of ten consecutive runs are: SELECT id, i1, i2, g1 FROM t1 ------------------------------------------ 2.59user 0.18system 0:04.75elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33536minor)pagefaults 0swaps 2.57user 0.19system 0:04.59elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+34068minor)pagefaults 0swaps 2.60user 0.16system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33826minor)pagefaults 0swaps 2.59user 0.17system 0:04.82elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+34067minor)pagefaults 0swaps 2.59user 0.16system 0:04.61elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+34068minor)pagefaults 0swaps 2.61user 0.17system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33574minor)pagefaults 0swaps 2.59user 0.18system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+34067minor)pagefaults 0swaps 2.59user 0.18system 0:04.67elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32147minor)pagefaults 0swaps 2.58user 0.19system 0:04.63elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33584minor)pagefaults 0swaps 2.73user 0.16system 0:04.94elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+34066minor)pagefaults 0swaps Average is 4.68 seconds. SELECT id, i1, i2, <generation expression> FROM t1 ------------------------------------------ 2.76user 0.18system 0:05.49elapsed 53%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33945minor)pagefaults 0swaps 2.69user 0.17system 0:05.23elapsed 54%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33917minor)pagefaults 0swaps 2.60user 0.18system 0:05.04elapsed 55%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32027minor)pagefaults 0swaps 2.64user 0.16system 0:05.03elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32626minor)pagefaults 0swaps 2.60user 0.15system 0:05.03elapsed 54%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32550minor)pagefaults 0swaps 2.77user 0.18system 0:05.21elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32609minor)pagefaults 0swaps 2.71user 0.17system 0:05.07elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33043minor)pagefaults 0swaps 2.66user 0.17system 0:05.12elapsed 55%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33042minor)pagefaults 0swaps 2.80user 0.16system 0:05.19elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33043minor)pagefaults 0swaps 2.71user 0.16system 0:05.14elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32052minor)pagefaults 0swaps Average is 5.16 seconds. Using a single GENERATED column gains a good 9.4% in runtime. With the IDENTITY, the picture is not so bright. I expected some cost but not that much. Why is [simple_]heap_update() so expensive? I created a table and times inserting 1 million rows into it: create table t2 (id serial, i1 integer); or create table t2 (id serial generated always as identity, i1 integer); Using a serial column gave me about 12 seconds on the average of 5 runs. With an IDENTITY column, I got 61 seconds once and 66 seconds twice. So, the strictness of the identity column gave me 500-550% performance penalty. With a single unique index on i1, I got 24.4 seconds with the serial column and 67 seconds for the identity column. I run these only once so this last one isn't representative. I tried to use heap_inplace_update() to update the newly updated or inserted tuple in place but it gave me ERROR: heap_inplace_update: wrong tuple length even when I already filled the IDENTITY column with a constant Datum with an Int64 value 0 converted to the type of the column. If I read it correctly, the HOT patch would give me a speedup for this case? Best regards, Zoltán Böszörményi
pgsql-hackers by date: