Thread: Why does ExecComputeStoredGenerated() form a heap tuple
Hi, while rebasing the remaining tableam patches (luckily a pretty small set now!), I had a few conflicts with ExecComputeStoredGenerated(). While resolving I noticed: oldtuple = ExecFetchSlotHeapTuple(slot, true, &should_free); newtuple = heap_modify_tuple(oldtuple, tupdesc, values, nulls, replaces); ExecForceStoreHeapTuple(newtuple, slot); if (should_free) heap_freetuple(oldtuple); MemoryContextSwitchTo(oldContext); First off, I'm not convinced this is correct: ISTM you'd need at least an ExecMaterializeSlot() before the MemoryContextSwitchTo() in ExecComputeStoredGenerated(). But what actually brought me to reply was that it seems like it'll cause unnecessary slowdowns for !heap AMs. First, it'll form a heaptuple if the slot isn't in that form, and then it'll cause a conversion by storing a heap tuple even if the target doesn't use heap representation. ISTM the above would be much more efficiently - even more efficient if only heap is used - implemented as something roughly akin to: slot_getallattrs(slot); memcpy(values, slot->tts_values, ...); memcpy(nulls, slot->tts_isnull, ...); for (int i = 0; i < natts; i++) { if (TupleDescAttr(tupdesc, i)->attgenerated == ATTRIBUTE_GENERATED_STORED) { values[i] = ... } else values[i] = datumCopy(...); } ExecClearTuple(slot); memcpy(slot->tts_values, values, ...); memcpy(slot->tts_isnull, nulls, ...); ExecStoreVirtualTuple(slot); ExecMaterializeSlot(slot); that's not perfect, but more efficient than your version... Greetings, Andres Freund
Hi, On 2019-03-30 19:57:44 -0700, Andres Freund wrote: > while rebasing the remaining tableam patches (luckily a pretty small set > now!), I had a few conflicts with ExecComputeStoredGenerated(). While > resolving I noticed: > > oldtuple = ExecFetchSlotHeapTuple(slot, true, &should_free); > newtuple = heap_modify_tuple(oldtuple, tupdesc, values, nulls, replaces); > ExecForceStoreHeapTuple(newtuple, slot); > if (should_free) > heap_freetuple(oldtuple); > > MemoryContextSwitchTo(oldContext); > > First off, I'm not convinced this is correct: > > ISTM you'd need at least an ExecMaterializeSlot() before the > MemoryContextSwitchTo() in ExecComputeStoredGenerated(). > > But what actually brought me to reply was that it seems like it'll cause > unnecessary slowdowns for !heap AMs. First, it'll form a heaptuple if > the slot isn't in that form, and then it'll cause a conversion by > storing a heap tuple even if the target doesn't use heap representation. > > ISTM the above would be much more efficiently - even more efficient if > only heap is used - implemented as something roughly akin to: > > slot_getallattrs(slot); > memcpy(values, slot->tts_values, ...); > memcpy(nulls, slot->tts_isnull, ...); > > for (int i = 0; i < natts; i++) > { > if (TupleDescAttr(tupdesc, i)->attgenerated == ATTRIBUTE_GENERATED_STORED) > { > values[i] = ... > } > else > values[i] = datumCopy(...); > } > > ExecClearTuple(slot); > memcpy(slot->tts_values, values, ...); > memcpy(slot->tts_isnull, nulls, ...); > ExecStoreVirtualTuple(slot); > ExecMaterializeSlot(slot); > > that's not perfect, but more efficient than your version... Also, have you actually benchmarked this code? ISTM that adding a stored generated column would cause quite noticable slowdowns in the COPY path based on this code. Greetings, Andres Freund
On 2019-03-31 04:57, Andres Freund wrote: > while rebasing the remaining tableam patches (luckily a pretty small set > now!), I had a few conflicts with ExecComputeStoredGenerated(). While > resolving I noticed: The core of that code was written a long time ago and perhaps hasn't caught up with all the refactoring going on. I'll look through your proposal and update the code. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-03-31 05:00, Andres Freund wrote: > Also, have you actually benchmarked this code? ISTM that adding a > stored generated column would cause quite noticable slowdowns in the > COPY path based on this code. Yes, it'll be slower than not having it, but it's much faster than the equivalent trigger. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-04-01 11:25:46 +0200, Peter Eisentraut wrote: > On 2019-03-31 05:00, Andres Freund wrote: > > Also, have you actually benchmarked this code? ISTM that adding a > > stored generated column would cause quite noticable slowdowns in the > > COPY path based on this code. > > Yes, it'll be slower than not having it, but it's much faster than the > equivalent trigger. It at the moment is quite noticably slower than directly inserting the generated column. postgres[11993][1]=# CREATE TABLE foo_without_generated(id int, copy_of_int int); CREATE TABLE Time: 0.625 ms postgres[11993][1]=# CREATE TABLE foo_with_generated(id int, copy_of_int int generated always as (id) stored); CREATE TABLE Time: 0.771 ms postgres[11993][1]=# INSERT INTO foo_without_generated SELECT g.i, g.i FROM generate_series(1, 1000000) g(i); INSERT 0 1000000 Time: 691.533 ms postgres[11993][1]=# INSERT INTO foo_with_generated SELECT g.i FROM generate_series(1, 1000000) g(i); INSERT 0 1000000 Time: 825.471 ms postgres[11993][1]=# COPY foo_without_generated TO '/tmp/foo_without_generated'; COPY 1000000 Time: 194.051 ms postgres[11993][1]=# COPY foo_with_generated TO '/tmp/foo_with_generated'; COPY 1000000 Time: 153.146 ms postgres[11993][1]=# ;TRUNCATE foo_without_generated ;COPY foo_without_generated FROM '/tmp/foo_without_generated'; Time: 0.178 ms TRUNCATE TABLE Time: 8.456 ms COPY 1000000 Time: 394.990 ms postgres[11993][1]=# ;TRUNCATE foo_with_generated ;COPY foo_with_generated FROM '/tmp/foo_with_generated'; Time: 0.147 ms TRUNCATE TABLE Time: 8.043 ms COPY 1000000 Time: 508.918 ms From a quick profile that's indeed largely because ExecComputeStoredGenerated() is really inefficient - and it seems largely unnecessarily so. I think this should at least be roughly as efficient as getting the additional data from the client. Minor other point: I'm not a fan of defining more general infrastructure like ExecComputedStoredGenerated() in nodeModifyTable.c - it's already large and confusing, and it's not obvious that e.g. COPY would call into it. Greetings, Andres Freund
On 2019-04-01 11:23, Peter Eisentraut wrote: > On 2019-03-31 04:57, Andres Freund wrote: >> while rebasing the remaining tableam patches (luckily a pretty small set >> now!), I had a few conflicts with ExecComputeStoredGenerated(). While >> resolving I noticed: > > The core of that code was written a long time ago and perhaps hasn't > caught up with all the refactoring going on. I'll look through your > proposal and update the code. The attached patch is based on your sketch. It's clearly better in the long term not to rely on heap tuples here. But in testing this change seems to make it slightly slower, certainly not a speedup as you were apparently hoping for. Test setup: create table t0 (a int, b int); insert into t0 select generate_series (1, 10000000); -- 10 million \copy t0 (a) to 'test.dat'; -- for comparison, without generated column truncate t0; \copy t0 (a) from 'test.dat'; -- master create table t1 (a int, b int generated always as (a * 2) stored); truncate t1; \copy t1 (a) from 'test.dat'; -- patched truncate t1; \copy t1 (a) from 'test.dat'; -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, 23 Apr 2019 at 20:23, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > The attached patch is based on your sketch. It's clearly better in the > long term not to rely on heap tuples here. But in testing this change > seems to make it slightly slower, certainly not a speedup as you were > apparently hoping for. > > > Test setup: > > create table t0 (a int, b int); > insert into t0 select generate_series (1, 10000000); -- 10 million > \copy t0 (a) to 'test.dat'; > > -- for comparison, without generated column > truncate t0; > \copy t0 (a) from 'test.dat'; > > -- master > create table t1 (a int, b int generated always as (a * 2) stored); > truncate t1; > \copy t1 (a) from 'test.dat'; > > -- patched > truncate t1; > \copy t1 (a) from 'test.dat'; I didn't do the exact same test, but if I use COPY instead of \copy, then for me patched is faster. Normal table: postgres=# copy t0 (a) from '/home/drowley/test.dat'; COPY 10000000 Time: 5437.768 ms (00:05.438) postgres=# truncate t0; TRUNCATE TABLE Time: 20.775 ms postgres=# copy t0 (a) from '/home/drowley/test.dat'; COPY 10000000 Time: 5272.228 ms (00:05.272) Master: postgres=# copy t1 (a) from '/home/drowley/test.dat'; COPY 10000000 Time: 6570.031 ms (00:06.570) postgres=# truncate t1; TRUNCATE TABLE Time: 17.813 ms postgres=# copy t1 (a) from '/home/drowley/test.dat'; COPY 10000000 Time: 6486.253 ms (00:06.486) Patched: postgres=# copy t1 (a) from '/home/drowley/test.dat'; COPY 10000000 Time: 5359.338 ms (00:05.359) postgres=# truncate table t1; TRUNCATE TABLE Time: 25.551 ms postgres=# copy t1 (a) from '/home/drowley/test.dat'; COPY 10000000 Time: 5347.596 ms (00:05.348) For the patch, I wonder if you need this line: + memcpy(values, slot->tts_values, sizeof(*values) * natts); If you got rid of that and changed the datumCopy to use slot->tts_values[i] instead. Maybe it's also worth getting rid of the first memcpy for the null array and just assign the element in the else clause. It might also be cleaner to assign TupleDescAttr(tupdesc, i) to a variable instead of using the macro 3 times. It'd make that datumCopy line shorter too. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2019-04-24 00:26, David Rowley wrote: > I didn't do the exact same test, but if I use COPY instead of \copy, > then for me patched is faster. OK, confirmed that way, too. > For the patch, I wonder if you need this line: > > + memcpy(values, slot->tts_values, sizeof(*values) * natts); > > If you got rid of that and changed the datumCopy to use > slot->tts_values[i] instead. done > Maybe it's also worth getting rid of the first memcpy for the null > array and just assign the element in the else clause. Tried that, seems to be slower. So I left it as is. > It might also be cleaner to assign TupleDescAttr(tupdesc, i) to a > variable instead of using the macro 3 times. It'd make that datumCopy > line shorter too. Also done. Updated patch attached. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Thu, 16 May 2019 at 05:44, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > Updated patch attached. This patch looks okay to me. It's not for this patch, or probably for PG12, but it would be good if we could avoid the formation of the Tuple until right before the new tuple is inserted. I see heap_form_tuple() is called 3 times for a single INSERT with: create table t (a text, b text, c text generated always as (b || b) stored); create or replace function t_trigger() returns trigger as $$ begin NEW.b = UPPER(NEW.a); RETURN NEW; end; $$ language plpgsql; create trigger t_on_insert before insert on t for each row execute function t_trigger(); insert into t (a) values('one'); and heap_deform_tuple() is called once for each additional heap_form_tuple(). That's pretty wasteful :-( Maybe Andres can explain if this is really required, or if it's just something that's not well optimised yet. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, On 2019-05-20 14:23:34 +1200, David Rowley wrote: > It's not for this patch, or probably for PG12, but it would be good if > we could avoid the formation of the Tuple until right before the new > tuple is inserted. > > I see heap_form_tuple() is called 3 times for a single INSERT with: > > create table t (a text, b text, c text generated always as (b || b) stored); > > create or replace function t_trigger() returns trigger as $$ > begin > NEW.b = UPPER(NEW.a); > RETURN NEW; > end; > $$ language plpgsql; > > create trigger t_on_insert before insert on t for each row execute > function t_trigger(); > > insert into t (a) values('one'); > > and heap_deform_tuple() is called once for each additional > heap_form_tuple(). That's pretty wasteful :-( > > Maybe Andres can explain if this is really required, or if it's just > something that's not well optimised yet. I think we can optimize this further, but it's not unexpected. I see: 1) ExecCopySlot() call in in ExecModifyTable(). For INSERT SELECT the input will be in a virtual slot. We might be able to have some trickery to avoid this one in some case. Not sure how much it'd help - I think we most of the time would just move the forming of the tuple around - ExecInsert() wants to materialize the slot. 2) plpgsql form/deform due to updating a field. I don't see how we could easily fix that. We'd have to invent a mechanism that allows plpgsql to pass slots around. I guess it's possible you could make that work somehow? But I think we'd also need to change the external trigger interface - which currently specifies that the return type is a HeapTuple 3) ExecComputeStoredGenerated(). I suspect it's not particularly useful to get rid of the heap_form_tuple (from with ExecMaterialize()) here. When actually inserting we'll have to actually form the tuple anyway. But what I do wonder is whether it would make sense to move the materialization outside of that function. If there's constraints, or partitioning, we'll have to deform (parts of) the tuple, to access the necessary columns. Currently materializing an unmaterialized slot (i.e. making it independent from anything but memory referenced by the slot) also means that later accesses will need to deform again. I'm fairly sure we can improve that for many cases (IIRC we don't need to that for virtual slots, but that's irrelevant here). I'm pretty sure we get rid of most of this, but it'll be some work. I'm also not sure how important it is - for INSERT/UPDATE, in how many cases is the bottleneck those copies, rather than other parts of query execution? I suspect you can measure it for some INSERT ... SELECT type cases - but probably the overhead of triggers and GENERATED is going to be higher. Greetings, Andres Freund
On 2019-05-20 04:23, David Rowley wrote: > On Thu, 16 May 2019 at 05:44, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> Updated patch attached. > > This patch looks okay to me. committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services