Re: Why does ExecComputeStoredGenerated() form a heap tuple - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Why does ExecComputeStoredGenerated() form a heap tuple
Date
Msg-id 20190520025018.nblm2ex3a62asanc@alap3.anarazel.de
Whole thread Raw
In response to Re: Why does ExecComputeStoredGenerated() form a heap tuple  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
Hi,

On 2019-05-20 14:23:34 +1200, David Rowley wrote:
> It's not for this patch, or probably for PG12, but it would be good if
> we could avoid the formation of the Tuple until right before the new
> tuple is inserted.
>
> I see heap_form_tuple() is called 3 times for a single INSERT with:
>
> create table t (a text, b text, c text generated always as (b || b) stored);
>
> create or replace function t_trigger() returns trigger as $$
> begin
> NEW.b = UPPER(NEW.a);
> RETURN NEW;
> end;
> $$ language plpgsql;
>
> create trigger t_on_insert before insert on t for each row execute
> function t_trigger();
>
> insert into t (a) values('one');
>
> and heap_deform_tuple() is called once for each additional
> heap_form_tuple().  That's pretty wasteful :-(
>
> Maybe Andres can explain if this is really required, or if it's just
> something that's not well optimised yet.

I think we can optimize this further, but it's not unexpected.

I see:

1) ExecCopySlot() call in in ExecModifyTable(). For INSERT SELECT the
   input will be in a virtual slot. We might be able to have some
   trickery to avoid this one in some case. Not sure how much it'd help
   - I think we most of the time would just move the forming of the
   tuple around - ExecInsert() wants to materialize the slot.

2) plpgsql form/deform due to updating a field. I don't see how we could
   easily fix that. We'd have to invent a mechanism that allows plpgsql to pass
   slots around. I guess it's possible you could make that work somehow?
   But I think we'd also need to change the external trigger interface -
   which currently specifies that the return type is a HeapTuple

3) ExecComputeStoredGenerated(). I suspect it's not particularly useful
   to get rid of the heap_form_tuple (from with ExecMaterialize())
   here. When actually inserting we'll have to actually form the tuple
   anyway.  But what I do wonder is whether it would make sense to move
   the materialization outside of that function. If there's constraints,
   or partitioning, we'll have to deform (parts of) the tuple, to access
   the necessary columns.

Currently materializing an unmaterialized slot (i.e. making it
independent from anything but memory referenced by the slot) also means
that later accesses will need to deform again. I'm fairly sure we can
improve that for many cases (IIRC we don't need to that for virtual
slots, but that's irrelevant here).

I'm pretty sure we get rid of most of this, but it'll be some work. I'm
also not sure how important it is - for INSERT/UPDATE, in how many cases
is the bottleneck those copies, rather than other parts of query
execution? I suspect you can measure it for some INSERT ... SELECT type
cases - but probably the overhead of triggers and GENERATED is going to
be higher.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: VACUUM fails to parse 0 and 1 as boolean value
Next
From: Thomas Munro
Date:
Subject: Parallel Append subplan order instability on aye-aye