Thread: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

On Mon, 2024-08-26 at 11:09 +0530, Bharath Rupireddy wrote:
> On Wed, Jun 5, 2024 at 12:42 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Please find the v22 patches with the above changes.
>
> Please find the v23 patches after rebasing 0005 and adapting 0004 for
> 9758174e2e.


Thank you.

0001 API design:

* Remove TableModifyState.modify_end_callback.

* This patch means that we will either remove or deprecate
TableAmRoutine.multi_insert and finish_bulk_insert. Are there any
strong opinions about maintaining support for multi-insert, or should
we just remove it outright and force any new AMs to implement the new
APIs to maintain COPY performance?

* Why do we need a separate "modify_flags" and "options"? Can't we just
combine them into TABLE_MODIFY_* flags?


Alexander, you had some work in this area as well, such b1484a3f19. I
believe 0001 covers this use case in a different way: rather than
giving complete responsibility to the AM to insert into the indexes,
the caller provides a callback and the AM is responsible for calling it
at the time the tuples are flushed. Is that right?

The design has been out for a while, so unless others have suggestions,
I'm considering the major design points mostly settled and I will move
forward with something like 0001 (pending implementation issues).

Note: I believe this API will extend naturally to updates and deletes,
as well.


0001 implementation issues:

* We need default implementations for AMs that don't implement the new
APIs, so that the AM will still function even if it only defines the
single-tuple APIs. If we need to make use of the AM's multi_insert
method (I'm not sure we do), then the default methods would need to
handle that as well. (I thought a previous version had these default
implementations -- is there a reason they were removed?)

* I am confused about how the heap implementation manages state and
resets it. mistate->mem_cxt is initialized to a new memory context in
heap_modify_begin, and then re-initialized to another new memory
context in heap_modify_buffer_insert. Then the mistate->mem_cxt is also
used as a temp context for executing heap_multi_insert, and it gets
reset before calling the flush callback, which still needs the slots.

* Why materialize the slot at copyfrom.c:1308 if the slot is going to
be copied anyway (which also materializes it; see
tts_virtual_copyslot()) at heapam.c:2710?

* After correcting the memory issues, can you get updated performance
numbers for COPY?


Regards,
    Jeff Davis




On Mon, 26 Aug 2024 at 23:18, Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Mon, 2024-08-26 at 11:09 +0530, Bharath Rupireddy wrote:
> > On Wed, Jun 5, 2024 at 12:42 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > Please find the v22 patches with the above changes.
> >
> > Please find the v23 patches after rebasing 0005 and adapting 0004 for
> > 9758174e2e.
>
>
> Thank you.
>
> 0001 API design:
>
> * Remove TableModifyState.modify_end_callback.
>
> * This patch means that we will either remove or deprecate
> TableAmRoutine.multi_insert and finish_bulk_insert. Are there any
> strong opinions about maintaining support for multi-insert, or should
> we just remove it outright and force any new AMs to implement the new
> APIs to maintain COPY performance?

I don't think there is a significant enough difference in the
capabilities and requirements between the two APIs as currently
designed that removal of the old API would mean a significant
difference in capabilities. Maybe we could supply an equivalent API
shim to help the transition, but I don't think we should keep the old
API around in the TableAM.

> * Why do we need a separate "modify_flags" and "options"? Can't we just
> combine them into TABLE_MODIFY_* flags?
>
>
> Alexander, you had some work in this area as well, such b1484a3f19. I
> believe 0001 covers this use case in a different way: rather than
> giving complete responsibility to the AM to insert into the indexes,
> the caller provides a callback and the AM is responsible for calling it
> at the time the tuples are flushed. Is that right?
>
> The design has been out for a while, so unless others have suggestions,
> I'm considering the major design points mostly settled and I will move
> forward with something like 0001 (pending implementation issues).

Sorry about this late feedback, but while I'm generally +1 on the idea
and primary design, I feel that it doesn't quite cover all the areas
I'd expected it to cover.

Specifically, I'm having trouble seeing how this could be used to
implement ```INSERT INTO ... SELECT ... RETURNING ctid``` as I see no
returning output path for the newly inserted tuples' data, which is
usually required for our execution nodes' output path. Is support for
RETURN-clauses planned for this API? In a previous iteration, the
flush operation was capable of returning a TTS, but that seems to have
been dropped, and I can't quite figure out why.

> Note: I believe this API will extend naturally to updates and deletes,
> as well.

I have the same concern about UPDATE ... RETURNING not fitting with
this callback-based design.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



On Mon, 2024-08-26 at 23:59 +0200, Matthias van de Meent wrote:
> Specifically, I'm having trouble seeing how this could be used to
> implement ```INSERT INTO ... SELECT ... RETURNING ctid``` as I see no
> returning output path for the newly inserted tuples' data, which is
> usually required for our execution nodes' output path. Is support for
> RETURN-clauses planned for this API? In a previous iteration, the
> flush operation was capable of returning a TTS, but that seems to
> have
> been dropped, and I can't quite figure out why.

I'm not sure where that was lost, but I suspect when we changed
flushing to use a callback. I didn't get to v23-0003 yet, but I think
you're right that the current flushing mechanism isn't right for
returning tuples. Thank you.

One solution: when the buffer is flushed, we can return an iterator
over the buffered tuples to the caller. The caller can then use the
iterator to insert into indexes, return a tuple to the executor, etc.,
and then release the iterator when done (freeing the buffer). That
control flow is less convenient for most callers, though, so perhaps
that should be optional?

Regards,
    Jeff Davis




On Tue, 27 Aug 2024 at 07:42, Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Mon, 2024-08-26 at 23:59 +0200, Matthias van de Meent wrote:
> > Specifically, I'm having trouble seeing how this could be used to
> > implement ```INSERT INTO ... SELECT ... RETURNING ctid``` as I see no
> > returning output path for the newly inserted tuples' data, which is
> > usually required for our execution nodes' output path. Is support for
> > RETURN-clauses planned for this API? In a previous iteration, the
> > flush operation was capable of returning a TTS, but that seems to
> > have
> > been dropped, and I can't quite figure out why.
>
> I'm not sure where that was lost, but I suspect when we changed
> flushing to use a callback. I didn't get to v23-0003 yet, but I think
> you're right that the current flushing mechanism isn't right for
> returning tuples. Thank you.
>
> One solution: when the buffer is flushed, we can return an iterator
> over the buffered tuples to the caller. The caller can then use the
> iterator to insert into indexes, return a tuple to the executor, etc.,
> and then release the iterator when done (freeing the buffer).

I think that would work, but it'd need to be accomodated in the
table_modify_buffer_insert path too, not just the _flush path, as the
heap AM flushes the buffer when inserting tuples and its internal
buffer is full, so not only at the end of modifications.

> That control flow is less convenient for most callers, though, so
> perhaps that should be optional?

That would be OK with me.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



On Tue, 2024-08-27 at 15:44 +0200, Matthias van de Meent wrote:
> > One solution: when the buffer is flushed, we can return an iterator
> > over the buffered tuples to the caller. The caller can then use the
> > iterator to insert into indexes, return a tuple to the executor,
> > etc.,
> > and then release the iterator when done (freeing the buffer).
>
> I think that would work, but it'd need to be accomodated in the
> table_modify_buffer_insert path too, not just the _flush path, as the
> heap AM flushes the buffer when inserting tuples and its internal
> buffer is full, so not only at the end of modifications.

I gave this a little more thought and I don't think we need a change
here now. The callback could support RETURNING by copying the tuples
out into the caller's state somewhere, and then the caller can iterate
on its own and emit those tuples.

That's not ideal, because it involves an extra copy, but it's a much
simpler API.

Another thought is that there are already a number of cases where we
need to limit the use of batching similar to copyfrom.c:917-1006. For
instance, before-row triggers, instead-of-row triggers, and volatile
functions in the query. We could also just consider RETURNING another
restriction, which could be lifted later by implementing the logic in
the callback (as described above) without an API change.

Regards,
    Jeff Davis




On Mon, 2024-08-26 at 11:09 +0530, Bharath Rupireddy wrote:
> On Wed, Jun 5, 2024 at 12:42 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Please find the v22 patches with the above changes.
>
> Please find the v23 patches after rebasing 0005 and adapting 0004 for
> 9758174e2e.

In patches 0002-0004, they must avoid the multi insert path when there
are before-row triggers, instead-of-row triggers, or volatile functions
used (see copyfrom.c:917-1006).

Also, until we decide on the RETURNING clause, we should block the
multi-insert path for that, as well, or implement it by using the
callback to copy tuples into the caller's context.

In 0003, why do you need the global insert_modify_buffer_flush_context?

0004 is the only place that calls table_modify_buffer_flush(). Is that
really necessary, or is automatic flushing enough?

Regards,
    Jeff Davis




On Mon, 2024-08-26 at 14:18 -0700, Jeff Davis wrote:
> 0001 implementation issues:
>
> * We need default implementations for AMs that don't implement the
> new
> APIs, so that the AM will still function even if it only defines the
> single-tuple APIs. If we need to make use of the AM's multi_insert
> method (I'm not sure we do), then the default methods would need to
> handle that as well. (I thought a previous version had these default
> implementations -- is there a reason they were removed?)

On second thought, it would be easier to just have the caller check
whether the AM supports the multi-insert path; and if not, fall back to
the single-tuple path. The single-tuple path is needed anyway for cases
like before-row triggers.

Regards,
    Jeff Davis




On Wed, Aug 28, 2024 at 3:14 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Mon, 2024-08-26 at 14:18 -0700, Jeff Davis wrote:
> > 0001 implementation issues:
> >
> > * We need default implementations for AMs that don't implement the
> > new
> > APIs, so that the AM will still function even if it only defines the
> > single-tuple APIs. If we need to make use of the AM's multi_insert
> > method (I'm not sure we do), then the default methods would need to
> > handle that as well. (I thought a previous version had these default
> > implementations -- is there a reason they were removed?)
>
> On second thought, it would be easier to just have the caller check
> whether the AM supports the multi-insert path; and if not, fall back to
> the single-tuple path. The single-tuple path is needed anyway for cases
> like before-row triggers.

Up until v21, the default implementation existed, see
https://www.postgresql.org/message-id/CALj2ACX90L5Mb5Vv%3DjsvhOdZ8BVsfpZf-CdCGhtm2N%2BbGUCSjg%40mail.gmail.com.
I then removed it in v22 to keep the code simple.

IMO, every caller branching out in the code like if (rel->rd_tableam->
tuple_modify_buffer_insert != NULL) then multi insert; else single
insert; doesn't look good. IMO, the default implementation approach
keeps things simple which eventually can be removed in *near* future.
Thoughts?

One change in the default implementation I would do from that of v21
is to assign the default AMs in GetTableAmRoutine() itself to avoid if
.. else if .. else in the table_modify_XXX().

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Thu, 2024-08-29 at 12:55 +0530, Bharath Rupireddy wrote:
> IMO, every caller branching out in the code like if (rel->rd_tableam-
> >
> tuple_modify_buffer_insert != NULL) then multi insert; else single
> insert; doesn't look good. IMO, the default implementation approach
> keeps things simple which eventually can be removed in *near* future.
> Thoughts?

I believe we need the branching in the caller anyway:

1. If there is a BEFORE row trigger with a volatile function, the
visibility rules[1] mean that the function should see changes from all
the rows inserted so far this command, which won't work if they are
still in the buffer.

2. Similarly, for an INSTEAD OF row trigger, the visibility rules say
that the function should see all previous rows inserted.

3. If there are volatile functions in the target list or WHERE clause,
the same visibility semantics apply.

4. If there's a "RETURNING ctid" clause, we need to either come up with
a way to return the tuples after flushing, or we need to use the
single-tuple path. (Similarly in the future when we support UPDATE ...
RETURNING, as Matthias pointed out.)

If we need two paths in each caller anyway, it seems cleaner to just
wrap the check for tuple_modify_buffer_insert in
table_modify_buffer_enabled().

We could perhaps use a one path and then force a batch size of one or
something, which is an alternative, but we have to be careful not to
introduce a regression (and it still requires a solution for #4).

Regards,
    Jeff Davis

[1]
https://www.postgresql.org/docs/devel/trigger-datachanges.html