Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date
Msg-id CAEze2WgmyTVyZv+u+D44f+3Xk5TyZm=+4Gp5qdEX14FwBJA1Hw@mail.gmail.com
Whole thread Raw
In response to Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
List pgsql-hackers
On Mon, 26 Aug 2024 at 23:18, Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Mon, 2024-08-26 at 11:09 +0530, Bharath Rupireddy wrote:
> > On Wed, Jun 5, 2024 at 12:42 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > Please find the v22 patches with the above changes.
> >
> > Please find the v23 patches after rebasing 0005 and adapting 0004 for
> > 9758174e2e.
>
>
> Thank you.
>
> 0001 API design:
>
> * Remove TableModifyState.modify_end_callback.
>
> * This patch means that we will either remove or deprecate
> TableAmRoutine.multi_insert and finish_bulk_insert. Are there any
> strong opinions about maintaining support for multi-insert, or should
> we just remove it outright and force any new AMs to implement the new
> APIs to maintain COPY performance?

I don't think there is a significant enough difference in the
capabilities and requirements between the two APIs as currently
designed that removal of the old API would mean a significant
difference in capabilities. Maybe we could supply an equivalent API
shim to help the transition, but I don't think we should keep the old
API around in the TableAM.

> * Why do we need a separate "modify_flags" and "options"? Can't we just
> combine them into TABLE_MODIFY_* flags?
>
>
> Alexander, you had some work in this area as well, such b1484a3f19. I
> believe 0001 covers this use case in a different way: rather than
> giving complete responsibility to the AM to insert into the indexes,
> the caller provides a callback and the AM is responsible for calling it
> at the time the tuples are flushed. Is that right?
>
> The design has been out for a while, so unless others have suggestions,
> I'm considering the major design points mostly settled and I will move
> forward with something like 0001 (pending implementation issues).

Sorry about this late feedback, but while I'm generally +1 on the idea
and primary design, I feel that it doesn't quite cover all the areas
I'd expected it to cover.

Specifically, I'm having trouble seeing how this could be used to
implement ```INSERT INTO ... SELECT ... RETURNING ctid``` as I see no
returning output path for the newly inserted tuples' data, which is
usually required for our execution nodes' output path. Is support for
RETURN-clauses planned for this API? In a previous iteration, the
flush operation was capable of returning a TTS, but that seems to have
been dropped, and I can't quite figure out why.

> Note: I believe this API will extend naturally to updates and deletes,
> as well.

I have the same concern about UPDATE ... RETURNING not fitting with
this callback-based design.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: RFC: Additional Directory for Extensions
Next
From: Peter Smith
Date:
Subject: Re: Conflict detection and logging in logical replication