Re: New Table Access Methods for Multi and Single Inserts - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: New Table Access Methods for Multi and Single Inserts
Date
Msg-id CALj2ACU70HZm+0QRJdkGA5RdJUo4zPYnV2hzkiV-wH5QS2PAEQ@mail.gmail.com
Whole thread Raw
In response to Re: New Table Access Methods for Multi and Single Inserts  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: New Table Access Methods for Multi and Single Inserts  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Tue, Mar 26, 2024 at 9:07 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Tue, 2024-03-26 at 01:28 +0530, Bharath Rupireddy wrote:
> > I'm thinking
> > of dropping the COPY FROM patch using the new multi insert API for
> > the
> > following reasons: ...
>
> I agree with all of this. We do want COPY ... FROM support, but there
> are some details to work out and we don't want to make a big code
> change at this point in the cycle.

Right.

> > Please see the attached v14 patches.
>
> * No need for a 'kind' field in TableModifyState. The state should be
> aware of the kinds of changes that it has received and that may need to
> be flushed later -- for now, only inserts, but possibly updates/deletes
> in the future.

Removed 'kind' field with lazy initialization of required AM specific
modify (insert in this case) state. Since we don't have 'kind', I
chose the callback approach to cleanup the modify (insert in this
case) specific state at the end.

> * If the AM doesn't support the bulk methods, fall back to retail
> inserts instead of throwing an error.

For instance, CREATE MATERIALIZED VIEW foo_mv AS SELECT * FROM foo
USING bar_tam; doesn't work if bar_tam doesn't have the
table_tuple_insert implemented.

Similarly, with this new AM, the onus lies on the table AM
implementers to provide an implementation for these new AMs even if
they just do single inserts. But, I do agree that we must catch this
ahead during parse analysis itself, so I've added assertions in
GetTableAmRoutine().

> * It seems like this API will eventually replace table_multi_insert and
> table_finish_bulk_insert completely. Do those APIs have any advantage
> remaining over the new one proposed here?

table_multi_insert needs to be there for sure as COPY ... FROM uses
it. Not sure if we need to remove the optional callback
table_finish_bulk_insert though. Heap AM doesn't implement one, but
some other AM might. Having said that, with this new AM, whatever the
logic that used to be there in table_finish_bulk_insert previously,
table AM implementers will have to move them to table_modify_end.

FWIW, I can try writing a test table AM that uses this new AM but just
does single inserts, IOW, equivalent to table_tuple_insert().
Thoughts?

> * Right now I don't any important use of the flush method. It seems
> that could be accomplished in the finish method, and flush could just
> be an internal detail when the memory is exhausted. If we find a use
> for it later, we can always add it, but right now it seems unnecessary.

Firstly, we are not storing CommandId and options in TableModifyState,
because we expect CommandId to be changing (per Andres comment).
Secondly, we don't want to pass just the CommandId and options to
table_modify_end(). Thirdly, one just has to call the
table_modify_buffer_flush before the table_modify_end. Do you have any
other thoughts here?

> * We need to be careful about cases where the command can be successful
> but the writes are not flushed. I don't tihnk that's a problem with the
> current patch, but we will need to do something here when we expand to
> INSERT INTO ... SELECT.

You mean, writes are not flushed to the disk? Can you please elaborate
why it's different for INSERT INTO ... SELECT and not others? Can't
the new flush AM be helpful here to implement any flush related
things?

Please find the attached v15 patches with the above review comments addressed.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs
Next
From: Tom Lane
Date:
Subject: Re: Remove some redundant set_cheapest() calls