Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM - Mailing list pgsql-hackers

From Jingtang Zhang
Subject Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date
Msg-id CAPsk3_BWX1xTa2jiWenq=AKs2QKFBg59jJ5EG8qNY4j6Kj8Cig@mail.gmail.com
Whole thread Raw
In response to Re: New Table Access Methods for Multi and Single Inserts  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
List pgsql-hackers
Hi! Glad to see update in this thread.

Little question about v24 0002 patch: would it be better to move the
implementation of TableModifyIsMultiInsertsSupported to somewhere for table AM
level? Seems it is a common function for future use, not a specific one for
matview.

---

Regards, Jingtang


Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> 于2024年10月26日周六 21:31写道:
Hi,

Thanks for looking into this.

On Thu, Aug 29, 2024 at 12:29 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> I believe we need the branching in the caller anyway:
>
> 1. If there is a BEFORE row trigger with a volatile function, the
> visibility rules[1] mean that the function should see changes from all
> the rows inserted so far this command, which won't work if they are
> still in the buffer.
>
> 2. Similarly, for an INSTEAD OF row trigger, the visibility rules say
> that the function should see all previous rows inserted.
>
> 3. If there are volatile functions in the target list or WHERE clause,
> the same visibility semantics apply.
>
> 4. If there's a "RETURNING ctid" clause, we need to either come up with
> a way to return the tuples after flushing, or we need to use the
> single-tuple path. (Similarly in the future when we support UPDATE ...
> RETURNING, as Matthias pointed out.)
>
> If we need two paths in each caller anyway, it seems cleaner to just
> wrap the check for tuple_modify_buffer_insert in
> table_modify_buffer_enabled().
>
> We could perhaps use a one path and then force a batch size of one or
> something, which is an alternative, but we have to be careful not to
> introduce a regression (and it still requires a solution for #4).

I chose to branch in the caller e.g. if there's a volatile function
SELECT query of REFRESH MATERIALIZED VIEW, the caller goes
table_tuple_insert() path, else multi-insert path.

I am posting the new v24 patch set organized as follows: 0001
introducing the new table AM, 0002 optimizing CTAS, CMV and RMV, 0003
using the new table AM for COPY ... FROM. I, for now, discarded the
INSERT INTO ... SELECT and Logical Replication Apply patches, the idea
is to take the basic stuff forward.

I reworked structure names, members and function names, reworded
comments, addressed review comments in the v24 patches. Please have a
look.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

Previous
From: 黄铎彦
Date:
Subject: Re: msvc directory missing in PostgreSQL 17.0
Next
From: Tender Wang
Date:
Subject: Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails