New Table Access Methods for Multi and Single Inserts - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject New Table Access Methods for Multi and Single Inserts
Date
Msg-id CALj2ACVi9eTRYR=gdca5wxtj3Kk_9q9qVccxsS1hngTGOCjPwQ@mail.gmail.com
Whole thread Raw
Responses Re: New Table Access Methods for Multi and Single Inserts
Re: New Table Access Methods for Multi and Single Inserts
List pgsql-hackers
Hi,

Currently, for any component (such as COPY, CTAS[1], CREATE/REFRESH
Mat View[1], INSERT INTO SELECTs[2]) multi insert logic such as buffer
slots allocation, maintenance, decision to flush and clean up, need to
be implemented outside the table_multi_insert() API. The main problem
is that it fails to take into consideration the underlying storage
engine capabilities, for more details of this point refer to a
discussion in multi inserts in CTAS thread[1]. This also creates a lot
of duplicate code which is more error prone and not maintainable.

More importantly, in another thread [3] @Andres Freund suggested to
have table insert APIs in such a way that they look more like 'scan'
APIs i.e. insert_begin, insert, insert_end. The main advantages doing
this are(quoting from his statement in [3]) - "more importantly it'd
allow an AM to optimize operations across multiple inserts, which is
important for column stores."

I propose to introduce new table access methods for both multi and
single inserts based on the prototype suggested by Andres in [3]. Main
design goal of these new APIs is to give flexibility to tableam
developers in implementing multi insert logic dependent on the
underlying storage engine.

Below are the APIs. I suggest to have a look at
v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch for details
of the new data structure and the API functionality. Note that
temporarily I used XX_v2, we can change it later.

TableInsertState* table_insert_begin(initial_args);
void table_insert_v2(TableInsertState *state, TupleTableSlot *slot);
void table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot);
void table_multi_insert_flush(TableInsertState *state);
void table_insert_end(TableInsertState *state);

I'm attaching a few patches(just to show that these APIs work, avoids
a lot of duplicate code and makes life easier). Better commenting can
be added later. If these APIs and patches look okay, we can even
consider replacing them in other places such as nodeModifyTable.c and
so on.

v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch --->
introduces new table access methods for multi and single inserts. Also
implements/rearranges the outside code for heap am into these new
APIs.
v1-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-Table-AM.patch
---> adds new multi insert table access methods to CREATE TABLE AS,
CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW.
v1-0003-ATRewriteTable-With-New-Single-Insert-Table-AM.patch ---> adds
new single insert table access method to ALTER TABLE rewrite table
code.
v1-0004-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch ---> adds
new single and multi insert table access method to COPY code.

Thoughts?

Many thanks to Robert, Vignesh and Dilip for offlist discussion.

[1] - https://www.postgresql.org/message-id/4eee0730-f6ec-e72d-3477-561643f4b327%40swarm64.com
[2] - https://www.postgresql.org/message-id/20201124020020.GK24052%40telsasoft.com
[3] - https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: "Hou, Zhijie"
Date:
Subject: RE: Parallel Inserts in CREATE TABLE AS
Next
From: Bharath Rupireddy
Date:
Subject: Re: Parallel Inserts in CREATE TABLE AS