Re: Multi Inserts in CREATE TABLE AS - revived patch - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Multi Inserts in CREATE TABLE AS - revived patch
Date
Msg-id CALj2ACXr5d48+f=4P2Hrdox8sq+Jhz1_dU_TS-2V_4A3zb+zxg@mail.gmail.com
Whole thread Raw
In response to Re: Multi Inserts in CREATE TABLE AS - revived patch  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Thu, Dec 3, 2020 at 1:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Nov 30, 2020 at 10:49 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Currently, required logic for multi inserts (such as buffer slots allocation, flushing, tuple size calculation to
decidewhen to flush, cleanup and so on) is being handled outside of the existing tableam APIs. And there are a good
numberof cases where multi inserts can be used, such as for existing COPY or for CTAS, CREATE/REFRESH MATERIALIZED VIEW
[proposedin this thread], and INSERT INTO SELECTs [here] which are currently under discussion. Handling the same multi
insertslogic in many places is error prone and duplicates most of the code. To avoid this, proposing here are generic
tableamAPIs, that can be used in all the cases and which also gives the flexibility to tableam developers in
implementingmulti inserts logic dependent on the underlying storage engine[1]. 
> >
> > I would like to seek thoughts/opinions on the proposed new APIs. Once reviewed, I will start implementing them.
>
> IMHO, if we think that something really specific to the tableam then
> it makes sense to move it there.  But just to avoid duplicating the
> code it might not be the best idea.  Instead, you can write some
> common functions and we can call them from different places.  So if
> something is very much common and will not vary based on the storage
> type we can keep it outside the tableam interface however we can move
> them into some common functions to avoid duplication.
>

Thanks for the response. Main design goal of the new APIs is to give
flexibility to tableam developers in implementing multi insert logic
dependent on the underlying storage engine. Currently, for all the
underlying storage engines, we follow the same multi insert logic such
as when and how to flush the buffered tuples, tuple size calculation,
and this logic doesn't take into account the underlying storage engine
capabilities. Please have a look at [1] where this point was brought
up by @Luc Vlaming. The subsequent discussion went on to some level of
agreement on the proposed APIs.

I want to clarify that avoiding duplicate multi insert code (for COPY,
CTAS, CREATE/REFRESH MAT VIEW and INSERT SELECTs) is a byproduct(not a
main design goal) if we implement the new APIs for heap AM. I feel
sorry for projecting the goal as avoiding duplicate code earlier.

I also want to mention that @Andres Freund visualized similar kinds of
APIs in [2].

I tried to keep the API as generic as possible, please have a look at
the new structure and APIs [3].

Thoughts?

[1] - https://www.postgresql.org/message-id/ca3dd08f-4ce0-01df-ba30-e9981bb0d54e%40swarm64.com
[2] - https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de
[3] -
https://www.postgresql.org/message-id/CALj2ACV8_O651C2zUqrVSRFDJkp8%3DTMwSdG9%2BmDGL%2BvF6CD%2BAQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: pg_stat_statements oddity with track = all
Next
From: Peter Eisentraut
Date:
Subject: Re: Improper use about DatumGetInt32