Home > mailing lists

Re: Multi Inserts in CREATE TABLE AS - revived patch - Mailing list pgsql-hackers

From	Bharath Rupireddy
Subject	Re: Multi Inserts in CREATE TABLE AS - revived patch
Date	December 3, 2020 08:57:22
Msg-id	CALj2ACXr5d48+f=4P2Hrdox8sq+Jhz1_dU_TS-2V_4A3zb+zxg@mail.gmail.com Whole thread
In response to	Re: Multi Inserts in CREATE TABLE AS - revived patch (Dilip Kumar <dilipbalaut@gmail.com>)
List	pgsql-hackers

Tree view

On Thu, Dec 3, 2020 at 1:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Nov 30, 2020 at 10:49 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Currently, required logic for multi inserts (such as buffer slots allocation, flushing, tuple size calculation to
decidewhen to flush, cleanup and so on) is being handled outside of the existing tableam APIs. And there are a good
numberof cases where multi inserts can be used, such as for existing COPY or for CTAS, CREATE/REFRESH MATERIALIZED VIEW
[proposedin this thread], and INSERT INTO SELECTs [here] which are currently under discussion. Handling the same multi
insertslogic in many places is error prone and duplicates most of the code. To avoid this, proposing here are generic
tableamAPIs, that can be used in all the cases and which also gives the flexibility to tableam developers in
implementingmulti inserts logic dependent on the underlying storage engine[1]. 
> >
> > I would like to seek thoughts/opinions on the proposed new APIs. Once reviewed, I will start implementing them.
>
> IMHO, if we think that something really specific to the tableam then
> it makes sense to move it there.  But just to avoid duplicating the
> code it might not be the best idea.  Instead, you can write some
> common functions and we can call them from different places.  So if
> something is very much common and will not vary based on the storage
> type we can keep it outside the tableam interface however we can move
> them into some common functions to avoid duplication.
>

Thanks for the response. Main design goal of the new APIs is to give
flexibility to tableam developers in implementing multi insert logic
dependent on the underlying storage engine. Currently, for all the
underlying storage engines, we follow the same multi insert logic such
as when and how to flush the buffered tuples, tuple size calculation,
and this logic doesn't take into account the underlying storage engine
capabilities. Please have a look at [1] where this point was brought
up by @Luc Vlaming. The subsequent discussion went on to some level of
agreement on the proposed APIs.

I want to clarify that avoiding duplicate multi insert code (for COPY,
CTAS, CREATE/REFRESH MAT VIEW and INSERT SELECTs) is a byproduct(not a
main design goal) if we implement the new APIs for heap AM. I feel
sorry for projecting the goal as avoiding duplicate code earlier.

I also want to mention that @Andres Freund visualized similar kinds of
APIs in [2].

I tried to keep the API as generic as possible, please have a look at
the new structure and APIs [3].

Thoughts?

[1] - https://www.postgresql.org/message-id/ca3dd08f-4ce0-01df-ba30-e9981bb0d54e%40swarm64.com
[2] - https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de
[3] -
https://www.postgresql.org/message-id/CALj2ACV8_O651C2zUqrVSRFDJkp8%3DTMwSdG9%2BmDGL%2BvF6CD%2BAQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Julien Rouhaud
Date: 03 December 2020, 08:53:59
Subject: Re: pg_stat_statements oddity with track = all

From: Peter Eisentraut
Date: 03 December 2020, 09:24:03
Subject: Re: Improper use about DatumGetInt32

Re: Multi Inserts in CREATE TABLE AS - revived patch - Mailing list pgsql-hackers

Previous

Next