Re: Multi Inserts in CREATE TABLE AS - revived patch - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Multi Inserts in CREATE TABLE AS - revived patch |
Date | |
Msg-id | CALj2ACXr5d48+f=4P2Hrdox8sq+Jhz1_dU_TS-2V_4A3zb+zxg@mail.gmail.com Whole thread Raw |
In response to | Re: Multi Inserts in CREATE TABLE AS - revived patch (Dilip Kumar <dilipbalaut@gmail.com>) |
List | pgsql-hackers |
On Thu, Dec 3, 2020 at 1:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Nov 30, 2020 at 10:49 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > Currently, required logic for multi inserts (such as buffer slots allocation, flushing, tuple size calculation to decidewhen to flush, cleanup and so on) is being handled outside of the existing tableam APIs. And there are a good numberof cases where multi inserts can be used, such as for existing COPY or for CTAS, CREATE/REFRESH MATERIALIZED VIEW [proposedin this thread], and INSERT INTO SELECTs [here] which are currently under discussion. Handling the same multi insertslogic in many places is error prone and duplicates most of the code. To avoid this, proposing here are generic tableamAPIs, that can be used in all the cases and which also gives the flexibility to tableam developers in implementingmulti inserts logic dependent on the underlying storage engine[1]. > > > > I would like to seek thoughts/opinions on the proposed new APIs. Once reviewed, I will start implementing them. > > IMHO, if we think that something really specific to the tableam then > it makes sense to move it there. But just to avoid duplicating the > code it might not be the best idea. Instead, you can write some > common functions and we can call them from different places. So if > something is very much common and will not vary based on the storage > type we can keep it outside the tableam interface however we can move > them into some common functions to avoid duplication. > Thanks for the response. Main design goal of the new APIs is to give flexibility to tableam developers in implementing multi insert logic dependent on the underlying storage engine. Currently, for all the underlying storage engines, we follow the same multi insert logic such as when and how to flush the buffered tuples, tuple size calculation, and this logic doesn't take into account the underlying storage engine capabilities. Please have a look at [1] where this point was brought up by @Luc Vlaming. The subsequent discussion went on to some level of agreement on the proposed APIs. I want to clarify that avoiding duplicate multi insert code (for COPY, CTAS, CREATE/REFRESH MAT VIEW and INSERT SELECTs) is a byproduct(not a main design goal) if we implement the new APIs for heap AM. I feel sorry for projecting the goal as avoiding duplicate code earlier. I also want to mention that @Andres Freund visualized similar kinds of APIs in [2]. I tried to keep the API as generic as possible, please have a look at the new structure and APIs [3]. Thoughts? [1] - https://www.postgresql.org/message-id/ca3dd08f-4ce0-01df-ba30-e9981bb0d54e%40swarm64.com [2] - https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de [3] - https://www.postgresql.org/message-id/CALj2ACV8_O651C2zUqrVSRFDJkp8%3DTMwSdG9%2BmDGL%2BvF6CD%2BAQ%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: