Re: Multi Inserts in CREATE TABLE AS - revived patch - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Multi Inserts in CREATE TABLE AS - revived patch
Date
Msg-id CALj2ACVEXJhwdgrTwNrhxTdEfUH_CgyViJiHSUYD6y+k_OoUXg@mail.gmail.com
Whole thread Raw
In response to Re: Multi Inserts in CREATE TABLE AS - revived patch  (Luc Vlaming <luc@swarm64.com>)
Responses Re: Multi Inserts in CREATE TABLE AS - revived patch  (Luc Vlaming <luc@swarm64.com>)
List pgsql-hackers
On Thu, Nov 26, 2020 at 5:34 PM Luc Vlaming <luc@swarm64.com> wrote:
>
> On 26-11-2020 12:36, Bharath Rupireddy wrote:
> > Few things:
> >
> > IIUC Andres mentioned similar kinds of APIs earlier in [1].
> >
> > [1] -
> > https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de
> > <https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de>
> >
> > I would like to add some more info to one of the API:
> >
> > typedef struct MultiInsertStateData
> > {
> >      MemoryContext         micontext; /* A temporary memory context for
> > multi insert. */
> >      BulkInsertStateData *bistate;   /* Bulk insert state. */
> >      TupleTableSlot      **mislots; /* Array of buffered slots. */
> >      uint32              nslots; /* Total number of buffered slots. */
> >      int64              nbytes; /* Flush buffers if the total tuple size
> >  >= nbytes. */
> >      int32              nused; /* Number of current buffered slots for a
> > multi insert batch. */
> >      int64              nsize; /* Total tuple size for a multi insert
> > batch. */
> > } MultiInsertStateData;
> >
> > /* Creates a temporary memory context, allocates the
> > MultiInsertStateData, BulkInsertStateData and initializes other members. */
> >      void        (*begin_multi_insert) (Relation rel,
> > MultiInsertStateData **mistate, uint32 nslots, uint64 nbytes);
> >
> > /* Buffers the input slot into mistate slots, computes the size of the
> > tuple, and adds it total buffer tuple size, if this size crosses
> > mistate->nbytes, flush the buffered tuples into table. For heapam,
> > existing heap_multi_insert can be used. Once the buffer is flushed, then
> > the micontext can be reset and buffered slots can be cleared. *If nbytes
> > i.e. total tuple size of the batch is not given, tuple size is not
> > calculated, tuples are buffered until all the nslots are filled and then
> > flushed.* */
> >      void        (*do_multi_insert) (Relation rel, MultiInsertStateData
> > *mistate, struct TupleTableSlot *slot, CommandId cid, int options);
> >
> > /* Flush the buffered tuples if any. For heapam, existing
> > heap_multi_insert can be used. Deletes temporary memory context and
> > deallocates mistate. */
> >      void        (*end_multi_insert) (Relation rel, MultiInsertStateData
> > *mistate, CommandId cid, int options);
>
> Looks all good to me, except for the nbytes part.
> Could you explain to me what use case that supports? IMHO the tableam
> can best decide itself that its time to flush, based on its
> implementation that e.g. considers how many pages to flush at a time and
> such, etc? This means also that most of the fields of
> MultiInsertStateData can be private as each tableam would return a
> derivative of that struct (like with the destreceivers).
>

nbytes is basically to support the following case, say the number of
tuples to buffer is 1000, and if all the tuples are toasted with size
in few hundred MB or even GB, then do we want to wait until 1000
tuples are buffered in which case we occupy for one query 1000*toasted
tuple size in GB. So, if we have a memory limit, then it will give
flexibility. Whether to use it or not is up to the table AM
implementation. And also that existing copy code(since it can know the
tuple size after parsing input data) uses this mechanism to decide
when to flush.

If the nbytes is not used in a table am, then the multi insert can
wait until the total tuples, how much ever large memory they occupy,
are buffered.

IMO, we can retain nbytes for now to decide on when to flush. Thoughts?

I wonder, how can the do_multi_insert() API decide on when to flush, I
mean, based on the number of pages to flush? Do we need to pass the
maximum number of pages the buffered tuples can occupy and track the
pages currently buffered tuples occupy to decide when to flush? Or is
it something that the existing table AM infrastructure already
supports? If we use the number of pages to decide on when to flush,
how well it works with parallel inserts?

>
> One thing I'm wondering is in which memory context the slots end up
> being allocated. I'd assume we would want to keep the slots around
> between flushes. If they are in the temporary context this might prove
> problematic however?
>

I should not have used the word temporary, it actually is not
temporary. This memory conext will be created in begin_multi_insert(),
all the buffered tuples are copied using this context, it will be
reset at the end of each flush and reused. It can get destroyed at the
end in end_multi_insert(). I think we should even do this with the new
APIs implementation.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Allow some recovery parameters to be changed with reload
Next
From: Alvaro Herrera
Date:
Subject: Re: configure and DocBook XML