Re: Multi Inserts in CREATE TABLE AS - revived patch - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Multi Inserts in CREATE TABLE AS - revived patch
Date
Msg-id CALj2ACUSpjR5vUNX=Z+Y8Vp3xxQ1LYVKwqAPOZNQi5_OUOkv0w@mail.gmail.com
Whole thread Raw
In response to Re: Multi Inserts in CREATE TABLE AS - revived patch  (Luc Vlaming <luc@swarm64.com>)
Responses Re: Multi Inserts in CREATE TABLE AS - revived patch  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On Fri, Nov 27, 2020 at 12:22 PM Luc Vlaming <luc@swarm64.com> wrote:
>
> >
> > I wonder, how can the do_multi_insert() API decide on when to flush, I
> > mean, based on the number of pages to flush? Do we need to pass the
> > maximum number of pages the buffered tuples can occupy and track the
> > pages currently buffered tuples occupy to decide when to flush? Or is
> > it something that the existing table AM infrastructure already
> > supports? If we use the number of pages to decide on when to flush,
> > how well it works with parallel inserts?
>
> I was assuming each tableam to use its own logic, based on its needs and
> the tradeoffs a storage engine might want to provide. This does not mean
> it should not consider outside parameters, like the aforementioned
> memory usage.
> I think it would imply that each tableam implements its own tracking
> mechanism for how much has accumulated, how, and when to flush, because
> they might track different statistics. IMHO given that each tableam
> anyway would want to implement its own logic on how to store a slot into
> a page, tracking the logic for tracking these statistics seemed minor to
> me. Maybe I missed some parts that should be extracted out to a generic
> interface however?
>

Agree with you that tracking page level or some other info is
dependent on table am implementations.

>
> Some examples of why a tableam could decide on its own on when to flush:
> - the current heap implementation could accumulate a few pages (say up
> to 64) and thereby limit the amount of calls to write() and limit the
> accompanying blocks/context switches. This would also then make the
> writes more sequential wrt the processes which can help with the
> flushing I presume, like how the sequential scan was optimized to
> process a consequtive set of blocks per worker (see
> table_block_parallelscan_nextpage).
> - something like zheap could accumulate data based on the amount of
> columns so that a page with column data is completely filled, thereby
> limiting the write amplification.
> - something that would implement an lsm storage might accumulate a full
> in-memory level before flushing it out.
>

Thanks for the details.

>
> >> One thing I'm wondering is in which memory context the slots end up
> >> being allocated. I'd assume we would want to keep the slots around
> >> between flushes. If they are in the temporary context this might prove
> >> problematic however?
> >>
> >
> > I should not have used the word temporary, it actually is not
> > temporary. This memory conext will be created in begin_multi_insert(),
> > all the buffered tuples are copied using this context, it will be
> > reset at the end of each flush and reused. It can get destroyed at the
> > end in end_multi_insert(). I think we should even do this with the new
> > APIs implementation.
> >
>
> Okay. But in which context are the slots themselves allocated then?
> Because if we allocate the slots themselves in the context and then
> reset with each flush we would have to also re-allocate the slots every
> flush, which seems wasteful to me?
>

Buffer slots are allocated in the memory context in which the new APIs
get called. We don't have to re-allocate the slots every time after
flushing, but we have to clear them using ExecClearTuple() and reuse.

And the memory context I specified in the MultiInsertStateData
structure is for using table_multi_insert() inside the new
do_multi_insert API after we decide to flush. There's a comment in the
existing table_multi_insert() usage in copy code, which says that
table_multi_insert() may leak the memory, for the same reason we need
that temporary memory context, which gets set just before
table_multi_insert(), and reset after that. This happens for each
batch of tuples. And in the end this context can be deleted in the
end_multi_insert API.

Hope this helps.

I'm planning to summarize and post the new APIs description here again
for other opinions.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Luc Vlaming
Date:
Subject: Re: Parallel Inserts in CREATE TABLE AS
Next
From: Bharath Rupireddy
Date:
Subject: Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module