Re: Batch insert in CTAS/MatView code - Mailing list pgsql-hackers

From Paul Guo
Subject Re: Batch insert in CTAS/MatView code
Date
Msg-id CAEET0ZH2o95dcRkB26a-hizFP4CXTmcLkPekxRD1kCno8nvYMg@mail.gmail.com
Whole thread Raw
In response to Re: Batch insert in CTAS/MatView code  (Asim R P <apraveen@pivotal.io>)
List pgsql-hackers

Asim Thanks for the review.

On Wed, Sep 25, 2019 at 6:39 PM Asim R P <apraveen@pivotal.io> wrote:



On Mon, Sep 9, 2019 at 4:02 PM Paul Guo <pguo@pivotal.io> wrote:
>
> So in theory
> we should not worry about additional tuple copy overhead now, and then I tried the patch without setting
> multi-insert threshold as attached.
>

I reviewed your patch today.  It looks good overall.  My concern is that the ExecFetchSlotHeapTuple call does not seem appropriate.  In a generic place such as createas.c, we should be using generic tableam API only.  However, I can also see that there is no better alternative.  We need to compute the size of accumulated tuples so far, in order to decide whether to stop accumulating tuples.  There is no convenient way to obtain the length of the tuple, given a slot.  How about making that decision solely based on number of tuples, so that we can avoid ExecFetchSlotHeapTuple call altogether?

For heapam, ExecFetchSlotHeapTuple() will be called again in heap_multi_insert() to prepare the final multi-insert. if we check ExecFetchSlotHeapTuple(), we could find that calling it multiple time just involves very very few overhead for the BufferHeapTuple case. Note for virtual tuple case the 2nd ExecFetchSlotHeapTuple() call still copies slot contents, but we've called ExecCopySlot(batchslot, slot); to copy to a BufferHeap case so no worries for the virtual tuple case (as a source). 

Previously (long ago) I probably understood the code incorrectly so had the concern also. I used sampling to do that (for variable-length tuple), but now apparently we do not need that.

The multi insert copy code deals with index tuples also, which I don't see in the patch.  Don't we need to consider populating indexes?

create table as/create mat view DDL does not involve index creation for the table/matview. The code seems to be able to used in RefreshMatView also, for that we need to consider if we use multi-insert in that code.
 

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: range test for hash index?
Next
From: Paul Guo
Date:
Subject: Re: Batch insert in CTAS/MatView code