Re: Multi Inserts in CREATE TABLE AS - revived patch - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Multi Inserts in CREATE TABLE AS - revived patch
Date
Msg-id CALj2ACX0b_WxmPHUELBY1ntW603h_GQ=yE0BX05QQ+VHHPPRDQ@mail.gmail.com
Whole thread Raw
In response to Re: Multi Inserts in CREATE TABLE AS - revived patch  (Luc Vlaming <luc@swarm64.com>)
Responses Re: Multi Inserts in CREATE TABLE AS - revived patch  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Wed, Nov 25, 2020 at 2:11 PM Luc Vlaming <luc@swarm64.com> wrote:
>
> Thanks for reviving the patch! I did unfortunately have to shift my
> priorities somewhat and did not find much time to work on open source
> things the last week(s).
>

Thanks for the comments.

>
> I'm wondering about the use of the GetTupleSize function. As far as I
> understand the idea is to limit the amount of buffered data, presumably
> to not write too much data at once for intorel_flush_multi_insert.
> If I understood correctly how it all works, the table slot can however
> be of different type than the source slot, which makes that the call to
> CopySlot() potentially stores a different amount of data than computed
> by GetTupleSize(). Not sure if this is a big problem as an estimation
> might be good enough?
>

Yeah. The tuple size may change after ExecCopySlot(). For instance, create table t2 as select a1 from t1; where t1 has two integer columns a1, b1. I'm creating t2 with single column a1 from t1 which makes the source slot virtual.

Source slot is virtual and the size calculated with GetTupleSize() is 8 bytes:
(gdb) p *slot
$18 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 1,
  tts_ops = 0x562c592652c0 <TTSOpsVirtual>,
  tts_tupleDescriptor = 0x562c5a0409f0, tts_values = 0x562c5a040b50,
  tts_isnull = 0x562c5a040b58, tts_mcxt = 0x562c5a040320, tts_tid = {
    ip_blkid = {bi_hi = 65535, bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0}
(gdb) call GetTupleSize(slot, 65535)
$24 = 8

After ExecCopySlot(batchslot, slot), destination slot changes to TTSOpsBufferHeapTuple and the GetTupleSize() gives 28 bytes now.
(gdb) p *batchslot
$19 = {type = T_TupleTableSlot, tts_flags = 20, tts_nvalid = 0,
  tts_ops = 0x562c592653e0 <TTSOpsBufferHeapTuple>,
  tts_tupleDescriptor = 0x7f063fbeecd0, tts_values = 0x562c5a05daa8,
  tts_isnull = 0x562c5a05dab0, tts_mcxt = 0x562c5a040320, tts_tid = {
    ip_blkid = {bi_hi = 65535, bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0}
(gdb) call GetTupleSize(batchslot, 65535)
$25 = 28

I think your suggestion to call GetTupleSize() on the destination slot after ExecCopySlot() is right. I changed it in the v4 patch.

>
> Some other solutions/implementations would be:
> - compute the size after doing CopySlot. Maybe the relation never wants
> a virtual tuple and then you can also simplify GetTupleSize?
>

I think we need to have TTSOpsVirtual code in GetTupleSize() because table_slot_create() which gets called before ExecCopySlot() may create virtual slots for cases such as views and partitioned tables. Though we can not insert into views or partitioned tables using CTAS, I want GetTupleSize() to be a generic function. Right now, I can not find other use cases where GetTupleSize() can be used.

>
> - after CopySlot ask for the memory consumed in the slot using
> MemoryContextMemAllocated.
>

MemoryContextMemAllocated of the slot's tts_mcxt will always have extra bytes and those extra bytes are way more compared to the actual tuple bytes. And most of the time, ExecCopySlot() will just point the src slot tts_mcxt to dest slot  tts_mcxt. For instance, for a single row with a single integer column of 8 bytes, the mem_allocated is 49232 bytes. This is the reason we can not rely on mem_allocated.

(gdb) p slot->tts_mcxt     -----> source slot
$22 = (MemoryContext) 0x562c5a040320
(gdb) p *slot->tts_mcxt
$20 = {type = T_AllocSetContext, isReset = false, allowInCritSection = false,
  mem_allocated = 49232, methods = 0x562c5926d560 <AllocSetMethods>,
  parent = 0x562c59f97820, firstchild = 0x562c5a042330, prevchild = 0x0,
  nextchild = 0x0, name = 0x562c590d3554 "ExecutorState", ident = 0x0,
  reset_cbs = 0x0}

(gdb) p batchslot->tts_mcxt     -----> destination slot after ExecCopySlot().
$23 = (MemoryContext) 0x562c5a040320
(gdb) p *batchslot->tts_mcxt
$21 = {type = T_AllocSetContext, isReset = false, allowInCritSection = false,
  mem_allocated = 49232, methods = 0x562c5926d560 <AllocSetMethods>,
  parent = 0x562c59f97820, firstchild = 0x562c5a042330, prevchild = 0x0,
  nextchild = 0x0, name = 0x562c590d3554 "ExecutorState", ident = 0x0,
  reset_cbs = 0x0}

>
> Some small things to maybe change are:
> ===========
> +               if (myState->mi_slots[myState->mi_slots_num] == NULL)
> +               {
> +                       batchslot = table_slot_create(myState->rel, NULL);
> +                       myState->mi_slots[myState->mi_slots_num] = batchslot;
> +               }
> +               else
> +                       batchslot = myState->mi_slots[myState->mi_slots_num];
>
> Alternative:
> +               if (myState->mi_slots[myState->mi_slots_num] == NULL)
> +                       myState->mi_slots[myState->mi_slots_num] =
> table_slot_create(myState->rel, NULL);
> +               batchslot = myState->mi_slots[myState->mi_slots_num];
>

Changed.

> ==============
>
> +                       sz = att_align_nominal(sz, att->attalign);
> This could be moved out of the if statement?
>
> ==============

I don't think we can change it. If we were to move it, then sz = att_addlength_datum(sz, att->attlen, val); which takes aligned sz may have problems like below:
Say att_align_nominal sets sz to 4 bytes, then att_addlength_datum takes this 4 bytes adds attlen to it. If we move  att_align_nominal(sz, att->attalign) out, then att_addlength_datum(sz, att->attlen, val) will not consider the aligned bytes. We might have to add up the aligned bytes separately for the else case. And also note that this code is derived from ts_virtual_materialize(), where we have the att_align_nominal inside both if and else blocks. I may be wrong here.

Attaching v4 patch. Consider it for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachment

pgsql-hackers by date:

Previous
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: POC: postgres_fdw insert batching
Next
From: "Hou, Zhijie"
Date:
Subject: RE: Parallel Inserts in CREATE TABLE AS