Re: New Table Access Methods for Multi and Single Inserts - Mailing list pgsql-hackers

From Luc Vlaming
Subject Re: New Table Access Methods for Multi and Single Inserts
Date
Msg-id c2ba9c52-8c5a-13f0-7d50-b569f7c93840@swarm64.com
Whole thread Raw
In response to Re: New Table Access Methods for Multi and Single Inserts  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On 06-01-2021 14:06, Bharath Rupireddy wrote:
> On Wed, Jan 6, 2021 at 12:56 PM Luc Vlaming <luc@swarm64.com> wrote:
>> The main reason for me for wanting a single API is that I would like the
>> decision of using single or multi inserts to move to inside the tableam.
>> For e.g. a heap insert we might want to put the threshold at e.g. 100
>> rows so that the overhead of buffering the tuples is actually
>> compensated. For other tableam this logic might also be quite different,
>> and I think therefore that it shouldn't be e.g. COPY or CTAS deciding
>> whether or not multi inserts should be used. Because otherwise the thing
>> we'll get is that there will be tableams that will ignore this flag and
>> do their own thing anyway. I'd rather have an API that gives all
>> necessary information to the tableam and then make the tableam do "the
>> right thing".
>>
>> Another reason I'm suggesting this API is that I would expect that the
>> begin is called in a different place in the code for the (multiple)
>> inserts than the actual insert statement.
>> To me conceptually the begin and end are like e.g. the executor begin
>> and end: you prepare the inserts with the knowledge you have at that
>> point. I assumed (wrongly?) that during the start of the statement one
>> knows best how many rows are coming; and then the actual insertion of
>> the row doesn't have to deal anymore with multi/single inserts, choosing
>> when to buffer or not, because that information has already been given
>> during the initial phase. One of the reasons this is appealing to me is
>> that e.g. in [1] there was discussion on when to switch to a multi
>> insert state, and imo this should be up to the tableam.
> 
> Agree that whether to go with the multi or single inserts should be
> completely left to tableam implementation, we, as callers of those API
> just need to inform whether we expect single or multiple rows, and it
> should be left to tableam implementation whether to actually go with
> buffering or single inserts. ISTM that it's an elegant way of making
> the API generic and abstracting everything from the callers. What I
> wonder is how can we know in advance the expected row count that we
> need to pass in to heap_insert_begin()? IIUC, we can not estimate the
> upcoming rows in COPY, Insert Into Select, or Refresh Mat View or some
> other insert queries?  Of course, we can look at the planner's
> estimated row count for the selects in COPY, Insert Into Select or
> Refresh Mat View after the planning, but to me that's not something we
> can depend on and pass in the row count to the insert APIs.
> 
> When we don't know the expected row count, why can't we(as callers of
> the APIs) tell the APIs something like, "I'm intending to perform
> multi inserts, so if possible and if you have a mechanism to buffer
> the slots, do it, otherwise insert the tuples one by one, or else do
> whatever you want to do with the tuples I give it you". So, in case of
> COPY we can ask the API for multi inserts and call heap_insert_begin()
> and heap_insert_v2().
> 

I thought that when it is available (because of planning) it would be 
nice to pass it in. If you don't know you could pass in a 1 for doing 
single inserts, and e.g. -1 or max-int for streaming. The reason I 
proposed it is so that tableam's have as much knowledge as posisble to 
do the right thing. is_multi does also work of course but is just 
somewhat less informative.

What to me seemed somewhat counterintuitive is that with the proposed 
API it is possible to say is_multi=true and then still call 
heap_insert_v2 to do a single insert.

> Given the above explanation, I still feel bool is_multi would suffice.
> 
> Thoughts?
> 
> On dynamically, switching from single to multi inserts, this can be
> done by heap_insert_v2 itself. The way I think it's possible is that,
> say we have some threshold row count 1000(can be a macro)  after
> inserting those many tuples, heap_insert_v2 can switch to buffering
> mode.

For that I thought it'd be good to use the expected row count, but yeah 
dynamically switching also works and might work better if the expected 
row counts are usually off.

> 
> Thoughts?
> 
>> Which would make the code something like:
>>
>> void
>> heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
>> {
>>          TupleTableSlot  *batchslot;
>>          HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
>>          Size sz;
>>
>>          Assert(mistate && mistate->slots);
>>
>>          if (mistate->slots[mistate->cur_slots] == NULL)
>>                  mistate->slots[mistate->cur_slots] =
>>                                                                          table_slot_create(state->rel, NULL);
>>
>>          batchslot = mistate->slots[mistate->cur_slots];
>>
>>          ExecClearTuple(batchslot);
>>          ExecCopySlot(batchslot, slot);
>>
>>          /*
>>           * Calculate the tuple size after the original slot is copied, because the
>>           * copied slot type and the tuple size may change.
>>           */
>>          sz = GetTupleSize(batchslot, mistate->max_size);
>>
>>          Assert(sz > 0);
>>
>>          mistate->cur_slots++;
>>          mistate->cur_size += sz;
>>
>>          if (mistate->cur_slots >= mistate->max_slots ||
>>                  mistate->cur_size >= mistate->max_size)
>>                  heap_multi_insert_flush(state);
>> }
> 
> I think clearing tuples before copying the slot as you suggested may
> work without the need of clear_slots flag.

ok, cool :)

> 
>>
>>>   > Also, why do we want to do ExecClearTuple() anyway? Isn't
>>>   > it good enough that the next call to ExecCopySlot will effectively clear
>>>   > it out?
>>>
>>> For virtual, heap, minimal tuple slots, yes ExecCopySlot slot clears the
>>> slot before copying. But, for buffer heap slots, the
>>> tts_buffer_heap_copyslot does not always clear the destination slot, see
>>> below. If we fall into else condition, we might get some issues. And
>>> also note that, once the slot is cleared in ExecClearTuple, it will not
>>> be cleared again in ExecCopySlot because TTS_SHOULDFREE(slot) will be
>>> false. That is why, let's have ExecClearTuple as is.
>>>
>> I had no idea the buffer heap slot doesn't unconditionally clear out the
>> slot :( So yes lets call it unconditionally ourselves. See also
>> suggestion above.
> 
> Yeah, we will clear the tuple slot before copy to be on the safer side.
> 

ok

>>>       /*
>>>        * If the source slot is of a different kind, or is a buffer slot
>>> that has
>>>        * been materialized / is virtual, make a new copy of the tuple.
>>> Otherwise
>>>        * make a new reference to the in-buffer tuple.
>>>        */
>>>       if (dstslot->tts_ops != srcslot->tts_ops ||
>>>           TTS_SHOULDFREE(srcslot) ||
>>>           !bsrcslot->base.tuple)
>>>       {
>>>           MemoryContext oldContext;
>>>
>>>           ExecClearTuple(dstslot);
>>>       }
>>>       else
>>>       {
>>>           Assert(BufferIsValid(bsrcslot->buffer));
>>>
>>>           tts_buffer_heap_store_tuple(dstslot, bsrcslot->base.tuple,
>>>                                       bsrcslot->buffer, false);
>>>
>>>   > - flushed -> why is this a stored boolean? isn't this indirectly encoded
>>>   > by cur_slots/cur_size == 0?
>>>
>>> Note that cur_slots is in HeapMultiInsertState and outside of the new
>>> APIs i.e. in TableInsertState, mistate is a void pointer, and we can't
>>> really access the cur_slots. I mean, we can access but we need to be
>>> dereferencing using the tableam kind. Instead of  doing all of that, to
>>> keep the API cleaner, I chose to have a boolean in the TableInsertState
>>> which we can see and use outside of the new APIs. Hope that's fine.
>>>
>> So you mean the flushed variable is actually there to tell the user of
>> the API that they are supposed to call flush before end? Why can't the
>> end call flush itself then? I guess I completely misunderstood the
>> purpose of table_multi_insert_flush being public. I had assumed it is
>> there to from the usage site indicate that now would be a good time to
>> flush, e.g. because of a statement ending or something. I had not
>> understood this is a requirement that its always required to do
>> table_multi_insert_flush + table_insert_end.
>> IMHO I would hide this from the callee, given that you would only really
>> call flush yourself when you immediately after would call end, or are
>> there other cases where one would be required to explicitly call flush?
> 
> We need to know outside the multi_insert API whether the buffered
> slots in case of multi inserts are flushed. Reason is that if we have
> indexes or after row triggers, currently we call ExecInsertIndexTuples
> or ExecARInsertTriggers on the buffered slots outside the API in a
> loop after the flush.
> 
> If we agree on removing heap_multi_insert_v2 API and embed that logic
> inside heap_insert_v2, then we can do this - pass the required
> information and the functions ExecInsertIndexTuples and
> ExecARInsertTriggers as callbacks so that, whether or not
> heap_insert_v2 choses single or multi inserts, it can callback these
> functions with the required information passed after the flush. We can
> add the callback and required information into TableInsertState. But,
> I'm not quite sure, we would make ExecInsertIndexTuples and
> ExecARInsertTriggers. And in
> 
> If we don't want to go with callback way, then at least we need to
> know whether or not heap_insert_v2 has chosen multi inserts, if yes,
> the buffered slots array, and the number of current buffered slots,
> whether they are flushed or not in the TableInsertState. Then,
> eventually, we might need all the HeapMultiInsertState info in the
> TableInsertState.
> 

To me the callback API seems cleaner, that on heap_insert_begin we can 
pass in a callback that is called on every flushed slot, or only on 
multi-insert flushes. Is there a reason it would only be done for 
multi-insert flushes or can it be generic?

> Thoughts?
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
> 

Hi,

Replied inline.

Kind regards,
Luc



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: O(n^2) system calls in RemoveOldXlogFiles()
Next
From: Amit Kapila
Date:
Subject: Re: adding partitioned tables to publications