Re: New Table Access Methods for Multi and Single Inserts - Mailing list pgsql-hackers

From Luc Vlaming
Subject Re: New Table Access Methods for Multi and Single Inserts
Date
Msg-id 508af801-6356-d36b-1867-011ac6df8f55@swarm64.com
Whole thread Raw
In response to Re: New Table Access Methods for Multi and Single Inserts  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: New Table Access Methods for Multi and Single Inserts  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On 05-01-2021 11:06, Bharath Rupireddy wrote:
> On Mon, Jan 4, 2021 at 1:29 PM Luc Vlaming <luc@swarm64.com 
> <mailto:luc@swarm64.com>> wrote:
>  >  > table AM patch [2] be reviewed further?
>  > As to the patches themselves:
>  >
>  > I think the API is a huge step forward! I assume that we want to have a
>  > single-insert API like heap_insert_v2 so that we can encode the
>  > knowledge that there will just be a single insert coming and likely a
>  > commit afterwards?
>  >
>  > Reason I'm asking is that I quite liked the heap_insert_begin parameter
>  > is_multi, which could even be turned into a "expected_rowcount" of the
>  > amount of rows expected to be commited in the transaction (e.g. single,
>  > several, thousands/stream).
>  > If we were to make the API based on expected rowcounts, the whole
>  > heap_insert_v2, heap_insert and heap_multi_insert could be turned into a
>  > single function heap_insert, as the knowledge about buffering of the
>  > slots is then already stored in the TableInsertState, creating an API 
> like:
>  >
>  > // expectedRows: -1 = streaming, otherwise expected rowcount.
>  > TableInsertState* heap_insert_begin(Relation rel, CommandId cid, int
>  > options, int expectedRows);
>  > heap_insert(TableInsertState *state, TupleTableSlot *slot);
>  >
>  > Do you think that's a good idea?
> 
> IIUC, your suggestion is to use expectedRows and move the multi insert 
> implementation heap_multi_insert_v2 to heap_insert_v2. If that's 
> correct, so heap_insert_v2 will look something like this:
> 
> heap_insert_v2()
> {
>      if (single_insert)
>        //do single insertion work, the code in existing heap_insert_v2 
> comes here
>     else
>        //do multi insertion work, the code in existing 
> heap_multi_insert_v2 comes here
> }
> 
> I don't see any problem in combining single and multi insert APIs into 
> one. Having said that, will the APIs be cleaner then? Isn't it going to 
> be confusing if a single heap_insert_v2 API does both the works? With 
> the existing separate APIs, for single insertion, the sequence of the 
> API can be like begin, insert_v2, end and for multi inserts it's like 
> begin, multi_insert_v2, flush, end. I prefer to have a separate multi 
> insert API so that it will make the code look readable.
> 
> Thoughts?

The main reason for me for wanting a single API is that I would like the 
decision of using single or multi inserts to move to inside the tableam.
For e.g. a heap insert we might want to put the threshold at e.g. 100 
rows so that the overhead of buffering the tuples is actually 
compensated. For other tableam this logic might also be quite different, 
and I think therefore that it shouldn't be e.g. COPY or CTAS deciding 
whether or not multi inserts should be used. Because otherwise the thing 
we'll get is that there will be tableams that will ignore this flag and 
do their own thing anyway. I'd rather have an API that gives all 
necessary information to the tableam and then make the tableam do "the 
right thing".

Another reason I'm suggesting this API is that I would expect that the 
begin is called in a different place in the code for the (multiple) 
inserts than the actual insert statement.
To me conceptually the begin and end are like e.g. the executor begin 
and end: you prepare the inserts with the knowledge you have at that 
point. I assumed (wrongly?) that during the start of the statement one 
knows best how many rows are coming; and then the actual insertion of 
the row doesn't have to deal anymore with multi/single inserts, choosing 
when to buffer or not, because that information has already been given 
during the initial phase. One of the reasons this is appealing to me is 
that e.g. in [1] there was discussion on when to switch to a multi 
insert state, and imo this should be up to the tableam.

> 
>  > Two smaller things I'm wondering:
>  > - the clear_mi_slots; why is this not in the HeapMultiInsertState? the
>  > slots themselves are declared there?
> 
> Firstly, we need to have the buffered slots sometimes(please have a look 
> at the comments in TableInsertState structure) outside the multi_insert 
> API. And we need to have cleared the previously flushed slots before we 
> start buffering in heap_multi_insert_v2(). I can remove the 
> clear_mi_slots flag altogether and do as follows: I will not set 
> mistate->cur_slots to 0 in heap_multi_insert_flush after the flush, I 
> will only set state->flushed to true. In heap_multi_insert_v2,
> 
> void
> heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
> {
>      TupleTableSlot  *batchslot;
>      HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
>      Size sz;
> 
>      Assert(mistate && mistate->slots);
> 
> *  /* if the slots are flushed previously then clear them off before 
> using them again. */
>      if (state->flushed)
>      {
>          int i;
> 
>          for (i = 0; i < mistate->cur_slots; i++)
>              ExecClearTuple(mistate->slots[i]);
> 
>          mistate->cur_slots = 0;
>          state->flushed = false
>      }*
> 
>      if (mistate->slots[mistate->cur_slots] == NULL)
>          mistate->slots[mistate->cur_slots] =
>                                      table_slot_create(state->rel, NULL);
> 
>      batchslot = mistate->slots[mistate->cur_slots];
> 
>      ExecCopySlot(batchslot, slot);
> 
> Thoughts?

 From what I can see you can just keep the v2-0001 patch and:
- remove the flushed variable alltogether. mistate->cur_slots == 0 
encodes this already and the variable is never actually checked on.
- call ExecClearTuple just before ExecCopySlot()

Which would make the code something like:

void
heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
{
    TupleTableSlot  *batchslot;
    HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
    Size sz;

    Assert(mistate && mistate->slots);

    if (mistate->slots[mistate->cur_slots] == NULL)
        mistate->slots[mistate->cur_slots] =
                                    table_slot_create(state->rel, NULL);

    batchslot = mistate->slots[mistate->cur_slots];

    ExecClearTuple(batchslot);
    ExecCopySlot(batchslot, slot);

    /*
     * Calculate the tuple size after the original slot is copied, because the
     * copied slot type and the tuple size may change.
     */
    sz = GetTupleSize(batchslot, mistate->max_size);

    Assert(sz > 0);

    mistate->cur_slots++;
    mistate->cur_size += sz;

    if (mistate->cur_slots >= mistate->max_slots ||
        mistate->cur_size >= mistate->max_size)
        heap_multi_insert_flush(state);
}

void
heap_multi_insert_flush(TableInsertState *state)
{
    HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
    MemoryContext oldcontext;

    Assert(mistate && mistate->slots && mistate->cur_slots >= 0 &&
           mistate->context);

    if (mistate->cur_slots == 0)
        return;

    oldcontext = MemoryContextSwitchTo(mistate->context);

    heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
                      state->cid, state->options, state->bistate);

    MemoryContextReset(mistate->context);
    MemoryContextSwitchTo(oldcontext);

    /*
     * Do not clear the slots always. Sometimes callers may want the slots for
     * index insertions or after row trigger executions in which case they have
     * to clear the tuples before using for the next insert batch.
     */
    if (state->clear_mi_slots)
    {
        int i;

        for (i = 0; i < mistate->cur_slots; i++)
            ExecClearTuple(mistate->slots[i]);
    }

    mistate->cur_slots = 0;
    mistate->cur_size = 0;
}


> 
>  > Also, why do we want to do ExecClearTuple() anyway? Isn't
>  > it good enough that the next call to ExecCopySlot will effectively clear
>  > it out?
> 
> For virtual, heap, minimal tuple slots, yes ExecCopySlot slot clears the 
> slot before copying. But, for buffer heap slots, the 
> tts_buffer_heap_copyslot does not always clear the destination slot, see 
> below. If we fall into else condition, we might get some issues. And
> also note that, once the slot is cleared in ExecClearTuple, it will not 
> be cleared again in ExecCopySlot because TTS_SHOULDFREE(slot) will be 
> false. That is why, let's have ExecClearTuple as is.
> 
I had no idea the buffer heap slot doesn't unconditionally clear out the 
slot :( So yes lets call it unconditionally ourselves. See also 
suggestion above.

>      /*
>       * If the source slot is of a different kind, or is a buffer slot 
> that has
>       * been materialized / is virtual, make a new copy of the tuple. 
> Otherwise
>       * make a new reference to the in-buffer tuple.
>       */
>      if (dstslot->tts_ops != srcslot->tts_ops ||
>          TTS_SHOULDFREE(srcslot) ||
>          !bsrcslot->base.tuple)
>      {
>          MemoryContext oldContext;
> 
>          ExecClearTuple(dstslot);
>      }
>      else
>      {
>          Assert(BufferIsValid(bsrcslot->buffer));
> 
>          tts_buffer_heap_store_tuple(dstslot, bsrcslot->base.tuple,
>                                      bsrcslot->buffer, false);
> 
>  > - flushed -> why is this a stored boolean? isn't this indirectly encoded
>  > by cur_slots/cur_size == 0?
> 
> Note that cur_slots is in HeapMultiInsertState and outside of the new 
> APIs i.e. in TableInsertState, mistate is a void pointer, and we can't 
> really access the cur_slots. I mean, we can access but we need to be 
> dereferencing using the tableam kind. Instead of  doing all of that, to 
> keep the API cleaner, I chose to have a boolean in the TableInsertState 
> which we can see and use outside of the new APIs. Hope that's fine.
> 
So you mean the flushed variable is actually there to tell the user of 
the API that they are supposed to call flush before end? Why can't the 
end call flush itself then? I guess I completely misunderstood the 
purpose of table_multi_insert_flush being public. I had assumed it is 
there to from the usage site indicate that now would be a good time to 
flush, e.g. because of a statement ending or something. I had not 
understood this is a requirement that its always required to do 
table_multi_insert_flush + table_insert_end.
IMHO I would hide this from the callee, given that you would only really 
call flush yourself when you immediately after would call end, or are 
there other cases where one would be required to explicitly call flush?

> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com>

Kind regards,
Luc



pgsql-hackers by date:

Previous
From: Luc Vlaming
Date:
Subject: Re: New Table Access Methods for Multi and Single Inserts
Next
From: Michael Banck
Date:
Subject: Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)