Home > mailing lists

Re: New Table Access Methods for Multi and Single Inserts - Mailing list pgsql-hackers

From	Bharath Rupireddy
Subject	Re: New Table Access Methods for Multi and Single Inserts
Date	January 6, 2021 16:06:38
Msg-id	CALj2ACUmL3+xLFtVbdcNpo_=ubdi=_nsp6MNq__xWwL=NGkdgA@mail.gmail.com Whole thread Raw
In response to	Re: New Table Access Methods for Multi and Single Inserts (Luc Vlaming <luc@swarm64.com>)
Responses	Re: New Table Access Methods for Multi and Single Inserts Re: New Table Access Methods for Multi and Single Inserts Re: New Table Access Methods for Multi and Single Inserts
List	pgsql-hackers

Tree view

On Wed, Jan 6, 2021 at 12:56 PM Luc Vlaming <luc@swarm64.com> wrote:
> The main reason for me for wanting a single API is that I would like the
> decision of using single or multi inserts to move to inside the tableam.
> For e.g. a heap insert we might want to put the threshold at e.g. 100
> rows so that the overhead of buffering the tuples is actually
> compensated. For other tableam this logic might also be quite different,
> and I think therefore that it shouldn't be e.g. COPY or CTAS deciding
> whether or not multi inserts should be used. Because otherwise the thing
> we'll get is that there will be tableams that will ignore this flag and
> do their own thing anyway. I'd rather have an API that gives all
> necessary information to the tableam and then make the tableam do "the
> right thing".
>
> Another reason I'm suggesting this API is that I would expect that the
> begin is called in a different place in the code for the (multiple)
> inserts than the actual insert statement.
> To me conceptually the begin and end are like e.g. the executor begin
> and end: you prepare the inserts with the knowledge you have at that
> point. I assumed (wrongly?) that during the start of the statement one
> knows best how many rows are coming; and then the actual insertion of
> the row doesn't have to deal anymore with multi/single inserts, choosing
> when to buffer or not, because that information has already been given
> during the initial phase. One of the reasons this is appealing to me is
> that e.g. in [1] there was discussion on when to switch to a multi
> insert state, and imo this should be up to the tableam.

Agree that whether to go with the multi or single inserts should be
completely left to tableam implementation, we, as callers of those API
just need to inform whether we expect single or multiple rows, and it
should be left to tableam implementation whether to actually go with
buffering or single inserts. ISTM that it's an elegant way of making
the API generic and abstracting everything from the callers. What I
wonder is how can we know in advance the expected row count that we
need to pass in to heap_insert_begin()? IIUC, we can not estimate the
upcoming rows in COPY, Insert Into Select, or Refresh Mat View or some
other insert queries?  Of course, we can look at the planner's
estimated row count for the selects in COPY, Insert Into Select or
Refresh Mat View after the planning, but to me that's not something we
can depend on and pass in the row count to the insert APIs.

When we don't know the expected row count, why can't we(as callers of
the APIs) tell the APIs something like, "I'm intending to perform
multi inserts, so if possible and if you have a mechanism to buffer
the slots, do it, otherwise insert the tuples one by one, or else do
whatever you want to do with the tuples I give it you". So, in case of
COPY we can ask the API for multi inserts and call heap_insert_begin()
and heap_insert_v2().

Given the above explanation, I still feel bool is_multi would suffice.

Thoughts?

On dynamically, switching from single to multi inserts, this can be
done by heap_insert_v2 itself. The way I think it's possible is that,
say we have some threshold row count 1000(can be a macro)  after
inserting those many tuples, heap_insert_v2 can switch to buffering
mode.

Thoughts?

> Which would make the code something like:
>
> void
> heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
> {
>         TupleTableSlot  *batchslot;
>         HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
>         Size sz;
>
>         Assert(mistate && mistate->slots);
>
>         if (mistate->slots[mistate->cur_slots] == NULL)
>                 mistate->slots[mistate->cur_slots] =
>                                                                         table_slot_create(state->rel, NULL);
>
>         batchslot = mistate->slots[mistate->cur_slots];
>
>         ExecClearTuple(batchslot);
>         ExecCopySlot(batchslot, slot);
>
>         /*
>          * Calculate the tuple size after the original slot is copied, because the
>          * copied slot type and the tuple size may change.
>          */
>         sz = GetTupleSize(batchslot, mistate->max_size);
>
>         Assert(sz > 0);
>
>         mistate->cur_slots++;
>         mistate->cur_size += sz;
>
>         if (mistate->cur_slots >= mistate->max_slots ||
>                 mistate->cur_size >= mistate->max_size)
>                 heap_multi_insert_flush(state);
> }

I think clearing tuples before copying the slot as you suggested may
work without the need of clear_slots flag.

>
> >  > Also, why do we want to do ExecClearTuple() anyway? Isn't
> >  > it good enough that the next call to ExecCopySlot will effectively clear
> >  > it out?
> >
> > For virtual, heap, minimal tuple slots, yes ExecCopySlot slot clears the
> > slot before copying. But, for buffer heap slots, the
> > tts_buffer_heap_copyslot does not always clear the destination slot, see
> > below. If we fall into else condition, we might get some issues. And
> > also note that, once the slot is cleared in ExecClearTuple, it will not
> > be cleared again in ExecCopySlot because TTS_SHOULDFREE(slot) will be
> > false. That is why, let's have ExecClearTuple as is.
> >
> I had no idea the buffer heap slot doesn't unconditionally clear out the
> slot :( So yes lets call it unconditionally ourselves. See also
> suggestion above.

Yeah, we will clear the tuple slot before copy to be on the safer side.

> >      /*
> >       * If the source slot is of a different kind, or is a buffer slot
> > that has
> >       * been materialized / is virtual, make a new copy of the tuple.
> > Otherwise
> >       * make a new reference to the in-buffer tuple.
> >       */
> >      if (dstslot->tts_ops != srcslot->tts_ops ||
> >          TTS_SHOULDFREE(srcslot) ||
> >          !bsrcslot->base.tuple)
> >      {
> >          MemoryContext oldContext;
> >
> >          ExecClearTuple(dstslot);
> >      }
> >      else
> >      {
> >          Assert(BufferIsValid(bsrcslot->buffer));
> >
> >          tts_buffer_heap_store_tuple(dstslot, bsrcslot->base.tuple,
> >                                      bsrcslot->buffer, false);
> >
> >  > - flushed -> why is this a stored boolean? isn't this indirectly encoded
> >  > by cur_slots/cur_size == 0?
> >
> > Note that cur_slots is in HeapMultiInsertState and outside of the new
> > APIs i.e. in TableInsertState, mistate is a void pointer, and we can't
> > really access the cur_slots. I mean, we can access but we need to be
> > dereferencing using the tableam kind. Instead of  doing all of that, to
> > keep the API cleaner, I chose to have a boolean in the TableInsertState
> > which we can see and use outside of the new APIs. Hope that's fine.
> >
> So you mean the flushed variable is actually there to tell the user of
> the API that they are supposed to call flush before end? Why can't the
> end call flush itself then? I guess I completely misunderstood the
> purpose of table_multi_insert_flush being public. I had assumed it is
> there to from the usage site indicate that now would be a good time to
> flush, e.g. because of a statement ending or something. I had not
> understood this is a requirement that its always required to do
> table_multi_insert_flush + table_insert_end.
> IMHO I would hide this from the callee, given that you would only really
> call flush yourself when you immediately after would call end, or are
> there other cases where one would be required to explicitly call flush?

We need to know outside the multi_insert API whether the buffered
slots in case of multi inserts are flushed. Reason is that if we have
indexes or after row triggers, currently we call ExecInsertIndexTuples
or ExecARInsertTriggers on the buffered slots outside the API in a
loop after the flush.

If we agree on removing heap_multi_insert_v2 API and embed that logic
inside heap_insert_v2, then we can do this - pass the required
information and the functions ExecInsertIndexTuples and
ExecARInsertTriggers as callbacks so that, whether or not
heap_insert_v2 choses single or multi inserts, it can callback these
functions with the required information passed after the flush. We can
add the callback and required information into TableInsertState. But,
I'm not quite sure, we would make ExecInsertIndexTuples and
ExecARInsertTriggers. And in

If we don't want to go with callback way, then at least we need to
know whether or not heap_insert_v2 has chosen multi inserts, if yes,
the buffered slots array, and the number of current buffered slots,
whether they are flushed or not in the TableInsertState. Then,
eventually, we might need all the HeapMultiInsertState info in the
TableInsertState.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Amit Kapila
Date: 06 January 2021, 15:58:51
Subject: Re: Track replica origin progress for Rollback Prepared

From: "k.jamison@fujitsu.com"
Date: 06 January 2021, 16:13:03
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist

Re: New Table Access Methods for Multi and Single Inserts - Mailing list pgsql-hackers

Previous

Next