Re: New Table Access Methods for Multi and Single Inserts - Mailing list pgsql-hackers

From Andres Freund
Subject Re: New Table Access Methods for Multi and Single Inserts
Date
Msg-id 20230603223824.o7iyochli2dwwi7k@alap3.anarazel.de
Whole thread Raw
In response to Re: New Table Access Methods for Multi and Single Inserts  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: New Table Access Methods for Multi and Single Inserts  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Re: New Table Access Methods for Multi and Single Inserts  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
Hi,

This patch was referenced in a discussion at pgcon, so I thought I'd give it a
look, even though Bharat said that he won't have time to drive it forward...


On 2021-04-19 10:21:36 +0530, Bharath Rupireddy wrote:
> diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
> index bd5faf0c1f..655de8e6b7 100644
> --- a/src/backend/access/heap/heapam_handler.c
> +++ b/src/backend/access/heap/heapam_handler.c
> @@ -2558,6 +2558,11 @@ static const TableAmRoutine heapam_methods = {
>      .tuple_insert_speculative = heapam_tuple_insert_speculative,
>      .tuple_complete_speculative = heapam_tuple_complete_speculative,
>      .multi_insert = heap_multi_insert,
> +    .tuple_insert_begin = heap_insert_begin,
> +    .tuple_insert_v2 = heap_insert_v2,
> +    .multi_insert_v2 = heap_multi_insert_v2,
> +    .multi_insert_flush = heap_multi_insert_flush,
> +    .tuple_insert_end = heap_insert_end,
>      .tuple_delete = heapam_tuple_delete,
>      .tuple_update = heapam_tuple_update,
>      .tuple_lock = heapam_tuple_lock,

I don't think we should have multiple callback for the insertion APIs in
tableam.h. I think it'd be good to continue supporting the old table_*()
functions, but supporting multiple insert APIs in each AM doesn't make much
sense to me.


> +/*
> + * GetTupleSize - Compute the tuple size given a table slot.
> + *
> + * For heap tuple, buffer tuple and minimal tuple slot types return the actual
> + * tuple size that exists. For virtual tuple, the size is calculated as the
> + * slot does not have the tuple size. If the computed size exceeds the given
> + * maxsize for the virtual tuple, this function exits, not investing time in
> + * further unnecessary calculation.
> + *
> + * Important Notes:
> + * 1) Size calculation code for virtual slots is being used from
> + *       tts_virtual_materialize(), hence ensure to have the same changes or fixes
> + *       here and also there.
> + * 2) Currently, GetTupleSize() handles the existing heap, buffer, minimal and
> + *       virtual slots. Ensure to add related code in case any new slot type is
> + *    introduced.
> + */
> +inline Size
> +GetTupleSize(TupleTableSlot *slot, Size maxsize)
> +{
> +    Size sz = 0;
> +    HeapTuple tuple = NULL;
> +
> +    if (TTS_IS_HEAPTUPLE(slot))
> +        tuple = ((HeapTupleTableSlot *) slot)->tuple;
> +    else if(TTS_IS_BUFFERTUPLE(slot))
> +        tuple = ((BufferHeapTupleTableSlot *) slot)->base.tuple;
> +    else if(TTS_IS_MINIMALTUPLE(slot))
> +        tuple = ((MinimalTupleTableSlot *) slot)->tuple;
> +    else if(TTS_IS_VIRTUAL(slot))

I think this embeds too much knowledge of the set of slot types in core
code. I don't see why it's needed either?


> diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
> index 414b6b4d57..2a1470a7b6 100644
> --- a/src/include/access/tableam.h
> +++ b/src/include/access/tableam.h
> @@ -229,6 +229,32 @@ typedef struct TM_IndexDeleteOp
>      TM_IndexStatus *status;
>  } TM_IndexDeleteOp;
>
> +/* Holds table insert state. */
> +typedef struct TableInsertState

I suspect we should design it to be usable for updates and deletes in the
future, and thus name it TableModifyState.



> +{
> +    Relation    rel;
> +    /* Bulk insert state if requested, otherwise NULL. */
> +    struct BulkInsertStateData    *bistate;
> +    CommandId    cid;

Hm - I'm not sure it's a good idea to force the cid to be the same for all
inserts done via one TableInsertState.



> +    int    options;
> +    /* Below members are only used for multi inserts. */
> +    /* Array of buffered slots. */
> +    TupleTableSlot    **mi_slots;
> +    /* Number of slots that are currently buffered. */
> +    int32    mi_cur_slots;

> +    /*
> +     * Access method specific information such as parameters that are needed
> +     * for buffering and flushing decisions can go here.
> +     */
> +    void    *mistate;

I think we should instead have a generic TableModifyState, which each AM then
embeds into an AM specific AM state. Forcing two very related structs to be
allocated separately doesn't seem wise in this case.



> @@ -1430,6 +1473,50 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
>                                    cid, options, bistate);
>  }
>
> +static inline TableInsertState*
> +table_insert_begin(Relation rel, CommandId cid, int options,
> +                   bool alloc_bistate, bool is_multi)

Why have alloc_bistate and options?


> +static inline void
> +table_insert_end(TableInsertState *state)
> +{
> +    /* Deallocate bulk insert state here, since it's AM independent. */
> +    if (state->bistate)
> +        FreeBulkInsertState(state->bistate);
> +
> +    state->rel->rd_tableam->tuple_insert_end(state);
> +}

Seems like the order in here should be swapped?


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [PATCH] Slight improvement of worker_spi.c example
Next
From: Michael Paquier
Date:
Subject: Re: Avoid unused value (src/fe_utils/print.c)