Re: Multi Inserts in CREATE TABLE AS - revived patch - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Multi Inserts in CREATE TABLE AS - revived patch
Date
Msg-id CALj2ACV8_O651C2zUqrVSRFDJkp8=TMwSdG9+mDGL+vF6CD+AQ@mail.gmail.com
Whole thread Raw
In response to Re: Multi Inserts in CREATE TABLE AS - revived patch  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Multi Inserts in CREATE TABLE AS - revived patch  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
Hi,

Currently, required logic for multi inserts (such as buffer slots allocation, flushing, tuple size calculation to decide when to flush, cleanup and so on) is being handled outside of the existing tableam APIs. And there are a good number of cases where multi inserts can be used, such as for existing COPY or for CTAS, CREATE/REFRESH MATERIALIZED VIEW [proposed in this thread], and INSERT INTO SELECTs [here] which are currently under discussion. Handling the same multi inserts logic in many places is error prone and duplicates most of the code. To avoid this, proposing here are generic tableam APIs, that can be used in all the cases and which also gives the flexibility to tableam developers in implementing multi inserts logic dependent on the underlying storage engine[1].

I would like to seek thoughts/opinions on the proposed new APIs. Once reviewed, I will start implementing them.


Below are the proposed structures and APIs:

/* Holds the multi insert related information. */
typedef struct MultiInsertStateData
{
    /* A temporary memory context for multi insert. */
    MemoryContext         micontext;
    /* Bulk insert state. */
    BulkInsertStateData *bistate;
    /* Array of buffered slots. */
    TupleTableSlot      **mislots;
    /* Maximum number of slots that can be buffered. */
    int32              nslots;
    /* Number of slots that are currently buffered. */
    int32              nused;
    /*
     * Maximum total tuple size that can be buffered in
     * a single batch. Flush the buffered tuples if the
     * current total tuple size, nsize >= nbytes.
     */
    int64              nbytes;
    /*
     * Total tuple size in bytes of the slots that are
     * currently buffered.
     */
    int64              nsize;
    /*
     * Whether to clear the buffered slots content
     * after the flush? If the relation has indexes
     * or after row triggers, the buffered slots
     * required outside do_multi_insert() and clean
     * them using ExecClearTuple() outside the
     * do_multi_insert API. If true, do_multi_insert()
     * can clear the slots.
     */
    bool                clearslots;
    /*
     * If true, do_multi_insert will flush the buffered
     * slots, if any, bypassing the slot count and total
     * tuple size checks. This can be useful in cases,
     * where one of the partition can not use multi inserts
     * but others can and they have buffered few slots
     * so far, which need to be flushed for visibility,
     * before the partition that doesn't support can
     * proceed with single inserts.
     */
    bool                forceflush;
} MultiInsertStateData;

/*
 * Allocates and initializes the MultiInsertStateData. Creates a temporary
 * memory context for multi inserts, allocates BulkInsertStateData.
 */
void (*begin_multi_insert) (Relation rel,
                            MultiInsertStateData **mistate,
                            uint32 nslots,
                            uint64 nbytes);

/*
 * Buffers the input slot into mistate slots. Computes the size of the tuple,
 * and adds it to the total size of the buffered tuples. If this size crosses
 * nbytes, flush the buffered tuples into the table. Clear the buffered slots
 * content if clearslots is true. If nbytes i.e. the maximum total tuple size
 * of the buffered tuples is not given, the tuple size is not calculated,
 * tuples are buffered until all the nslots are filled and then flushed.
 *
 * For heapam, existing heap_multi_insert can be called using
 * rel->rd_tableam->multi_insert() for flushing.
 */
void (*do_multi_insert) (Relation rel,
                         struct MultiInsertStateData *mistate,
                         struct TupleTableSlot *slot,
                         CommandId cid,
                         int options);

/*
 * Flush the buffered tuples if any. Clear the buffered slots content if
 * clearslots is true. Deletes temporary memory context and deallocates
 * mistate.
 */
void (*end_multi_insert) (Relation rel,
                          struct MultiInsertStateData *mistate,
                          CommandId cid,
                          int options);

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Parallel Inserts in CREATE TABLE AS
Next
From: Craig Ringer
Date:
Subject: Re: Printing backtrace of postgres processes