Re: Multi Inserts in CREATE TABLE AS - revived patch - Mailing list pgsql-hackers

From Luc Vlaming
Subject Re: Multi Inserts in CREATE TABLE AS - revived patch
Date
Msg-id ca3dd08f-4ce0-01df-ba30-e9981bb0d54e@swarm64.com
Whole thread Raw
In response to Re: Multi Inserts in CREATE TABLE AS - revived patch  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Multi Inserts in CREATE TABLE AS - revived patch  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On 26-11-2020 07:31, Bharath Rupireddy wrote:
> On Thu, Nov 26, 2020 at 9:55 AM Michael Paquier <michael@paquier.xyz> wrote:
>>
>> +inline Size
>> +GetTupleSize(TupleTableSlot *slot, Size maxsize)
>> +{
>> +   Size sz = 0;
>> +   HeapTuple tuple = NULL;
>> +
>> +   if (TTS_IS_HEAPTUPLE(slot))
>> +       tuple = ((HeapTupleTableSlot *) slot)->tuple;
>> +   else if(TTS_IS_BUFFERTUPLE(slot))
>> +       tuple = ((BufferHeapTupleTableSlot *) slot)->base.tuple;
>> +   else if(TTS_IS_MINIMALTUPLE(slot))
>> +       tuple = ((MinimalTupleTableSlot *) slot)->tuple;
>>
>> There have been various talks about the methods we could use to
>> evaluate the threshold in bytes when evaluating that a flush can
>> happen, including the use of memory contexts, or even estimate the
>> size of the number of tuples.  This one looks promising because it
>> seems exact, however for virtual slots I don't like much the fact that
>> you basically just extracted the parts of tts_virtual_materialize()
>> and stuck them in this routine.  That's a recipe for future bugs if
>> the materialization logic changes.  In short, I am surprised that this
>> calculation is not directly part of TupleTableSlotOps.  What we'd want
>> is to get this information depending on the slot type dealt with, and
>> with your patch you would miss to handle any new slot type
>> introduced.
>>
> 
> Yes for virtual slots, I reused the code from
> tts_virtual_materialize() in GetTupleSize(). I can think of below
> options:
> 
> 1) Make the size calculation code for virtual slots, a macro or a
> static inline function and use that in tts_virtual_materialize() and
> GetTupleSize().
> 2) Add comments in both the places, such as "if any code is changed
> here, consider changing it in tts_virtual_materialize() /
> GetTupleSize()"
> 3) Add a size variable to TupleTableSlotOps structure.
> 4) Add a new API to TupleTableSlotOps structure say get_slot_size().
> 5) For new slot types, maybe we can have comments in tuptable.h to
> consider having equivalent change in GetTupleSize().
> 
> If we go with 3 and 4, will it be acceptable to add the extra code in
> generic structure which gets used in most of the code base and use
> that new code only in limited places (for multi inserts in CTAS and
> Refresh Mat View)? I think we can go ahead with 2 and 5. Thoughts?
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
> 

What I'm wondering about is the reason for wanting a cap on data volume. 
When doing some local (highly concurrent) ingest speed tests a few weeks 
ago it seemed to mostly matter how many pages were being written and the 
resulting pressure on locks, etc. and not necessarily so much the actual 
memory usage. I didn't collect proof on that though (yet). There was 
however a very clearly observable contention point where with bigger 
buffers the performance would not only stagnate but actually drop.

So what I'm kinda wondering is if we should worry more about the amount 
of pages that are going to be written and maybe not so much about the 
memory usage?

If this were to be the case then maybe we can consider improving the 
current design, potentially in a follow-up patch? The problem I see is 
that generically each tableam will have different choices to make on how 
to buffer and flush multiple rows, given that a storage engine might 
have more or less write amplification, a different way of extending a 
relation, fsm use, etc.
Assuming we indeed want a per-tableam implementation, we could either:
- make multi_insert buffer the tuples itself and add a flush_multi_insert.
- add a new function called create_multi_insert which returns something 
like a MultiInsertState, which, like a destreceiver, has a set of 
callbacks to start, shutdown and insert.

With both solutions one part that to me seems appealing is that we 
buffer the data in something that likely resembles the disk format very 
much. Thoughts?

Regards,
Luc
Swarm64



pgsql-hackers by date:

Previous
From: Amit Khandekar
Date:
Subject: Re: Improving spin-lock implementation on ARM.
Next
From: Masahiro Ikeda
Date:
Subject: Re: Add statistics to pg_stat_wal view for wal related parameter tuning