Re: New Table Access Methods for Multi and Single Inserts - Mailing list pgsql-hackers

From Luc Vlaming
Subject Re: New Table Access Methods for Multi and Single Inserts
Date
Msg-id 96eaa813-4ad6-e80a-04a4-cc8082d356dd@swarm64.com
Whole thread Raw
In response to Re: New Table Access Methods for Multi and Single Inserts  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: New Table Access Methods for Multi and Single Inserts  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Re: New Table Access Methods for Multi and Single Inserts  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On 28-12-2020 13:48, Bharath Rupireddy wrote:
> On Fri, Dec 25, 2020 at 8:10 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>> On Thu, Dec 24, 2020 at 05:48:42AM +0530, Bharath Rupireddy wrote:
>>> I'm not posting the updated 0002 to 0004 patches, I plan to do so
>>> after a couple of reviews happen on the design of the APIs in 0001.
>>>
>>> Thoughts?
>>
>> Are you familiar with this work ?
>>
>> https://commitfest.postgresql.org/31/2717/
>> Reloptions for table access methods
>>
>> It seems like that can be relevant for your patch, and I think some of what
>> your patch needs might be provided by AM opts.
>>
>> It's difficult to generalize AMs when we have only one, but your use-case might
>> be a concrete example which would help to answer some questions on the other
>> thread.
>>
>> @Jeff: https://commitfest.postgresql.org/31/2871/
> 
> Note that I have not gone through the entire thread at [1]. On some
> initial study, that patch is proposing to allow different table AMs to
> have custom rel options.
> 
> In the v2 patch that I sent upthread [2] for new table AMs has heap AM
> multi insert code moved inside the new heap AM implementation and I
> don't see any need of having rel options. In case, any other AMs want
> to have the control for their multi insert API implementation via rel
> options, I think the proposal at [1] can be useful.
> 
> 
> Thoughts?
> 
> [1] - https://commitfest.postgresql.org/31/2717/
> [2] -
https://www.postgresql.org/message-id/CALj2ACWMnZZCu%3DG0PJkEeYYicKeuJ-X%3DSU19i6vQ1%2B%3DuXz8u0Q%40mail.gmail.com
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
> 
Hi,

 > IIUC, there's no dependency or anything as such for the new table AM
 > patch with the rel options thread [1]. If I'm right, can this new
 > table AM patch [2] be reviewed further?

To me this seems good enough. Reason is that I anticipate that there 
would not necessarily be per-table options for now but rather global 
options, if any. Moreover, if we want to make these kind of tradeoffs 
user-controllable I would argue this should be done in a different 
patch-set either way. Reason is that there are parameters in heap 
already that are computed / hardcoded as well (see e.g. 
RelationAddExtraBlocks).

===

As to the patches themselves:

I think the API is a huge step forward! I assume that we want to have a 
single-insert API like heap_insert_v2 so that we can encode the 
knowledge that there will just be a single insert coming and likely a 
commit afterwards?

Reason I'm asking is that I quite liked the heap_insert_begin parameter 
is_multi, which could even be turned into a "expected_rowcount" of the 
amount of rows expected to be commited in the transaction (e.g. single, 
several, thousands/stream).
If we were to make the API based on expected rowcounts, the whole 
heap_insert_v2, heap_insert and heap_multi_insert could be turned into a 
single function heap_insert, as the knowledge about buffering of the 
slots is then already stored in the TableInsertState, creating an API like:

// expectedRows: -1 = streaming, otherwise expected rowcount.
TableInsertState* heap_insert_begin(Relation rel, CommandId cid, int 
options, int expectedRows);
heap_insert(TableInsertState *state, TupleTableSlot *slot);

Do you think that's a good idea?

Two smaller things I'm wondering:
- the clear_mi_slots; why is this not in the HeapMultiInsertState? the 
slots themselves are declared there? also, the boolean themselves is 
somewhat problematic I think because it would only work if you specified 
is_multi=true which would depend on the actual tableam to implement this 
then in a way that copy/ctas/etc can also use the slot properly, which I 
think would severely limit their freedom to store the slots more 
efficiently? Also, why do we want to do ExecClearTuple() anyway? Isn't 
it good enough that the next call to ExecCopySlot will effectively clear 
it out?
- flushed -> why is this a stored boolean? isn't this indirectly encoded 
by cur_slots/cur_size == 0?

For patches 02-04 I quickly skimmed through them as I assume we first 
want the API agreed upon. Generally they look nice and like a big step 
forward. What I'm just wondering about is the usage of the 
implementation details like mistate->slots[X]. It makes a lot of sense 
to do so but also makes for a difficult compromise, because now the 
tableam has to guarantee a copy of the slot, and hopefully even one in a 
somewhat efficient form.

Kind regards,
Luc



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: zstd compression for pg_dump
Next
From: Luc Vlaming
Date:
Subject: Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW