Re: New Table Access Methods for Multi and Single Inserts - Mailing list pgsql-hackers
From | Luc Vlaming |
---|---|
Subject | Re: New Table Access Methods for Multi and Single Inserts |
Date | |
Msg-id | 96eaa813-4ad6-e80a-04a4-cc8082d356dd@swarm64.com Whole thread Raw |
In response to | Re: New Table Access Methods for Multi and Single Inserts (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: New Table Access Methods for Multi and Single Inserts
Re: New Table Access Methods for Multi and Single Inserts |
List | pgsql-hackers |
On 28-12-2020 13:48, Bharath Rupireddy wrote: > On Fri, Dec 25, 2020 at 8:10 AM Justin Pryzby <pryzby@telsasoft.com> wrote: >> On Thu, Dec 24, 2020 at 05:48:42AM +0530, Bharath Rupireddy wrote: >>> I'm not posting the updated 0002 to 0004 patches, I plan to do so >>> after a couple of reviews happen on the design of the APIs in 0001. >>> >>> Thoughts? >> >> Are you familiar with this work ? >> >> https://commitfest.postgresql.org/31/2717/ >> Reloptions for table access methods >> >> It seems like that can be relevant for your patch, and I think some of what >> your patch needs might be provided by AM opts. >> >> It's difficult to generalize AMs when we have only one, but your use-case might >> be a concrete example which would help to answer some questions on the other >> thread. >> >> @Jeff: https://commitfest.postgresql.org/31/2871/ > > Note that I have not gone through the entire thread at [1]. On some > initial study, that patch is proposing to allow different table AMs to > have custom rel options. > > In the v2 patch that I sent upthread [2] for new table AMs has heap AM > multi insert code moved inside the new heap AM implementation and I > don't see any need of having rel options. In case, any other AMs want > to have the control for their multi insert API implementation via rel > options, I think the proposal at [1] can be useful. > > > Thoughts? > > [1] - https://commitfest.postgresql.org/31/2717/ > [2] - https://www.postgresql.org/message-id/CALj2ACWMnZZCu%3DG0PJkEeYYicKeuJ-X%3DSU19i6vQ1%2B%3DuXz8u0Q%40mail.gmail.com > > With Regards, > Bharath Rupireddy. > EnterpriseDB: http://www.enterprisedb.com > Hi, > IIUC, there's no dependency or anything as such for the new table AM > patch with the rel options thread [1]. If I'm right, can this new > table AM patch [2] be reviewed further? To me this seems good enough. Reason is that I anticipate that there would not necessarily be per-table options for now but rather global options, if any. Moreover, if we want to make these kind of tradeoffs user-controllable I would argue this should be done in a different patch-set either way. Reason is that there are parameters in heap already that are computed / hardcoded as well (see e.g. RelationAddExtraBlocks). === As to the patches themselves: I think the API is a huge step forward! I assume that we want to have a single-insert API like heap_insert_v2 so that we can encode the knowledge that there will just be a single insert coming and likely a commit afterwards? Reason I'm asking is that I quite liked the heap_insert_begin parameter is_multi, which could even be turned into a "expected_rowcount" of the amount of rows expected to be commited in the transaction (e.g. single, several, thousands/stream). If we were to make the API based on expected rowcounts, the whole heap_insert_v2, heap_insert and heap_multi_insert could be turned into a single function heap_insert, as the knowledge about buffering of the slots is then already stored in the TableInsertState, creating an API like: // expectedRows: -1 = streaming, otherwise expected rowcount. TableInsertState* heap_insert_begin(Relation rel, CommandId cid, int options, int expectedRows); heap_insert(TableInsertState *state, TupleTableSlot *slot); Do you think that's a good idea? Two smaller things I'm wondering: - the clear_mi_slots; why is this not in the HeapMultiInsertState? the slots themselves are declared there? also, the boolean themselves is somewhat problematic I think because it would only work if you specified is_multi=true which would depend on the actual tableam to implement this then in a way that copy/ctas/etc can also use the slot properly, which I think would severely limit their freedom to store the slots more efficiently? Also, why do we want to do ExecClearTuple() anyway? Isn't it good enough that the next call to ExecCopySlot will effectively clear it out? - flushed -> why is this a stored boolean? isn't this indirectly encoded by cur_slots/cur_size == 0? For patches 02-04 I quickly skimmed through them as I assume we first want the API agreed upon. Generally they look nice and like a big step forward. What I'm just wondering about is the usage of the implementation details like mistate->slots[X]. It makes a lot of sense to do so but also makes for a difficult compromise, because now the tableam has to guarantee a copy of the slot, and hopefully even one in a somewhat efficient form. Kind regards, Luc
pgsql-hackers by date: