Re: POC: postgres_fdw insert batching - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: POC: postgres_fdw insert batching |
Date | |
Msg-id | CA+HiwqEbnhwVJMsukTP-S9Kv1ynC7Da3yuqSPZC0Y7oWWOwoHQ@mail.gmail.com Whole thread Raw |
In response to | Re: POC: postgres_fdw insert batching (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: POC: postgres_fdw insert batching
|
List | pgsql-hackers |
Hi, On Thu, Jan 14, 2021 at 2:41 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > On 1/13/21 3:43 PM, Tomas Vondra wrote: > > Thanks for the report. Yeah, I think there's a missing check in > > ExecInsert. Adding > > > > (!resultRelInfo->ri_TrigDesc->trig_insert_after_row) > > > > solves this. But now I'm wondering if this is the wrong place to make > > this decision. I mean, why should we make the decision here, when the > > decision whether to have a RETURNING clause is made in postgres_fdw in > > deparseReturningList? We don't really know what the other FDWs will do, > > for example. > > > > So I think we should just move all of this into GetModifyBatchSize. We > > can start with ri_BatchSize = 0. And then do > > > > if (resultRelInfo->ri_BatchSize == 0) > > resultRelInfo->ri_BatchSize = > > resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo); > > > > if (resultRelInfo->ri_BatchSize > 1) > > { > > ... do batching ... > > } > > > > The GetModifyBatchSize would always return value > 0, so either 1 (no > > batching) or >1 (batching). > > > > FWIW the attached v8 patch does this - most of the conditions are moved > to the GetModifyBatchSize() callback. Thanks. A few comments: * I agree with leaving it up to an FDW to look at the properties of the table and of the operation being performed to decide whether or not to use batching, although maybe BeginForeignModify() is a better place for putting that logic instead of GetModifyBatchSize()? So, in create_foreign_modify(), instead of PgFdwModifyState.batch_size simply being set to match the table's or the server's value for the batch_size option, make it also consider the things that prevent batching and set the execution state's batch_size based on that. GetModifyBatchSize() simply returns that value. * Regarding the timing of calling GetModifyBatchSize() to set ri_BatchSize, I wonder if it wouldn't be better to call it just once, say from ExecInitModifyTable(), right after BeginForeignModify() returns? I don't quite understand why it is being called from ExecInsert(). Can the batch size change once the execution starts? * Lastly, how about calling it GetForeignModifyBatchSize() to be consistent with other nearby callbacks? > I've removed the check for the > BatchInsert callback, though - the FDW knows whether it supports that, > and it seems a bit pointless at the moment as there are no other batch > callbacks. Maybe we should add an Assert somewhere, though? Hmm, not checking whether BatchInsert() exists may not be good idea, because if an FDW's GetModifyBatchSize() returns a value > 1 but there's no BatchInsert() function to call, ExecBatchInsert() would trip. I don't see the newly added documentation telling FDW authors to either define both or none. Regarding how this plays with partitions, I don't think we need ExecGetTouchedPartitions(), because you can get the routed-to partitions using es_tuple_routing_result_relations. Also, perhaps it's a good idea to put the "finishing" ExecBatchInsert() calls into a function ExecFinishBatchInsert(). Maybe the logic to choose the relations to perform the finishing calls on will get complicated in the future as batching is added for updates/deletes too and it seems better to encapsulate that in the separate function than have it out in the open in ExecModifyTable(). (Sorry about being so late reviewing this.) -- Amit Langote EDB: http://www.enterprisedb.com
pgsql-hackers by date: