Re: POC: postgres_fdw insert batching - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: POC: postgres_fdw insert batching |
Date | |
Msg-id | 6929d485-2d2a-da46-3681-4a400a3d794f@enterprisedb.com Whole thread Raw |
In response to | Re: POC: postgres_fdw insert batching (Amit Langote <amitlangote09@gmail.com>) |
Responses |
Re: POC: postgres_fdw insert batching
Re: POC: postgres_fdw insert batching |
List | pgsql-hackers |
On 2/17/21 9:51 AM, Amit Langote wrote: > On Wed, Feb 17, 2021 at 5:46 PM Amit Langote <amitlangote09@gmail.com> wrote: >> On Wed, Feb 17, 2021 at 12:04 AM Tomas Vondra >> <tomas.vondra@enterprisedb.com> wrote: >>> On 2/16/21 10:25 AM, Amit Langote wrote: >>>> On Tue, Feb 16, 2021 at 1:36 AM Tomas Vondra >>>>> Thanks for the patch, it seems fine to me. >>>> >>>> Thanks for checking. >>>> >>>>> I wonder it the commit >>>>> message needs some tweaks, though. At the moment it says: >>>>> >>>>> Prevent FDW insert batching during cross-partition updates >>>>> >>>>> but what the patch seems to be doing is simply initializing the info >>>>> only for CMD_INSERT operations. Which does the trick, but it affects >>>>> everything, i.e. all updates, no? Not just cross-partition updates. >>>> >>>> You're right. Please check the message in the updated patch. >>> >>> Thanks. I'm not sure I understand what "FDW may not be able to handle >>> both the original update operation and the batched insert operation >>> being performed at the same time" means. I mean, if we translate the >>> UPDATE into DELETE+INSERT, then we don't run both the update and insert >>> at the same time, right? What exactly is the problem with allowing >>> batching for inserts in cross-partition updates? >> >> Sorry, I hadn't shared enough details of my investigations when I >> originally ran into this. Such as that I had considered implementing >> the use of batching for these inserts too but had given up. >> >> Now that you mention it, I think I gave a less convincing reason for >> why we should avoid doing it at all. Maybe it would have been more >> right to say that it is the core code, not necessarily the FDWs, that >> currently fails to deal with the use of batching by the insert >> component of a cross-partition update. Those failures could be >> addressed as I'll describe below. >> >> For postgres_fdw, postgresGetForeignModifyBatchSize() could be taught >> to simply use the PgFdwModifyTable that is installed to handle the >> insert component of a cross-partition update (one can get that one via >> aux_fmstate field of the original PgFdwModifyState). However, even >> though that's fine for postgres_fdw to do, what worries (had worried) >> me is that it also results in scribbling on ri_BatchSize that the core >> code may see to determine what to do with a particular tuple, and I >> just have to hope that nodeModifyTable.c doesn't end up doing anything >> unwarranted with the original update based on seeing a non-zero >> ri_BatchSize. AFAICS, we are fine on that front. >> >> That said, there are some deficiencies in the code that have to be >> addressed before we can let postgres_fdw do as mentioned above. For >> example, the code in ExecModifyTable() that runs after breaking out of >> the loop to insert any remaining batched tuples appears to miss the >> tuples batched by such inserts. Apparently, that is because the >> ResultRelInfos used by those inserts are not present in >> es_tuple_routing_result_relations. Turns out I had forgotten that >> execPartition.c doesn't add the ResultRelInfos to that list if they >> are made by ExecInitModifyTable() for the original update operation >> and simply reused by ExecFindPartition() when tuples were routed to >> those partitions. It can be "fixed" by reverting to the original >> design in Tsunakawa-san's patch where the tuple routing result >> relations were obtained from the PartitionTupleRouting data structure, >> which fortunately stores all tuple routing result relations. (Sorry, >> I gave wrong advice in [1] in retrospect.) >> >>> On a closer look, it seems the problem actually lies in a small >>> inconsistency between create_foreign_modify and ExecInitRoutingInfo. The >>> former only set batch_size for CMD_INSERT while the latter called the >>> BatchSize() for all operations, expecting >= 1 result. So we may either >>> relax create_foreign_modify and set batch_size for all DML, or make >>> ExecInitRoutingInfo stricter (which is what the patches here do). >> >> I think we should be fine if we make >> postgresGetForeignModifyBatchSize() use the correct PgFdwModifyState >> as described above. We can be sure that we are not mixing the >> information used by the batched insert with that of the original >> unbatched update. >> >>> Is there a reason not to do the first thing, allowing batching of >>> inserts during cross-partition updates? I tried to do that, but it >>> dawned on me that we can't mix batched and un-batched operations, e.g. >>> DELETE + INSERT, because that'd break the order of execution, leading to >>> bogus results in case the same row is modified repeatedly, etc. >> >> Actually, postgres_fdw only supports moving a row into a partition (as >> part of a cross-partition update that is) if it has already finished >> performing any updates on it. So there is no worry of rows that are >> moved into a partition subsequently getting updated due to the >> original command. >> >> The attached patch implements the changes necessary to make these >> inserts use batching too. >> >> [1] https://www.postgresql.org/message-id/CA%2BHiwqEbnhwVJMsukTP-S9Kv1ynC7Da3yuqSPZC0Y7oWWOwoHQ%40mail.gmail.com > > Oops, I had mistakenly not hit "Reply All". Attaching the patch again. > Thanks. The patch seems reasonable, but it's a bit too large for a fix, so I'll go ahead and push one of the previous fixes restricting batching to plain INSERT commands. But this seems useful, so I suggest adding it to the next commitfest. One thing that surprised me is that we only move the rows *to* the foreign partition, not from it (even on pg13, i.e. before the batching etc.). I mean, using the example you posted earlier, with one foreign and one local partition, consider this: delete from p; insert into p values (2); test=# select * from p2; a --- 2 (1 row) test=# update p set a = 1; UPDATE 1 test=# select * from p1; a --- 1 (1 row) OK, so it was moved to the foreign partition, which is for rows with value in (1). So far so good. Let's do another update: test=# update p set a = 2; UPDATE 1 test=# select * from p1; a --- 2 (1 row) So now the foreign partition contains value (2), which is however wrong with respect to the partitioning rules - this should be in p2, the local partition. This however causes pretty annoying issue: test=# explain analyze select * from p where a = 2; QUERY PLAN --------------------------------------------------------------- Seq Scan on p2 p (cost=0.00..41.88 rows=13 width=4) (actual time=0.024..0.028 rows=0 loops=1) Filter: (a = 2) Planning Time: 0.355 ms Execution Time: 0.089 ms (4 rows) That is, we fail to find the row, because we eliminate the partition. Now, maybe this is expected, but it seems like a rather mind-boggling violation of POLA principle. I've checked if postgres_fdw mentions this somewhere, but all I see about row movement is this: Note also that postgres_fdw supports row movement invoked by UPDATE statements executed on partitioned tables, but it currently does not handle the case where a remote partition chosen to insert a moved row into is also an UPDATE target partition that will be updated later. and if I understand that correctly, that's about something different. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: