Re: POC: postgres_fdw insert batching - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: POC: postgres_fdw insert batching
Date
Msg-id 6929d485-2d2a-da46-3681-4a400a3d794f@enterprisedb.com
Whole thread Raw
In response to Re: POC: postgres_fdw insert batching  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: POC: postgres_fdw insert batching
Re: POC: postgres_fdw insert batching
List pgsql-hackers
On 2/17/21 9:51 AM, Amit Langote wrote:
> On Wed, Feb 17, 2021 at 5:46 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> On Wed, Feb 17, 2021 at 12:04 AM Tomas Vondra
>> <tomas.vondra@enterprisedb.com> wrote:
>>> On 2/16/21 10:25 AM, Amit Langote wrote:
>>>> On Tue, Feb 16, 2021 at 1:36 AM Tomas Vondra
>>>>> Thanks for the patch, it seems fine to me.
>>>>
>>>> Thanks for checking.
>>>>
>>>>> I wonder it the commit
>>>>> message needs some tweaks, though. At the moment it says:
>>>>>
>>>>>      Prevent FDW insert batching during cross-partition updates
>>>>>
>>>>> but what the patch seems to be doing is simply initializing the info
>>>>> only for CMD_INSERT operations. Which does the trick, but it affects
>>>>> everything, i.e. all updates, no? Not just cross-partition updates.
>>>>
>>>> You're right.  Please check the message in the updated patch.
>>>
>>> Thanks. I'm not sure I understand what "FDW may not be able to handle
>>> both the original update operation and the batched insert operation
>>> being performed at the same time" means. I mean, if we translate the
>>> UPDATE into DELETE+INSERT, then we don't run both the update and insert
>>> at the same time, right? What exactly is the problem with allowing
>>> batching for inserts in cross-partition updates?
>>
>> Sorry, I hadn't shared enough details of my investigations when I
>> originally ran into this.  Such as that I had considered implementing
>> the use of batching for these inserts too but had given up.
>>
>> Now that you mention it, I think I gave a less convincing reason for
>> why we should avoid doing it at all.  Maybe it would have been more
>> right to say that it is the core code, not necessarily the FDWs, that
>> currently fails to deal with the use of batching by the insert
>> component of a cross-partition update.  Those failures could be
>> addressed as I'll describe below.
>>
>> For postgres_fdw, postgresGetForeignModifyBatchSize() could be taught
>> to simply use the PgFdwModifyTable that is installed to handle the
>> insert component of a cross-partition update (one can get that one via
>> aux_fmstate field of the original PgFdwModifyState).  However, even
>> though that's fine for postgres_fdw to do, what worries (had worried)
>> me is that it also results in scribbling on ri_BatchSize that the core
>> code may see to determine what to do with a particular tuple, and I
>> just have to hope that nodeModifyTable.c doesn't end up doing anything
>> unwarranted with the original update based on seeing a non-zero
>> ri_BatchSize.  AFAICS, we are fine on that front.
>>
>> That said, there are some deficiencies in the code that have to be
>> addressed before we can let postgres_fdw do as mentioned above.  For
>> example, the code in ExecModifyTable() that runs after breaking out of
>> the loop to insert any remaining batched tuples appears to miss the
>> tuples batched by such inserts.   Apparently, that is because the
>> ResultRelInfos used by those inserts are not present in
>> es_tuple_routing_result_relations.  Turns out I had forgotten that
>> execPartition.c doesn't add the ResultRelInfos to that list if they
>> are made by ExecInitModifyTable() for the original update operation
>> and simply reused by ExecFindPartition() when tuples were routed to
>> those partitions.  It can be "fixed" by reverting to the original
>> design in Tsunakawa-san's patch where the tuple routing result
>> relations were obtained from the PartitionTupleRouting data structure,
>> which fortunately stores all tuple routing result relations.  (Sorry,
>> I gave wrong advice in [1] in retrospect.)
>>
>>> On a closer look, it seems the problem actually lies in a small
>>> inconsistency between create_foreign_modify and ExecInitRoutingInfo. The
>>> former only set batch_size for CMD_INSERT while the latter called the
>>> BatchSize() for all operations, expecting >= 1 result. So we may either
>>> relax create_foreign_modify and set batch_size for all DML, or make
>>> ExecInitRoutingInfo stricter (which is what the patches here do).
>>
>> I think we should be fine if we make
>> postgresGetForeignModifyBatchSize() use the correct PgFdwModifyState
>> as described above.  We can be sure that we are not mixing the
>> information used by the batched insert with that of the original
>> unbatched update.
>>
>>> Is there a reason not to do the first thing, allowing batching of
>>> inserts during cross-partition updates? I tried to do that, but it
>>> dawned on me that we can't mix batched and un-batched operations, e.g.
>>> DELETE + INSERT, because that'd break the order of execution, leading to
>>> bogus results in case the same row is modified repeatedly, etc.
>>
>> Actually, postgres_fdw only supports moving a row into a partition (as
>> part of a cross-partition update that is) if it has already finished
>> performing any updates on it.   So there is no worry of rows that are
>> moved into a partition subsequently getting updated due to the
>> original command.
>>
>> The attached patch implements the changes necessary to make these
>> inserts use batching too.
>>
>> [1] https://www.postgresql.org/message-id/CA%2BHiwqEbnhwVJMsukTP-S9Kv1ynC7Da3yuqSPZC0Y7oWWOwoHQ%40mail.gmail.com
> 
> Oops, I had mistakenly not hit "Reply All".  Attaching the patch again.
> 

Thanks. The patch seems reasonable, but it's a bit too large for a fix,
so I'll go ahead and push one of the previous fixes restricting batching
to plain INSERT commands. But this seems useful, so I suggest adding it
to the next commitfest.

One thing that surprised me is that we only move the rows *to* the
foreign partition, not from it (even on pg13, i.e. before the batching
etc.). I mean, using the example you posted earlier, with one foreign
and one local partition, consider this:

  delete from p;
  insert into p values (2);

  test=# select * from p2;
   a
  ---
   2
  (1 row)

  test=# update p set a = 1;
  UPDATE 1

  test=# select * from p1;
   a
  ---
   1
  (1 row)

OK, so it was moved to the foreign partition, which is for rows with
value in (1). So far so good. Let's do another update:

  test=# update p set a = 2;
  UPDATE 1
  test=# select * from p1;
   a
  ---
   2
  (1 row)

So now the foreign partition contains value (2), which is however wrong
with respect to the partitioning rules - this should be in p2, the local
partition. This however causes pretty annoying issue:

test=# explain analyze select * from p where a = 2;

                            QUERY PLAN
  ---------------------------------------------------------------
   Seq Scan on p2 p  (cost=0.00..41.88 rows=13 width=4)
                     (actual  time=0.024..0.028 rows=0 loops=1)
     Filter: (a = 2)
   Planning Time: 0.355 ms
   Execution Time: 0.089 ms
  (4 rows)

That is, we fail to find the row, because we eliminate the partition.

Now, maybe this is expected, but it seems like a rather mind-boggling
violation of POLA principle. I've checked if postgres_fdw mentions this
somewhere, but all I see about row movement is this:

   Note also that postgres_fdw supports row movement invoked by UPDATE
   statements executed on partitioned tables, but it currently does not
   handle the case where a remote partition chosen to insert a moved row
   into is also an UPDATE target partition that will be updated later.

and if I understand that correctly, that's about something different.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: documentation fix for SET ROLE
Next
From: Joe Conway
Date:
Subject: Re: documentation fix for SET ROLE