Re: POC: postgres_fdw insert batching - Mailing list pgsql-hackers

From Amit Langote
Subject Re: POC: postgres_fdw insert batching
Date
Msg-id CA+HiwqFwfha6vka51J5d2aoV=XbsV1S2GvmhgAfHpHto2-taYw@mail.gmail.com
Whole thread Raw
In response to Re: POC: postgres_fdw insert batching  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Thu, Feb 18, 2021 at 4:36 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> On 2/17/21 9:51 AM, Amit Langote wrote:
> > On Wed, Feb 17, 2021 at 5:46 PM Amit Langote <amitlangote09@gmail.com> wrote:
> >> Sorry, I hadn't shared enough details of my investigations when I
> >> originally ran into this.  Such as that I had considered implementing
> >> the use of batching for these inserts too but had given up.
> >>
> >> Now that you mention it, I think I gave a less convincing reason for
> >> why we should avoid doing it at all.  Maybe it would have been more
> >> right to say that it is the core code, not necessarily the FDWs, that
> >> currently fails to deal with the use of batching by the insert
> >> component of a cross-partition update.  Those failures could be
> >> addressed as I'll describe below.
> >>
> >> For postgres_fdw, postgresGetForeignModifyBatchSize() could be taught
> >> to simply use the PgFdwModifyTable that is installed to handle the
> >> insert component of a cross-partition update (one can get that one via
> >> aux_fmstate field of the original PgFdwModifyState).  However, even
> >> though that's fine for postgres_fdw to do, what worries (had worried)
> >> me is that it also results in scribbling on ri_BatchSize that the core
> >> code may see to determine what to do with a particular tuple, and I
> >> just have to hope that nodeModifyTable.c doesn't end up doing anything
> >> unwarranted with the original update based on seeing a non-zero
> >> ri_BatchSize.  AFAICS, we are fine on that front.
> >>
> >> That said, there are some deficiencies in the code that have to be
> >> addressed before we can let postgres_fdw do as mentioned above.  For
> >> example, the code in ExecModifyTable() that runs after breaking out of
> >> the loop to insert any remaining batched tuples appears to miss the
> >> tuples batched by such inserts.   Apparently, that is because the
> >> ResultRelInfos used by those inserts are not present in
> >> es_tuple_routing_result_relations.  Turns out I had forgotten that
> >> execPartition.c doesn't add the ResultRelInfos to that list if they
> >> are made by ExecInitModifyTable() for the original update operation
> >> and simply reused by ExecFindPartition() when tuples were routed to
> >> those partitions.  It can be "fixed" by reverting to the original
> >> design in Tsunakawa-san's patch where the tuple routing result
> >> relations were obtained from the PartitionTupleRouting data structure,
> >> which fortunately stores all tuple routing result relations.  (Sorry,
> >> I gave wrong advice in [1] in retrospect.)
> >>
> >>> On a closer look, it seems the problem actually lies in a small
> >>> inconsistency between create_foreign_modify and ExecInitRoutingInfo. The
> >>> former only set batch_size for CMD_INSERT while the latter called the
> >>> BatchSize() for all operations, expecting >= 1 result. So we may either
> >>> relax create_foreign_modify and set batch_size for all DML, or make
> >>> ExecInitRoutingInfo stricter (which is what the patches here do).
> >>
> >> I think we should be fine if we make
> >> postgresGetForeignModifyBatchSize() use the correct PgFdwModifyState
> >> as described above.  We can be sure that we are not mixing the
> >> information used by the batched insert with that of the original
> >> unbatched update.
> >>
> >>> Is there a reason not to do the first thing, allowing batching of
> >>> inserts during cross-partition updates? I tried to do that, but it
> >>> dawned on me that we can't mix batched and un-batched operations, e.g.
> >>> DELETE + INSERT, because that'd break the order of execution, leading to
> >>> bogus results in case the same row is modified repeatedly, etc.
> >>
> >> Actually, postgres_fdw only supports moving a row into a partition (as
> >> part of a cross-partition update that is) if it has already finished
> >> performing any updates on it.   So there is no worry of rows that are
> >> moved into a partition subsequently getting updated due to the
> >> original command.
> >>
> >> The attached patch implements the changes necessary to make these
> >> inserts use batching too.
> >>
> >> [1] https://www.postgresql.org/message-id/CA%2BHiwqEbnhwVJMsukTP-S9Kv1ynC7Da3yuqSPZC0Y7oWWOwoHQ%40mail.gmail.com
> >
> > Oops, I had mistakenly not hit "Reply All".  Attaching the patch again.
> >
>
> Thanks. The patch seems reasonable, but it's a bit too large for a fix,
> so I'll go ahead and push one of the previous fixes restricting batching
> to plain INSERT commands. But this seems useful, so I suggest adding it
> to the next commitfest.

Okay will post the rebased patch to a new thread.

> One thing that surprised me is that we only move the rows *to* the
> foreign partition, not from it (even on pg13, i.e. before the batching
> etc.). I mean, using the example you posted earlier, with one foreign
> and one local partition, consider this:
>
>   delete from p;
>   insert into p values (2);
>
>   test=# select * from p2;
>    a
>   ---
>    2
>   (1 row)
>
>   test=# update p set a = 1;
>   UPDATE 1
>
>   test=# select * from p1;
>    a
>   ---
>    1
>   (1 row)
>
> OK, so it was moved to the foreign partition, which is for rows with
> value in (1). So far so good. Let's do another update:
>
>   test=# update p set a = 2;
>   UPDATE 1
>   test=# select * from p1;
>    a
>   ---
>    2
>   (1 row)
>
> So now the foreign partition contains value (2), which is however wrong
> with respect to the partitioning rules - this should be in p2, the local
> partition.
>
> This however causes pretty annoying issue:
>
> test=# explain analyze select * from p where a = 2;
>
>                             QUERY PLAN
>   ---------------------------------------------------------------
>    Seq Scan on p2 p  (cost=0.00..41.88 rows=13 width=4)
>                      (actual  time=0.024..0.028 rows=0 loops=1)
>      Filter: (a = 2)
>    Planning Time: 0.355 ms
>    Execution Time: 0.089 ms
>   (4 rows)
>
> That is, we fail to find the row, because we eliminate the partition.
>
> Now, maybe this is expected, but it seems like a rather mind-boggling
> violation of POLA principle.

Yeah, I think we knowingly allow this behavior.  The documentation
states that a foreign table's constraints are not enforced by the core
server nor by the FDW, but I suppose we still allow declaring them
mostly for the planner's consumption:

https://www.postgresql.org/docs/current/sql-createforeigntable.html

"Constraints on foreign tables (such as CHECK or NOT NULL clauses) are
not enforced by the core PostgreSQL system, and most foreign data
wrappers do not attempt to enforce them either; that is, the
constraint is simply assumed to hold true. There would be little point
in such enforcement since it would only apply to rows inserted or
updated via the foreign table, and not to rows modified by other
means, such as directly on the remote server. Instead, a constraint
attached to a foreign table should represent a constraint that is
being enforced by the remote server."

Partitioning constraints are not treated any differently for those
reasons.  It's a good idea to declare a CHECK constraint on the remote
table matching with the local partition constraint, though IIRC we
don't mention that advice anywhere in our documentation.

> I've checked if postgres_fdw mentions this
> somewhere, but all I see about row movement is this:
>
>    Note also that postgres_fdw supports row movement invoked by UPDATE
>    statements executed on partitioned tables, but it currently does not
>    handle the case where a remote partition chosen to insert a moved row
>    into is also an UPDATE target partition that will be updated later.
>
> and if I understand that correctly, that's about something different.

Yeah, that's a note saying that while we do support moving a row from
a local partition to a postgres_fdw foreign partition, it's only
allowed if the foreign partition won't subsequently be updated.  So to
reiterate, the cases we don't support:

* Moving a row from a foreign partition to a local one

* Moving a row from a local partition to a foreign one if the latter
will be updated subsequent to moving a row into it

postgres_fdw detects the second case with the following code in
postgresBeginForeignInsert():

    /*
     * If the foreign table we are about to insert routed rows into is also an
     * UPDATE subplan result rel that will be updated later, proceeding with
     * the INSERT will result in the later UPDATE incorrectly modifying those
     * routed rows, so prevent the INSERT --- it would be nice if we could
     * handle this case; but for now, throw an error for safety.
     */
    if (plan && plan->operation == CMD_UPDATE &&
        (resultRelInfo->ri_usesFdwDirectModify ||
         resultRelInfo->ri_FdwState) &&
        resultRelInfo > mtstate->resultRelInfo + mtstate->mt_whichplan)
        ereport(ERROR,
                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                 errmsg("cannot route tuples into foreign table to be
updated \"%s\"",
                        RelationGetRelationName(rel))));
--
Amit Langote
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: ERROR: "ft1" is of the wrong type.
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Is it worth accepting multiple CRLs?