Re: Optimization for updating foreign tables in Postgres FDW - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Optimization for updating foreign tables in Postgres FDW
Date
Msg-id CA+TgmoaeFjmFN6R+SKesyhKyz2tSVi7QyA53jvUtVjpLrH2XaQ@mail.gmail.com
Whole thread Raw
In response to Re: Optimization for updating foreign tables in Postgres FDW  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Responses Re: Optimization for updating foreign tables in Postgres FDW
List pgsql-hackers
On Tue, Feb 23, 2016 at 1:18 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> Thanks again for updating the patch and fixing the issues!

Some comments on the latest version.  I haven't reviewed the
postgres_fdw changes in detail here, so this is just about the core
changes.

I see that show_plan_tlist checks whether the operation is any of
CMD_INSERT, CMD_UPDATE, or CMD_DELETE.  But practically every place
else where a similar test is needed instead tests whether the
operation is *not* CMD_SELECT.  I think this place should do it that
way, too.

+               resultRelInfo = mtstate->resultRelInfo;               for (i = 0; i < nplans; i++)               {
                ExecAuxRowMark *aerm;
 

+                       /*
+                        * ignore subplan if the FDW pushes down the
command to the remote
+                        * server; the ModifyTable won't have anything
to do except for
+                        * evaluation of RETURNING expressions
+                        */
+                       if (resultRelInfo->ri_FdwPushdown)
+                       {
+                               resultRelInfo++;
+                               continue;
+                       }
+                       subplan = mtstate->mt_plans[i]->plan;                       aerm = ExecBuildAuxRowMark(erm,
subplan->targetlist);                      mtstate->mt_arowmarks[i] =
 
lappend(mtstate->mt_arowmarks[i], aerm);
+                       resultRelInfo++;               }


This kind of thing creates a hazard for future people maintaining this
code.  If somebody adds some code to this loop that needs to execute
even when resultRelInfo->ri_FdwPushdown is true, they have to add two
copies of it.  It's much better to move the three lines of logic that
execute only in the non-pushdown case inside of if
(!resultRelInfo->ri_FdwPushdown).

This issue crops up elsewhere as well.  The changes to
ExecModifyTable() have the same problem -- in that case, it might be
wise to move the code that's going to have to be indented yet another
level into a separate function.   That code also says this:

+                       /* No need to provide scan tuple to
ExecProcessReturning. */
+                       slot = ExecProcessReturning(resultRelInfo,
NULL, planSlot);

...but, uh, why not?  The comment says what the code does, but what it
should do is explain why it does it.

On a broader level, I'm not very happy with the naming this patch
uses.  Here's an example:

+    <para>
+     If an FDW supports optimizing foreign table updates, it still needs to
+     provide <function>PlanDMLPushdown</>, <function>BeginDMLPushdown</>,
+     <function>IterateDMLPushdown</> and <function>EndDMLPushdown</>
+     described below.
+    </para>

"Optimizing foreign table updates" is both inaccurate (since it
doesn't only optimize updates) and so vague as to be meaningless
unless you already know what it means.  The actual patch uses
terminology like "fdwPushdowns" which is just as bad.  We might push a
lot of things to the foreign side -- sorts, joins, aggregates, limits
-- and this is just one of them.  Worse, "pushdown" is itself
something of a term of art - will people who haven't been following
all of the mammoth, multi-hundred-email threads on this topic know
what that means?  I think we need some better terminology here.

The best thing that I can come up with offhand is "bulk modify".  So
we'd have PlanBulkModify, BeginBulkModify, IterateBulkModify,
EndBulkModify, ExplainBulkModify.  Other suggestions welcome.   The
ResultRelInfo flag could be ri_usesFDWBulkModify.  The documentation
could say something like this:

Some inserts, updates, and deletes to foreign tables can be optimized
by implementing an alternate set of interfaces.  The ordinary
interfaces for inserts, updates, and deletes fetch rows from the
remote server and then modify those rows one at a time.  In some
cases, this row-by-row approach is necessary, but it can be
inefficient.  If it is possible for the foreign server to determine
which rows should be modified without actually retrieving them, and if
there are no local triggers which would affect the operation, then it
is possible to arrange things so that the entire operation is
performed on the remote server.  The interfaces described below make
this possible.

+     Begin executing a foreign table update directly on the remote server.

I think this should say "Prepare to execute a bulk modification
directly on the remote server".  It shouldn't actually begin the
execution phase.

+     End the table update and release resources.  It is normally not important

And I think this one should say "Clean up following a bulk
modification on the remote server".  It's not actually ending the
update; the iterate method already did that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Corey Huinker
Date:
Subject: Re: Add generate_series(date,date) and generate_series(date,date,integer)
Next
From: Robert Haas
Date:
Subject: Re: Typo in comment