Re: Optimization for updating foreign tables in Postgres FDW - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Optimization for updating foreign tables in Postgres FDW |
Date | |
Msg-id | CA+TgmoaeFjmFN6R+SKesyhKyz2tSVi7QyA53jvUtVjpLrH2XaQ@mail.gmail.com Whole thread Raw |
In response to | Re: Optimization for updating foreign tables in Postgres FDW (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>) |
Responses |
Re: Optimization for updating foreign tables in Postgres
FDW
|
List | pgsql-hackers |
On Tue, Feb 23, 2016 at 1:18 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > Thanks again for updating the patch and fixing the issues! Some comments on the latest version. I haven't reviewed the postgres_fdw changes in detail here, so this is just about the core changes. I see that show_plan_tlist checks whether the operation is any of CMD_INSERT, CMD_UPDATE, or CMD_DELETE. But practically every place else where a similar test is needed instead tests whether the operation is *not* CMD_SELECT. I think this place should do it that way, too. + resultRelInfo = mtstate->resultRelInfo; for (i = 0; i < nplans; i++) { ExecAuxRowMark *aerm; + /* + * ignore subplan if the FDW pushes down the command to the remote + * server; the ModifyTable won't have anything to do except for + * evaluation of RETURNING expressions + */ + if (resultRelInfo->ri_FdwPushdown) + { + resultRelInfo++; + continue; + } + subplan = mtstate->mt_plans[i]->plan; aerm = ExecBuildAuxRowMark(erm, subplan->targetlist); mtstate->mt_arowmarks[i] = lappend(mtstate->mt_arowmarks[i], aerm); + resultRelInfo++; } This kind of thing creates a hazard for future people maintaining this code. If somebody adds some code to this loop that needs to execute even when resultRelInfo->ri_FdwPushdown is true, they have to add two copies of it. It's much better to move the three lines of logic that execute only in the non-pushdown case inside of if (!resultRelInfo->ri_FdwPushdown). This issue crops up elsewhere as well. The changes to ExecModifyTable() have the same problem -- in that case, it might be wise to move the code that's going to have to be indented yet another level into a separate function. That code also says this: + /* No need to provide scan tuple to ExecProcessReturning. */ + slot = ExecProcessReturning(resultRelInfo, NULL, planSlot); ...but, uh, why not? The comment says what the code does, but what it should do is explain why it does it. On a broader level, I'm not very happy with the naming this patch uses. Here's an example: + <para> + If an FDW supports optimizing foreign table updates, it still needs to + provide <function>PlanDMLPushdown</>, <function>BeginDMLPushdown</>, + <function>IterateDMLPushdown</> and <function>EndDMLPushdown</> + described below. + </para> "Optimizing foreign table updates" is both inaccurate (since it doesn't only optimize updates) and so vague as to be meaningless unless you already know what it means. The actual patch uses terminology like "fdwPushdowns" which is just as bad. We might push a lot of things to the foreign side -- sorts, joins, aggregates, limits -- and this is just one of them. Worse, "pushdown" is itself something of a term of art - will people who haven't been following all of the mammoth, multi-hundred-email threads on this topic know what that means? I think we need some better terminology here. The best thing that I can come up with offhand is "bulk modify". So we'd have PlanBulkModify, BeginBulkModify, IterateBulkModify, EndBulkModify, ExplainBulkModify. Other suggestions welcome. The ResultRelInfo flag could be ri_usesFDWBulkModify. The documentation could say something like this: Some inserts, updates, and deletes to foreign tables can be optimized by implementing an alternate set of interfaces. The ordinary interfaces for inserts, updates, and deletes fetch rows from the remote server and then modify those rows one at a time. In some cases, this row-by-row approach is necessary, but it can be inefficient. If it is possible for the foreign server to determine which rows should be modified without actually retrieving them, and if there are no local triggers which would affect the operation, then it is possible to arrange things so that the entire operation is performed on the remote server. The interfaces described below make this possible. + Begin executing a foreign table update directly on the remote server. I think this should say "Prepare to execute a bulk modification directly on the remote server". It shouldn't actually begin the execution phase. + End the table update and release resources. It is normally not important And I think this one should say "Clean up following a bulk modification on the remote server". It's not actually ending the update; the iterate method already did that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: