Re: Optimization for updating foreign tables in Postgres FDW - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | Re: Optimization for updating foreign tables in Postgres FDW |
Date | |
Msg-id | 56DFDF34.8040804@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Optimization for updating foreign tables in Postgres FDW (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Optimization for updating foreign tables in Postgres FDW
|
List | pgsql-hackers |
On 2016/03/08 2:35, Robert Haas wrote: > On Mon, Mar 7, 2016 at 7:53 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> Another option to avoid such a hazard would be to remove the two changes >> from ExecInitModifyTable and create ExecAuxRowMarks and junk filters even in >> the pushdown case. I made the changes because we won't use ExecAuxRowMarks >> in that case since we don't need to do EvalPlanQual rechecks and because we >> won't use junk filters in that case since we do UPDATE/DELETE in the >> subplan. But the creating cost is enough small, so simply removing the >> changes seems like a good idea. > Sure, that works. OK, I removed the changes. >>> This issue crops up elsewhere as well. The changes to >>> ExecModifyTable() have the same problem -- in that case, it might be >>> wise to move the code that's going to have to be indented yet another >>> level into a separate function. That code also says this: >>> >>> + /* No need to provide scan tuple to >>> ExecProcessReturning. */ >>> + slot = ExecProcessReturning(resultRelInfo, >>> NULL, planSlot); >>> >>> ...but, uh, why not? The comment says what the code does, but what it >>> should do is explain why it does it. >> >> As documented in IterateDMLPushdown in fdwhandler.sgml, the reason for that >> is that in the pushdown case it's the IterateDMLPushdown's responsiblity to >> get actually inserted/updated/deleted tuples and make those tuples available >> to the ExecProcessReturning. I'll add comments. > Comments are good things to have. :-) Yeah, I added comments. >>> On a broader level, I'm not very happy with the naming this patch >>> uses. Here's an example: >>> >>> + <para> >>> + If an FDW supports optimizing foreign table updates, it still needs >>> to >>> + provide <function>PlanDMLPushdown</>, <function>BeginDMLPushdown</>, >>> + <function>IterateDMLPushdown</> and <function>EndDMLPushdown</> >>> + described below. >>> + </para> >>> >>> "Optimizing foreign table updates" is both inaccurate (since it >>> doesn't only optimize updates) and so vague as to be meaningless >>> unless you already know what it means. The actual patch uses >>> terminology like "fdwPushdowns" which is just as bad. We might push a >>> lot of things to the foreign side -- sorts, joins, aggregates, limits >>> -- and this is just one of them. Worse, "pushdown" is itself >>> something of a term of art - will people who haven't been following >>> all of the mammoth, multi-hundred-email threads on this topic know >>> what that means? I think we need some better terminology here. >>> >>> The best thing that I can come up with offhand is "bulk modify". So >>> we'd have PlanBulkModify, BeginBulkModify, IterateBulkModify, >>> EndBulkModify, ExplainBulkModify. Other suggestions welcome. The >>> ResultRelInfo flag could be ri_usesFDWBulkModify. >> >> I'm not sure that "bulk modify" is best. Yeah, this would improve the >> performance especially in the bulk-modification case, but would improve the >> performance even in the case where an UPDATE/DELETE modifies just a single >> row. Let me explain using an example. Without the patch, we have the >> following plan for an UPDATE on a foreign table that updates a single row: >> >> postgres=# explain verbose update foo set a = a + 1 where a = 1; >> QUERY PLAN >> ---------------------------------------------------------------------------------- >> Update on public.foo (cost=100.00..101.05 rows=1 width=14) >> Remote SQL: UPDATE public.foo SET a = $2 WHERE ctid = $1 >> -> Foreign Scan on public.foo (cost=100.00..101.05 rows=1 width=14) >> Output: (a + 1), b, ctid >> Remote SQL: SELECT a, b, ctid FROM public.foo WHERE ((a = 1)) FOR >> UPDATE >> (5 rows) >> >> The plan requires two queries, SELECT and UPDATE, to do the update. >> (Actually, the plan have additional overheads in creating a cursor for the >> SELECT and establishing a prepared statement for the UPDATE.) But with the >> patch, we have: >> >> postgres=# explain verbose update foo set a = a + 1 where a = 1; >> QUERY PLAN >> --------------------------------------------------------------------------- >> Update on public.foo (cost=100.00..101.05 rows=1 width=14) >> -> Foreign Update on public.foo (cost=100.00..101.05 rows=1 width=14) >> Remote SQL: UPDATE public.foo SET a = (a + 1) WHERE ((a = 1)) >> (3 rows) >> >> The optimized plan requires just a single UPDATE query to do that! So, even >> in the single-row-modification case the patch could improve the performance. >> >> How about "Direct Modify"; PlanDirectModify, BeginDirectModify, >> IterateDirectModify, EndDirectModify, ExplainDirectModify, and >> ri_usesFDWDirectModify. > Works for me! Great! I changed the naming. I also updated docs as proposed by you in a previous email, and rebased the patch to the latest HEAD. Please find attached an updated version of the patch. Best regards, Etsuro Fujita
Attachment
pgsql-hackers by date: