Home > mailing lists

Re: Optimization for updating foreign tables in Postgres FDW - Mailing list pgsql-hackers

From	Etsuro Fujita
Subject	Re: Optimization for updating foreign tables in Postgres FDW
Date	March 9, 2016 08:33:58
Msg-id	56DFDF34.8040804@lab.ntt.co.jp Whole thread Raw
In response to	Re: Optimization for updating foreign tables in Postgres FDW (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Optimization for updating foreign tables in Postgres FDW
List	pgsql-hackers

Tree view

On 2016/03/08 2:35, Robert Haas wrote:
> On Mon, Mar 7, 2016 at 7:53 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> Another option to avoid such a hazard would be to remove the two changes
>> from ExecInitModifyTable and create ExecAuxRowMarks and junk filters even in
>> the pushdown case.  I made the changes because we won't use ExecAuxRowMarks
>> in that case since we don't need to do EvalPlanQual rechecks and because we
>> won't use junk filters in that case since we do UPDATE/DELETE in the
>> subplan.  But the creating cost is enough small, so simply removing the
>> changes seems like a good idea.

> Sure, that works.

OK, I removed the changes.

>>> This issue crops up elsewhere as well.  The changes to
>>> ExecModifyTable() have the same problem -- in that case, it might be
>>> wise to move the code that's going to have to be indented yet another
>>> level into a separate function.   That code also says this:
>>>
>>> +                       /* No need to provide scan tuple to
>>> ExecProcessReturning. */
>>> +                       slot = ExecProcessReturning(resultRelInfo,
>>> NULL, planSlot);
>>>
>>> ...but, uh, why not?  The comment says what the code does, but what it
>>> should do is explain why it does it.
>>
>> As documented in IterateDMLPushdown in fdwhandler.sgml, the reason for that
>> is that in the pushdown case it's the IterateDMLPushdown's responsiblity to
>> get actually inserted/updated/deleted tuples and make those tuples available
>> to the ExecProcessReturning.  I'll add comments.

> Comments are good things to have.  :-)

Yeah, I added comments.

>>> On a broader level, I'm not very happy with the naming this patch
>>> uses.  Here's an example:
>>>
>>> +    <para>
>>> +     If an FDW supports optimizing foreign table updates, it still needs
>>> to
>>> +     provide <function>PlanDMLPushdown</>, <function>BeginDMLPushdown</>,
>>> +     <function>IterateDMLPushdown</> and <function>EndDMLPushdown</>
>>> +     described below.
>>> +    </para>
>>>
>>> "Optimizing foreign table updates" is both inaccurate (since it
>>> doesn't only optimize updates) and so vague as to be meaningless
>>> unless you already know what it means.  The actual patch uses
>>> terminology like "fdwPushdowns" which is just as bad.  We might push a
>>> lot of things to the foreign side -- sorts, joins, aggregates, limits
>>> -- and this is just one of them.  Worse, "pushdown" is itself
>>> something of a term of art - will people who haven't been following
>>> all of the mammoth, multi-hundred-email threads on this topic know
>>> what that means?  I think we need some better terminology here.
>>>
>>> The best thing that I can come up with offhand is "bulk modify".  So
>>> we'd have PlanBulkModify, BeginBulkModify, IterateBulkModify,
>>> EndBulkModify, ExplainBulkModify.  Other suggestions welcome.   The
>>> ResultRelInfo flag could be ri_usesFDWBulkModify.
>>
>> I'm not sure that "bulk modify" is best.  Yeah, this would improve the
>> performance especially in the bulk-modification case, but would improve the
>> performance even in the case where an UPDATE/DELETE modifies just a single
>> row.  Let me explain using an example.  Without the patch, we have the
>> following plan for an UPDATE on a foreign table that updates a single row:
>>
>> postgres=# explain verbose update foo set a = a + 1 where a = 1;
>>                                      QUERY PLAN
>> ----------------------------------------------------------------------------------
>>   Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
>>     Remote SQL: UPDATE public.foo SET a = $2 WHERE ctid = $1
>>     ->  Foreign Scan on public.foo  (cost=100.00..101.05 rows=1 width=14)
>>           Output: (a + 1), b, ctid
>>           Remote SQL: SELECT a, b, ctid FROM public.foo WHERE ((a = 1)) FOR
>> UPDATE
>> (5 rows)
>>
>> The plan requires two queries, SELECT and UPDATE, to do the update.
>> (Actually, the plan have additional overheads in creating a cursor for the
>> SELECT and establishing a prepared statement for the UPDATE.)  But with the
>> patch, we have:
>>
>> postgres=# explain verbose update foo set a = a + 1 where a = 1;
>>                                  QUERY PLAN
>> ---------------------------------------------------------------------------
>>   Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
>>     ->  Foreign Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
>>           Remote SQL: UPDATE public.foo SET a = (a + 1) WHERE ((a = 1))
>> (3 rows)
>>
>> The optimized plan requires just a single UPDATE query to do that!  So, even
>> in the single-row-modification case the patch could improve the performance.
>>
>> How about "Direct Modify"; PlanDirectModify, BeginDirectModify,
>> IterateDirectModify, EndDirectModify, ExplainDirectModify, and
>> ri_usesFDWDirectModify.

> Works for me!

Great!  I changed the naming.  I also updated docs as proposed by you in
a previous email, and rebased the patch to the latest HEAD.  Please find
attached an updated version of the patch.

Best regards,
Etsuro Fujita

Attachment

fdw-dml-pushdown-v10.patch

pgsql-hackers by date:

From: Kyotaro HORIGUCHI
Date: 09 March 2016, 08:30:22
Subject: Re: PATCH: index-only scans with partial indexes

From: Joel Jacobson
Date: 09 March 2016, 08:34:25
Subject: Re: Disabling an index temporarily

Re: Optimization for updating foreign tables in Postgres FDW - Mailing list pgsql-hackers

Attachment

Previous

Next