Thread: Optimization for updating foreign tables in Postgres FDW

Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
Attached is a WIP patch for the following:

/*
 * postgresPlanForeignModify
 *      Plan an insert/update/delete operation on a foreign table
 *
 * Note: currently, the plan tree generated for UPDATE/DELETE will always
 * include a ForeignScan that retrieves ctids (using SELECT FOR UPDATE)
 * and then the ModifyTable node will have to execute individual remote
 * UPDATE/DELETE commands.  If there are no local conditions or joins
 * needed, it'd be better to let the scan node do UPDATE/DELETE RETURNING
 * and then do nothing at ModifyTable.  Room for future optimization ...
 */

In the patch postgresPlanForeignModify has been modified so that if, in
addition to the above condition, the followings are satisfied, then the
ForeignScan and ModifyTable node will work that way.

 - There are no local BEFORE/AFTER triggers.
 - In UPDATE it's safe to evaluate expressions to assign to the target
columns on the remote server.

Here is a simple performance test.

On remote side:
postgres=# create table t (id serial primary key, inserted timestamp
default clock_timestamp(), data text);
CREATE TABLE
postgres=# insert into t(data) select random() from generate_series(0,
99999);
INSERT 0 100000
postgres=# vacuum t;
VACUUM

On local side:
postgres=# create foreign table ft (id integer, inserted timestamp, data
text) server myserver options (table_name 't');
CREATE FOREIGN TABLE

Unpatched:
postgres=# explain analyze verbose delete from ft where id < 10000;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Delete on public.ft  (cost=100.00..162.32 rows=910 width=6) (actual
time=1275.255..1275.255 rows=0 loops=1)
   Remote SQL: DELETE FROM public.t WHERE ctid = $1
   ->  Foreign Scan on public.ft  (cost=100.00..162.32 rows=910 width=6)
(actual time=1.180..52.095 rows=9999 loops=1)
         Output: ctid
         Remote SQL: SELECT ctid FROM public.t WHERE ((id < 10000)) FOR
UPDATE
 Planning time: 0.112 ms
 Execution time: 1275.733 ms
(7 rows)

Patched (Note that the DELETE command has been pushed down.):
postgres=# explain analyze verbose delete from ft where id < 10000;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Delete on public.ft  (cost=100.00..162.32 rows=910 width=6) (actual
time=0.006..0.006 rows=0 loops=1)
   ->  Foreign Scan on public.ft  (cost=100.00..162.32 rows=910 width=6)
(actual time=0.001..0.001 rows=0 loops=1)
         Output: ctid
         Remote SQL: DELETE FROM public.t WHERE ((id < 10000))
 Planning time: 0.101 ms
 Execution time: 8.808 ms
(6 rows)

I'll add this to the next CF.  Comments are welcome.

Thanks,

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Tue, Jul 8, 2014 at 3:07 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> Attached is a WIP patch for the following:
>
> /*
>  * postgresPlanForeignModify
>  *      Plan an insert/update/delete operation on a foreign table
>  *
>  * Note: currently, the plan tree generated for UPDATE/DELETE will always
>  * include a ForeignScan that retrieves ctids (using SELECT FOR UPDATE)
>  * and then the ModifyTable node will have to execute individual remote
>  * UPDATE/DELETE commands.  If there are no local conditions or joins
>  * needed, it'd be better to let the scan node do UPDATE/DELETE RETURNING
>  * and then do nothing at ModifyTable.  Room for future optimization ...
>  */
>
> In the patch postgresPlanForeignModify has been modified so that if, in
> addition to the above condition, the followings are satisfied, then the
> ForeignScan and ModifyTable node will work that way.
>
>  - There are no local BEFORE/AFTER triggers.
>  - In UPDATE it's safe to evaluate expressions to assign to the target
> columns on the remote server.
>
> Here is a simple performance test.
>
> On remote side:
> postgres=# create table t (id serial primary key, inserted timestamp
> default clock_timestamp(), data text);
> CREATE TABLE
> postgres=# insert into t(data) select random() from generate_series(0,
> 99999);
> INSERT 0 100000
> postgres=# vacuum t;
> VACUUM
>
> On local side:
> postgres=# create foreign table ft (id integer, inserted timestamp, data
> text) server myserver options (table_name 't');
> CREATE FOREIGN TABLE
>
> Unpatched:
> postgres=# explain analyze verbose delete from ft where id < 10000;
>                                                       QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------
>  Delete on public.ft  (cost=100.00..162.32 rows=910 width=6) (actual
> time=1275.255..1275.255 rows=0 loops=1)
>    Remote SQL: DELETE FROM public.t WHERE ctid = $1
>    ->  Foreign Scan on public.ft  (cost=100.00..162.32 rows=910 width=6)
> (actual time=1.180..52.095 rows=9999 loops=1)
>          Output: ctid
>          Remote SQL: SELECT ctid FROM public.t WHERE ((id < 10000)) FOR
> UPDATE
>  Planning time: 0.112 ms
>  Execution time: 1275.733 ms
> (7 rows)
>
> Patched (Note that the DELETE command has been pushed down.):
> postgres=# explain analyze verbose delete from ft where id < 10000;
>                                                     QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------
>  Delete on public.ft  (cost=100.00..162.32 rows=910 width=6) (actual
> time=0.006..0.006 rows=0 loops=1)
>    ->  Foreign Scan on public.ft  (cost=100.00..162.32 rows=910 width=6)
> (actual time=0.001..0.001 rows=0 loops=1)
>          Output: ctid
>          Remote SQL: DELETE FROM public.t WHERE ((id < 10000))
>  Planning time: 0.101 ms
>  Execution time: 8.808 ms
> (6 rows)
>
> I'll add this to the next CF.  Comments are welcome.

I haven't looked at the code, but +1 for the general idea.  The
concept seems good to me, and that's a very large performance
improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Shigeru Hanada
Date:
Hi Fujita-san,

My coworker Takaaki EITOKU reviewed the patch, and here are some
comments from him.

2014-07-08 16:07 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
...
> In the patch postgresPlanForeignModify has been modified so that if, in
> addition to the above condition, the followings are satisfied, then the
> ForeignScan and ModifyTable node will work that way.
>
>  - There are no local BEFORE/AFTER triggers.

The reason the check ignores INSTEAD OF triggers here is that INSTEAD
OF trigger would prevent executing UPDATE/DELETE statement against a
foreign tables at all, right?

>  - In UPDATE it's safe to evaluate expressions to assign to the target
> columns on the remote server.

IIUC, in addition to target expressions, whole WHERE clause should be
safe to be pushed-down.  Though that is obviously required, but
mentioning that in documents and source comments would help users and
developers.

>
> Here is a simple performance test.
>
> On remote side:
> postgres=# create table t (id serial primary key, inserted timestamp
> default clock_timestamp(), data text);
> CREATE TABLE
> postgres=# insert into t(data) select random() from generate_series(0,
> 99999);
> INSERT 0 100000
> postgres=# vacuum t;
> VACUUM
>
> On local side:
> postgres=# create foreign table ft (id integer, inserted timestamp, data
> text) server myserver options (table_name 't');
> CREATE FOREIGN TABLE
>
> Unpatched:
> postgres=# explain analyze verbose delete from ft where id < 10000;
>                                                       QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------
>  Delete on public.ft  (cost=100.00..162.32 rows=910 width=6) (actual
> time=1275.255..1275.255 rows=0 loops=1)
>    Remote SQL: DELETE FROM public.t WHERE ctid = $1
>    ->  Foreign Scan on public.ft  (cost=100.00..162.32 rows=910 width=6)
> (actual time=1.180..52.095 rows=9999 loops=1)
>          Output: ctid
>          Remote SQL: SELECT ctid FROM public.t WHERE ((id < 10000)) FOR
> UPDATE
>  Planning time: 0.112 ms
>  Execution time: 1275.733 ms
> (7 rows)
>
> Patched (Note that the DELETE command has been pushed down.):
> postgres=# explain analyze verbose delete from ft where id < 10000;
>                                                     QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------
>  Delete on public.ft  (cost=100.00..162.32 rows=910 width=6) (actual
> time=0.006..0.006 rows=0 loops=1)
>    ->  Foreign Scan on public.ft  (cost=100.00..162.32 rows=910 width=6)
> (actual time=0.001..0.001 rows=0 loops=1)
>          Output: ctid
>          Remote SQL: DELETE FROM public.t WHERE ((id < 10000))
>  Planning time: 0.101 ms
>  Execution time: 8.808 ms
> (6 rows)

We found that this patch speeds up DELETE case remarkably, as you
describe above, but we saw only less than 2x speed on UPDATE cases.
Do you have any numbers of UPDATE cases?

Some more random thoughts:

* Naming of new behavior
You named this optimization "Direct Update", but I'm not sure that
this is intuitive enough to express this behavior.  I would like to
hear opinions of native speakers.

* Macros for # of fdw_private elements
In postgres_fdw.c you defined MaxFdwScanFdwPrivateLength and
MinFdwScanFdwPrivateLength for determining the mode, but number of
fdw_private elements is not a ranged value but an enum value (I mean
that fdw_private holds 3 or 7, not 4~6, so Max and Min seems
inappropriate for prefix.  IMO those macros should be named so that
the names represent the behavior, or define a macro to determine the
mode, say IS_SHIPPABLE(foo).

* Comparison of Macros
Comparison against MaxFdwScanFdwPrivateLength and
MinFdwScanFdwPrivateLength should be == instead of >= or <= to detect
unexpected value.  Adding assert macro seems good too.

-- 
Shigeru HANADA



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
(2014/07/24 18:30), Shigeru Hanada wrote:
> My coworker Takaaki EITOKU reviewed the patch, and here are some
> comments from him.

Thank you for the review, Eitoku-san!

> 2014-07-08 16:07 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
>> In the patch postgresPlanForeignModify has been modified so that if, in
>> addition to the above condition, the followings are satisfied, then the
>> ForeignScan and ModifyTable node will work that way.
>>
>>   - There are no local BEFORE/AFTER triggers.
>
> The reason the check ignores INSTEAD OF triggers here is that INSTEAD
> OF trigger would prevent executing UPDATE/DELETE statement against a
> foreign tables at all, right?

I'm not sure that I understand your question correctly, but the reason 
for that is because foreign tables cannot have INSTEAD OF triggers.

>>   - In UPDATE it's safe to evaluate expressions to assign to the target
>> columns on the remote server.
>
> IIUC, in addition to target expressions, whole WHERE clause should be
> safe to be pushed-down.  Though that is obviously required, but
> mentioning that in documents and source comments would help users and
> developers.

OK, I'll add the comments and documentation notes.

(I intentionaly didn't mention that because I think the comment for 
postgresPlanForeignModify has already said the equivalent condition, ie, 
"there are no local conditions", which means all conditions are executed 
remotely.)

> We found that this patch speeds up DELETE case remarkably, as you
> describe above, but we saw only less than 2x speed on UPDATE cases.
> Do you have any numbers of UPDATE cases?

Here is the result for an UPDATE case.
>> On remote side:>> postgres=# create table t (id serial primary key, inserted timestamp>> default clock_timestamp(),
datatext);>> CREATE TABLE>> postgres=# insert into t(data) select random() from generate_series(0,>> 99999);>> INSERT 0
100000>>postgres=# vacuum t;>> VACUUM>>>> On local side:>> postgres=# create foreign table ft (id integer, inserted
timestamp,data>> text) server myserver options (table_name 't');>> CREATE FOREIGN TABLE
 

Unpatched:
postgres=# explain analyze verbose update ft set data = 'notsolongtext' 
where id < 10000;                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
Updateon public.ft  (cost=100.00..147.38 rows=650 width=18) (actual 
 
time=1463.944..1463.944 rows=0 loops=1)   Remote SQL: UPDATE public.t SET data = $2 WHERE ctid = $1   ->  Foreign Scan
onpublic.ft  (cost=100.00..147.38 rows=650 
 
width=18) (actual time=2.069..83.220 rows=9999 loops=1)         Output: id, inserted, 'notsolongtext'::text, ctid
 Remote SQL: SELECT id, inserted, ctid FROM public.t WHERE ((id 
 
< 10000)) FOR UPDATE Planning time: 0.355 ms Execution time: 1470.032 ms
(7 rows)

Patched:
postgres=# explain analyze verbose update ft set data = 'notsolongtext' 
where id < 10000;                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Updateon public.ft  (cost=100.00..147.38 rows=650 width=18) (actual 
 
time=0.005..0.005 rows=0 loops=1)   ->  Foreign Scan on public.ft  (cost=100.00..147.38 rows=650 
width=18) (actual time=0.001..0.001 rows=0 loops=1)         Output: id, inserted, data, ctid         Remote SQL: UPDATE
public.tSET data = 'notsolongtext'::text 
 
WHERE ((id < 10000)) Planning time: 0.197 ms Execution time: 61.519 ms
(6 rows)

I think that the precise effect of this optimization for DELETE/UPDATE 
would depend on eg, data, queries (inc. w/ or w/o RETRUNING clauses) and 
server/network performance.  Could you tell me these information about 
the UPDATE evaluation?

> Some more random thoughts:
>
> * Naming of new behavior
> You named this optimization "Direct Update", but I'm not sure that
> this is intuitive enough to express this behavior.  I would like to
> hear opinions of native speakers.

+1

> * Macros for # of fdw_private elements
> In postgres_fdw.c you defined MaxFdwScanFdwPrivateLength and
> MinFdwScanFdwPrivateLength for determining the mode, but number of
> fdw_private elements is not a ranged value but an enum value (I mean
> that fdw_private holds 3 or 7, not 4~6, so Max and Min seems
> inappropriate for prefix.  IMO those macros should be named so that
> the names represent the behavior, or define a macro to determine the
> mode, say IS_SHIPPABLE(foo).

OK, Will fix.

> * Comparison of Macros
> Comparison against MaxFdwScanFdwPrivateLength and
> MinFdwScanFdwPrivateLength should be == instead of >= or <= to detect
> unexpected value.  Adding assert macro seems good too.

Will fix this too.

Thanks,

Best regards,
Etsuro Fujita



Re: Optimization for updating foreign tables in Postgres FDW

From
Albe Laurenz
Date:
Shigeru Hanada wrote:
> * Naming of new behavior
> You named this optimization "Direct Update", but I'm not sure that
> this is intuitive enough to express this behavior.  I would like to
> hear opinions of native speakers.

How about "batch foreign update" or "batch foreign modification"?
(Disclaimer: I'm not a native speaker either.)

Yours,
Laurenz Albe

Re: Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Fri, Jul 25, 2014 at 3:39 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
> Shigeru Hanada wrote:
>> * Naming of new behavior
>> You named this optimization "Direct Update", but I'm not sure that
>> this is intuitive enough to express this behavior.  I would like to
>> hear opinions of native speakers.
>
> How about "batch foreign update" or "batch foreign modification"?
> (Disclaimer: I'm not a native speaker either.)

I think direct update sounds pretty good.  "Batch" does not sound as
good to me, since it doesn't clearly describe what makes this patch
special as opposed to some other grouping of updates that happens to
produce a speedup.

Another term that might be used is "update pushdown", since we are
pushing the whole update to the remote server instead of having the
local server participate.  Without looking at the patch, I don't have
a strong opinion on whether that's better than "direct update" in this
context.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
(2014/07/29 0:58), Robert Haas wrote:
> On Fri, Jul 25, 2014 at 3:39 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>> Shigeru Hanada wrote:
>>> * Naming of new behavior
>>> You named this optimization "Direct Update", but I'm not sure that
>>> this is intuitive enough to express this behavior.  I would like to
>>> hear opinions of native speakers.
>>
>> How about "batch foreign update" or "batch foreign modification"?
>> (Disclaimer: I'm not a native speaker either.)
>
> I think direct update sounds pretty good.  "Batch" does not sound as
> good to me, since it doesn't clearly describe what makes this patch
> special as opposed to some other grouping of updates that happens to
> produce a speedup.

I agree with Robert on that point.

> Another term that might be used is "update pushdown", since we are
> pushing the whole update to the remote server instead of having the
> local server participate.  Without looking at the patch, I don't have
> a strong opinion on whether that's better than "direct update" in this
> context.

"Update Pushdown" is fine with me.

If there are no objections of others, I'll change the name from "Direct 
Update" to "Update Pushdown".

Thanks,

Best regards,
Etsuro Fujita



Re: Optimization for updating foreign tables in Postgres FDW

From
Pavel Stehule
Date:
Hi


2014-07-30 10:22 GMT+02:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
(2014/07/29 0:58), Robert Haas wrote:
On Fri, Jul 25, 2014 at 3:39 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Shigeru Hanada wrote:
* Naming of new behavior
You named this optimization "Direct Update", but I'm not sure that
this is intuitive enough to express this behavior.  I would like to
hear opinions of native speakers.

How about "batch foreign update" or "batch foreign modification"?
(Disclaimer: I'm not a native speaker either.)

I think direct update sounds pretty good.  "Batch" does not sound as
good to me, since it doesn't clearly describe what makes this patch
special as opposed to some other grouping of updates that happens to
produce a speedup.

I agree with Robert on that point.

Another term that might be used is "update pushdown", since we are
pushing the whole update to the remote server instead of having the
local server participate.  Without looking at the patch, I don't have
a strong opinion on whether that's better than "direct update" in this
context.

"Update Pushdown" is fine with me.

If there are no objections of others, I'll change the name from "Direct Update" to "Update Pushdown".

I like "Update Pushdown" - it is simple without other semantic

Regards

Pavel
 

Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Optimization for updating foreign tables in Postgres FDW

From
Shigeru Hanada
Date:
Hi Fujita-san,

Here is a new review result from Eitoku-san.

2014-07-25 16:30 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
> (2014/07/24 18:30), Shigeru Hanada wrote:
> I'm not sure that I understand your question correctly, but the reason for
> that is because foreign tables cannot have INSTEAD OF triggers.

Now I see the reason, but then I worry (though it unlikely happens) a
case that new trigger type might be added in future.  The code says
that "only BEFORE and AFTER triggers are unsafe for direct update",
but it would be more safe to code that "any trigger other than event
trigger is unsafe for direct update".

>> We found that this patch speeds up DELETE case remarkably, as you
>> describe above, but we saw only less than 2x speed on UPDATE cases.
>> Do you have any numbers of UPDATE cases?
>
>
> Here is the result for an UPDATE case.

Hmm, performance gain on UPDATE cases seems similar to our results,
except planning times.  In your environment the patch reduces planning
time too, but we got longer planning times with your patch (in only
once in six trial, we got shorter planning time than average of
patched version).  Could you try multiple times on your environment?

> I think that the precise effect of this optimization for DELETE/UPDATE would
> depend on eg, data, queries (inc. w/ or w/o RETRUNING clauses) and
> server/network performance.  Could you tell me these information about the
> UPDATE evaluation?

I tried on a CentOS 6.5 on VMware on a Note PC with Core i3 1.17GHz,
2.0GB memory and single HDD, so the performance is poor.

The SQLs used for performance test are quite simple, update 10
thousands rows at a time, and repeat it for different section of the
table for six times.  The definition of foreign table ft is same as
the one in your case.

EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >= 0
AND id < 10000;
EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
10000 AND id < 20000;
EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
20000 AND id < 30000;
EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
30000 AND id < 40000;
EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
40000 AND id < 50000;
EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
50000 AND id < 60000;

>> Some more random thoughts:
>>
>> * Naming of new behavior
>> You named this optimization "Direct Update", but I'm not sure that
>> this is intuitive enough to express this behavior.  I would like to
>> hear opinions of native speakers.

Update push-down seems nice with according to others.

-- 
Shigeru HANADA



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
(2014/07/30 17:22), Etsuro Fujita wrote:
> (2014/07/29 0:58), Robert Haas wrote:
>> On Fri, Jul 25, 2014 at 3:39 AM, Albe Laurenz
>> <laurenz.albe@wien.gv.at> wrote:
>>> Shigeru Hanada wrote:
>>>> * Naming of new behavior
>>>> You named this optimization "Direct Update", but I'm not sure that
>>>> this is intuitive enough to express this behavior.  I would like to
>>>> hear opinions of native speakers.
>>>
>>> How about "batch foreign update" or "batch foreign modification"?
>>> (Disclaimer: I'm not a native speaker either.)
>>
>> I think direct update sounds pretty good.  "Batch" does not sound as
>> good to me, since it doesn't clearly describe what makes this patch
>> special as opposed to some other grouping of updates that happens to
>> produce a speedup.
>
> I agree with Robert on that point.
>
>> Another term that might be used is "update pushdown", since we are
>> pushing the whole update to the remote server instead of having the
>> local server participate.  Without looking at the patch, I don't have
>> a strong opinion on whether that's better than "direct update" in this
>> context.
>
> "Update Pushdown" is fine with me.
>
> If there are no objections of others, I'll change the name from "Direct
> Update" to "Update Pushdown".

Done.  (I've left deparseDirectUpdateSql/deparseDirectDeleteSql as-is,
though.)

Other changes:

* Address the comments from Eitoku-san.
* Add regression tests.
* Fix a bug, which fails to show the actual row counts in EXPLAIN
ANALYZE for UPDATE/DELETE without a RETURNING clause.
* Rebase to HEAD.

Please find attached an updated version of the patch.

Thanks,

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
Hi Hanada-san,

Thank you for the answer.

(2014/08/04 19:36), Shigeru Hanada wrote:
> 2014-07-25 16:30 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
>> (2014/07/24 18:30), Shigeru Hanada wrote:
>> I'm not sure that I understand your question correctly, but the reason for
>> that is because foreign tables cannot have INSTEAD OF triggers.

> Now I see the reason, but then I worry (though it unlikely happens) a
> case that new trigger type might be added in future.  The code says
> that "only BEFORE and AFTER triggers are unsafe for direct update",
> but it would be more safe to code that "any trigger other than event
> trigger is unsafe for direct update".

Yeah, I've revised the comment for that in the updated version of the 
patch I sent in just now.  Could you check it?

>>> We found that this patch speeds up DELETE case remarkably, as you
>>> describe above, but we saw only less than 2x speed on UPDATE cases.
>>> Do you have any numbers of UPDATE cases?

> Hmm, performance gain on UPDATE cases seems similar to our results,
> except planning times.  In your environment the patch reduces planning
> time too, but we got longer planning times with your patch (in only
> once in six trial, we got shorter planning time than average of
> patched version).  Could you try multiple times on your environment?

No.  Is the overhead so large that it cannot be ignored?

>> I think that the precise effect of this optimization for DELETE/UPDATE would
>> depend on eg, data, queries (inc. w/ or w/o RETRUNING clauses) and
>> server/network performance.  Could you tell me these information about the
>> UPDATE evaluation?

> I tried on a CentOS 6.5 on VMware on a Note PC with Core i3 1.17GHz,
> 2.0GB memory and single HDD, so the performance is poor.
>
> The SQLs used for performance test are quite simple, update 10
> thousands rows at a time, and repeat it for different section of the
> table for six times.  The definition of foreign table ft is same as
> the one in your case.
>
> EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >= 0
> AND id < 10000;
> EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
> 10000 AND id < 20000;
> EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
> 20000 AND id < 30000;
> EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
> 30000 AND id < 40000;
> EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
> 40000 AND id < 50000;
> EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
> 50000 AND id < 60000;

OK  I also will evaluate the performance under the same workloads.

>>> Some more random thoughts:
>>>
>>> * Naming of new behavior
>>> You named this optimization "Direct Update", but I'm not sure that
>>> this is intuitive enough to express this behavior.  I would like to
>>> hear opinions of native speakers.

> Update push-down seems nice with according to others.

The name has been changed in the updated version.

Thanks,

Best regards,
Etsuro Fujita



Re: Optimization for updating foreign tables in Postgres FDW

From
Shigeru Hanada
Date:
Hi Fujita-san,

Issues addressed by Eitoku-san were fixed properly, but he found a bug
and a possible enhancement  in the v2 patch.

* push-down check misses delete triggers
update_is_pushdown_safe() seems to have a bug that it misses the
existence of row-level delete trigger.  DELETE statement executed
against a foreign table which has row-level delete trigger is pushed
down to remote, and consequently no row-level delete trigger is fired.

* further optimization
Is there any chance to consider further optimization by passing the
operation type (UPDATE|DELETE) of undergoing statement to
update_is_pushdown_safe()?  It seems safe to push down UPDATE
statement when the target foreign table has no update trigger even it
has a delete trigger (of course the opposite combination would be also
fine).

* Documentation
The requirement of pushing down UPDATE/DELETE statements would not be
easy to understand for non-expert users, so it seems that there is a
room to enhance documentation.  An idea is to define which expression
is safe to send to remote first (it might need to mention the
difference of semantics), and refer the definition from the place
describing the requirement of pushing-down for SELECT, UPDATE and
DELETE.

2014-08-04 20:30 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
> (2014/07/30 17:22), Etsuro Fujita wrote:
>>
>> (2014/07/29 0:58), Robert Haas wrote:
>>>
>>> On Fri, Jul 25, 2014 at 3:39 AM, Albe Laurenz
>>> <laurenz.albe@wien.gv.at> wrote:
>>>>
>>>> Shigeru Hanada wrote:
>>>>>
>>>>> * Naming of new behavior
>>>>> You named this optimization "Direct Update", but I'm not sure that
>>>>> this is intuitive enough to express this behavior.  I would like to
>>>>> hear opinions of native speakers.
>>>>
>>>>
>>>> How about "batch foreign update" or "batch foreign modification"?
>>>> (Disclaimer: I'm not a native speaker either.)
>>>
>>>
>>> I think direct update sounds pretty good.  "Batch" does not sound as
>>> good to me, since it doesn't clearly describe what makes this patch
>>> special as opposed to some other grouping of updates that happens to
>>> produce a speedup.
>>
>>
>> I agree with Robert on that point.
>>
>>> Another term that might be used is "update pushdown", since we are
>>> pushing the whole update to the remote server instead of having the
>>> local server participate.  Without looking at the patch, I don't have
>>> a strong opinion on whether that's better than "direct update" in this
>>> context.
>>
>>
>> "Update Pushdown" is fine with me.
>>
>> If there are no objections of others, I'll change the name from "Direct
>> Update" to "Update Pushdown".
>
>
> Done.  (I've left deparseDirectUpdateSql/deparseDirectDeleteSql as-is,
> though.)
>
> Other changes:
>
> * Address the comments from Eitoku-san.
> * Add regression tests.
> * Fix a bug, which fails to show the actual row counts in EXPLAIN ANALYZE
> for UPDATE/DELETE without a RETURNING clause.
> * Rebase to HEAD.
>
> Please find attached an updated version of the patch.
>
>
> Thanks,
>
> Best regards,
> Etsuro Fujita



-- 
Shigeru HANADA



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
(2014/08/12 18:34), Shigeru Hanada wrote:
> Issues addressed by Eitoku-san were fixed properly, but he found a bug
> and a possible enhancement  in the v2 patch.

Thank you for the review, Hanada-san and Eitoku-san!

> * push-down check misses delete triggers
> update_is_pushdown_safe() seems to have a bug that it misses the
> existence of row-level delete trigger.  DELETE statement executed
> against a foreign table which has row-level delete trigger is pushed
> down to remote, and consequently no row-level delete trigger is fired.

Ah, I noticed that the current code for that is not correct.  Will fix.

> * further optimization
> Is there any chance to consider further optimization by passing the
> operation type (UPDATE|DELETE) of undergoing statement to
> update_is_pushdown_safe()?  It seems safe to push down UPDATE
> statement when the target foreign table has no update trigger even it
> has a delete trigger (of course the opposite combination would be also
> fine).

Good idea!  Will improve that too.

> * Documentation
> The requirement of pushing down UPDATE/DELETE statements would not be
> easy to understand for non-expert users, so it seems that there is a
> room to enhance documentation.  An idea is to define which expression
> is safe to send to remote first (it might need to mention the
> difference of semantics), and refer the definition from the place
> describing the requirement of pushing-down for SELECT, UPDATE and
> DELETE.

Yeah, I also think that it would not necessarily easy for the users to 
understand which expression is safe to send.  So I agree with that 
enhancement, but ISTM that it would be better to do that as a separate 
patch.

Thanks,

Best regards,
Etsuro Fujita



Re: Optimization for updating foreign tables in Postgres FDW

From
Albe Laurenz
Date:
Etsuro Fujita wrote:
> Done.  (I've left deparseDirectUpdateSql/deparseDirectDeleteSql as-is,
> though.)
> 
> Other changes:
> 
> * Address the comments from Eitoku-san.
> * Add regression tests.
> * Fix a bug, which fails to show the actual row counts in EXPLAIN
> ANALYZE for UPDATE/DELETE without a RETURNING clause.
> * Rebase to HEAD.
> 
> Please find attached an updated version of the patch.

Here is my review:

The patch Applies fine, Builds without warning and passes make Check,
so the ABC of patch reviewing is fine.

I played with it, and apart from Hanada's comments I have found the following:

test=> EXPLAIN (ANALYZE, VERBOSE) UPDATE rtest SET val=NULL WHERE id > 3;
           QUERY PLAN
 

----------------------------------------------------------------------------------------------------------------------------------Update
onlaurenz.rtest  (cost=100.00..14134.40 rows=299970 width=10) (actual time=0.005..0.005 rows=0 loops=1)  ->  Foreign
Scanon laurenz.rtest  (cost=100.00..14134.40 rows=299970 width=10) (actual time=0.002..0.002 rows=299997 loops=1)
Output: id, val, ctid        Remote SQL: UPDATE laurenz.test SET val = NULL::text WHERE ((id > 3))Planning time: 0.179
msExecutiontime: 3706.919 ms
 
(6 rows)

Time: 3708.272 ms

The "actual time" readings are surprising.
Shouldn't these similar to the actual execution time, since most of the time is spent
in the foreign scan node?

Reading the code, I noticed that the pushed down UPDATE or DELETE statement is executed
during postgresBeginForeignScan rather than during postgresIterateForeignScan.
It probably does not matter, but is there a reason to do it different from the normal scan?

It is not expected that postgresReScanForeignScan is called when the UPDATE/DELETE
is pushed down, right?  Maybe it would make sense to add an assertion for that.

I ran a simple performance test and found that performance is improved as expected;
updating 100000 rows took 1000 rather than 8000 ms, and DELETING the same amount
took 200 instead of 6500 ms.

Yours,
Laurenz Albe

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
(2014/08/25 21:58), Albe Laurenz wrote:
> Here is my review:

Thank you for the review!

> I played with it, and apart from Hanada's comments I have found the following:
>
> test=> EXPLAIN (ANALYZE, VERBOSE) UPDATE rtest SET val=NULL WHERE id > 3;
>                                                              QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------
>   Update on laurenz.rtest  (cost=100.00..14134.40 rows=299970 width=10) (actual time=0.005..0.005 rows=0 loops=1)
>     ->  Foreign Scan on laurenz.rtest  (cost=100.00..14134.40 rows=299970 width=10) (actual time=0.002..0.002
rows=299997loops=1)
 
>           Output: id, val, ctid
>           Remote SQL: UPDATE laurenz.test SET val = NULL::text WHERE ((id > 3))
>   Planning time: 0.179 ms
>   Execution time: 3706.919 ms
> (6 rows)
>
> Time: 3708.272 ms
>
> The "actual time" readings are surprising.
> Shouldn't these similar to the actual execution time, since most of the time is spent
> in the foreign scan node?

I was also thinkng that this is confusing to the users.  I think this is 
because the patch executes the UPDATE/DELETE statement during 
postgresBeginForeignScan, not postgresIterateForeignScan, as you 
mentioned below:

> Reading the code, I noticed that the pushed down UPDATE or DELETE statement is executed
> during postgresBeginForeignScan rather than during postgresIterateForeignScan.
> It probably does not matter, but is there a reason to do it different from the normal scan?

I'll modify the patch so as to execute the statement during 
postgresIterateForeignScan.

> It is not expected that postgresReScanForeignScan is called when the UPDATE/DELETE
> is pushed down, right?  Maybe it would make sense to add an assertion for that.

IIUC, that is right.  As ModifyTable doesn't support rescan currently, 
postgresReScanForeignScan needn't to be called in the update pushdown 
case.  The assertion is a good idea.  I'll add it.

Thanks,

Best regards,
Etsuro Fujita



Re: Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Mon, Aug 25, 2014 at 8:58 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
> Reading the code, I noticed that the pushed down UPDATE or DELETE statement is executed
> during postgresBeginForeignScan rather than during postgresIterateForeignScan.
> It probably does not matter, but is there a reason to do it different from the normal scan?

Hmm, I'm worried that may be an API contract violation.  ISTM that we
might initialize nodes that we never read from - they can show up in
the EXPLAIN-plan as (never executed) - and things that aren't executed
shouldn't do work, especially work that permanently modifies data.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Aug 25, 2014 at 8:58 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>> Reading the code, I noticed that the pushed down UPDATE or DELETE statement is executed
>> during postgresBeginForeignScan rather than during postgresIterateForeignScan.
>> It probably does not matter, but is there a reason to do it different from the normal scan?

> Hmm, I'm worried that may be an API contract violation.

Indeed it is.  You could get away with it if you check the
EXEC_FLAG_EXPLAIN_ONLY flag before doing anything with visible
side-effects, but it's still pretty ugly.
        regards, tom lane



Re: Optimization for updating foreign tables in Postgres FDW

From
Tom Lane
Date:
I wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Hmm, I'm worried that may be an API contract violation.

> Indeed it is.  You could get away with it if you check the
> EXEC_FLAG_EXPLAIN_ONLY flag before doing anything with visible
> side-effects, but it's still pretty ugly.

Actually, there's another problem there.  What of UPDATE or DELETE with a
LIMIT clause, which is something that seems to be coming down the pike:
https://commitfest.postgresql.org/action/patch_view?id=1550
        regards, tom lane



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
(2014/08/27 22:56), Robert Haas wrote:
> On Mon, Aug 25, 2014 at 8:58 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>> Reading the code, I noticed that the pushed down UPDATE or DELETE statement is executed
>> during postgresBeginForeignScan rather than during postgresIterateForeignScan.
>> It probably does not matter, but is there a reason to do it different from the normal scan?
>
> Hmm, I'm worried that may be an API contract violation.

Will fix.

Thanks,

Best regards,
Etsuro Fujita



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
(2014/08/27 23:05), Tom Lane wrote:
> I wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> Hmm, I'm worried that may be an API contract violation.

> Actually, there's another problem there.  What of UPDATE or DELETE with a
> LIMIT clause, which is something that seems to be coming down the pike:
> https://commitfest.postgresql.org/action/patch_view?id=1550

I'd like to try to extend the functionality so as to push
UPDATE/DELETE-with-LIMIT down into the remote server if it's safe.  But
I don't yet know if it's possible, because I started looking into the
UPDATE/DELETE-with-LIMIT patch just now ...

Thanks,

Best regards,
Etsuro Fujita



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
(2014/08/13 12:40), Etsuro Fujita wrote:
> (2014/08/12 18:34), Shigeru Hanada wrote:
>> Issues addressed by Eitoku-san were fixed properly, but he found a bug
>> and a possible enhancement  in the v2 patch.

>> * push-down check misses delete triggers
>> update_is_pushdown_safe() seems to have a bug that it misses the
>> existence of row-level delete trigger.  DELETE statement executed
>> against a foreign table which has row-level delete trigger is pushed
>> down to remote, and consequently no row-level delete trigger is fired.
>
> Ah, I noticed that the current code for that is not correct.  Will fix.

Done.

>> * further optimization
>> Is there any chance to consider further optimization by passing the
>> operation type (UPDATE|DELETE) of undergoing statement to
>> update_is_pushdown_safe()?  It seems safe to push down UPDATE
>> statement when the target foreign table has no update trigger even it
>> has a delete trigger (of course the opposite combination would be also
>> fine).
>
> Good idea!  Will improve that too.

Done.

>> * Documentation
>> The requirement of pushing down UPDATE/DELETE statements would not be
>> easy to understand for non-expert users, so it seems that there is a
>> room to enhance documentation.  An idea is to define which expression
>> is safe to send to remote first (it might need to mention the
>> difference of semantics), and refer the definition from the place
>> describing the requirement of pushing-down for SELECT, UPDATE and
>> DELETE.
>
> Yeah, I also think that it would not necessarily easy for the users to
> understand which expression is safe to send.  So I agree with that
> enhancement, but ISTM that it would be better to do that as a separate
> patch.

As above, I'd like to leave this as another patch.

Please find attached the updated version of the patch.

Thanks,

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
(2014/08/26 12:20), Etsuro Fujita wrote:
> (2014/08/25 21:58), Albe Laurenz wrote:
>> I played with it, and apart from Hanada's comments I have found the
>> following:
>>
>> test=> EXPLAIN (ANALYZE, VERBOSE) UPDATE rtest SET val=NULL WHERE id > 3;
>>                                                              QUERY PLAN
>>
----------------------------------------------------------------------------------------------------------------------------------
>>
>>   Update on laurenz.rtest  (cost=100.00..14134.40 rows=299970
>> width=10) (actual time=0.005..0.005 rows=0 loops=1)
>>     ->  Foreign Scan on laurenz.rtest  (cost=100.00..14134.40
>> rows=299970 width=10) (actual time=0.002..0.002 rows=299997 loops=1)
>>           Output: id, val, ctid
>>           Remote SQL: UPDATE laurenz.test SET val = NULL::text WHERE
>> ((id > 3))
>>   Planning time: 0.179 ms
>>   Execution time: 3706.919 ms
>> (6 rows)
>>
>> Time: 3708.272 ms
>>
>> The "actual time" readings are surprising.
>> Shouldn't these similar to the actual execution time, since most of
>> the time is spent
>> in the foreign scan node?
>
> I was also thinkng that this is confusing to the users.  I think this is
> because the patch executes the UPDATE/DELETE statement during
> postgresBeginForeignScan, not postgresIterateForeignScan, as you
> mentioned below:
>
>> Reading the code, I noticed that the pushed down UPDATE or DELETE
>> statement is executed
>> during postgresBeginForeignScan rather than during
>> postgresIterateForeignScan.

> I'll modify the patch so as to execute the statement during
> postgresIterateForeignScan.

Done.

>> It is not expected that postgresReScanForeignScan is called when the
>> UPDATE/DELETE
>> is pushed down, right?  Maybe it would make sense to add an assertion
>> for that.
>
> IIUC, that is right.  As ModifyTable doesn't support rescan currently,
> postgresReScanForeignScan needn't to be called in the update pushdown
> case.  The assertion is a good idea.  I'll add it.

Done.

You can find the updated version of the patch at

http://www.postgresql.org/message-id/53FFFA50.6020007@lab.ntt.co.jp

Thanks,

Best regards,
Etsuro Fujita



Re: Optimization for updating foreign tables in Postgres FDW

From
Shigeru Hanada
Date:
I confirmed performance gain accomplished by this patch.

This patch makes update queries ~50x faster, and even hit-miss update
is 3x faster than original.  Of course benefit is only for queries
whose qualifiers are enough simple so that they can be pushied down
fully, but this improvement is remarkable.

This patch avoids 1) SELECT for determining target rows, and 2)
repeated per-row UPDATE/DELETE in particular situation, so I assumed
that the gain is larger for bulk update, and it's true indeed, but in
fact even hit-miss update (0 row affected) become faster enough.  This
would come from the omission of SELECT preceding repeated
UPDATE/DELETE.

I was little worried about overhead in planning phase, but fluctuation
was less than 1ms, so it's negligible.

Measurement Result
==================

Note: numbers below are "execution time" of EXPLAIN ANALYZE, and
average of five runs
--------------+-------------+-----------+--------
rows affected |    original |   patched |   gain
--------------+-------------+-----------+--------           0 |       4.841 |     1.548 |  3.13x           1 |
6.944|     1.793 |  3.87x         100 |     174.420 |     5.167 | 33.76x      10,000 |   8,215.551 |   163.832 | 50.15x
   100,000 |  78,135.905 | 1,595.739 | 48.97x     200,000 | 179,784.928 | 4,305.856 | 41.75x
 
--------------+-------------+-----------+--------

Measurement procedure
=====================

[Local side]
1) Create foreign table which refers pgbench_accounts on the remote side

[Remote side]
2) pgbench -i -s 100
3) Execute ANALYZE
4) Restart PostgreSQL to clear shared buffers

[Local side]
5) Execute ANALYZE against foreign table
6) Execute UPDATE SQL against foreign table once for warm the cache
7) Execute UPDATE SQL against foreign table five times

Test SQL for 10000-rows cas is below, only aid condition is changed
according to measurement variation.

EXPLAIN ANALYZE VERBOSE UPDATE ft_pgbench_accounts SET bid=bid+1,
abalance=abalance+1, filler='update test' WHERE aid<=10000;

2014-08-29 12:59 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
> (2014/08/26 12:20), Etsuro Fujita wrote:
>>
>> (2014/08/25 21:58), Albe Laurenz wrote:
>>>
>>> I played with it, and apart from Hanada's comments I have found the
>>> following:
>>>
>>> test=> EXPLAIN (ANALYZE, VERBOSE) UPDATE rtest SET val=NULL WHERE id > 3;
>>>                                                              QUERY PLAN
>>>
>>>
----------------------------------------------------------------------------------------------------------------------------------
>>>
>>>   Update on laurenz.rtest  (cost=100.00..14134.40 rows=299970
>>> width=10) (actual time=0.005..0.005 rows=0 loops=1)
>>>     ->  Foreign Scan on laurenz.rtest  (cost=100.00..14134.40
>>> rows=299970 width=10) (actual time=0.002..0.002 rows=299997 loops=1)
>>>           Output: id, val, ctid
>>>           Remote SQL: UPDATE laurenz.test SET val = NULL::text WHERE
>>> ((id > 3))
>>>   Planning time: 0.179 ms
>>>   Execution time: 3706.919 ms
>>> (6 rows)
>>>
>>> Time: 3708.272 ms
>>>
>>> The "actual time" readings are surprising.
>>> Shouldn't these similar to the actual execution time, since most of
>>> the time is spent
>>> in the foreign scan node?
>>
>>
>> I was also thinkng that this is confusing to the users.  I think this is
>> because the patch executes the UPDATE/DELETE statement during
>> postgresBeginForeignScan, not postgresIterateForeignScan, as you
>> mentioned below:
>>
>>> Reading the code, I noticed that the pushed down UPDATE or DELETE
>>> statement is executed
>>> during postgresBeginForeignScan rather than during
>>> postgresIterateForeignScan.
>
>
>> I'll modify the patch so as to execute the statement during
>> postgresIterateForeignScan.
>
>
> Done.
>
>
>>> It is not expected that postgresReScanForeignScan is called when the
>>> UPDATE/DELETE
>>> is pushed down, right?  Maybe it would make sense to add an assertion
>>> for that.
>>
>>
>> IIUC, that is right.  As ModifyTable doesn't support rescan currently,
>> postgresReScanForeignScan needn't to be called in the update pushdown
>> case.  The assertion is a good idea.  I'll add it.
>
>
> Done.
>
> You can find the updated version of the patch at
>
> http://www.postgresql.org/message-id/53FFFA50.6020007@lab.ntt.co.jp
>
>
> Thanks,
>
> Best regards,
> Etsuro Fujita



-- 
Shigeru HANADA



Re: Optimization for updating foreign tables in Postgres FDW

From
Albe Laurenz
Date:
Etsuro Fujita wrote:
> Please find attached the updated version of the patch.

I gave it a spin and could not find any undesirable behaviour, and the
output of EXPLAIN ANALYZE looks like I'd expect.

I noticed that you use the list length of fdw_private to check if
the UPDATE or DELETE is pushed down to the remote server or not.

While this works fine, I wonder if it wouldn't be better to have some
explicit flag in fdw_private for that purpose.  Future modifications that
change the list length might easily overlook that it is used for this
purpose, thereby breaking the code.

Other than that it looks alright to me.

Yours,
Laurenz Albe 

Re: Optimization for updating foreign tables in Postgres FDW

From
Albe Laurenz
Date:
I wrote:
> I gave it a spin and could not find any undesirable behaviour, and the
> output of EXPLAIN ANALYZE looks like I'd expect.
> 
> I noticed that you use the list length of fdw_private to check if
> the UPDATE or DELETE is pushed down to the remote server or not.
> 
> While this works fine, I wonder if it wouldn't be better to have some
> explicit flag in fdw_private for that purpose.  Future modifications that
> change the list length might easily overlook that it is used for this
> purpose, thereby breaking the code.
> 
> Other than that it looks alright to me.

Maybe I should have mentioned that I have set the patch to "Waiting for Author"
because I'd like to hear your opinion on that, but I'm prepared to set it
to "Ready for Committer" soon.

Yours,
Laurenz Albe

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
(2014/09/08 16:18), Albe Laurenz wrote:
> I wrote:
>> I gave it a spin and could not find any undesirable behaviour, and the
>> output of EXPLAIN ANALYZE looks like I'd expect.

Thank you for the review!

>> I noticed that you use the list length of fdw_private to check if
>> the UPDATE or DELETE is pushed down to the remote server or not.
>>
>> While this works fine, I wonder if it wouldn't be better to have some
>> explicit flag in fdw_private for that purpose.  Future modifications that
>> change the list length might easily overlook that it is used for this
>> purpose, thereby breaking the code.

>> Other than that it looks alright to me.

> Maybe I should have mentioned that I have set the patch to "Waiting for Author"
> because I'd like to hear your opinion on that, but I'm prepared to set it
> to "Ready for Committer" soon.

I agree with you on that point.  So, I've updated the patch to have the
explicit flag, as you proposed.  Attached is the updated version of the
patch.  In this version, I've also revised code and its comments a bit.

Sorry for the delay.

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Albe Laurenz
Date:
Etsuro Fujita wrote:
> I agree with you on that point.  So, I've updated the patch to have the
> explicit flag, as you proposed.  Attached is the updated version of the
> patch.  In this version, I've also revised code and its comments a bit.

Thank you, I have set the patch to "Ready for Committer".

Yours,
Laurenz Albe

Re: Optimization for updating foreign tables in Postgres FDW

From
Stephen Frost
Date:
* Albe Laurenz (laurenz.albe@wien.gv.at) wrote:
> Etsuro Fujita wrote:
> > I agree with you on that point.  So, I've updated the patch to have the
> > explicit flag, as you proposed.  Attached is the updated version of the
> > patch.  In this version, I've also revised code and its comments a bit.
>
> Thank you, I have set the patch to "Ready for Committer".

I had a few minutes, so I started looking at this patch and I definitely
like where it's going.  One concern which was brought up that I didn't
see completely addressed was around UPDATE/DELETE with LIMIT, which
seems to be making progress towards getting in.  Presumably, we'd simply
disallow this optimization in that case, but we'll need a way to
identify that case..

I have to admit that, while I applaud the effort made to have this
change only be to postgres_fdw, I'm not sure that having the
update/delete happening during the Scan phase and then essentially
no-op'ing the ExecForeignUpdate/ExecForeignDelete is entirely in-line
with the defined API.

I would have thought we'd add a capability check function that says "can
this Modify be completely pushed down" and then have a way for that to
happen in ExecForeignUpdate/ExecForeignDelete.  That means changes to
the existing API, of course, and if people feel that's unnecessary then
I'd suggest that we need to at least document that we're willing to
support these bulk operations happening on the remote siude during the
pre-Modify Scan and not during the ExecForeignUpdate/ExecForeignDelete.
Thanks,
    Stephen

Re: Optimization for updating foreign tables in Postgres FDW

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> I have to admit that, while I applaud the effort made to have this
> change only be to postgres_fdw, I'm not sure that having the
> update/delete happening during the Scan phase and then essentially
> no-op'ing the ExecForeignUpdate/ExecForeignDelete is entirely in-line
> with the defined API.

Yeah, I've started looking at this patch and that seemed like not
necessarily such a wise choice.  I think it'd be better if the generated
plan tree had a different structure in this case.  The existing approach
is an impressive hack but it's hard to call it anything but a hack.

I'm not sure offhand what the new plan tree ought to look like.  We could
just generate a ForeignScan node, but that seems like rather a misnomer.
Is it worth inventing a new ForeignUpdate node type?  Or maybe it'd still
be ForeignScan but with a flag field saying "hey this is really an update
(or a delete)".  The main benefit I can think of right now is that the
EXPLAIN output would be less strange-looking --- but EXPLAIN is hardly
the only thing that ever looks at plan trees, so having an outright
misleading plan structure is likely to burn us down the line.

One advantage of getting the core code involved in the decision about
whether an update/delete can be pushed down is that FDW-independent checks
like whether there are relevant triggers could be implemented in the core
code, rather than having to duplicate them (and later maintain them) in
every FDW that wants to do this.  OTOH, maybe the trigger issue is really
the only thing that could be shared, not sure.  Stuff like "is there a
LIMIT" probably has to be in the FDW since some FDWs could support pushing
down LIMIT and others not.
        regards, tom lane



Re: Optimization for updating foreign tables in Postgres FDW

From
Stephen Frost
Date:
Tom, all,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Yeah, I've started looking at this patch and that seemed like not
> necessarily such a wise choice.  I think it'd be better if the generated
> plan tree had a different structure in this case.  The existing approach
> is an impressive hack but it's hard to call it anything but a hack.

Agreed.

> I'm not sure offhand what the new plan tree ought to look like.  We could
> just generate a ForeignScan node, but that seems like rather a misnomer.
> Is it worth inventing a new ForeignUpdate node type?  Or maybe it'd still
> be ForeignScan but with a flag field saying "hey this is really an update
> (or a delete)".  The main benefit I can think of right now is that the
> EXPLAIN output would be less strange-looking --- but EXPLAIN is hardly
> the only thing that ever looks at plan trees, so having an outright
> misleading plan structure is likely to burn us down the line.

My initial reaction is that a ForeignUpdate node makes the most sense.
That said, I've not invested any time trying to flesh that out, so take
my gut feeling for what it's worth.

> One advantage of getting the core code involved in the decision about
> whether an update/delete can be pushed down is that FDW-independent checks
> like whether there are relevant triggers could be implemented in the core
> code, rather than having to duplicate them (and later maintain them) in
> every FDW that wants to do this.  OTOH, maybe the trigger issue is really
> the only thing that could be shared, not sure.  Stuff like "is there a
> LIMIT" probably has to be in the FDW since some FDWs could support pushing
> down LIMIT and others not.

Yeah, agreed.  I feel like we'll definitely want to support pushing down
LIMIT and it'll depend on the FDW as to if it can handle that or not
(and how it handles it, if it does).  My thinking was actually that we'd
ask the FDW if it supports pushing down the UPDATE or DELETE statement,
as I suspect not all FDWs will want to answer that question the same way
for every case.
    Thanks,
        Stephen

Re: Optimization for updating foreign tables in Postgres FDW

From
Albe Laurenz
Date:
Tom Lane wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>> I have to admit that, while I applaud the effort made to have this
>> change only be to postgres_fdw, I'm not sure that having the
>> update/delete happening during the Scan phase and then essentially
>> no-op'ing the ExecForeignUpdate/ExecForeignDelete is entirely in-line
>> with the defined API.
> 
> Yeah, I've started looking at this patch and that seemed like not
> necessarily such a wise choice.  I think it'd be better if the generated
> plan tree had a different structure in this case.  The existing approach
> is an impressive hack but it's hard to call it anything but a hack.

I guess that the idea is inspired by this comment on postgres_fdw.c:
* Note: currently, the plan tree generated for UPDATE/DELETE will always* include a ForeignScan that retrieves ctids
(usingSELECT FOR UPDATE)* and then the ModifyTable node will have to execute individual remote* UPDATE/DELETE commands.
If there are no local conditions or joins* needed, it'd be better to let the scan node do UPDATE/DELETE RETURNING* and
thendo nothing at ModifyTable.  Room for future optimization ...
 

> I'm not sure offhand what the new plan tree ought to look like.  We could
> just generate a ForeignScan node, but that seems like rather a misnomer.
> Is it worth inventing a new ForeignUpdate node type?  Or maybe it'd still
> be ForeignScan but with a flag field saying "hey this is really an update
> (or a delete)".  The main benefit I can think of right now is that the
> EXPLAIN output would be less strange-looking --- but EXPLAIN is hardly
> the only thing that ever looks at plan trees, so having an outright
> misleading plan structure is likely to burn us down the line.

I can understand these qualms.
I wonder if "ForeignUpdate" is such a good name though, since it would
surprise the uninitiate that in the regular (no push-down) case the
actual modification is *not* performed by ForeignUpdate.
So it should rather be a "ForeignModifyingScan", but I personally would
prefer a "has_side_effects" flag on ForeignScan.

> One advantage of getting the core code involved in the decision about
> whether an update/delete can be pushed down is that FDW-independent checks
> like whether there are relevant triggers could be implemented in the core
> code, rather than having to duplicate them (and later maintain them) in
> every FDW that wants to do this.  OTOH, maybe the trigger issue is really
> the only thing that could be shared, not sure.  Stuff like "is there a
> LIMIT" probably has to be in the FDW since some FDWs could support pushing
> down LIMIT and others not.

You are right, the gain would probably be limited.

Yours,
Laurenz Albe

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
(2014/09/12 16:03), Albe Laurenz wrote:
> Tom Lane wrote:
>> Stephen Frost <sfrost@snowman.net> writes:
>>> I have to admit that, while I applaud the effort made to have this
>>> change only be to postgres_fdw, I'm not sure that having the
>>> update/delete happening during the Scan phase and then essentially
>>> no-op'ing the ExecForeignUpdate/ExecForeignDelete is entirely in-line
>>> with the defined API.
>>
>> Yeah, I've started looking at this patch and that seemed like not
>> necessarily such a wise choice.  I think it'd be better if the generated
>> plan tree had a different structure in this case.  The existing approach
>> is an impressive hack but it's hard to call it anything but a hack.

Thank you for the review, Tom and Stephen!

> I guess that the idea is inspired by this comment on postgres_fdw.c:
>
>   * Note: currently, the plan tree generated for UPDATE/DELETE will always
>   * include a ForeignScan that retrieves ctids (using SELECT FOR UPDATE)
>   * and then the ModifyTable node will have to execute individual remote
>   * UPDATE/DELETE commands.  If there are no local conditions or joins
>   * needed, it'd be better to let the scan node do UPDATE/DELETE RETURNING
>   * and then do nothing at ModifyTable.  Room for future optimization ...

That's right.  Thanks, Laurenz!

And in addition to that, I've created the patch with the conscious aim 
of not getting the core code involved, because I was thinking this is 
just an optimization.  But I have to admit I was conscious of that too much.

>> I'm not sure offhand what the new plan tree ought to look like.  We could
>> just generate a ForeignScan node, but that seems like rather a misnomer.
>> Is it worth inventing a new ForeignUpdate node type?  Or maybe it'd still
>> be ForeignScan but with a flag field saying "hey this is really an update
>> (or a delete)".  The main benefit I can think of right now is that the
>> EXPLAIN output would be less strange-looking --- but EXPLAIN is hardly
>> the only thing that ever looks at plan trees, so having an outright
>> misleading plan structure is likely to burn us down the line.
>
> I can understand these qualms.
> I wonder if "ForeignUpdate" is such a good name though, since it would
> surprise the uninitiate that in the regular (no push-down) case the
> actual modification is *not* performed by ForeignUpdate.
> So it should rather be a "ForeignModifyingScan", but I personally would
> prefer a "has_side_effects" flag on ForeignScan.

+1 for improving the EXPLAIN output by inventing a plan tree with a 
different structure in this case, in general.

>> One advantage of getting the core code involved in the decision about
>> whether an update/delete can be pushed down is that FDW-independent checks
>> like whether there are relevant triggers could be implemented in the core
>> code, rather than having to duplicate them (and later maintain them) in
>> every FDW that wants to do this.

Good idea!

>> Stuff like "is there a
>> LIMIT" probably has to be in the FDW since some FDWs could support pushing
>> down LIMIT and others not.

That's what I have in mind.  I'll work on that at the next CF.

Thanks,

Best regards,
Etsuro Fujita



Re: Optimization for updating foreign tables in Postgres FDW

From
Tom Lane
Date:
Albe Laurenz <laurenz.albe@wien.gv.at> writes:
> Tom Lane wrote:
>> I'm not sure offhand what the new plan tree ought to look like.  We could
>> just generate a ForeignScan node, but that seems like rather a misnomer.
>> Is it worth inventing a new ForeignUpdate node type?  Or maybe it'd still
>> be ForeignScan but with a flag field saying "hey this is really an update
>> (or a delete)".  The main benefit I can think of right now is that the
>> EXPLAIN output would be less strange-looking --- but EXPLAIN is hardly
>> the only thing that ever looks at plan trees, so having an outright
>> misleading plan structure is likely to burn us down the line.

> I can understand these qualms.
> I wonder if "ForeignUpdate" is such a good name though, since it would
> surprise the uninitiate that in the regular (no push-down) case the
> actual modification is *not* performed by ForeignUpdate.
> So it should rather be a "ForeignModifyingScan", but I personally would
> prefer a "has_side_effects" flag on ForeignScan.

I was envisioning that the EXPLAIN output would look like
       Foreign Scan on tab1         Remote SQL: SELECT ...

for the normal case, versus
       Foreign Update on tab1         Remote SQL: UPDATE ...

for the pushed-down-update case (and similarly for DELETE).  For a
non-optimized update it'd still be a ForeignScan underneath a ModifyTable.

As for the internal representation, I was thinking of adding a CmdType
field to struct ForeignScan, with currently only CMD_SELECT, CMD_UPDATE,
CMD_DELETE as allowed values, though possibly in future we'd think of a
reason to allow CMD_INSERT there.  This is more or less isomorphic to your
"has_side_effects" flag, but it allows distinguishing UPDATE from DELETE
which might be useful.

The only thing that's bothering me about this concept is that I'm not
seeing how to scale it up to handling a pushed-down update on a join,
ie, "UPDATE foo ... FROM bar ..." where both foo and bar are remote.
Maybe it's silly to worry about that until join push-down is done;
but in that case I'd vote for postponing this whole patch until we
have join push-down.
        regards, tom lane



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
(2014/09/13 0:13), Tom Lane wrote:
> Albe Laurenz <laurenz.albe@wien.gv.at> writes:
>> Tom Lane wrote:
>>> I'm not sure offhand what the new plan tree ought to look like.  We could
>>> just generate a ForeignScan node, but that seems like rather a misnomer.
>>> Is it worth inventing a new ForeignUpdate node type?  Or maybe it'd still
>>> be ForeignScan but with a flag field saying "hey this is really an update
>>> (or a delete)".  The main benefit I can think of right now is that the
>>> EXPLAIN output would be less strange-looking --- but EXPLAIN is hardly
>>> the only thing that ever looks at plan trees, so having an outright
>>> misleading plan structure is likely to burn us down the line.

> I was envisioning that the EXPLAIN output would look like
> 
>          Foreign Scan on tab1
>            Remote SQL: SELECT ...
> 
> for the normal case, versus
> 
>          Foreign Update on tab1
>            Remote SQL: UPDATE ...
> 
> for the pushed-down-update case (and similarly for DELETE).  For a
> non-optimized update it'd still be a ForeignScan underneath a ModifyTable.

> As for the internal representation, I was thinking of adding a CmdType
> field to struct ForeignScan, with currently only CMD_SELECT, CMD_UPDATE,
> CMD_DELETE as allowed values, though possibly in future we'd think of a
> reason to allow CMD_INSERT there.

+1

> The only thing that's bothering me about this concept is that I'm not
> seeing how to scale it up to handling a pushed-down update on a join,
> ie, "UPDATE foo ... FROM bar ..." where both foo and bar are remote.
> Maybe it's silly to worry about that until join push-down is done;
> but in that case I'd vote for postponing this whole patch until we
> have join push-down.

OK

Thanks,

PS: I'll help Hanada-san do the work if there is anything I can do.

Best regards,
Etsuro Fujita



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2014/09/13 0:13, Tom Lane wrote:
> Albe Laurenz <laurenz.albe@wien.gv.at> writes:
>> Tom Lane wrote:
>>> I'm not sure offhand what the new plan tree ought to look like.  We could
>>> just generate a ForeignScan node, but that seems like rather a misnomer.
>>> Is it worth inventing a new ForeignUpdate node type?  Or maybe it'd still
>>> be ForeignScan but with a flag field saying "hey this is really an update
>>> (or a delete)".  The main benefit I can think of right now is that the
>>> EXPLAIN output would be less strange-looking --- but EXPLAIN is hardly
>>> the only thing that ever looks at plan trees, so having an outright
>>> misleading plan structure is likely to burn us down the line.
> 
>> I can understand these qualms.
>> I wonder if "ForeignUpdate" is such a good name though, since it would
>> surprise the uninitiate that in the regular (no push-down) case the
>> actual modification is *not* performed by ForeignUpdate.
>> So it should rather be a "ForeignModifyingScan", but I personally would
>> prefer a "has_side_effects" flag on ForeignScan.
> 
> I was envisioning that the EXPLAIN output would look like
> 
>          Foreign Scan on tab1
>            Remote SQL: SELECT ...
> 
> for the normal case, versus
> 
>          Foreign Update on tab1
>            Remote SQL: UPDATE ...
> 
> for the pushed-down-update case (and similarly for DELETE).  For a
> non-optimized update it'd still be a ForeignScan underneath a ModifyTable.
> 
> As for the internal representation, I was thinking of adding a CmdType
> field to struct ForeignScan, with currently only CMD_SELECT, CMD_UPDATE,
> CMD_DELETE as allowed values, though possibly in future we'd think of a
> reason to allow CMD_INSERT there.  This is more or less isomorphic to your
> "has_side_effects" flag, but it allows distinguishing UPDATE from DELETE
> which might be useful.
> 
> The only thing that's bothering me about this concept is that I'm not
> seeing how to scale it up to handling a pushed-down update on a join,
> ie, "UPDATE foo ... FROM bar ..." where both foo and bar are remote.
> Maybe it's silly to worry about that until join push-down is done;
> but in that case I'd vote for postponing this whole patch until we
> have join push-down.

I'll re-add this to the final CF.  And I'll update the patch.

Best regards,
Etsuro Fujita



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2015/02/16 12:03, Etsuro Fujita wrote:
> I'll update the patch.

While updating the patch, I noticed that in the previous patch, there is
a bug in pushing down parameterized UPDATE/DELETE queries; generic plans
for such queries fail with a can't-happen error.  I fixed the bug and
tried to add the regression tests that execute the generic plans, but I
couldn't because I can't figure out how to force generic plans.  Is
there any way to do that?

Best regards,
Etsuro Fujita



Re: Optimization for updating foreign tables in Postgres FDW

From
Albe Laurenz
Date:
Etsuro Fujita wrote:
> While updating the patch, I noticed that in the previous patch, there is
> a bug in pushing down parameterized UPDATE/DELETE queries; generic plans
> for such queries fail with a can't-happen error.  I fixed the bug and
> tried to add the regression tests that execute the generic plans, but I
> couldn't because I can't figure out how to force generic plans.  Is
> there any way to do that?

I don't know about a way to force it, but if you run the statement six
times, it will probably switch to a generic plan.

Yours,
Laurenz Albe

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2015/03/04 16:58, Albe Laurenz wrote:
> Etsuro Fujita wrote:
>> While updating the patch, I noticed that in the previous patch, there is
>> a bug in pushing down parameterized UPDATE/DELETE queries; generic plans
>> for such queries fail with a can't-happen error.  I fixed the bug and
>> tried to add the regression tests that execute the generic plans, but I
>> couldn't because I can't figure out how to force generic plans.  Is
>> there any way to do that?
>
> I don't know about a way to force it, but if you run the statement six
> times, it will probably switch to a generic plan.

Yeah, I was just thinking running the statement six times in the 
regression tests ...

Thanks!

Best regards,
Etsuro Fujita



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2015/03/04 17:07, Etsuro Fujita wrote:
> On 2015/03/04 16:58, Albe Laurenz wrote:
>> Etsuro Fujita wrote:
>>> While updating the patch, I noticed that in the previous patch, there is
>>> a bug in pushing down parameterized UPDATE/DELETE queries; generic plans
>>> for such queries fail with a can't-happen error.  I fixed the bug and
>>> tried to add the regression tests that execute the generic plans, but I
>>> couldn't because I can't figure out how to force generic plans.  Is
>>> there any way to do that?
>>
>> I don't know about a way to force it, but if you run the statement six
>> times, it will probably switch to a generic plan.
>
> Yeah, I was just thinking running the statement six times in the
> regression tests ...

Here is an updated version.  In this version, the bug has been fixed,
but any regression tests for that hasn't been added, because I'm not
sure that the above way is a good idea and don't have any other ideas.

The EXPLAIN output has also been improved as discussed in [1].

On top of this, I'll try to extend the join push-down patch to support a
pushed-down update on a join, though I'm still digesting the series of
patches.

Comments welcome.

Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/31942.1410534785@sss.pgh.pa.us

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2015/03/05 21:08, Etsuro Fujita wrote:
> Here is an updated version.

> The EXPLAIN output has also been improved as discussed in [1].

I noticed that the EXPLAIN for a pushed-down update (delete) on 
inheritance childs doubly displays "Foreign Update" ("Foreign Delete"), 
one for ForeignScan and the other for ModifyTable.  Here is an example:

postgres=# explain verbose update parent set c1 = c1;                                  QUERY PLAN
------------------------------------------------------------------------------ Update on public.parent
(cost=0.00..364.54rows=4819 width=10)   Update on public.parent   Foreign Update on public.ft1   Foreign Update on
public.ft2  ->  Seq Scan on public.parent  (cost=0.00..0.00 rows=1 width=10)         Output: parent.c1, parent.ctid
-> Foreign Update on public.ft1  (cost=100.00..182.27 rows=2409 
 
width=10)         Remote SQL: UPDATE public.t1 SET c1 = c1   ->  Foreign Update on public.ft2  (cost=100.00..182.27
rows=2409
 
width=10)         Remote SQL: UPDATE public.t2 SET c1 = c1
(10 rows)

Should we do something?  Suggestions are welcome.

Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/31942.1410534785@sss.pgh.pa.us



Re: Optimization for updating foreign tables in Postgres FDW

From
Amit Langote
Date:
On 16-04-2015 PM 07:50, Etsuro Fujita wrote:
>> The EXPLAIN output has also been improved as discussed in [1].
> 
> I noticed that the EXPLAIN for a pushed-down update (delete) on inheritance
> childs doubly displays "Foreign Update" ("Foreign Delete"), one for
> ForeignScan and the other for ModifyTable.  Here is an example:
> 
> postgres=# explain verbose update parent set c1 = c1;
>                                   QUERY PLAN
> ------------------------------------------------------------------------------
>  Update on public.parent  (cost=0.00..364.54 rows=4819 width=10)
>    Update on public.parent
>    Foreign Update on public.ft1
>    Foreign Update on public.ft2
>    ->  Seq Scan on public.parent  (cost=0.00..0.00 rows=1 width=10)
>          Output: parent.c1, parent.ctid
>    ->  Foreign Update on public.ft1  (cost=100.00..182.27 rows=2409 width=10)
>          Remote SQL: UPDATE public.t1 SET c1 = c1
>    ->  Foreign Update on public.ft2  (cost=100.00..182.27 rows=2409 width=10)
>          Remote SQL: UPDATE public.t2 SET c1 = c1
> (10 rows)
> 
> Should we do something?  Suggestions are welcome.
> 

From what I see in Tom's commit message[0] for FTI patch, this shouldn't be,
right?

To be specific, there should be "Foreign Scan" there as per the commit. Am I
missing something?

Thanks,
Amit

[1]
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=cb1ca4d800621dcae67ca6c799006de99fa4f0a5




Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2015/04/16 19:57, Amit Langote wrote:
> On 16-04-2015 PM 07:50, Etsuro Fujita wrote:
>>> The EXPLAIN output has also been improved as discussed in [1].
>>
>> I noticed that the EXPLAIN for a pushed-down update (delete) on inheritance
>> childs doubly displays "Foreign Update" ("Foreign Delete"), one for
>> ForeignScan and the other for ModifyTable.  Here is an example:
>>
>> postgres=# explain verbose update parent set c1 = c1;
>>                                    QUERY PLAN
>> ------------------------------------------------------------------------------
>>   Update on public.parent  (cost=0.00..364.54 rows=4819 width=10)
>>     Update on public.parent
>>     Foreign Update on public.ft1
>>     Foreign Update on public.ft2
>>     ->  Seq Scan on public.parent  (cost=0.00..0.00 rows=1 width=10)
>>           Output: parent.c1, parent.ctid
>>     ->  Foreign Update on public.ft1  (cost=100.00..182.27 rows=2409 width=10)
>>           Remote SQL: UPDATE public.t1 SET c1 = c1
>>     ->  Foreign Update on public.ft2  (cost=100.00..182.27 rows=2409 width=10)
>>           Remote SQL: UPDATE public.t2 SET c1 = c1
>> (10 rows)
>>
>> Should we do something?  Suggestions are welcome.

>>From what I see in Tom's commit message[0] for FTI patch, this shouldn't be,
> right?
> 
> To be specific, there should be "Foreign Scan" there as per the commit. Am I
> missing something?

As shown in the below example, this patch doesn't change the EXPLAIN
output for non-pushed-down update (delete) cases, but since we changed
the EXPLAIN output as discussed in [1], the patch doubly displays
"Foreign Update" ("Foreign Delete") for pushed-down update (delet) cases
like the above example.

postgres=# explain verbose update parent set c1 = trunc(random() * 9 +
1)::int;                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------Update on
public.parent (cost=0.00..452.06 rows=5461 width=6)  Update on public.parent  Foreign Update on public.ft1    Remote
SQL:UPDATE public.t1 SET c1 = $2 WHERE ctid = $1  Foreign Update on public.ft2    Remote SQL: UPDATE public.t2 SET c1 =
$2WHERE ctid = $1  ->  Seq Scan on public.parent  (cost=0.00..0.01 rows=1 width=6)        Output: (trunc(((random() *
'9'::doubleprecision) +
 
'1'::double precision)))::integer, parent.ctid  ->  Foreign Scan on public.ft1  (cost=100.00..226.03 rows=2730 width=6)
      Output: (trunc(((random() * '9'::double precision) +
 
'1'::double precision)))::integer, ft1.ctid        Remote SQL: SELECT ctid FROM public.t1 FOR UPDATE  ->  Foreign Scan
onpublic.ft2  (cost=100.00..226.03 rows=2730 width=6)        Output: (trunc(((random() * '9'::double precision) +
 
'1'::double precision)))::integer, ft2.ctid        Remote SQL: SELECT ctid FROM public.t2 FOR UPDATE
(14 rows)

Best regards,
Etsuro Fujita



Re: Optimization for updating foreign tables in Postgres FDW

From
Amit Langote
Date:
Fujita-san,

On 16-04-2015 PM 08:40, Etsuro Fujita wrote:
>> From what I see in Tom's commit message[0] for FTI patch, this shouldn't be,
>> right?
>>
>> To be specific, there should be "Foreign Scan" there as per the commit. Am I
>> missing something?
> 
> As shown in the below example, this patch doesn't change the EXPLAIN
> output for non-pushed-down update (delete) cases, but since we changed
> the EXPLAIN output as discussed in [1], the patch doubly displays
> "Foreign Update" ("Foreign Delete") for pushed-down update (delet) cases
> like the above example.
> 
> postgres=# explain verbose update parent set c1 = trunc(random() * 9 +
> 1)::int;
>                                                  QUERY PLAN
> -------------------------------------------------------------------------------------------------------------
>  Update on public.parent  (cost=0.00..452.06 rows=5461 width=6)
>    Update on public.parent
>    Foreign Update on public.ft1
>      Remote SQL: UPDATE public.t1 SET c1 = $2 WHERE ctid = $1
>    Foreign Update on public.ft2
>      Remote SQL: UPDATE public.t2 SET c1 = $2 WHERE ctid = $1
>    ->  Seq Scan on public.parent  (cost=0.00..0.01 rows=1 width=6)
>          Output: (trunc(((random() * '9'::double precision) +
> '1'::double precision)))::integer, parent.ctid
>    ->  Foreign Scan on public.ft1  (cost=100.00..226.03 rows=2730 width=6)
>          Output: (trunc(((random() * '9'::double precision) +
> '1'::double precision)))::integer, ft1.ctid
>          Remote SQL: SELECT ctid FROM public.t1 FOR UPDATE
>    ->  Foreign Scan on public.ft2  (cost=100.00..226.03 rows=2730 width=6)
>          Output: (trunc(((random() * '9'::double precision) +
> '1'::double precision)))::integer, ft2.ctid
>          Remote SQL: SELECT ctid FROM public.t2 FOR UPDATE
> (14 rows)
> 

I think I missed the point that you are talking about the result after the
patch for foreign udpate pushdown (which is the topic of this thread) has been
applied. Sorry about the noise.

By the way, one suggestion may be to attach a "(pushed down)" to the
ModifyTable's "Foreign Update". And in that case, there would be no mention of
corresponding scan node in the list below exactly because there would be none.

postgres=# explain verbose update parent set c1 = c1;                                 QUERY PLAN
------------------------------------------------------------------------------Update on public.parent
(cost=0.00..364.54rows=4819 width=10)  Update on public.parent  Foreign Update (pushed down) on public.ft1  Foreign
Update(pushed down) on public.ft2  ->  Seq Scan on public.parent  (cost=0.00..0.00 rows=1 width=10)        Output:
parent.c1,parent.ctid
 

Thoughts?

Thanks,
Amit




Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2015/04/17 10:23, Amit Langote wrote:
> By the way, one suggestion may be to attach a "(pushed down)" to the
> ModifyTable's "Foreign Update". And in that case, there would be no mention of
> corresponding scan node in the list below exactly because there would be none.
> 
> postgres=# explain verbose update parent set c1 = c1;
>                                    QUERY PLAN
> ------------------------------------------------------------------------------
>   Update on public.parent  (cost=0.00..364.54 rows=4819 width=10)
>     Update on public.parent
>     Foreign Update (pushed down) on public.ft1
>     Foreign Update (pushed down) on public.ft2
>     ->  Seq Scan on public.parent  (cost=0.00..0.00 rows=1 width=10)
>           Output: parent.c1, parent.ctid

Thanks for the suggestion!

I'm not sure that that is a good idea because (1) that is contrary to
the reality (the update pushdown patch lets the ForeignScan nodes do
UPDATE/DELETE RETURNING and then do nothing at ModifyTable!) and because
(2) that might cause the problem of associating subplans' update
information with subplans' scan information, pointed out by Tom [1].

Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/22505.1426986174@sss.pgh.pa.us



Re: Optimization for updating foreign tables in Postgres FDW

From
Amit Langote
Date:
On 17-04-2015 PM 12:35, Etsuro Fujita wrote:
> On 2015/04/17 10:23, Amit Langote wrote:
>> By the way, one suggestion may be to attach a "(pushed down)" to the
>> ModifyTable's "Foreign Update". And in that case, there would be no mention of
>> corresponding scan node in the list below exactly because there would be none.
>>
>> postgres=# explain verbose update parent set c1 = c1;
>>                                    QUERY PLAN
>> ------------------------------------------------------------------------------
>>   Update on public.parent  (cost=0.00..364.54 rows=4819 width=10)
>>     Update on public.parent
>>     Foreign Update (pushed down) on public.ft1
>>     Foreign Update (pushed down) on public.ft2
>>     ->  Seq Scan on public.parent  (cost=0.00..0.00 rows=1 width=10)
>>           Output: parent.c1, parent.ctid
> 
> Thanks for the suggestion!
> 
> I'm not sure that that is a good idea because (1) that is contrary to
> the reality (the update pushdown patch lets the ForeignScan nodes do
> UPDATE/DELETE RETURNING and then do nothing at ModifyTable!) and because

Ah, the reality is exactly the reverse then. Thanks for your patience.

> (2) that might cause the problem of associating subplans' update
> information with subplans' scan information, pointed out by Tom [1].
> 

Having realized how it really works now, my +1 to "Foreign Modifying Scan" for
cases of pushed down update as suggested by Albe Laurenz. I guess it would be
signaled by the proposed ForeignScan.CmdType being CMD_UPDATE / CMP_UPDATE
(/CMD_INSERT).

Thanks,
Amit




Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2015/04/17 13:16, Amit Langote wrote:
> On 17-04-2015 PM 12:35, Etsuro Fujita wrote:
>> (2) that might cause the problem of associating subplans' update
>> information with subplans' scan information, pointed out by Tom [1].

> Having realized how it really works now, my +1 to "Foreign Modifying Scan" for
> cases of pushed down update as suggested by Albe Laurenz. I guess it would be
> signaled by the proposed ForeignScan.CmdType being CMD_UPDATE / CMP_UPDATE
> (/CMD_INSERT).

Thanks for the opinion!  I think that that is an idea.  However, I'd
like to propose to rename "Foreign Update" ("Foreign Delete") of
ModifyTable simply to "Update" ("Delete") not only because (1) that
solves the duplication problem but also because (2) ISTM that is
consistent with the non-inherited updates in both of the
non-pushed-down-update case and the pushed-down-update case.  Here are
examples for (2).

* Inherited and non-inherited updates for the non-pushed-down case:

postgres=# explain verbose update parent set c1 = trunc(random() * 9 +
1)::int;                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------Update on
public.parent (cost=0.00..452.06 rows=5461 width=6)  Update on public.parent  Update on public.ft1    Remote SQL:
UPDATEpublic.t1 SET c1 = $2 WHERE ctid = $1  Update on public.ft2    Remote SQL: UPDATE public.t2 SET c1 = $2 WHERE
ctid= $1  ->  Seq Scan on public.parent  (cost=0.00..0.01 rows=1 width=6)        Output: (trunc(((random() *
'9'::doubleprecision) +
 
'1'::double precision)))::integer, parent.ctid  ->  Foreign Scan on public.ft1  (cost=100.00..226.03 rows=2730 width=6)
      Output: (trunc(((random() * '9'::double precision) +
 
'1'::double precision)))::integer, ft1.ctid        Remote SQL: SELECT ctid FROM public.t1 FOR UPDATE  ->  Foreign Scan
onpublic.ft2  (cost=100.00..226.03 rows=2730 width=6)        Output: (trunc(((random() * '9'::double precision) +
 
'1'::double precision)))::integer, ft2.ctid        Remote SQL: SELECT ctid FROM public.t2 FOR UPDATE
(14 rows)

postgres=# explain verbose update ft1 set c1 = trunc(random() * 9 + 1)::int;
QUERY PLAN
 
------------------------------------------------------------------------------------------------------Update on
public.ft1 (cost=100.00..226.03 rows=2730 width=6)  Remote SQL: UPDATE public.t1 SET c1 = $2 WHERE ctid = $1  ->
ForeignScan on public.ft1  (cost=100.00..226.03 rows=2730 width=6)        Output: (trunc(((random() * '9'::double
precision)+
 
'1'::double precision)))::integer, ctid        Remote SQL: SELECT ctid FROM public.t1 FOR UPDATE
(5 rows)

* Inherited and non-inherited updates for the pushed-down case:

postgres=# explain verbose update parent set c1 = c1 + 1;                                 QUERY PLAN
------------------------------------------------------------------------------Update on public.parent
(cost=0.00..376.59rows=4819 width=10)  Update on public.parent  Update on public.ft1  Update on public.ft2  ->  Seq
Scanon public.parent  (cost=0.00..0.00 rows=1 width=10)        Output: (parent.c1 + 1), parent.ctid  ->  Foreign Update
onpublic.ft1  (cost=100.00..188.29 rows=2409
 
width=10)        Remote SQL: UPDATE public.t1 SET c1 = (c1 + 1)  ->  Foreign Update on public.ft2  (cost=100.00..188.29
rows=2409
width=10)        Remote SQL: UPDATE public.t2 SET c1 = (c1 + 1)
(10 rows)

postgres=# explain verbose update ft1 set c1 = c1 + 1;                                 QUERY PLAN
------------------------------------------------------------------------------Update on public.ft1
(cost=100.00..188.29rows=2409 width=10)  ->  Foreign Update on public.ft1  (cost=100.00..188.29 rows=2409
 
width=10)        Remote SQL: UPDATE public.t1 SET c1 = (c1 + 1)
(3 rows)

Comments are welcome.

Best regards,
Etsuro Fujita



Re: Optimization for updating foreign tables in Postgres FDW

From
Kyotaro HORIGUCHI
Date:
Hi,

At Mon, 20 Apr 2015 16:40:52 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in
<5534AD84.3020501@lab.ntt.co.jp>
> On 2015/04/17 13:16, Amit Langote wrote:
> > On 17-04-2015 PM 12:35, Etsuro Fujita wrote:
> >> (2) that might cause the problem of associating subplans' update
> >> information with subplans' scan information, pointed out by Tom [1].
> 
> > Having realized how it really works now, my +1 to "Foreign Modifying Scan" for
> > cases of pushed down update as suggested by Albe Laurenz. I guess it would be
> > signaled by the proposed ForeignScan.CmdType being CMD_UPDATE / CMP_UPDATE
> > (/CMD_INSERT).
> 
> Thanks for the opinion!  I think that that is an idea.  However, I'd
> like to propose to rename "Foreign Update" ("Foreign Delete") of
> ModifyTable simply to "Update" ("Delete") not only because (1) that
> solves the duplication problem but also because (2) ISTM that is
> consistent with the non-inherited updates in both of the
> non-pushed-down-update case and the pushed-down-update case.  Here are
> examples for (2).

Update node without "Foreign" that runs "Remote SQL" looks to me
somewhat unusual..

It seems to me that the problem is "Foreign Update"s for
ModifyTable that does nothing eventually. Even though I don't
understand this fully, especially what "Foreign Update" for
ModifyTable does when "Foreign Update" in place of "Foreign Scan"
finished substantial work, I think the ForeignUpdate nodes should
be removed during planning if it is really ineffective, or such
"Foreign Update"s shoud be renamed or provided with some
explaination in explain output if it does anything or unremovable
from some reason.

If removed it looks like,

| =# explain verbose update p set b = b + 1;
|                                   QUERY PLAN                                  
| ------------------------------------------------------------------------------
|  Update on public.p  (cost=0.00..360.08 rows=4311 width=14)
|    Update on public.p
|    ->  Seq Scan on public.p  (cost=0.00..0.00 rows=1 width=14)
|          Output: p.a, (p.b + 1), p.ctid
|    ->  Foreign Update on public.ft1  (cost=100.00..180.04 rows=2155 width=14)
|          Remote SQL: UPDATE public.t1 SET b = (b + 1)
|    ->  Foreign Update on public.ft2  (cost=100.00..180.04 rows=2155 width=14)
|          Remote SQL: UPDATE public.t2 SET b = (b + 1)

regards,

> * Inherited and non-inherited updates for the non-pushed-down case:
> 
> postgres=# explain verbose update parent set c1 = trunc(random() * 9 +
> 1)::int;
>                                                  QUERY PLAN
> -------------------------------------------------------------------------------------------------------------
>  Update on public.parent  (cost=0.00..452.06 rows=5461 width=6)
>    Update on public.parent
>    Update on public.ft1
>      Remote SQL: UPDATE public.t1 SET c1 = $2 WHERE ctid = $1
>    Update on public.ft2
>      Remote SQL: UPDATE public.t2 SET c1 = $2 WHERE ctid = $1
>    ->  Seq Scan on public.parent  (cost=0.00..0.01 rows=1 width=6)
>          Output: (trunc(((random() * '9'::double precision) +
> '1'::double precision)))::integer, parent.ctid
>    ->  Foreign Scan on public.ft1  (cost=100.00..226.03 rows=2730 width=6)
>          Output: (trunc(((random() * '9'::double precision) +
> '1'::double precision)))::integer, ft1.ctid
>          Remote SQL: SELECT ctid FROM public.t1 FOR UPDATE
>    ->  Foreign Scan on public.ft2  (cost=100.00..226.03 rows=2730 width=6)
>          Output: (trunc(((random() * '9'::double precision) +
> '1'::double precision)))::integer, ft2.ctid
>          Remote SQL: SELECT ctid FROM public.t2 FOR UPDATE
> (14 rows)
> 
> postgres=# explain verbose update ft1 set c1 = trunc(random() * 9 + 1)::int;
>                                               QUERY PLAN
> ------------------------------------------------------------------------------------------------------
>  Update on public.ft1  (cost=100.00..226.03 rows=2730 width=6)
>    Remote SQL: UPDATE public.t1 SET c1 = $2 WHERE ctid = $1
>    ->  Foreign Scan on public.ft1  (cost=100.00..226.03 rows=2730 width=6)
>          Output: (trunc(((random() * '9'::double precision) +
> '1'::double precision)))::integer, ctid
>          Remote SQL: SELECT ctid FROM public.t1 FOR UPDATE
> (5 rows)
> 
> * Inherited and non-inherited updates for the pushed-down case:
> 
> postgres=# explain verbose update parent set c1 = c1 + 1;
>                                   QUERY PLAN
> ------------------------------------------------------------------------------
>  Update on public.parent  (cost=0.00..376.59 rows=4819 width=10)
>    Update on public.parent
>    Update on public.ft1
>    Update on public.ft2
>    ->  Seq Scan on public.parent  (cost=0.00..0.00 rows=1 width=10)
>          Output: (parent.c1 + 1), parent.ctid
>    ->  Foreign Update on public.ft1  (cost=100.00..188.29 rows=2409
> width=10)
>          Remote SQL: UPDATE public.t1 SET c1 = (c1 + 1)
>    ->  Foreign Update on public.ft2  (cost=100.00..188.29 rows=2409
> width=10)
>          Remote SQL: UPDATE public.t2 SET c1 = (c1 + 1)
> (10 rows)
> 
> postgres=# explain verbose update ft1 set c1 = c1 + 1;
>                                   QUERY PLAN
> ------------------------------------------------------------------------------
>  Update on public.ft1  (cost=100.00..188.29 rows=2409 width=10)
>    ->  Foreign Update on public.ft1  (cost=100.00..188.29 rows=2409
> width=10)
>          Remote SQL: UPDATE public.t1 SET c1 = (c1 + 1)
> (3 rows)
> 
> Comments are welcome.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2015/04/21 10:07, Kyotaro HORIGUCHI wrote:
> At Mon, 20 Apr 2015 16:40:52 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in
<5534AD84.3020501@lab.ntt.co.jp>
>> However, I'd
>> like to propose to rename "Foreign Update" ("Foreign Delete") of
>> ModifyTable simply to "Update" ("Delete") not only because (1) that
>> solves the duplication problem but also because (2) ISTM that is
>> consistent with the non-inherited updates in both of the
>> non-pushed-down-update case and the pushed-down-update case.  Here are
>> examples for (2).

> Update node without "Foreign" that runs "Remote SQL" looks to me
> somewhat unusual..

I think that has a similarity with the existing EXPLAIN outputs for 
non-inherited non-pushed-down updates, as shown in the below exaple.
>> postgres=# explain verbose update ft1 set c1 = trunc(random() * 9 + 
1)::int;>>                                                QUERY PLAN>> 
------------------------------------------------------------------------------------------------------>>   Update on
public.ft1 (cost=100.00..226.03 rows=2730 width=6)>>     Remote SQL: UPDATE public.t1 SET c1 = $2 WHERE ctid = $1>>
-> Foreign Scan on public.ft1  (cost=100.00..226.03 rows=2730 
 
width=6)>>           Output: (trunc(((random() * '9'::double precision) +>> '1'::double precision)))::integer, ctid>>
       Remote SQL: SELECT ctid FROM public.t1 FOR UPDATE>> (5 rows)
 

> It seems to me that the problem is "Foreign Update"s for
> ModifyTable that does nothing eventually.

> I think the ForeignUpdate nodes should
> be removed during planning if it is really ineffective,

> If removed it looks like,
>
> | =# explain verbose update p set b = b + 1;
> |                                   QUERY PLAN
> | ------------------------------------------------------------------------------
> |  Update on public.p  (cost=0.00..360.08 rows=4311 width=14)
> |    Update on public.p
> |    ->  Seq Scan on public.p  (cost=0.00..0.00 rows=1 width=14)
> |          Output: p.a, (p.b + 1), p.ctid
> |    ->  Foreign Update on public.ft1  (cost=100.00..180.04 rows=2155 width=14)
> |          Remote SQL: UPDATE public.t1 SET b = (b + 1)
> |    ->  Foreign Update on public.ft2  (cost=100.00..180.04 rows=2155 width=14)
> |          Remote SQL: UPDATE public.t2 SET b = (b + 1)

On that point, I agree with Tom that that would cause the problem that 
the user has to guess at which of the child plans goes with which target 
relation of ModifyTable [1].

Thanks for the comments!

Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/22505.1426986174@sss.pgh.pa.us



Re: Optimization for updating foreign tables in Postgres FDW

From
Kyotaro HORIGUCHI
Date:
Hi, thank you. My understanding became a bit clearer.

At Tue, 21 Apr 2015 15:35:41 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in
<5535EFBD.8030006@lab.ntt.co.jp>
> On 2015/04/21 10:07, Kyotaro HORIGUCHI wrote:
> > At Mon, 20 Apr 2015 16:40:52 +0900, Etsuro Fujita
> > <fujita.etsuro@lab.ntt.co.jp> wrote in
> > <5534AD84.3020501@lab.ntt.co.jp>
> >> However, I'd
> >> like to propose to rename "Foreign Update" ("Foreign Delete") of
> >> ModifyTable simply to "Update" ("Delete") not only because (1) that
> >> solves the duplication problem but also because (2) ISTM that is
> >> consistent with the non-inherited updates in both of the
> >> non-pushed-down-update case and the pushed-down-update case.  Here are
> >> examples for (2).
> 
> > Update node without "Foreign" that runs "Remote SQL" looks to me
> > somewhat unusual..
> 
> I think that has a similarity with the existing EXPLAIN outputs for
> non-inherited non-pushed-down updates, as shown in the below exaple.
> 
> >> postgres=# explain verbose update ft1 set c1 = trunc(random() * 9 +
> >> 1)::int;
> >>                                                QUERY PLAN
> >>
> ------------------------------------------------------------------------------------------------------
> >>   Update on public.ft1  (cost=100.00..226.03 rows=2730 width=6)
> >>     Remote SQL: UPDATE public.t1 SET c1 = $2 WHERE ctid = $1
> >>     ->  Foreign Scan on public.ft1  (cost=100.00..226.03 rows=2730 width=6)
> >>           Output: (trunc(((random() * '9'::double precision) +
> >> '1'::double precision)))::integer, ctid
> >>           Remote SQL: SELECT ctid FROM public.t1 FOR UPDATE
> >> (5 rows)

Mmm.. It also looks confusing which needs to be fixed. Now
foreign tables are updated in two ways. One is ModifyTable on
foreign relation and the another is ForeignScan node of update
operation. Though I think that I understand the path to this
form, but I suppose they should confluent into one type of node,
perhaps ForegnScan node. Even if it is hardly archievable for
now, explain representation should be uniform.

Making ModifyTable on foreign relation have the representation
"Foreign Update", the explain results of the queries modifying
foreign tables are looks like,

Foreign Update on public.ft1 (...) Remote SQL: UPDATE public.t1 .... -> Foreign Scan on public.ft1...

Foreign Update on public.ft1 (... Foreign Update on public.ft1 (...   Remote SQL: ...

If Foreign Update has only one internal representation, the two
same Foreign Updates are (ideally) easily eliminated during
planning and explain would naturally shows the following result.

Foreign Update on public.ft1 (... Remote SQL: ...

But if not as is currently so, printing the result needs a bit
complicated calculation.


> > |  Update on public.p  (cost=0.00..360.08 rows=4311 width=14)
> > |    Update on public.p
> > |    ->  Seq Scan on public.p  (cost=0.00..0.00 rows=1 width=14)
> > |          Output: p.a, (p.b + 1), p.ctid
> > |    ->  Foreign Update on public.ft1 (cost=100.00..180.04 rows=2155
> > |    ->  width=14)
> > |          Remote SQL: UPDATE public.t1 SET b = (b + 1)
> > |    ->  Foreign Update on public.ft2 (cost=100.00..180.04 rows=2155
> > |    ->  width=14)
> > |          Remote SQL: UPDATE public.t2 SET b = (b + 1)
> 
> On that point, I agree with Tom that that would cause the problem that
> the user has to guess at which of the child plans goes with which
> target relation of ModifyTable [1].
> 
> [1]
> http://www.postgresql.org/message-id/22505.1426986174@sss.pgh.pa.us

Yeah, that seems to make the plan to be understood
clerer. Combining Tom's suggestion and my suggestion together
would result in the following output of explain.

Update on public.p  (cost=0.00..360.08 rows=4311 width=14) Update on public.p ->  Seq Scan on public.p
(cost=0.00..0.00rows=1 width=14)       Output: p.a, (p.b + 1), p.ctid Foreign Update on public.ft1 (cost=100.00..180.04
rows=2155width=14)    Remote SQL: UPDATE public.t1 SET b = (b + 1) Foreign Update on public.ft2 (cost=100.00..180.04
rows=2155 width=14)    Remote SQL: UPDATE public.t2 SET b = (b + 1)
 

And when not pushed down it would look like,

Update on public.p  (cost=0.00..360.08 rows=4311 width=14) Update on public.p   ->  Seq Scan on public.p
(cost=0.00..0.00rows=1 width=14)         Output: p.a, (p.b + 1), p.ctid Foreign Update on public.ft1
(cost=100.00..180.04rows=2155 width=14)    Remote SQL: UPDATE public.t1 SET b = $2 WHERE ctid = $1    -> Foreign Scan
onpublic.ft1 (cost=....)         Output: a, b, ctid         Remote SQL: SELECT a, ctid FROM public.t1 FOR UPDATE
ForeignUpdate on public.ft2 (cost=100.00..180.04 rows=2155  width=14)    Remote SQL: UPDATE public.t2 SET b = (b + 1)
-> Foreign Scan on public.ft2 (cost=....)         Output: a, b, ctid         Remote SQL: SELECT a, ctid FROM public.t2
FORUPDATE
 

These looks quite reasonable *for me* :)

Of course, the same discussion is applicable on Foreign Delete.

What do you think about this?

Any further thoughts?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Optimization for updating foreign tables in Postgres FDW

From
Stephen Frost
Date:
Etsuro,

* Etsuro Fujita (fujita.etsuro@lab.ntt.co.jp) wrote:
> Here is an updated version.  In this version, the bug has been
> fixed, but any regression tests for that hasn't been added, because
> I'm not sure that the above way is a good idea and don't have any
> other ideas.
>
> The EXPLAIN output has also been improved as discussed in [1].

While the EXPLAIN output changed, the structure hasn't really changed
from what was discussed previously and there's not been any real
involvment from the core code in what's happening here.

Clearly, the documentation around how to use the FDW API hasn't changed
at all and there's been no additions to it for handling bulk work.
Everything here continues to be done inside of postgres_fdw, which
essentially ignores the prescribed "Update/Delete one tuple" interface
for ExecForeignUpdate/ExecForeignDelete.

I've spent the better part of the past two days trying to reason my way
around that while reviewing this patch and I haven't come out the other
side any happier with this approach than I was back in
20140911153049.GC16422@tamriel.snowman.net.

There are other things that don't look right to me, such as what's going
on at the bottom of push_update_down(), but I don't think there's much
point going into it until we figure out what the core FDW API here
should look like.  It might not be all that far from what we have now,
but I don't think we can just ignore the existing, documented, API.
Thanks!
    Stephen

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2015/05/13 0:55, Stephen Frost wrote:
> Etsuro,
>
> * Etsuro Fujita (fujita.etsuro@lab.ntt.co.jp) wrote:
>> Here is an updated version.  In this version, the bug has been
>> fixed, but any regression tests for that hasn't been added, because
>> I'm not sure that the above way is a good idea and don't have any
>> other ideas.
>>
>> The EXPLAIN output has also been improved as discussed in [1].
>
> While the EXPLAIN output changed, the structure hasn't really changed
> from what was discussed previously and there's not been any real
> involvment from the core code in what's happening here.
>
> Clearly, the documentation around how to use the FDW API hasn't changed
> at all and there's been no additions to it for handling bulk work.
> Everything here continues to be done inside of postgres_fdw, which
> essentially ignores the prescribed "Update/Delete one tuple" interface
> for ExecForeignUpdate/ExecForeignDelete.
>
> I've spent the better part of the past two days trying to reason my way
> around that while reviewing this patch and I haven't come out the other
> side any happier with this approach than I was back in
> 20140911153049.GC16422@tamriel.snowman.net.
>
> There are other things that don't look right to me, such as what's going
> on at the bottom of push_update_down(), but I don't think there's much
> point going into it until we figure out what the core FDW API here
> should look like.  It might not be all that far from what we have now,
> but I don't think we can just ignore the existing, documented, API.

OK, I'll try to introduce the core FDW API for this (and make changes to 
the core code) to address your previous comments.

Thanks for taking the time to review the patch!

Best regards,
Etsuro Fujita



Re: Optimization for updating foreign tables in Postgres FDW

From
Thom Brown
Date:
On 13 May 2015 at 04:10, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2015/05/13 0:55, Stephen Frost wrote:
>>
>> Etsuro,
>>
>> * Etsuro Fujita (fujita.etsuro@lab.ntt.co.jp) wrote:
>>>
>>> Here is an updated version.  In this version, the bug has been
>>> fixed, but any regression tests for that hasn't been added, because
>>> I'm not sure that the above way is a good idea and don't have any
>>> other ideas.
>>>
>>> The EXPLAIN output has also been improved as discussed in [1].
>>
>>
>> While the EXPLAIN output changed, the structure hasn't really changed
>> from what was discussed previously and there's not been any real
>> involvment from the core code in what's happening here.
>>
>> Clearly, the documentation around how to use the FDW API hasn't changed
>> at all and there's been no additions to it for handling bulk work.
>> Everything here continues to be done inside of postgres_fdw, which
>> essentially ignores the prescribed "Update/Delete one tuple" interface
>> for ExecForeignUpdate/ExecForeignDelete.
>>
>> I've spent the better part of the past two days trying to reason my way
>> around that while reviewing this patch and I haven't come out the other
>> side any happier with this approach than I was back in
>> 20140911153049.GC16422@tamriel.snowman.net.
>>
>> There are other things that don't look right to me, such as what's going
>> on at the bottom of push_update_down(), but I don't think there's much
>> point going into it until we figure out what the core FDW API here
>> should look like.  It might not be all that far from what we have now,
>> but I don't think we can just ignore the existing, documented, API.
>
>
> OK, I'll try to introduce the core FDW API for this (and make changes to the
> core code) to address your previous comments.
>
> Thanks for taking the time to review the patch!

Fujita-san,

I'm a bit behind in reading up on this, so maybe it's been covered
since, but is there a discussion of this API on another thread, or a
newer patch available?

Thanks

Thom



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
Hi Thom,

Thank you for paying attention to this!

On 2015/11/25 20:36, Thom Brown wrote:
> On 13 May 2015 at 04:10, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
>> On 2015/05/13 0:55, Stephen Frost wrote:
>>> While the EXPLAIN output changed, the structure hasn't really changed
>>> from what was discussed previously and there's not been any real
>>> involvment from the core code in what's happening here.
>>>
>>> Clearly, the documentation around how to use the FDW API hasn't changed
>>> at all and there's been no additions to it for handling bulk work.
>>> Everything here continues to be done inside of postgres_fdw, which
>>> essentially ignores the prescribed "Update/Delete one tuple" interface
>>> for ExecForeignUpdate/ExecForeignDelete.
>>>
>>> I've spent the better part of the past two days trying to reason my way
>>> around that while reviewing this patch and I haven't come out the other
>>> side any happier with this approach than I was back in
>>> 20140911153049.GC16422@tamriel.snowman.net.
>>>
>>> There are other things that don't look right to me, such as what's going
>>> on at the bottom of push_update_down(), but I don't think there's much
>>> point going into it until we figure out what the core FDW API here
>>> should look like.  It might not be all that far from what we have now,
>>> but I don't think we can just ignore the existing, documented, API.

>> OK, I'll try to introduce the core FDW API for this (and make changes to the
>> core code) to address your previous comments.

> I'm a bit behind in reading up on this, so maybe it's been covered
> since, but is there a discussion of this API on another thread, or a
> newer patch available?

Actually, I'm now working on this.  My basic idea is to add new FDW APIs 
for handling the bulk work, in order not to make messy the prescribed 
"Update/Delete one tuple" interface; 1) add to nodeModifyTable.c or 
nodeForeignscan.c the new FDW APIs BeginPushedDownForeignModify, 
ExecPushedDownForeignModify and EndPushedDownForeignModify for that, and 
2) call these FDW APIs, instead of BeginForeignModify, 
ExecForeignUpdate/ExecForeignDelete and EndForeignModify, when doing 
update pushdown.  I'd like to propose that in more detail as soon as 
possible, probably with an updated patch.

Best regards,
Etsuro Fujita




Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2015/11/26 18:00, Etsuro Fujita wrote:
> On 2015/11/25 20:36, Thom Brown wrote:
>> On 13 May 2015 at 04:10, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
>> wrote:
>>> On 2015/05/13 0:55, Stephen Frost wrote:
>>>> While the EXPLAIN output changed, the structure hasn't really changed
>>>> from what was discussed previously and there's not been any real
>>>> involvment from the core code in what's happening here.
>>>>
>>>> Clearly, the documentation around how to use the FDW API hasn't changed
>>>> at all and there's been no additions to it for handling bulk work.
>>>> Everything here continues to be done inside of postgres_fdw, which
>>>> essentially ignores the prescribed "Update/Delete one tuple" interface
>>>> for ExecForeignUpdate/ExecForeignDelete.
>>>>
>>>> I've spent the better part of the past two days trying to reason my way
>>>> around that while reviewing this patch and I haven't come out the other
>>>> side any happier with this approach than I was back in
>>>> 20140911153049.GC16422@tamriel.snowman.net.
>>>>
>>>> There are other things that don't look right to me, such as what's
>>>> going
>>>> on at the bottom of push_update_down(), but I don't think there's much
>>>> point going into it until we figure out what the core FDW API here
>>>> should look like.  It might not be all that far from what we have now,
>>>> but I don't think we can just ignore the existing, documented, API.

>>> OK, I'll try to introduce the core FDW API for this (and make changes
>>> to the
>>> core code) to address your previous comments.

>> I'm a bit behind in reading up on this, so maybe it's been covered
>> since, but is there a discussion of this API on another thread, or a
>> newer patch available?

To address Stephen's comments, I'd like to propose the following FDW APIs:

bool
PlanDMLPushdown (PlannerInfo *root,
                  ModifyTable *plan,
                  Index resultRelation,
                  int subplan_index);

This is called in make_modifytable, before calling PlanForeignModify.
This checks to see whether a given UPDATE/DELETE .. RETURNING .. is
pushdown-safe and if so, performs planning actions needed for the DML
pushdown.  The idea is to modify a ForeignScan subplan accordingly as in
the previous patch.  If the DML is pushdown-safe, this returns true, and
we don't call PlanForeignModify anymore.  (Else returns false and call
PlanForeignModify as before.)  When the DML is pushdown-safe, we hook
the following FDW APIs located in nodeForeignscan.c, instead of
BeginForeignModify, ExecForeignUpdate/ExecForeignDelete and
EndForeignModify:

void
BeginDMLPushdown (ForeignScanState *node,
                   int eflags);

This initializes the DML pushdown, like BeginForeignScan.

TupleTableSlot *
IterateDMLPushdown (ForeignScanState *node);

This fetches one returning result from the foreign server, like
IterateForeignScan, if having a RETURNING clause.  If not, just return
an empty slot.  (I'm thinking that it's required that the FDW replaces
the targetlist of the ForeignScan subplan to the RETURNING clause during
PlanDMLPushdown, if having the clause, so that we do nothing at
ModifyTable.)

void
EndDMLPushdown (ForeignScanState *node);

This finishes the DML pushdown, like EndForeignScan.

I'm attaching a WIP patch, which only includes changes to the core.  I'm
now working on the postgres_fdw patch to demonstrate that these APIs
work well, but I'd be happy if I could get any feedback earlier.

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Rushabh Lathia
Date:


On Mon, Dec 21, 2015 at 6:20 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
On 2015/11/26 18:00, Etsuro Fujita wrote:
On 2015/11/25 20:36, Thom Brown wrote:
On 13 May 2015 at 04:10, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
wrote:
On 2015/05/13 0:55, Stephen Frost wrote:
While the EXPLAIN output changed, the structure hasn't really changed
from what was discussed previously and there's not been any real
involvment from the core code in what's happening here.

Clearly, the documentation around how to use the FDW API hasn't changed
at all and there's been no additions to it for handling bulk work.
Everything here continues to be done inside of postgres_fdw, which
essentially ignores the prescribed "Update/Delete one tuple" interface
for ExecForeignUpdate/ExecForeignDelete.

I've spent the better part of the past two days trying to reason my way
around that while reviewing this patch and I haven't come out the other
side any happier with this approach than I was back in
20140911153049.GC16422@tamriel.snowman.net.

There are other things that don't look right to me, such as what's
going
on at the bottom of push_update_down(), but I don't think there's much
point going into it until we figure out what the core FDW API here
should look like.  It might not be all that far from what we have now,
but I don't think we can just ignore the existing, documented, API.

OK, I'll try to introduce the core FDW API for this (and make changes
to the
core code) to address your previous comments.

I'm a bit behind in reading up on this, so maybe it's been covered
since, but is there a discussion of this API on another thread, or a
newer patch available?

To address Stephen's comments, I'd like to propose the following FDW APIs:

bool
PlanDMLPushdown (PlannerInfo *root,
                 ModifyTable *plan,
                 Index resultRelation,
                 int subplan_index);

This is called in make_modifytable, before calling PlanForeignModify. This checks to see whether a given UPDATE/DELETE .. RETURNING .. is pushdown-safe and if so, performs planning actions needed for the DML pushdown.  The idea is to modify a ForeignScan subplan accordingly as in the previous patch.  If the DML is pushdown-safe, this returns true, and we don't call PlanForeignModify anymore.  (Else returns false and call PlanForeignModify as before.)  When the DML is pushdown-safe, we hook the following FDW APIs located in nodeForeignscan.c, instead of BeginForeignModify, ExecForeignUpdate/ExecForeignDelete and EndForeignModify:

void
BeginDMLPushdown (ForeignScanState *node,
                  int eflags);

This initializes the DML pushdown, like BeginForeignScan.

TupleTableSlot *
IterateDMLPushdown (ForeignScanState *node);

This fetches one returning result from the foreign server, like IterateForeignScan, if having a RETURNING clause.  If not, just return an empty slot.  (I'm thinking that it's required that the FDW replaces the targetlist of the ForeignScan subplan to the RETURNING clause during PlanDMLPushdown, if having the clause, so that we do nothing at ModifyTable.)

void
EndDMLPushdown (ForeignScanState *node);

This finishes the DML pushdown, like EndForeignScan.

+1.

I like idea of separate FDW API for the DML Pushdown. Was thinking can't we can re-use the  IterateForeignScan(ForeignScanState *node) rather then introducing IterateDMLPushdown(ForeignScanState *node) new API ?


I'm attaching a WIP patch, which only includes changes to the core.  I'm now working on the postgres_fdw patch to demonstrate that these APIs work well, but I'd be happy if I could get any feedback earlier.

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Rushabh Lathia

Re: Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Wed, Dec 23, 2015 at 5:50 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
> +1.
>
> I like idea of separate FDW API for the DML Pushdown. Was thinking can't we
> can re-use the  IterateForeignScan(ForeignScanState *node) rather then
> introducing IterateDMLPushdown(ForeignScanState *node) new API ?

Yeah, I think we need to ask ourselves what advantage we're getting
out of adding any new core APIs.  Marking the scan as a pushed-down
update or delete has some benefit in terms of making the information
visible via EXPLAIN, but even that's a pretty thin benefit.  The
iterate method seems to just complicate the core code without any
benefit at all.  More generally, there is very, very little code in
this patch that accomplishes anything that could not be done just as
well with the existing methods.  So why are we even doing these core
changes?

Tom seemed to think that we could centralize some checks in the core
code, say, related to triggers, or to LIMIT.  But there's nothing like
that in this patch, so I'm not really understanding the point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2015/12/24 4:34, Robert Haas wrote:
> On Wed, Dec 23, 2015 at 5:50 AM, Rushabh Lathia
> <rushabh.lathia@gmail.com> wrote:
>> +1.
>>
>> I like idea of separate FDW API for the DML Pushdown. Was thinking can't we
>> can re-use the  IterateForeignScan(ForeignScanState *node) rather then
>> introducing IterateDMLPushdown(ForeignScanState *node) new API ?

> Yeah, I think we need to ask ourselves what advantage we're getting
> out of adding any new core APIs.  Marking the scan as a pushed-down
> update or delete has some benefit in terms of making the information
> visible via EXPLAIN, but even that's a pretty thin benefit.  The
> iterate method seems to just complicate the core code without any
> benefit at all.  More generally, there is very, very little code in
> this patch that accomplishes anything that could not be done just as
> well with the existing methods.  So why are we even doing these core
> changes?

 From the FDWs' point of view, ISTM that what FDWs have to do for
IterateDMLPushdown is quite different from what FDWs have to do for
IterateForeignScan; eg, IterateDMLPushdown must work in accordance with
presence/absence of a RETURNING list.  (In addition to that,
IterateDMLPushdown has been designed so that it must make the scan tuple
available to later RETURNING projection in nodeModifyTable.c.)  So, I
think that it's better to FDWs to add separate APIs for the DML
pushdown, making the FDW code much simpler.  So based on that idea, I
added the postgres_fdw changes to the patch.  Attached is an updated
version of the patch, which is still WIP, but I'd be happy if I could
get any feedback.

> Tom seemed to think that we could centralize some checks in the core
> code, say, related to triggers, or to LIMIT.  But there's nothing like
> that in this patch, so I'm not really understanding the point.

For the trigger check, I added relation_has_row_level_triggers.  I
placed that function in postgres_fdw.c in the updated patch, but I think
that by placing that function in the core, FDWs can share that function.
  As for the LIMIT, I'm not sure we can do something about that.

I think the current design allows us to handle a pushed-down update on a
join, ie, "UPDATE foo ... FROM bar ..." where both foo and bar are
remote, which was Tom's concern, but I'll leave that for another patch.
  Also, I think the current design could also extend to push down INSERT
.. RETURNING .., but I'd like to leave that for future work.

I'll add this to the next CF.

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Rushabh Lathia
Date:
I started looking at updated patch and its definitely iked the new approach.

Patch passing the mandatory checks:

1) Patch applied cleanly using patch -p1 command
2) Source got compiled cleanly
3) make check, running cleanly


Explain output for the remote table and inheritance relations looks better
then earlier version of patch.

-- Inheritance foreign relation
postgres=# explain (ANALYZE, VERBOSE) update fp set a = 20;
                                                      QUERY PLAN                                                      
-----------------------------------------------------------------------------------------------------------------------
 Update on public.fp  (cost=100.00..767.60 rows=10920 width=6) (actual time=1.000..1.000 rows=0 loops=1)
   Foreign Update on public.fp
   Foreign Update on public.fc1
   Foreign Update on public.fc2
   Foreign Update on public.fc3
   ->  Foreign Update on public.fp  (cost=100.00..191.90 rows=2730 width=6) (actual time=0.493..0.493 rows=0 loops=1)
         Remote SQL: UPDATE public.p SET a = 20
   ->  Foreign Update on public.fc1  (cost=100.00..191.90 rows=2730 width=6) (actual time=0.177..0.177 rows=0 loops=1)
         Remote SQL: UPDATE public.c1 SET a = 20
   ->  Foreign Update on public.fc2  (cost=100.00..191.90 rows=2730 width=6) (actual time=0.163..0.163 rows=0 loops=1)
         Remote SQL: UPDATE public.c2 SET a = 20
   ->  Foreign Update on public.fc3  (cost=100.00..191.90 rows=2730 width=6) (actual time=0.158..0.158 rows=0 loops=1)
         Remote SQL: UPDATE public.c3 SET a = 20
 Planning time: 0.228 ms
 Execution time: 1.359 ms
(15 rows)

-- Foreign table update
postgres=# explain (ANALYZE, VERBOSE) update ft1 set c8 = 'updated'  where c1 = '200';
                                                      QUERY PLAN                                                     
----------------------------------------------------------------------------------------------------------------------
 Update on public.ft1  (cost=100.00..116.13 rows=2 width=144) (actual time=0.485..0.485 rows=0 loops=1)
   ->  Foreign Update on public.ft1  (cost=100.00..116.13 rows=2 width=144) (actual time=0.483..0.483 rows=0 loops=1)
         Remote SQL: UPDATE public.t1 SET c8 = 'updated   '::character(10) WHERE ((c1 = 200))
 Planning time: 0.158 ms
 Execution time: 0.786 ms
(5 rows)


-- Explain output for the returning clause:
postgres=# explain (ANALYZE, VERBOSE) update ft1 set c8 = 'updated'  where c1 = '200' returning c2;
                                                      QUERY PLAN                                                     
----------------------------------------------------------------------------------------------------------------------
 Update on public.ft1  (cost=100.00..116.13 rows=2 width=144) (actual time=0.516..0.516 rows=0 loops=1)
   Output: c2
   ->  Foreign Update on public.ft1  (cost=100.00..116.13 rows=2 width=144) (actual time=0.514..0.514 rows=0 loops=1)
         Remote SQL: UPDATE public.t1 SET c8 = 'updated   '::character(10) WHERE ((c1 = 200)) RETURNING c2
 Planning time: 0.172 ms
 Execution time: 0.938 ms
(6 rows)


-- Explain output when returning clause is not pushdown safe:
postgres=# explain (ANALYZE, VERBOSE) update ft1 set c8 = 'updated'  where c1 = '200' returning local_func(20);
                                                      QUERY PLAN                                                     
----------------------------------------------------------------------------------------------------------------------
 Update on public.ft1  (cost=100.00..116.13 rows=2 width=144) (actual time=0.364..0.364 rows=0 loops=1)
   Output: local_func(20)
   ->  Foreign Update on public.ft1  (cost=100.00..116.13 rows=2 width=144) (actual time=0.363..0.363 rows=0 loops=1)
         Remote SQL: UPDATE public.t1 SET c8 = 'updated   '::character(10) WHERE ((c1 = 200))
 Planning time: 0.142 ms
 Execution time: 0.623 ms
(6 rows)

-- Explain output with PREPARE:
postgres=# explain (ANALYZE, VERBOSE) execute ftupdate(20);
                                                      QUERY PLAN                                                     
----------------------------------------------------------------------------------------------------------------------
 Update on public.ft1  (cost=100.00..116.13 rows=2 width=144) (actual time=0.712..0.712 rows=0 loops=1)
   ->  Foreign Update on public.ft1  (cost=100.00..116.13 rows=2 width=144) (actual time=0.711..0.711 rows=1 loops=1)
         Remote SQL: UPDATE public.t1 SET c8 = 'updated   '::character(10) WHERE ((c1 = 20))
 Execution time: 1.001 ms
(4 rows)


With the initial look and test overall things looking great, I am still
reviewing the code changes but here are few early doubts/questions:

.) What the need of following change ?

@@ -833,9 +833,6 @@ appendWhereClause(StringInfo buf,
    int         nestlevel;
    ListCell   *lc;
 
-   if (params)
-       *params = NIL;          /* initialize result list to empty */
-
    /* Set up context struct for recursion */
    context.root = root;
    context.foreignrel = baserel;
@@ -971,6 +968,63 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 }


.) When Tom Lane and Stephen Frost suggested getting the core code involved,
I thought that we can do the mandatory checks into core it self and making
completely out of dml_is_pushdown_safe(). Please correct me

.) Documentation for the new API is missing (fdw-callbacks).



Regards,


On Fri, Dec 25, 2015 at 3:30 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
On 2015/12/24 4:34, Robert Haas wrote:
On Wed, Dec 23, 2015 at 5:50 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
+1.

I like idea of separate FDW API for the DML Pushdown. Was thinking can't we
can re-use the  IterateForeignScan(ForeignScanState *node) rather then
introducing IterateDMLPushdown(ForeignScanState *node) new API ?

Yeah, I think we need to ask ourselves what advantage we're getting
out of adding any new core APIs.  Marking the scan as a pushed-down
update or delete has some benefit in terms of making the information
visible via EXPLAIN, but even that's a pretty thin benefit.  The
iterate method seems to just complicate the core code without any
benefit at all.  More generally, there is very, very little code in
this patch that accomplishes anything that could not be done just as
well with the existing methods.  So why are we even doing these core
changes?

>From the FDWs' point of view, ISTM that what FDWs have to do for IterateDMLPushdown is quite different from what FDWs have to do for IterateForeignScan; eg, IterateDMLPushdown must work in accordance with presence/absence of a RETURNING list.  (In addition to that, IterateDMLPushdown has been designed so that it must make the scan tuple available to later RETURNING projection in nodeModifyTable.c.)  So, I think that it's better to FDWs to add separate APIs for the DML pushdown, making the FDW code much simpler.  So based on that idea, I added the postgres_fdw changes to the patch.  Attached is an updated version of the patch, which is still WIP, but I'd be happy if I could get any feedback.

Tom seemed to think that we could centralize some checks in the core
code, say, related to triggers, or to LIMIT.  But there's nothing like
that in this patch, so I'm not really understanding the point.

For the trigger check, I added relation_has_row_level_triggers.  I placed that function in postgres_fdw.c in the updated patch, but I think that by placing that function in the core, FDWs can share that function.  As for the LIMIT, I'm not sure we can do something about that.

I think the current design allows us to handle a pushed-down update on a join, ie, "UPDATE foo ... FROM bar ..." where both foo and bar are remote, which was Tom's concern, but I'll leave that for another patch.  Also, I think the current design could also extend to push down INSERT .. RETURNING .., but I'd like to leave that for future work.

I'll add this to the next CF.

Best regards,
Etsuro Fujita




Rushabh Lathia

Re: Optimization for updating foreign tables in Postgres FDW

From
Thom Brown
Date:
On 25 December 2015 at 10:00, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2015/12/24 4:34, Robert Haas wrote:
>>
>> On Wed, Dec 23, 2015 at 5:50 AM, Rushabh Lathia
>> <rushabh.lathia@gmail.com> wrote:
>>>
>>> +1.
>>>
>>> I like idea of separate FDW API for the DML Pushdown. Was thinking can't
>>> we
>>> can re-use the  IterateForeignScan(ForeignScanState *node) rather then
>>> introducing IterateDMLPushdown(ForeignScanState *node) new API ?
>
>
>> Yeah, I think we need to ask ourselves what advantage we're getting
>> out of adding any new core APIs.  Marking the scan as a pushed-down
>> update or delete has some benefit in terms of making the information
>> visible via EXPLAIN, but even that's a pretty thin benefit.  The
>> iterate method seems to just complicate the core code without any
>> benefit at all.  More generally, there is very, very little code in
>> this patch that accomplishes anything that could not be done just as
>> well with the existing methods.  So why are we even doing these core
>> changes?
>
>
> From the FDWs' point of view, ISTM that what FDWs have to do for
> IterateDMLPushdown is quite different from what FDWs have to do for
> IterateForeignScan; eg, IterateDMLPushdown must work in accordance with
> presence/absence of a RETURNING list.  (In addition to that,
> IterateDMLPushdown has been designed so that it must make the scan tuple
> available to later RETURNING projection in nodeModifyTable.c.)  So, I think
> that it's better to FDWs to add separate APIs for the DML pushdown, making
> the FDW code much simpler.  So based on that idea, I added the postgres_fdw
> changes to the patch.  Attached is an updated version of the patch, which is
> still WIP, but I'd be happy if I could get any feedback.
>
>> Tom seemed to think that we could centralize some checks in the core
>> code, say, related to triggers, or to LIMIT.  But there's nothing like
>> that in this patch, so I'm not really understanding the point.
>
>
> For the trigger check, I added relation_has_row_level_triggers.  I placed
> that function in postgres_fdw.c in the updated patch, but I think that by
> placing that function in the core, FDWs can share that function.  As for the
> LIMIT, I'm not sure we can do something about that.
>
> I think the current design allows us to handle a pushed-down update on a
> join, ie, "UPDATE foo ... FROM bar ..." where both foo and bar are remote,
> which was Tom's concern, but I'll leave that for another patch.  Also, I
> think the current design could also extend to push down INSERT .. RETURNING
> .., but I'd like to leave that for future work.
>
> I'll add this to the next CF.

I've run into an issue:

*# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
tableoid::regclass;
ERROR:
CONTEXT:  Remote SQL command: UPDATE public.customers SET id = 22
WHERE ((id = 16)) RETURNING NULL

However, this works:

*# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
tableoid::regclass, *;   tableoid     | id | name  |    company    | registered_date |
expiry_date | active | status  | account_level

-----------------+----+-------+---------------+-----------------+-------------+--------+---------+---------------local_customers
|22 | Bruce | Jo's Cupcakes | 2015-01-15      |
 
2017-01-14  | t      | running | basic
(1 row)

In this example, "local_customers" inherits from the remote table
"public"."customers", which inherits again from the local table
"master_customers"

Same issue with DELETE of course, and the ::regclass isn't important here.

Thom



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/01/06 18:58, Rushabh Lathia wrote:
> I started looking at updated patch and its definitely iked the new
> approach.

Thanks for the review!

> With the initial look and test overall things looking great, I am still
> reviewing the code changes but here are few early doubts/questions:

> .) What the need of following change ?
>
> @@ -833,9 +833,6 @@ appendWhereClause(StringInfo buf,
>      int         nestlevel;
>      ListCell   *lc;
>
> -   if (params)
> -       *params = NIL;          /* initialize result list to empty */
> -
>      /* Set up context struct for recursion */
>      context.root = root;
>      context.foreignrel = baserel;
> @@ -971,6 +968,63 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
>   }

It is needed for deparsePushedDownUpdateSql to store params in both 
WHERE clauses and expressions to assign to the target columns
into one params_list list.

> .) When Tom Lane and Stephen Frost suggested getting the core code involved,
> I thought that we can do the mandatory checks into core it self and making
> completely out of dml_is_pushdown_safe(). Please correct me

The reason why I put that function in postgres_fdw.c is Check point 4:

+  * 4. We can't push an UPDATE down, if any expressions to assign to 
the target
+  * columns are unsafe to evaluate on the remote server.

I think this depends on the capabilities of the FDW.

> .) Documentation for the new API is missing (fdw-callbacks).

Will add the docs.

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/01/06 20:37, Thom Brown wrote:
> On 25 December 2015 at 10:00, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
>> Attached is an updated version of the patch, which is
>> still WIP, but I'd be happy if I could get any feedback.

> I've run into an issue:
>
> *# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
> tableoid::regclass;
> ERROR:
> CONTEXT:  Remote SQL command: UPDATE public.customers SET id = 22
> WHERE ((id = 16)) RETURNING NULL
>
> However, this works:
>
> *# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
> tableoid::regclass, *;
>      tableoid     | id | name  |    company    | registered_date |
> expiry_date | active | status  | account_level
> -----------------+----+-------+---------------+-----------------+-------------+--------+---------+---------------
>   local_customers | 22 | Bruce | Jo's Cupcakes | 2015-01-15      |
> 2017-01-14  | t      | running | basic
> (1 row)
>
> In this example, "local_customers" inherits from the remote table
> "public"."customers", which inherits again from the local table
> "master_customers"
>
> Same issue with DELETE of course, and the ::regclass isn't important here.

Will fix.

Thanks for the testing!

Best regards,
Etsuro Fujita





On 2016/01/07 21:50, Etsuro Fujita wrote:
> On 2016/01/06 20:37, Thom Brown wrote:
>> On 25 December 2015 at 10:00, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>> Attached is an updated version of the patch, which is
>>> still WIP, but I'd be happy if I could get any feedback.

>> I've run into an issue:
>>
>> *# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
>> tableoid::regclass;
>> ERROR:
>> CONTEXT:  Remote SQL command: UPDATE public.customers SET id = 22
>> WHERE ((id = 16)) RETURNING NULL

> Will fix.

While working on this, I noticed that the existing postgres_fdw system
shows similar behavior, so I changed the subject.

IIUC, the reason for that is when the local query specifies "RETURNING
tableoid::regclass", the FDW has fmstate->has_returning=false while the
remote query executed at ModifyTable has "RETURNING NULL", as shown in
the above example; that would cause an abnormal exit in executing the
remote query in postgresExecForeignUpdate, since that the FDW would get
PGRES_TUPLES_OK as a result of the query while the FDW would think that
the right result to get should be PGRES_COMMAND_OK, from the flag
fmstate->has_returning=false.

Attached is a patch to fix that.

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/01/07 21:45, Etsuro Fujita wrote:
> On 2016/01/06 18:58, Rushabh Lathia wrote:
>> .) Documentation for the new API is missing (fdw-callbacks).

> Will add the docs.

I added docs for new FDW APIs.

Other changes:

* Rename relation_has_row_level_triggers to relation_has_row_triggers
shortly, and move it to rewriteHandler.c.  I'm not sure rewriteHandler.c
is a good place for that, though.

* Revise code, including a helper function get_result_result, whcih I
implemented using a modified version of store_returning_result in the
previous patch, but on second thought, I think that that is a bit too
invasive.  So, I re-implemented that function directly using
make_tuple_from_result_row.

* Add more comments.

* Add more regression tests.

Attached is an updated version of the patch.  Comments are wellcome!
(If the fix [1] is okay, I'd like to update this patch on top of the
patch in [1].)

Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/568F4430.6060805@lab.ntt.co.jp

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Rushabh Lathia
Date:


On Thu, Jan 7, 2016 at 6:15 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
On 2016/01/06 18:58, Rushabh Lathia wrote:
I started looking at updated patch and its definitely iked the new
approach.

Thanks for the review!

With the initial look and test overall things looking great, I am still
reviewing the code changes but here are few early doubts/questions:

.) What the need of following change ?

@@ -833,9 +833,6 @@ appendWhereClause(StringInfo buf,
     int         nestlevel;
     ListCell   *lc;

-   if (params)
-       *params = NIL;          /* initialize result list to empty */
-
     /* Set up context struct for recursion */
     context.root = root;
     context.foreignrel = baserel;
@@ -971,6 +968,63 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
  }

It is needed for deparsePushedDownUpdateSql to store params in both WHERE clauses and expressions to assign to the target columns
into one params_list list.

Hmm sorry but I am still not getting the point, can you provide some
example to explain this ?

.) When Tom Lane and Stephen Frost suggested getting the core code involved,
I thought that we can do the mandatory checks into core it self and making
completely out of dml_is_pushdown_safe(). Please correct me

The reason why I put that function in postgres_fdw.c is Check point 4:

+  * 4. We can't push an UPDATE down, if any expressions to assign to the target
+  * columns are unsafe to evaluate on the remote server.


Here I was talking about checks related to triggers, or to LIMIT. I think
earlier thread talked about those mandatory check to the core. So may
be we can move those checks into make_modifytable() before calling
the PlanDMLPushdown.

I think this depends on the capabilities of the FDW.

.) Documentation for the new API is missing (fdw-callbacks).

Will add the docs.





--
Rushabh Lathia
On 8 January 2016 at 05:08, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/01/07 21:50, Etsuro Fujita wrote:
>>
>> On 2016/01/06 20:37, Thom Brown wrote:
>>>
>>> On 25 December 2015 at 10:00, Etsuro Fujita
>>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>>>
>>>> Attached is an updated version of the patch, which is
>>>> still WIP, but I'd be happy if I could get any feedback.
>
>
>>> I've run into an issue:
>>>
>>> *# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
>>> tableoid::regclass;
>>> ERROR:
>>> CONTEXT:  Remote SQL command: UPDATE public.customers SET id = 22
>>> WHERE ((id = 16)) RETURNING NULL
>
>
>> Will fix.
>
>
> While working on this, I noticed that the existing postgres_fdw system shows
> similar behavior, so I changed the subject.
>
> IIUC, the reason for that is when the local query specifies "RETURNING
> tableoid::regclass", the FDW has fmstate->has_returning=false while the
> remote query executed at ModifyTable has "RETURNING NULL", as shown in the
> above example; that would cause an abnormal exit in executing the remote
> query in postgresExecForeignUpdate, since that the FDW would get
> PGRES_TUPLES_OK as a result of the query while the FDW would think that the
> right result to get should be PGRES_COMMAND_OK, from the flag
> fmstate->has_returning=false.
>
> Attached is a patch to fix that.

I can't apply this patch in tandem with FDW DML pushdown patch (either
v2 or v3).

Thom



On 2016/01/12 20:36, Thom Brown wrote:
> On 8 January 2016 at 05:08, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:

>>> On 2016/01/06 20:37, Thom Brown wrote:
>>>> I've run into an issue:
>>>>
>>>> *# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
>>>> tableoid::regclass;
>>>> ERROR:
>>>> CONTEXT:  Remote SQL command: UPDATE public.customers SET id = 22
>>>> WHERE ((id = 16)) RETURNING NULL

>> While working on this, I noticed that the existing postgres_fdw system shows
>> similar behavior, so I changed the subject.
>>
>> IIUC, the reason for that is when the local query specifies "RETURNING
>> tableoid::regclass", the FDW has fmstate->has_returning=false while the
>> remote query executed at ModifyTable has "RETURNING NULL", as shown in the
>> above example; that would cause an abnormal exit in executing the remote
>> query in postgresExecForeignUpdate, since that the FDW would get
>> PGRES_TUPLES_OK as a result of the query while the FDW would think that the
>> right result to get should be PGRES_COMMAND_OK, from the flag
>> fmstate->has_returning=false.

>> Attached is a patch to fix that.

> I can't apply this patch in tandem with FDW DML pushdown patch (either
> v2 or v3).

That patch is for fixing the similar issue in the existing postgres_fdw 
system.  So, please apply that patch without the DML pushdown patch.  If 
that patch is reasonable as a fix for the issue, I'll update the DML 
pushdown patch (v3) on top of that patch.

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/01/12 20:31, Rushabh Lathia wrote:
> On Thu, Jan 7, 2016 at 6:15 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
>     On 2016/01/06 18:58, Rushabh Lathia wrote:
>         .) What the need of following change ?
>
>         @@ -833,9 +833,6 @@ appendWhereClause(StringInfo buf,
>               int         nestlevel;
>               ListCell   *lc;
>
>         -   if (params)
>         -       *params = NIL;          /* initialize result list to
>         empty */
>         -
>               /* Set up context struct for recursion */
>               context.root = root;
>               context.foreignrel = baserel;
>         @@ -971,6 +968,63 @@ deparseUpdateSql(StringInfo buf,
>         PlannerInfo *root,
>            }

>     It is needed for deparsePushedDownUpdateSql to store params in both
>     WHERE clauses and expressions to assign to the target columns
>     into one params_list list.

> Hmm sorry but I am still not getting the point, can you provide some
> example to explain this ?

Sorry, maybe my explanation was not enough.  Consider:

postgres=# create foreign table ft1 (a int, b int) server myserver 
options (table_name 't1');
postgres=# insert into ft1 values (0, 0);
postgres=# prepare mt(int, int) as update ft1 set a = $1 where b = $2;
postgres=# explain verbose execute mt(1, 0);
postgres=# explain verbose execute mt(1, 0);
postgres=# explain verbose execute mt(1, 0);
postgres=# explain verbose execute mt(1, 0);
postgres=# explain verbose execute mt(1, 0);

After the 5 executions of mt we have

postgres=# explain verbose execute mt(1, 0);                                     QUERY PLAN
------------------------------------------------------------------------------------ Update on public.ft1
(cost=100.00..140.35rows=12 width=10)   ->  Foreign Update on public.ft1  (cost=100.00..140.35 rows=12 width=10)
Remote SQL: UPDATE public.t1 SET a = $1::integer WHERE ((b = 
 
$2::integer))
(3 rows)

If we do that initialization in appendWhereClause, we would get a wrong 
params_list list and a wrong remote pushed-down query for the last mt() 
in deparsePushedDownUpdateSql.

>         .) When Tom Lane and Stephen Frost suggested getting the core
>         code involved,
>         I thought that we can do the mandatory checks into core it self
>         and making
>         completely out of dml_is_pushdown_safe(). Please correct me

>     The reason why I put that function in postgres_fdw.c is Check point 4:
>
>     +  * 4. We can't push an UPDATE down, if any expressions to assign
>     to the target
>     +  * columns are unsafe to evaluate on the remote server.

> Here I was talking about checks related to triggers, or to LIMIT. I think
> earlier thread talked about those mandatory check to the core. So may
> be we can move those checks into make_modifytable() before calling
> the PlanDMLPushdown.

Noticed that.  Will do.

BTW, I keep a ForeignScan node pushing down an update to the remote 
server, in the updated patches.  I have to admit that that seems like 
rather a misnomer.  So, it might be worth adding a new ForeignUpdate 
node, but my concern about that is that if doing so, we would have a lot 
of duplicate code in ForeignUpdate and ForeignScan.  What do you think 
about that?

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Rushabh Lathia
Date:


On Thu, Jan 14, 2016 at 2:00 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
On 2016/01/12 20:31, Rushabh Lathia wrote:
On Thu, Jan 7, 2016 at 6:15 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
    On 2016/01/06 18:58, Rushabh Lathia wrote:
        .) What the need of following change ?

        @@ -833,9 +833,6 @@ appendWhereClause(StringInfo buf,
              int         nestlevel;
              ListCell   *lc;

        -   if (params)
        -       *params = NIL;          /* initialize result list to
        empty */
        -
              /* Set up context struct for recursion */
              context.root = root;
              context.foreignrel = baserel;
        @@ -971,6 +968,63 @@ deparseUpdateSql(StringInfo buf,
        PlannerInfo *root,
           }

    It is needed for deparsePushedDownUpdateSql to store params in both
    WHERE clauses and expressions to assign to the target columns
    into one params_list list.

Hmm sorry but I am still not getting the point, can you provide some
example to explain this ?

Sorry, maybe my explanation was not enough.  Consider:

postgres=# create foreign table ft1 (a int, b int) server myserver options (table_name 't1');
postgres=# insert into ft1 values (0, 0);
postgres=# prepare mt(int, int) as update ft1 set a = $1 where b = $2;
postgres=# explain verbose execute mt(1, 0);
postgres=# explain verbose execute mt(1, 0);
postgres=# explain verbose execute mt(1, 0);
postgres=# explain verbose execute mt(1, 0);
postgres=# explain verbose execute mt(1, 0);

After the 5 executions of mt we have

postgres=# explain verbose execute mt(1, 0);
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Update on public.ft1  (cost=100.00..140.35 rows=12 width=10)
   ->  Foreign Update on public.ft1  (cost=100.00..140.35 rows=12 width=10)
         Remote SQL: UPDATE public.t1 SET a = $1::integer WHERE ((b = $2::integer))
(3 rows)

If we do that initialization in appendWhereClause, we would get a wrong params_list list and a wrong remote pushed-down query for the last mt() in deparsePushedDownUpdateSql.

Strange, I am seeing same behaviour with or without that initialization in
appendWhereClause. After the 5 executions of mt I with or without I am
getting following output:

postgres=# explain verbose execute mt(1, 0);
                                     QUERY PLAN                                    
------------------------------------------------------------------------------------
 Update on public.ft2  (cost=100.00..140.35 rows=12 width=10)
   ->  Foreign Update on public.ft2  (cost=100.00..140.35 rows=12 width=10)
         Remote SQL: UPDATE public.t2 SET a = $1::integer WHERE ((b = $2::integer))
(3 rows)

 

        .) When Tom Lane and Stephen Frost suggested getting the core
        code involved,
        I thought that we can do the mandatory checks into core it self
        and making
        completely out of dml_is_pushdown_safe(). Please correct me

    The reason why I put that function in postgres_fdw.c is Check point 4:

    +  * 4. We can't push an UPDATE down, if any expressions to assign
    to the target
    +  * columns are unsafe to evaluate on the remote server.

Here I was talking about checks related to triggers, or to LIMIT. I think
earlier thread talked about those mandatory check to the core. So may
be we can move those checks into make_modifytable() before calling
the PlanDMLPushdown.

Noticed that.  Will do.

BTW, I keep a ForeignScan node pushing down an update to the remote server, in the updated patches.  I have to admit that that seems like rather a misnomer.  So, it might be worth adding a new ForeignUpdate node, but my concern about that is that if doing so, we would have a lot of duplicate code in ForeignUpdate and ForeignScan.  What do you think about that?


Yes, I noticed that in the patch and I was about to point that out in my
final review. As first review I was mainly focused on the functionality testing
and other overview things. Another reason I haven't posted that in my
first review round is, I was not quite sure whether we need the
separate new node ForeignUpdate, ForeignDelete  and want to duplicate
code? Was also not quite sure about the fact that what we will achieve
by doing that.

So I thought, I will have this open question in my final review comment,
and will take committer's opinion on this. Since you already raised this
question lets take others opinion on this.

Regards,



--
Rushabh Lathia
www.EnterpriseDB.come

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/01/14 21:36, Rushabh Lathia wrote:
> On Thu, Jan 14, 2016 at 2:00 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:

>     On 2016/01/12 20:31, Rushabh Lathia wrote:

>         On Thu, Jan 7, 2016 at 6:15 PM, Etsuro Fujita
>         <fujita.etsuro@lab.ntt.co.jp
>         <mailto:fujita.etsuro@lab.ntt.co.jp>
>         <mailto:fujita.etsuro@lab.ntt.co.jp
>         <mailto:fujita.etsuro@lab.ntt.co.jp>>> wrote:
>              On 2016/01/06 18:58, Rushabh Lathia wrote:
>                  .) What the need of following change ?
>
>                  @@ -833,9 +833,6 @@ appendWhereClause(StringInfo buf,
>                        int         nestlevel;
>                        ListCell   *lc;
>
>                  -   if (params)
>                  -       *params = NIL;          /* initialize result
>         list to
>                  empty */
>                  -
>                        /* Set up context struct for recursion */
>                        context.root = root;
>                        context.foreignrel = baserel;
>                  @@ -971,6 +968,63 @@ deparseUpdateSql(StringInfo buf,
>                  PlannerInfo *root,
>                     }

>              It is needed for deparsePushedDownUpdateSql to store params
>         in both
>              WHERE clauses and expressions to assign to the target columns
>              into one params_list list.

>         Hmm sorry but I am still not getting the point, can you provide some
>         example to explain this ?

>     Sorry, maybe my explanation was not enough.  Consider:
>
>     postgres=# create foreign table ft1 (a int, b int) server myserver
>     options (table_name 't1');
>     postgres=# insert into ft1 values (0, 0);
>     postgres=# prepare mt(int, int) as update ft1 set a = $1 where b = $2;
>     postgres=# explain verbose execute mt(1, 0);
>     postgres=# explain verbose execute mt(1, 0);
>     postgres=# explain verbose execute mt(1, 0);
>     postgres=# explain verbose execute mt(1, 0);
>     postgres=# explain verbose execute mt(1, 0);
>
>     After the 5 executions of mt we have
>
>     postgres=# explain verbose execute mt(1, 0);
>                                           QUERY PLAN
>     ------------------------------------------------------------------------------------
>       Update on public.ft1  (cost=100.00..140.35 rows=12 width=10)
>         ->  Foreign Update on public.ft1  (cost=100.00..140.35 rows=12
>     width=10)
>               Remote SQL: UPDATE public.t1 SET a = $1::integer WHERE ((b
>     = $2::integer))
>     (3 rows)
>
>     If we do that initialization in appendWhereClause, we would get a
>     wrong params_list list and a wrong remote pushed-down query for the
>     last mt() in deparsePushedDownUpdateSql.

> Strange, I am seeing same behaviour with or without that initialization in
> appendWhereClause. After the 5 executions of mt I with or without I am
> getting following output:
>
> postgres=# explain verbose execute mt(1, 0);
>                                       QUERY PLAN
> ------------------------------------------------------------------------------------
>   Update on public.ft2  (cost=100.00..140.35 rows=12 width=10)
>     ->  Foreign Update on public.ft2  (cost=100.00..140.35 rows=12 width=10)
>           Remote SQL: UPDATE public.t2 SET a = $1::integer WHERE ((b =
> $2::integer))
> (3 rows)

Really?  With that initialization in appendWhereClause, I got the 
following wrong result (note that both parameter numbers are $1):

postgres=# explain verbose execute mt(1, 0);                                     QUERY PLAN
------------------------------------------------------------------------------------ Update on public.ft1
(cost=100.00..140.35rows=12 width=10)   ->  Foreign Update on public.ft1  (cost=100.00..140.35 rows=12 width=10)
Remote SQL: UPDATE public.t1 SET a = $1::integer WHERE ((b = 
 
$1::integer))
(3 rows)

>     BTW, I keep a ForeignScan node pushing down an update to the remote
>     server, in the updated patches.  I have to admit that that seems
>     like rather a misnomer.  So, it might be worth adding a new
>     ForeignUpdate node, but my concern about that is that if doing so,
>     we would have a lot of duplicate code in ForeignUpdate and
>     ForeignScan.  What do you think about that?

> Yes, I noticed that in the patch and I was about to point that out in my
> final review. As first review I was mainly focused on the functionality
> testing
> and other overview things. Another reason I haven't posted that in my
> first review round is, I was not quite sure whether we need the
> separate new node ForeignUpdate, ForeignDelete  and want to duplicate
> code? Was also not quite sure about the fact that what we will achieve
> by doing that.
>
> So I thought, I will have this open question in my final review comment,
> and will take committer's opinion on this. Since you already raised this
> question lets take others opinion on this.

OK, let's do that.

Best regards,
Etsuro Fujita





On 2016/01/08 14:08, Etsuro Fujita wrote:
> On 2016/01/07 21:50, Etsuro Fujita wrote:
>> On 2016/01/06 20:37, Thom Brown wrote:

>>> I've run into an issue:
>>>
>>> *# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
>>> tableoid::regclass;
>>> ERROR:
>>> CONTEXT:  Remote SQL command: UPDATE public.customers SET id = 22
>>> WHERE ((id = 16)) RETURNING NULL

> While working on this, I noticed that the existing postgres_fdw system
> shows similar behavior, so I changed the subject.
>
> IIUC, the reason for that is when the local query specifies "RETURNING
> tableoid::regclass", the FDW has fmstate->has_returning=false while the
> remote query executed at ModifyTable has "RETURNING NULL", as shown in
> the above example; that would cause an abnormal exit in executing the
> remote query in postgresExecForeignUpdate, since that the FDW would get
> PGRES_TUPLES_OK as a result of the query while the FDW would think that
> the right result to get should be PGRES_COMMAND_OK, from the flag
> fmstate->has_returning=false.
>
> Attached is a patch to fix that.

I added this to the next CF.

https://commitfest.postgresql.org/9/483/

Best regards,
Etsuro Fujita





On 12 January 2016 at 11:49, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/01/12 20:36, Thom Brown wrote:
>>
>> On 8 January 2016 at 05:08, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
>> wrote:
>
>
>>>> On 2016/01/06 20:37, Thom Brown wrote:
>>>>>
>>>>> I've run into an issue:
>>>>>
>>>>> *# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
>>>>> tableoid::regclass;
>>>>> ERROR:
>>>>> CONTEXT:  Remote SQL command: UPDATE public.customers SET id = 22
>>>>> WHERE ((id = 16)) RETURNING NULL
>
>
>>> While working on this, I noticed that the existing postgres_fdw system
>>> shows
>>> similar behavior, so I changed the subject.
>>>
>>> IIUC, the reason for that is when the local query specifies "RETURNING
>>> tableoid::regclass", the FDW has fmstate->has_returning=false while the
>>> remote query executed at ModifyTable has "RETURNING NULL", as shown in
>>> the
>>> above example; that would cause an abnormal exit in executing the remote
>>> query in postgresExecForeignUpdate, since that the FDW would get
>>> PGRES_TUPLES_OK as a result of the query while the FDW would think that
>>> the
>>> right result to get should be PGRES_COMMAND_OK, from the flag
>>> fmstate->has_returning=false.
>
>
>>> Attached is a patch to fix that.
>
>
>> I can't apply this patch in tandem with FDW DML pushdown patch (either
>> v2 or v3).
>
>
> That patch is for fixing the similar issue in the existing postgres_fdw
> system.  So, please apply that patch without the DML pushdown patch.  If
> that patch is reasonable as a fix for the issue, I'll update the DML
> pushdown patch (v3) on top of that patch.

The patch seems to work for me:

Before:

*# UPDATE master_customers SET id = 22 WHERE id = 1 RETURNING
tableoid::regclass;
ERROR:
CONTEXT:  Remote SQL command: UPDATE public.customers SET id = $2
WHERE ctid = $1 RETURNING NULL

After:

*# UPDATE master_customers SET id = 22 WHERE id = 1 RETURNING
tableoid::regclass;    tableoid
------------------remote.customers
(1 row)

UPDATE 1

Thom



On Tue, Jan 19, 2016 at 1:59 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>>>> I've run into an issue:
>>>>
>>>> *# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
>>>> tableoid::regclass;
>>>> ERROR:
>>>> CONTEXT:  Remote SQL command: UPDATE public.customers SET id = 22
>>>> WHERE ((id = 16)) RETURNING NULL
>
>> While working on this, I noticed that the existing postgres_fdw system
>> shows similar behavior, so I changed the subject.
>>
>> IIUC, the reason for that is when the local query specifies "RETURNING
>> tableoid::regclass", the FDW has fmstate->has_returning=false while the
>> remote query executed at ModifyTable has "RETURNING NULL", as shown in
>> the above example; that would cause an abnormal exit in executing the
>> remote query in postgresExecForeignUpdate, since that the FDW would get
>> PGRES_TUPLES_OK as a result of the query while the FDW would think that
>> the right result to get should be PGRES_COMMAND_OK, from the flag
>> fmstate->has_returning=false.
>>
>> Attached is a patch to fix that.
>
> I added this to the next CF.
>
> https://commitfest.postgresql.org/9/483/

Uggh, what a mess.  How about passing an additional boolean
"is_returning" to deparseTargetList()?   If false, then
deparseTargetList() behaves as now.  If false, then
deparseTargetList() doesn't append anything at all if there are no
columns to return, instead of (as at present) adding NULL.  On the
other hand, if there are columns to return, then it appends "
RETURNING " before the first column.  Then, deparseReturningList could
skip adding RETURNING itself, and just pass true to
deparseTargetList().  The advantage of this approach is that we don't
end up with two copies of the code that have to stay synchronized -
the decision is made inside deparseTargetList(), and
deparseReturningList() accepts the results.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On 2016/01/20 3:42, Robert Haas wrote:
> On Tue, Jan 19, 2016 at 1:59 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>>>> I've run into an issue:
>>>>>
>>>>> *# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
>>>>> tableoid::regclass;
>>>>> ERROR:
>>>>> CONTEXT:  Remote SQL command: UPDATE public.customers SET id = 22
>>>>> WHERE ((id = 16)) RETURNING NULL

>>> While working on this, I noticed that the existing postgres_fdw system
>>> shows similar behavior, so I changed the subject.
>>>
>>> IIUC, the reason for that is when the local query specifies "RETURNING
>>> tableoid::regclass", the FDW has fmstate->has_returning=false while the
>>> remote query executed at ModifyTable has "RETURNING NULL", as shown in
>>> the above example; that would cause an abnormal exit in executing the
>>> remote query in postgresExecForeignUpdate, since that the FDW would get
>>> PGRES_TUPLES_OK as a result of the query while the FDW would think that
>>> the right result to get should be PGRES_COMMAND_OK, from the flag
>>> fmstate->has_returning=false.
>>>
>>> Attached is a patch to fix that.

>> I added this to the next CF.
>>
>> https://commitfest.postgresql.org/9/483/

> Uggh, what a mess.  How about passing an additional boolean
> "is_returning" to deparseTargetList()?   If false, then
> deparseTargetList() behaves as now.  If false, then
> deparseTargetList() doesn't append anything at all if there are no
> columns to return, instead of (as at present) adding NULL.  On the
> other hand, if there are columns to return, then it appends "
> RETURNING " before the first column.  Then, deparseReturningList could
> skip adding RETURNING itself, and just pass true to
> deparseTargetList().  The advantage of this approach is that we don't
> end up with two copies of the code that have to stay synchronized -

Thanks for the review!  I think that is important.

> the decision is made inside deparseTargetList(), and
> deparseReturningList() accepts the results.

My concern about that is that would make the code in deparseTargetList() 
complicated.

Essentially, I think your propossal needs a two-pass algorithm for 
deparseTargetList; (1) create an integer List of the columns being 
retrieved from the given attrs_used (getRetrievedAttrs()), and (2) print 
those columns (printRetrievedAttrs()).  How about sharing those two 
functions between deparseTargetList and deparseReturningList?:

* In deparseTargetList, perform getRetrievedAttrs().  If 
getRetrievedAttrs()!=NIL, perform printRetrievedAttrs().  Otherwise, 
print NULL.
* In deparseReturningList, perform getRetrievedAttrs() before adding 
RETURNING.  If getRetrievedAttrs()!=NIL, print RETURNING and perform 
printRetrievedAttrs().  Otherwise, do nothing.

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Rushabh Lathia
Date:


On Fri, Jan 15, 2016 at 9:06 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
On 2016/01/14 21:36, Rushabh Lathia wrote:
On Thu, Jan 14, 2016 at 2:00 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:

    On 2016/01/12 20:31, Rushabh Lathia wrote:

        On Thu, Jan 7, 2016 at 6:15 PM, Etsuro Fujita
        <fujita.etsuro@lab.ntt.co.jp
        <mailto:fujita.etsuro@lab.ntt.co.jp>
        <mailto:fujita.etsuro@lab.ntt.co.jp

        <mailto:fujita.etsuro@lab.ntt.co.jp>>> wrote:
             On 2016/01/06 18:58, Rushabh Lathia wrote:
                 .) What the need of following change ?

                 @@ -833,9 +833,6 @@ appendWhereClause(StringInfo buf,
                       int         nestlevel;
                       ListCell   *lc;

                 -   if (params)
                 -       *params = NIL;          /* initialize result
        list to
                 empty */
                 -
                       /* Set up context struct for recursion */
                       context.root = root;
                       context.foreignrel = baserel;
                 @@ -971,6 +968,63 @@ deparseUpdateSql(StringInfo buf,
                 PlannerInfo *root,
                    }

             It is needed for deparsePushedDownUpdateSql to store params
        in both
             WHERE clauses and expressions to assign to the target columns
             into one params_list list.

        Hmm sorry but I am still not getting the point, can you provide some
        example to explain this ?

    Sorry, maybe my explanation was not enough.  Consider:

    postgres=# create foreign table ft1 (a int, b int) server myserver
    options (table_name 't1');
    postgres=# insert into ft1 values (0, 0);
    postgres=# prepare mt(int, int) as update ft1 set a = $1 where b = $2;
    postgres=# explain verbose execute mt(1, 0);
    postgres=# explain verbose execute mt(1, 0);
    postgres=# explain verbose execute mt(1, 0);
    postgres=# explain verbose execute mt(1, 0);
    postgres=# explain verbose execute mt(1, 0);

    After the 5 executions of mt we have

    postgres=# explain verbose execute mt(1, 0);
                                          QUERY PLAN
    ------------------------------------------------------------------------------------
      Update on public.ft1  (cost=100.00..140.35 rows=12 width=10)
        ->  Foreign Update on public.ft1  (cost=100.00..140.35 rows=12
    width=10)
              Remote SQL: UPDATE public.t1 SET a = $1::integer WHERE ((b
    = $2::integer))
    (3 rows)

    If we do that initialization in appendWhereClause, we would get a
    wrong params_list list and a wrong remote pushed-down query for the
    last mt() in deparsePushedDownUpdateSql.

Strange, I am seeing same behaviour with or without that initialization in
appendWhereClause. After the 5 executions of mt I with or without I am
getting following output:

postgres=# explain verbose execute mt(1, 0);
                                      QUERY PLAN
------------------------------------------------------------------------------------
  Update on public.ft2  (cost=100.00..140.35 rows=12 width=10)
    ->  Foreign Update on public.ft2  (cost=100.00..140.35 rows=12 width=10)
          Remote SQL: UPDATE public.t2 SET a = $1::integer WHERE ((b =
$2::integer))
(3 rows)

Really?  With that initialization in appendWhereClause, I got the following wrong result (note that both parameter numbers are $1):

postgres=# explain verbose execute mt(1, 0);
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Update on public.ft1  (cost=100.00..140.35 rows=12 width=10)
   ->  Foreign Update on public.ft1  (cost=100.00..140.35 rows=12 width=10)
         Remote SQL: UPDATE public.t1 SET a = $1::integer WHERE ((b = $1::integer))
(3 rows)


Oops sorry. I got the point now.
 
    BTW, I keep a ForeignScan node pushing down an update to the remote
    server, in the updated patches.  I have to admit that that seems
    like rather a misnomer.  So, it might be worth adding a new
    ForeignUpdate node, but my concern about that is that if doing so,
    we would have a lot of duplicate code in ForeignUpdate and
    ForeignScan.  What do you think about that?

Yes, I noticed that in the patch and I was about to point that out in my
final review. As first review I was mainly focused on the functionality
testing
and other overview things. Another reason I haven't posted that in my
first review round is, I was not quite sure whether we need the
separate new node ForeignUpdate, ForeignDelete  and want to duplicate
code? Was also not quite sure about the fact that what we will achieve
by doing that.

So I thought, I will have this open question in my final review comment,
and will take committer's opinion on this. Since you already raised this
question lets take others opinion on this.

OK, let's do that.


Overall I am quite done with the review of this patch. Patch is in good
shape and covered most of the things which been discussed earlier
or been mentioned during review process. Patch pass through the
make check and also includes good test coverage.

Here are couple of things which is still open for discussion:

1)

.) When Tom Lane and Stephen Frost suggested getting the core code involved,
I thought that we can do the mandatory checks into core it self and making
completely out of dml_is_pushdown_safe(). Please correct me

The reason why I put that function in postgres_fdw.c is Check point 4:

+  * 4. We can't push an UPDATE down, if any expressions to assign to the target
+  * columns are unsafe to evaluate on the remote server.


Here I was talking about checks related to triggers, or to LIMIT. I think
earlier thread talked about those mandatory check to the core. So may
be we can move those checks into make_modifytable() before calling
the PlanDMLPushdown.

This need to handle by the Owner.


2) Decision on whether we need the separate new node ForeignUpdate,
ForeignDelete. In my opinion I really don't see the need of this as we
that will add lot of duplicate. Having said that if committer or someone
else feel like that will make code more clean that is also true,

This need more comments from the committer.

Thanks

Rushabh Lathia
On Wed, Jan 20, 2016 at 4:50 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> My concern about that is that would make the code in deparseTargetList()
> complicated.
>
> Essentially, I think your propossal needs a two-pass algorithm for
> deparseTargetList; (1) create an integer List of the columns being retrieved
> from the given attrs_used (getRetrievedAttrs()), and (2) print those columns
> (printRetrievedAttrs()).  How about sharing those two functions between
> deparseTargetList and deparseReturningList?:

I don't see why we'd need that.  I adjusted the code in postgres_fdw
along the lines I had in mind and am attaching the result.  It doesn't
look complicated to me, and it passes the regression test you wrote.

By the way, I'm not too sure I understand the need for the core
changes that are part of this patch, and I think that point merits
some discussion.  Whenever you change core like this, you're changing
the contract between the FDW and core; it's not just postgres_fdw that
needs updating, but every FDW.  So we'd better be pretty sure we need
these changes and they are adequately justified before we think about
putting them into the tree.  Are these core changes really needed
here, or can we fix this whole issue in postgres_fdw and leave the
core code alone?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment
On 2016/01/21 5:06, Robert Haas wrote:
> On Wed, Jan 20, 2016 at 4:50 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> My concern about that is that would make the code in deparseTargetList()
>> complicated.
>>
>> Essentially, I think your propossal needs a two-pass algorithm for
>> deparseTargetList; (1) create an integer List of the columns being retrieved
>> from the given attrs_used (getRetrievedAttrs()), and (2) print those columns
>> (printRetrievedAttrs()).  How about sharing those two functions between
>> deparseTargetList and deparseReturningList?:

> I don't see why we'd need that.  I adjusted the code in postgres_fdw
> along the lines I had in mind and am attaching the result.  It doesn't
> look complicated to me, and it passes the regression test you wrote.

Thanks for the patch!  From the patch, I correctly understand what you 
proposed.  Good idea!

> By the way, I'm not too sure I understand the need for the core
> changes that are part of this patch, and I think that point merits
> some discussion.  Whenever you change core like this, you're changing
> the contract between the FDW and core; it's not just postgres_fdw that
> needs updating, but every FDW.  So we'd better be pretty sure we need
> these changes and they are adequately justified before we think about
> putting them into the tree.  Are these core changes really needed
> here, or can we fix this whole issue in postgres_fdw and leave the
> core code alone?

Well, if we think it is the FDW's responsibility to insert a valid value 
for tableoid in the returned slot during ExecForeignInsert, 
ExecForeignUpdate or ExecForeignDelete, we don't need those core 
changes.  However, I think it would be better that that's done by 
ModifyTable in the same way as ForeignScan does in ForeignNext, IMO. 
That eliminates the need for postgres_fdw or any other FDW to do that 
business in the callback routines.

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/01/20 19:57, Rushabh Lathia wrote:
> Overall I am quite done with the review of this patch. Patch is in good
> shape and covered most of the things which been discussed earlier
> or been mentioned during review process. Patch pass through the
> make check and also includes good test coverage.

Thanks for the review!

> Here are couple of things which is still open for discussion:

> 1)
>         .) When Tom Lane and Stephen Frost suggested getting the core
>         code involved,
>         I thought that we can do the mandatory checks into core it self
>         and making
>         completely out of dml_is_pushdown_safe(). Please correct me

>     The reason why I put that function in postgres_fdw.c is Check point 4:
>
>     +  * 4. We can't push an UPDATE down, if any expressions to assign
>     to the target
>     +  * columns are unsafe to evaluate on the remote server.

> Here I was talking about checks related to triggers, or to LIMIT. I think
> earlier thread talked about those mandatory check to the core. So may
> be we can move those checks into make_modifytable() before calling
> the PlanDMLPushdown.
>
> This need to handle by the Owner.

Done.  For that, I modified relation_has_row_triggers a bit, renamed it
to has_row_triggers (more shortly), and moved it to plancat.c.  And I
merged dml_is_pushdown_safe with postgresPlanDMLPushdown, and revised
that callback routine a bit.  Attached is an updated version of the
patch created on top of Robert's version of the patch [1], which fixes
handling of RETURNING tableoid in updating foreign tables.

> 2) Decision on whether we need the separate new node ForeignUpdate,
> ForeignDelete. In my opinion I really don't see the need of this as we
> that will add lot of duplicate. Having said that if committer or someone
> else feel like that will make code more clean that is also true,
>
> This need more comments from the committer.

I agree with you.

Other changes:

* In previous version, I assumed that PlanDMLPushdown sets fsSystemCol
to true when rewriting the ForeignScan plan node so as to push down an
UPDATE/DELETE to the remote server, in order to initialize t_tableOid
for the scan tuple in ForeignNext.  The reason is that I created the
patch so that the scan tuple is provided to the local query's RETURNING
computation, which might see the tableoid column.  In this version,
however, I modified the patch so that the tableoid value is inserted by
ModifyTable.  This eliminates the need for postgres_fdw (or any other
FDW) to set fsSystemCol to true in PlanDMLPushdown.

* Add set_transmission_modes/reset_transmission_modes to
deparsePushedDownUpdateSql.

* Revise comments a bit further.

* Revise docs, including a fix for a wrong copy-and-paste.

Best regards,
Etsuro Fujita

[1]
http://www.postgresql.org/message-id/CA+TgmoZ40j2uC5aC1NXu03oj4CrVOLkS15XX+PTFP-1U-8zR1Q@mail.gmail.com

Attachment
On 2016/01/19 19:04, Thom Brown wrote:
> On 12 January 2016 at 11:49, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
>> On 2016/01/12 20:36, Thom Brown wrote:

>>> On 8 January 2016 at 05:08, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
>>> wrote:

>>>>> On 2016/01/06 20:37, Thom Brown wrote:
>>>>>>
>>>>>> I've run into an issue:
>>>>>>
>>>>>> *# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
>>>>>> tableoid::regclass;
>>>>>> ERROR:
>>>>>> CONTEXT:  Remote SQL command: UPDATE public.customers SET id = 22
>>>>>> WHERE ((id = 16)) RETURNING NULL

>>>> While working on this, I noticed that the existing postgres_fdw system
>>>> shows
>>>> similar behavior, so I changed the subject.
>>>>
>>>> IIUC, the reason for that is when the local query specifies "RETURNING
>>>> tableoid::regclass", the FDW has fmstate->has_returning=false while the
>>>> remote query executed at ModifyTable has "RETURNING NULL", as shown in
>>>> the
>>>> above example; that would cause an abnormal exit in executing the remote
>>>> query in postgresExecForeignUpdate, since that the FDW would get
>>>> PGRES_TUPLES_OK as a result of the query while the FDW would think that
>>>> the
>>>> right result to get should be PGRES_COMMAND_OK, from the flag
>>>> fmstate->has_returning=false.

>>>> Attached is a patch to fix that.

>>> I can't apply this patch in tandem with FDW DML pushdown patch (either
>>> v2 or v3).

>> That patch is for fixing the similar issue in the existing postgres_fdw
>> system.  So, please apply that patch without the DML pushdown patch.  If
>> that patch is reasonable as a fix for the issue, I'll update the DML
>> pushdown patch (v3) on top of that patch.

> The patch seems to work for me:
>
> Before:
>
> *# UPDATE master_customers SET id = 22 WHERE id = 1 RETURNING
> tableoid::regclass;
> ERROR:
> CONTEXT:  Remote SQL command: UPDATE public.customers SET id = $2
> WHERE ctid = $1 RETURNING NULL
>
> After:
>
> *# UPDATE master_customers SET id = 22 WHERE id = 1 RETURNING
> tableoid::regclass;
>       tableoid
> ------------------
>   remote.customers
> (1 row)
>
> UPDATE 1

Thanks for the testing!

I updated the DML pushdown patch on top of Robert's version of this 
bugfix patch.  Please see

http://www.postgresql.org/message-id/56A0A9F0.9090304@lab.ntt.co.jp

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Rushabh Lathia
Date:


On Thu, Jan 21, 2016 at 3:20 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
On 2016/01/20 19:57, Rushabh Lathia wrote:
Overall I am quite done with the review of this patch. Patch is in good
shape and covered most of the things which been discussed earlier
or been mentioned during review process. Patch pass through the
make check and also includes good test coverage.

Thanks for the review!

Here are couple of things which is still open for discussion:

1)
        .) When Tom Lane and Stephen Frost suggested getting the core
        code involved,
        I thought that we can do the mandatory checks into core it self
        and making
        completely out of dml_is_pushdown_safe(). Please correct me

    The reason why I put that function in postgres_fdw.c is Check point 4:

    +  * 4. We can't push an UPDATE down, if any expressions to assign
    to the target
    +  * columns are unsafe to evaluate on the remote server.

Here I was talking about checks related to triggers, or to LIMIT. I think
earlier thread talked about those mandatory check to the core. So may
be we can move those checks into make_modifytable() before calling
the PlanDMLPushdown.

This need to handle by the Owner.

Done.  For that, I modified relation_has_row_triggers a bit, renamed it to has_row_triggers (more shortly), and moved it to plancat.c.  And I merged dml_is_pushdown_safe with postgresPlanDMLPushdown, and revised that callback routine a bit.  Attached is an updated version of the patch created on top of Robert's version of the patch [1], which fixes handling of RETURNING tableoid in updating foreign tables.


This looks great.
 

2) Decision on whether we need the separate new node ForeignUpdate,
ForeignDelete. In my opinion I really don't see the need of this as we
that will add lot of duplicate. Having said that if committer or someone
else feel like that will make code more clean that is also true,

This need more comments from the committer.

I agree with you.

Other changes:

* In previous version, I assumed that PlanDMLPushdown sets fsSystemCol to true when rewriting the ForeignScan plan node so as to push down an UPDATE/DELETE to the remote server, in order to initialize t_tableOid for the scan tuple in ForeignNext.  The reason is that I created the patch so that the scan tuple is provided to the local query's RETURNING computation, which might see the tableoid column.  In this version, however, I modified the patch so that the tableoid value is inserted by ModifyTable.  This eliminates the need for postgres_fdw (or any other FDW) to set fsSystemCol to true in PlanDMLPushdown.

* Add set_transmission_modes/reset_transmission_modes to deparsePushedDownUpdateSql.

* Revise comments a bit further.

* Revise docs, including a fix for a wrong copy-and-paste.

Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/CA+TgmoZ40j2uC5aC1NXu03oj4CrVOLkS15XX+PTFP-1U-8zR1Q@mail.gmail.com

Here are couple of comments:

1)

int
IsForeignRelUpdatable (Relation rel);


Documentation for IsForeignUpdatable() need to change as it says:

If the IsForeignRelUpdatable pointer is set to NULL, foreign tables are assumed
to be insertable, updatable, or deletable if the FDW provides ExecForeignInsert,
ExecForeignUpdate or ExecForeignDelete respectively.

With introduce of DMLPushdown API now this is no more correct, as even if
FDW don't provide ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete API
still foreign tables are assumed to be updatable or deletable with DMLPushdown
API's, right ?


2)

+     /* SQL statement to execute remotely (as a String node) */
+     FdwDmlPushdownPrivateUpdateSql,

FdwDmlPushdownPrivateUpdateSql holds the UPDATE/DELETE query, so name should
be something like FdwDmlPushdownPrivateQuery os FdwDmlPushdownPrivateSql ?


Later I realized that for FdwModifyPrivateIndex too the index name is
FdwModifyPrivateUpdateSql even though its holding any DML query. Not sure
whether we should consider to change this or not ?

Apart from this perform sanity testing on the new patch and things working
as expected.


--
Rushabh Lathia

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/01/25 17:03, Rushabh Lathia wrote:
> Here are couple of comments:

> 1)
>
> int
> IsForeignRelUpdatable (Relation rel);

> Documentation for IsForeignUpdatable() need to change as it says:
>
> If the IsForeignRelUpdatable pointer is set to NULL, foreign tables are
> assumed
> to be insertable, updatable, or deletable if the FDW provides
> ExecForeignInsert,
> ExecForeignUpdate or ExecForeignDelete respectively.
>
> With introduce of DMLPushdown API now this is no more correct, as even if
> FDW don't provide ExecForeignInsert, ExecForeignUpdate or
> ExecForeignDelete API
> still foreign tables are assumed to be updatable or deletable with
> DMLPushdown
> API's, right ?

That's what I'd like to discuss.

I intentionally leave that as-is, because I think we should determine 
the updatability of a foreign table in the current manner.  As you 
pointed out, even if the FDW doesn't provide eg, ExecForeignUpdate, an 
UPDATE on a foreign table could be done using the DML pushdown APIs if 
the UPDATE is *pushdown-safe*.  However, since all UPDATEs on the 
foreign table are not necessarily pushdown-safe, I'm not sure it's a 
good idea to assume the table-level updatability if the FDW provides the 
DML pushdown callback routines.  To keep the existing updatability 
decision, I think the FDW should provide the DML pushdown callback 
routines together with ExecForeignInsert, ExecForeignUpdate, or 
ExecForeignDelete.  What do you think about that?

> 2)
>
> +     /* SQL statement to execute remotely (as a String node) */
> +     FdwDmlPushdownPrivateUpdateSql,
>
> FdwDmlPushdownPrivateUpdateSql holds the UPDATE/DELETE query, so name should
> be something like FdwDmlPushdownPrivateQuery os FdwDmlPushdownPrivateSql ?

> Later I realized that for FdwModifyPrivateIndex too the index name is
> FdwModifyPrivateUpdateSql even though its holding any DML query. Not sure
> whether we should consider to change this or not ?

To tell the truth, I imitated FdwModifyPrivateIndex when adding 
FdwDmlPushdownPrivateIndex, because I think "UpdateSql" means INSERT, 
UPDATE, or DELETE, not just UPDATE.  (IsForeignRelUpdatable discussed 
above reports not only the updatability but the insertability and 
deletability of a foreign table!).  So, +1 for leaving that as-is.

> Apart from this perform sanity testing on the new patch and things working
> as expected.

Thanks for the review!

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Rushabh Lathia
Date:


On Tue, Jan 26, 2016 at 4:15 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
On 2016/01/25 17:03, Rushabh Lathia wrote:
Here are couple of comments:

1)

int
IsForeignRelUpdatable (Relation rel);

Documentation for IsForeignUpdatable() need to change as it says:

If the IsForeignRelUpdatable pointer is set to NULL, foreign tables are
assumed
to be insertable, updatable, or deletable if the FDW provides
ExecForeignInsert,
ExecForeignUpdate or ExecForeignDelete respectively.

With introduce of DMLPushdown API now this is no more correct, as even if
FDW don't provide ExecForeignInsert, ExecForeignUpdate or
ExecForeignDelete API
still foreign tables are assumed to be updatable or deletable with
DMLPushdown
API's, right ?

That's what I'd like to discuss.

I intentionally leave that as-is, because I think we should determine the updatability of a foreign table in the current manner.  As you pointed out, even if the FDW doesn't provide eg, ExecForeignUpdate, an UPDATE on a foreign table could be done using the DML pushdown APIs if the UPDATE is *pushdown-safe*.  However, since all UPDATEs on the foreign table are not necessarily pushdown-safe, I'm not sure it's a good idea to assume the table-level updatability if the FDW provides the DML pushdown callback routines.  To keep the existing updatability decision, I think the FDW should provide the DML pushdown callback routines together with ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete.  What do you think about that?


Sorry but I am not in favour of adding compulsion that FDW should provide
the DML pushdown callback routines together with existing ExecForeignInsert,
ExecForeignUpdate or ExecForeignDelete APIs.

May be we should change the documentation in such way, that explains

a) If FDW PlanDMLPushdown is NULL, then check for ExecForeignInsert,
ExecForeignUpdate or ExecForeignDelete APIs
b) If FDW PlanDMLPushdown is non-NULL and plan is not pushable
check for ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete APIs
c) If FDW PlanDMLPushdown is non-NULL and plan is pushable
check for DMLPushdown APIs.

Does this sounds wired ?

  
2)

+     /* SQL statement to execute remotely (as a String node) */
+     FdwDmlPushdownPrivateUpdateSql,

FdwDmlPushdownPrivateUpdateSql holds the UPDATE/DELETE query, so name should
be something like FdwDmlPushdownPrivateQuery os FdwDmlPushdownPrivateSql ?

Later I realized that for FdwModifyPrivateIndex too the index name is
FdwModifyPrivateUpdateSql even though its holding any DML query. Not sure
whether we should consider to change this or not ?

To tell the truth, I imitated FdwModifyPrivateIndex when adding FdwDmlPushdownPrivateIndex, because I think "UpdateSql" means INSERT, UPDATE, or DELETE, not just UPDATE.  (IsForeignRelUpdatable discussed above reports not only the updatability but the insertability and deletability of a foreign table!).  So, +1 for leaving that as-is.


Make sense for now.
 
Apart from this perform sanity testing on the new patch and things working
as expected.

Thanks for the review!

Best regards,
Etsuro Fujita





--
Rushabh Lathia

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/01/26 22:57, Rushabh Lathia wrote:
> On Tue, Jan 26, 2016 at 4:15 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
>
>     On 2016/01/25 17:03, Rushabh Lathia wrote:

>         int
>         IsForeignRelUpdatable (Relation rel);

>         Documentation for IsForeignUpdatable() need to change as it says:
>
>         If the IsForeignRelUpdatable pointer is set to NULL, foreign
>         tables are
>         assumed
>         to be insertable, updatable, or deletable if the FDW provides
>         ExecForeignInsert,
>         ExecForeignUpdate or ExecForeignDelete respectively.
>
>         With introduce of DMLPushdown API now this is no more correct,
>         as even if
>         FDW don't provide ExecForeignInsert, ExecForeignUpdate or
>         ExecForeignDelete API
>         still foreign tables are assumed to be updatable or deletable with
>         DMLPushdown
>         API's, right ?

>     That's what I'd like to discuss.
>
>     I intentionally leave that as-is, because I think we should
>     determine the updatability of a foreign table in the current
>     manner.  As you pointed out, even if the FDW doesn't provide eg,
>     ExecForeignUpdate, an UPDATE on a foreign table could be done using
>     the DML pushdown APIs if the UPDATE is *pushdown-safe*.  However,
>     since all UPDATEs on the foreign table are not necessarily
>     pushdown-safe, I'm not sure it's a good idea to assume the
>     table-level updatability if the FDW provides the DML pushdown
>     callback routines.  To keep the existing updatability decision, I
>     think the FDW should provide the DML pushdown callback routines
>     together with ExecForeignInsert, ExecForeignUpdate, or
>     ExecForeignDelete.  What do you think about that?

> Sorry but I am not in favour of adding compulsion that FDW should provide
> the DML pushdown callback routines together with existing ExecForeignInsert,
> ExecForeignUpdate or ExecForeignDelete APIs.
>
> May be we should change the documentation in such way, that explains
>
> a) If FDW PlanDMLPushdown is NULL, then check for ExecForeignInsert,
> ExecForeignUpdate or ExecForeignDelete APIs
> b) If FDW PlanDMLPushdown is non-NULL and plan is not pushable
> check for ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete APIs
> c) If FDW PlanDMLPushdown is non-NULL and plan is pushable
> check for DMLPushdown APIs.
>
> Does this sounds wired ?

Yeah, but I think that that would be what is done during executor 
startup (see CheckValidResultRel()), while what the documentation is 
saying is about relation_is_updatable(); that is, how to decide the 
updatability of a given foreign table, not how the executor processes an 
individual INSERT/UPDATE/DELETE on a updatable foreign table.  So, I'm 
not sure it's a good idea to modify the documentation in such a way.

BTW, I have added the description about that check partially.  I added 
to the PlanDMLPushdown documentation:

+      If this function succeeds, <function>PlanForeignModify</>
+      won't be executed, and <function>BeginDMLPushdown</>,
+      <function>IterateDMLPushdown</> and <function>EndDMLPushdown</> will
+      be called at the execution stage, instead.

And for example, I added to the BeginDMLPushdown documentation:

+      If the <function>BeginDMLPushdown</> pointer is set to
+      <literal>NULL</>, attempts to execute the table update directly on
+      the remote server will fail with an error message.

However, I agree that we should add a documentation note about the 
compulsion somewhere.  Maybe something like this:

The FDW should provide DML pushdown callback routines together with 
table-updating callback routines described above.  Even if the callback 
routines are provided, the updatability of a foreign table is determined 
based on the presence of ExecForeignInsert, ExecForeignUpdate or 
ExecForeignDelete if the IsForeignRelUpdatable pointer is set to NULL.

What's your opinion?

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/01/27 12:20, Etsuro Fujita wrote:
> On 2016/01/26 22:57, Rushabh Lathia wrote:
>> On Tue, Jan 26, 2016 at 4:15 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
>>
>>     On 2016/01/25 17:03, Rushabh Lathia wrote:

>>         int
>>         IsForeignRelUpdatable (Relation rel);

>>         Documentation for IsForeignUpdatable() need to change as it says:
>>
>>         If the IsForeignRelUpdatable pointer is set to NULL, foreign
>>         tables are
>>         assumed
>>         to be insertable, updatable, or deletable if the FDW provides
>>         ExecForeignInsert,
>>         ExecForeignUpdate or ExecForeignDelete respectively.
>>
>>         With introduce of DMLPushdown API now this is no more correct,
>>         as even if
>>         FDW don't provide ExecForeignInsert, ExecForeignUpdate or
>>         ExecForeignDelete API
>>         still foreign tables are assumed to be updatable or deletable
>> with
>>         DMLPushdown
>>         API's, right ?

>>     That's what I'd like to discuss.
>>
>>     I intentionally leave that as-is, because I think we should
>>     determine the updatability of a foreign table in the current
>>     manner.  As you pointed out, even if the FDW doesn't provide eg,
>>     ExecForeignUpdate, an UPDATE on a foreign table could be done using
>>     the DML pushdown APIs if the UPDATE is *pushdown-safe*.  However,
>>     since all UPDATEs on the foreign table are not necessarily
>>     pushdown-safe, I'm not sure it's a good idea to assume the
>>     table-level updatability if the FDW provides the DML pushdown
>>     callback routines.  To keep the existing updatability decision, I
>>     think the FDW should provide the DML pushdown callback routines
>>     together with ExecForeignInsert, ExecForeignUpdate, or
>>     ExecForeignDelete.  What do you think about that?

>> Sorry but I am not in favour of adding compulsion that FDW should provide
>> the DML pushdown callback routines together with existing
>> ExecForeignInsert,
>> ExecForeignUpdate or ExecForeignDelete APIs.
>>
>> May be we should change the documentation in such way, that explains
>>
>> a) If FDW PlanDMLPushdown is NULL, then check for ExecForeignInsert,
>> ExecForeignUpdate or ExecForeignDelete APIs
>> b) If FDW PlanDMLPushdown is non-NULL and plan is not pushable
>> check for ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete APIs
>> c) If FDW PlanDMLPushdown is non-NULL and plan is pushable
>> check for DMLPushdown APIs.
>>
>> Does this sounds wired ?

> Yeah, but I think that that would be what is done during executor
> startup (see CheckValidResultRel()), while what the documentation is
> saying is about relation_is_updatable(); that is, how to decide the
> updatability of a given foreign table, not how the executor processes an
> individual INSERT/UPDATE/DELETE on a updatable foreign table.  So, I'm
> not sure it's a good idea to modify the documentation in such a way.

> However, I agree that we should add a documentation note about the
> compulsion somewhere.  Maybe something like this:
>
> The FDW should provide DML pushdown callback routines together with
> table-updating callback routines described above.  Even if the callback
> routines are provided, the updatability of a foreign table is determined
> based on the presence of ExecForeignInsert, ExecForeignUpdate or
> ExecForeignDelete if the IsForeignRelUpdatable pointer is set to NULL.

On second thought, I think it might be okay to assume the presence of 
PlanDMLPushdown, BeginDMLPushdown, IterateDMLPushdown, and 
EndDMLPushdown is also sufficient for the insertablity, updatability, 
and deletability of a foreign table, if the IsForeignRelUpdatable 
pointer is set to NULL.  How about modifying the documentation like this:

If the IsForeignRelUpdatable pointer is set to NULL, foreign tables are 
assumed to be insertable, updatable, or deletable if the FDW provides 
ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete respectively, 
or if the FDW provides PlanDMLPushdown, BeginDMLPushdown, 
IterateDMLPushdown, and EndDMLPushdown described below.

Of course, we also need to modify relation_is_updatable() accordingly.

What's your opinion?

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Rushabh Lathia
Date:


On Wed, Jan 27, 2016 at 2:50 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
On 2016/01/27 12:20, Etsuro Fujita wrote:
On 2016/01/26 22:57, Rushabh Lathia wrote:
On Tue, Jan 26, 2016 at 4:15 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:

    On 2016/01/25 17:03, Rushabh Lathia wrote:

        int
        IsForeignRelUpdatable (Relation rel);

        Documentation for IsForeignUpdatable() need to change as it says:

        If the IsForeignRelUpdatable pointer is set to NULL, foreign
        tables are
        assumed
        to be insertable, updatable, or deletable if the FDW provides
        ExecForeignInsert,
        ExecForeignUpdate or ExecForeignDelete respectively.

        With introduce of DMLPushdown API now this is no more correct,
        as even if
        FDW don't provide ExecForeignInsert, ExecForeignUpdate or
        ExecForeignDelete API
        still foreign tables are assumed to be updatable or deletable
with
        DMLPushdown
        API's, right ?

    That's what I'd like to discuss.

    I intentionally leave that as-is, because I think we should
    determine the updatability of a foreign table in the current
    manner.  As you pointed out, even if the FDW doesn't provide eg,
    ExecForeignUpdate, an UPDATE on a foreign table could be done using
    the DML pushdown APIs if the UPDATE is *pushdown-safe*.  However,
    since all UPDATEs on the foreign table are not necessarily
    pushdown-safe, I'm not sure it's a good idea to assume the
    table-level updatability if the FDW provides the DML pushdown
    callback routines.  To keep the existing updatability decision, I
    think the FDW should provide the DML pushdown callback routines
    together with ExecForeignInsert, ExecForeignUpdate, or
    ExecForeignDelete.  What do you think about that?

Sorry but I am not in favour of adding compulsion that FDW should provide
the DML pushdown callback routines together with existing
ExecForeignInsert,
ExecForeignUpdate or ExecForeignDelete APIs.

May be we should change the documentation in such way, that explains

a) If FDW PlanDMLPushdown is NULL, then check for ExecForeignInsert,
ExecForeignUpdate or ExecForeignDelete APIs
b) If FDW PlanDMLPushdown is non-NULL and plan is not pushable
check for ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete APIs
c) If FDW PlanDMLPushdown is non-NULL and plan is pushable
check for DMLPushdown APIs.

Does this sounds wired ?

Yeah, but I think that that would be what is done during executor
startup (see CheckValidResultRel()), while what the documentation is
saying is about relation_is_updatable(); that is, how to decide the
updatability of a given foreign table, not how the executor processes an
individual INSERT/UPDATE/DELETE on a updatable foreign table.  So, I'm
not sure it's a good idea to modify the documentation in such a way.

However, I agree that we should add a documentation note about the
compulsion somewhere.  Maybe something like this:

The FDW should provide DML pushdown callback routines together with
table-updating callback routines described above.  Even if the callback
routines are provided, the updatability of a foreign table is determined
based on the presence of ExecForeignInsert, ExecForeignUpdate or
ExecForeignDelete if the IsForeignRelUpdatable pointer is set to NULL.

On second thought, I think it might be okay to assume the presence of PlanDMLPushdown, BeginDMLPushdown, IterateDMLPushdown, and EndDMLPushdown is also sufficient for the insertablity, updatability, and deletability of a foreign table, if the IsForeignRelUpdatable pointer is set to NULL.  How about modifying the documentation like this:

If the IsForeignRelUpdatable pointer is set to NULL, foreign tables are assumed to be insertable, updatable, or deletable if the FDW provides ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete respectively, or if the FDW provides PlanDMLPushdown, BeginDMLPushdown, IterateDMLPushdown, and EndDMLPushdown described below.

Of course, we also need to modify relation_is_updatable() accordingly.


What's your opinion?


If I understood correctly, above documentation means, that if FDW have DMLPushdown APIs that is enough. But in reality thats not the case, we need  ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete in case DML is not pushable.

And here fact is DMLPushdown APIs are optional for FDW, so that if FDW don't have DMLPushdown APIs they can still very well perform the DML operations using ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete. So documentation should be like:

If the IsForeignRelUpdatable pointer is set to NULL, foreign tables are assumed to be insertable, updatable, or deletable if the FDW provides ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete respectively,

If FDW provides DMLPushdown APIs and the DML are pushable to the foreign server, then FDW still needs ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete for the non-pushable DML operation.

What's your opinion ?


Best regards,
Etsuro Fujita





--
Rushabh Lathia
On Thu, Jan 21, 2016 at 4:05 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>> By the way, I'm not too sure I understand the need for the core
>> changes that are part of this patch, and I think that point merits
>> some discussion.  Whenever you change core like this, you're changing
>> the contract between the FDW and core; it's not just postgres_fdw that
>> needs updating, but every FDW.  So we'd better be pretty sure we need
>> these changes and they are adequately justified before we think about
>> putting them into the tree.  Are these core changes really needed
>> here, or can we fix this whole issue in postgres_fdw and leave the
>> core code alone?
>
> Well, if we think it is the FDW's responsibility to insert a valid value for
> tableoid in the returned slot during ExecForeignInsert, ExecForeignUpdate or
> ExecForeignDelete, we don't need those core changes.  However, I think it
> would be better that that's done by ModifyTable in the same way as
> ForeignScan does in ForeignNext, IMO. That eliminates the need for
> postgres_fdw or any other FDW to do that business in the callback routines.

I'm not necessarily opposed to the core changes, but I want to
understand better what complexity they are avoiding.  Can you send a
version of this patch that only touches postgres_fdw, so I can
compare?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/01/27 21:23, Rushabh Lathia wrote:
> If I understood correctly, above documentation means, that if FDW have
> DMLPushdown APIs that is enough. But in reality thats not the case, we
> need  ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete in case
> DML is not pushable.
>
> And here fact is DMLPushdown APIs are optional for FDW, so that if FDW
> don't have DMLPushdown APIs they can still very well perform the DML
> operations using ExecForeignInsert, ExecForeignUpdate, or
> ExecForeignDelete.

I agree with you.  I guess I was wrong. sorry.

> So documentation should be like:
>
> If the IsForeignRelUpdatable pointer is set to NULL, foreign tables are
> assumed to be insertable, updatable, or deletable if the FDW provides
> ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete respectively,
>
> If FDW provides DMLPushdown APIs and the DML are pushable to the foreign
> server, then FDW still needs ExecForeignInsert, ExecForeignUpdate, or
> ExecForeignDelete for the non-pushable DML operation.
>
> What's your opinion ?

I agree that we should add this to the documentation, too.

BTW, if I understand correctly, I think we should also modify 
relation_is_updatabale() accordingly.  Am I right?

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Rushabh Lathia
Date:


On Thu, Jan 28, 2016 at 11:33 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
On 2016/01/27 21:23, Rushabh Lathia wrote:
If I understood correctly, above documentation means, that if FDW have
DMLPushdown APIs that is enough. But in reality thats not the case, we
need  ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete in case
DML is not pushable.

And here fact is DMLPushdown APIs are optional for FDW, so that if FDW
don't have DMLPushdown APIs they can still very well perform the DML
operations using ExecForeignInsert, ExecForeignUpdate, or
ExecForeignDelete.

I agree with you.  I guess I was wrong. sorry.

So documentation should be like:

If the IsForeignRelUpdatable pointer is set to NULL, foreign tables are
assumed to be insertable, updatable, or deletable if the FDW provides
ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete respectively,

If FDW provides DMLPushdown APIs and the DML are pushable to the foreign
server, then FDW still needs ExecForeignInsert, ExecForeignUpdate, or
ExecForeignDelete for the non-pushable DML operation.

What's your opinion ?

I agree that we should add this to the documentation, too.

BTW, if I understand correctly, I think we should also modify relation_is_updatabale() accordingly.  Am I right?

Yep, we need to modify relation_is_updatable().
 

Best regards,
Etsuro Fujita





--
Rushabh Lathia

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/01/28 15:20, Rushabh Lathia wrote:
> On Thu, Jan 28, 2016 at 11:33 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
>
>     On 2016/01/27 21:23, Rushabh Lathia wrote:
>
>         If I understood correctly, above documentation means, that if
>         FDW have
>         DMLPushdown APIs that is enough. But in reality thats not the
>         case, we
>         need  ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete
>         in case
>         DML is not pushable.
>
>         And here fact is DMLPushdown APIs are optional for FDW, so that
>         if FDW
>         don't have DMLPushdown APIs they can still very well perform the DML
>         operations using ExecForeignInsert, ExecForeignUpdate, or
>         ExecForeignDelete.

>     I agree with you.  I guess I was wrong. sorry.
>
>         So documentation should be like:
>
>         If the IsForeignRelUpdatable pointer is set to NULL, foreign
>         tables are
>         assumed to be insertable, updatable, or deletable if the FDW
>         provides
>         ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete
>         respectively,
>
>         If FDW provides DMLPushdown APIs and the DML are pushable to the
>         foreign
>         server, then FDW still needs ExecForeignInsert,
>         ExecForeignUpdate, or
>         ExecForeignDelete for the non-pushable DML operation.
>
>         What's your opinion ?

>     I agree that we should add this to the documentation, too.
>
>     BTW, if I understand correctly, I think we should also modify
>     relation_is_updatabale() accordingly.  Am I right?

> Yep, we need to modify relation_is_updatable().

OK, will do.

Best regards,
Etsuro Fujita





On 2016/01/28 12:58, Robert Haas wrote:
> On Thu, Jan 21, 2016 at 4:05 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>> By the way, I'm not too sure I understand the need for the core
>>> changes that are part of this patch, and I think that point merits
>>> some discussion.  Whenever you change core like this, you're changing
>>> the contract between the FDW and core; it's not just postgres_fdw that
>>> needs updating, but every FDW.  So we'd better be pretty sure we need
>>> these changes and they are adequately justified before we think about
>>> putting them into the tree.  Are these core changes really needed
>>> here, or can we fix this whole issue in postgres_fdw and leave the
>>> core code alone?

>> Well, if we think it is the FDW's responsibility to insert a valid value for
>> tableoid in the returned slot during ExecForeignInsert, ExecForeignUpdate or
>> ExecForeignDelete, we don't need those core changes.  However, I think it
>> would be better that that's done by ModifyTable in the same way as
>> ForeignScan does in ForeignNext, IMO. That eliminates the need for
>> postgres_fdw or any other FDW to do that business in the callback routines.

> I'm not necessarily opposed to the core changes, but I want to
> understand better what complexity they are avoiding.  Can you send a
> version of this patch that only touches postgres_fdw, so I can
> compare?

Attached is that version of the patch.

I think that postgres_fdw might be able to insert a tableoid value in
the returned slot in e.g., postgresExecForeignInsert if AFTER ROW
Triggers or RETURNING expressions reference that value, but I didn't do
anything about that.

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/01/28 15:20, Rushabh Lathia wrote:
> On Thu, Jan 28, 2016 at 11:33 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
>
>     On 2016/01/27 21:23, Rushabh Lathia wrote:
>
>         If I understood correctly, above documentation means, that if
>         FDW have
>         DMLPushdown APIs that is enough. But in reality thats not the
>         case, we
>         need  ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete
>         in case
>         DML is not pushable.
>
>         And here fact is DMLPushdown APIs are optional for FDW, so that
>         if FDW
>         don't have DMLPushdown APIs they can still very well perform the DML
>         operations using ExecForeignInsert, ExecForeignUpdate, or
>         ExecForeignDelete.

>         So documentation should be like:
>
>         If the IsForeignRelUpdatable pointer is set to NULL, foreign
>         tables are
>         assumed to be insertable, updatable, or deletable if the FDW
>         provides
>         ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete
>         respectively,
>
>         If FDW provides DMLPushdown APIs and the DML are pushable to the
>         foreign
>         server, then FDW still needs ExecForeignInsert,
>         ExecForeignUpdate, or
>         ExecForeignDelete for the non-pushable DML operation.
>
>         What's your opinion ?

>     I agree that we should add this to the documentation, too.

I added docs to the IsForeignRelUpdatable documentation.  Also, a brief
introductory remark has been added at the beginning of the DML pushdown
APIs' documentation.

>     BTW, if I understand correctly, I think we should also modify
>     relation_is_updatabale() accordingly.  Am I right?

> Yep, we need to modify relation_is_updatable().

I thought I'd modify that function in the same way as
CheckValidResultRel(), but I noticed that we cannot do that, because we
don't have any information on whether each update is pushed down to the
remote server by PlanDMLPushdown, during relation_is_updatabale().  So,
I left that function as-is.  relation_is_updatabale() is just used for
display in the information_schema views, so ISTM that that function is
fine as-is.  (As for CheckValidResultRel(), I revised it so as to check
the presence of DML pushdown APIs after checking the existing APIs if
the given command will be pushed down.  The reason is because we assume
the presence of the existing APIs, anyway.)

I revised other docs and some comments, mostly for consistency.

Attached is an updated version of the patch, which has been created on
top of the updated version of the bugfix patch posted by Robert in [1]
(attached).

Best regards,
Etsuro Fujita

[1]
http://www.postgresql.org/message-id/CA+TgmoZ40j2uC5aC1NXu03oj4CrVOLkS15XX+PTFP-1U-8zR1Q@mail.gmail.com

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Rushabh Lathia
Date:


On Wed, Feb 3, 2016 at 3:31 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
On 2016/01/28 15:20, Rushabh Lathia wrote:
On Thu, Jan 28, 2016 at 11:33 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:

    On 2016/01/27 21:23, Rushabh Lathia wrote:

        If I understood correctly, above documentation means, that if
        FDW have
        DMLPushdown APIs that is enough. But in reality thats not the
        case, we
        need  ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete
        in case
        DML is not pushable.

        And here fact is DMLPushdown APIs are optional for FDW, so that
        if FDW
        don't have DMLPushdown APIs they can still very well perform the DML
        operations using ExecForeignInsert, ExecForeignUpdate, or
        ExecForeignDelete.

        So documentation should be like:

        If the IsForeignRelUpdatable pointer is set to NULL, foreign
        tables are
        assumed to be insertable, updatable, or deletable if the FDW
        provides
        ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete
        respectively,

        If FDW provides DMLPushdown APIs and the DML are pushable to the
        foreign
        server, then FDW still needs ExecForeignInsert,
        ExecForeignUpdate, or
        ExecForeignDelete for the non-pushable DML operation.

        What's your opinion ?

    I agree that we should add this to the documentation, too.

I added docs to the IsForeignRelUpdatable documentation.  Also, a brief introductory remark has been added at the beginning of the DML pushdown APIs' documentation.

    BTW, if I understand correctly, I think we should also modify
    relation_is_updatabale() accordingly.  Am I right?

Yep, we need to modify relation_is_updatable().

I thought I'd modify that function in the same way as CheckValidResultRel(), but I noticed that we cannot do that, because we don't have any information on whether each update is pushed down to the remote server by PlanDMLPushdown, during relation_is_updatabale().  So, I left that function as-is.  relation_is_updatabale() is just used for display in the information_schema views, so ISTM that that function is fine as-is.  (As for CheckValidResultRel(), I revised it so as to check the presence of DML pushdown APIs after checking the existing APIs if the given command will be pushed down.  The reason is because we assume the presence of the existing APIs, anyway.)

I revised other docs and some comments, mostly for consistency.


I just started reviewing this and realized that patch is not getting applied
cleanly on latest source, it having some conflicts. Can you please upload
the correct version of patch.


Attached is an updated version of the patch, which has been created on top of the updated version of the bugfix patch posted by Robert in [1] (attached).



--
Rushabh Lathia
On Thu, Jan 28, 2016 at 7:36 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> Attached is that version of the patch.
>
> I think that postgres_fdw might be able to insert a tableoid value in the
> returned slot in e.g., postgresExecForeignInsert if AFTER ROW Triggers or
> RETURNING expressions reference that value, but I didn't do anything about
> that.

Thanks.  I went with the earlier version, but split it into two commits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/02/04 19:44, Rushabh Lathia wrote:
> I just started reviewing this and realized that patch is not getting applied
> cleanly on latest source, it having some conflicts. Can you please upload
> the correct version of patch.

I rebased the patch to the latest HEAD.  Attached is a rebased version
of the patch.  You don't need to apply the patch
fdw-foreign-modify-rmh-v2.patch attached before.

Thanks for the review!

Best regards,
Etsuro Fujita

Attachment
On 2016/02/05 12:28, Robert Haas wrote:
> On Thu, Jan 28, 2016 at 7:36 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> Attached is that version of the patch.
>>
>> I think that postgres_fdw might be able to insert a tableoid value in the
>> returned slot in e.g., postgresExecForeignInsert if AFTER ROW Triggers or
>> RETURNING expressions reference that value, but I didn't do anything about
>> that.

> Thanks.  I went with the earlier version, but split it into two commits.

Thank you!

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Rushabh Lathia
Date:


On Wed, Feb 3, 2016 at 3:31 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
On 2016/01/28 15:20, Rushabh Lathia wrote:
On Thu, Jan 28, 2016 at 11:33 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:

    On 2016/01/27 21:23, Rushabh Lathia wrote:

        If I understood correctly, above documentation means, that if
        FDW have
        DMLPushdown APIs that is enough. But in reality thats not the
        case, we
        need  ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete
        in case
        DML is not pushable.

        And here fact is DMLPushdown APIs are optional for FDW, so that
        if FDW
        don't have DMLPushdown APIs they can still very well perform the DML
        operations using ExecForeignInsert, ExecForeignUpdate, or
        ExecForeignDelete.

        So documentation should be like:

        If the IsForeignRelUpdatable pointer is set to NULL, foreign
        tables are
        assumed to be insertable, updatable, or deletable if the FDW
        provides
        ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete
        respectively,

        If FDW provides DMLPushdown APIs and the DML are pushable to the
        foreign
        server, then FDW still needs ExecForeignInsert,
        ExecForeignUpdate, or
        ExecForeignDelete for the non-pushable DML operation.

        What's your opinion ?

    I agree that we should add this to the documentation, too.

I added docs to the IsForeignRelUpdatable documentation.  Also, a brief introductory remark has been added at the beginning of the DML pushdown APIs' documentation.

    BTW, if I understand correctly, I think we should also modify
    relation_is_updatabale() accordingly.  Am I right?

Yep, we need to modify relation_is_updatable().

I thought I'd modify that function in the same way as CheckValidResultRel(), but I noticed that we cannot do that, because we don't have any information on whether each update is pushed down to the remote server by PlanDMLPushdown, during relation_is_updatabale(). 

Sorry I didn't get you here. Can't resultRelInfo->ri_FdwPushdown gives information update whether update is pushed down safe or not ? What my concern here is, lets say resultRelInfo->ri_FdwPushdown marked as true (PlanDMLPushdown return true), but later into CheckValidResultRel() it found out that missing BeginDMLPushdown, IterateDMLPushdown and EndDMLPushdown APIs and it will end up with error.

What I think CheckValidResultRel() should do is, if resultRelInfo->ri_FdwPushdown is true then check for the DMLPushdown API and if it doesn't find those API then check for traditional APIs (ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete). And when it doesn't find both then it should return an error.

I changed CheckValidResultRel(), where

1) Don't throw an error if resultRelInfo->ri_FdwPushdown is true and DMLPushdown APIs are missing as query can still perform operation with traditional ExecForeign APIs. So in this situation just marked resultRelInfo->ri_FdwPushdown to false.

(Wondering can we add the checks for DMLPushdown APIs into PlanDMLPushdown as additional check? Means PlanDMLPushdown should return true only if FDW provides the BeginDMLPushdown & IterateDMLPushdown & EndDMLPushdown APIs ? What you say ?)

2) Don't throw an error if resultRelInfo->ri_FdwPushdown is true and DMLPushdown APIs is present but ExecForeign APIs are missing.
3) Throw an error if resultRelInfo->ri_FdwPushdown is false and ExecForeign APIs are missing.

Attaching is the WIP patch here, do share your thought.
(need to apply on top of V6 patch)


So, I left that function as-is.  relation_is_updatabale() is just used for display in the information_schema views, so ISTM that that function is fine as-is.  (As for CheckValidResultRel(), I revised it so as to check the presence of DML pushdown APIs after checking the existing APIs if the given command will be pushed down.  The reason is because we assume the presence of the existing APIs, anyway.)


I revised other docs and some comments, mostly for consistency.

Attached is an updated version of the patch, which has been created on top of the updated version of the bugfix patch posted by Robert in [1] (attached).



--
Rushabh Lathia
Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Rushabh Lathia
Date:


On Fri, Feb 5, 2016 at 4:46 PM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:


On Wed, Feb 3, 2016 at 3:31 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
On 2016/01/28 15:20, Rushabh Lathia wrote:
On Thu, Jan 28, 2016 at 11:33 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:

    On 2016/01/27 21:23, Rushabh Lathia wrote:

        If I understood correctly, above documentation means, that if
        FDW have
        DMLPushdown APIs that is enough. But in reality thats not the
        case, we
        need  ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete
        in case
        DML is not pushable.

        And here fact is DMLPushdown APIs are optional for FDW, so that
        if FDW
        don't have DMLPushdown APIs they can still very well perform the DML
        operations using ExecForeignInsert, ExecForeignUpdate, or
        ExecForeignDelete.

        So documentation should be like:

        If the IsForeignRelUpdatable pointer is set to NULL, foreign
        tables are
        assumed to be insertable, updatable, or deletable if the FDW
        provides
        ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete
        respectively,

        If FDW provides DMLPushdown APIs and the DML are pushable to the
        foreign
        server, then FDW still needs ExecForeignInsert,
        ExecForeignUpdate, or
        ExecForeignDelete for the non-pushable DML operation.

        What's your opinion ?

    I agree that we should add this to the documentation, too.

I added docs to the IsForeignRelUpdatable documentation.  Also, a brief introductory remark has been added at the beginning of the DML pushdown APIs' documentation.

    BTW, if I understand correctly, I think we should also modify
    relation_is_updatabale() accordingly.  Am I right?

Yep, we need to modify relation_is_updatable().

I thought I'd modify that function in the same way as CheckValidResultRel(), but I noticed that we cannot do that, because we don't have any information on whether each update is pushed down to the remote server by PlanDMLPushdown, during relation_is_updatabale(). 

Sorry I didn't get you here. Can't resultRelInfo->ri_FdwPushdown gives information update whether update is pushed down safe or not ? What my concern here is, lets say resultRelInfo->ri_FdwPushdown marked as true (PlanDMLPushdown return true), but later into CheckValidResultRel() it found out that missing BeginDMLPushdown, IterateDMLPushdown and EndDMLPushdown APIs and it will end up with error.

What I think CheckValidResultRel() should do is, if resultRelInfo->ri_FdwPushdown is true then check for the DMLPushdown API and if it doesn't find those API then check for traditional APIs (ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete). And when it doesn't find both then it should return an error.

I changed CheckValidResultRel(), where

1) Don't throw an error if resultRelInfo->ri_FdwPushdown is true and DMLPushdown APIs are missing as query can still perform operation with traditional ExecForeign APIs. So in this situation just marked resultRelInfo->ri_FdwPushdown to false.

(Wondering can we add the checks for DMLPushdown APIs into PlanDMLPushdown as additional check? Means PlanDMLPushdown should return true only if FDW provides the BeginDMLPushdown & IterateDMLPushdown & EndDMLPushdown APIs ? What you say ?)


On another thought, we should not give responsibility to check for the APIs to the FDW. So may be we should call PlanDMLPushdown only if BeginDMLPushdown & IterateDMLPushdown & EndDMLPushdown APIs are present into FDW. That means prepare DMLPushdown plan only when all the required APIs are available with FDW. This will also reduce the changes into CheckValidResultRel().

Thanks Ashutosh Bapat for healthy discussion.

PFA patch.

 
2) Don't throw an error if resultRelInfo->ri_FdwPushdown is true and DMLPushdown APIs is present but ExecForeign APIs are missing.
3) Throw an error if resultRelInfo->ri_FdwPushdown is false and ExecForeign APIs are missing.

Attaching is the WIP patch here, do share your thought.
(need to apply on top of V6 patch)


So, I left that function as-is.  relation_is_updatabale() is just used for display in the information_schema views, so ISTM that that function is fine as-is.  (As for CheckValidResultRel(), I revised it so as to check the presence of DML pushdown APIs after checking the existing APIs if the given command will be pushed down.  The reason is because we assume the presence of the existing APIs, anyway.)


I revised other docs and some comments, mostly for consistency.

Attached is an updated version of the patch, which has been created on top of the updated version of the bugfix patch posted by Robert in [1] (attached).



--
Rushabh Lathia



--
Rushabh Lathia
Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Rushabh Lathia
Date:
Fujita-san, I am attaching update version of the patch, which added
the documentation update.

Once we finalize this, I feel good with the patch and think that we
could mark this as ready for committer.

On Fri, Feb 5, 2016 at 5:33 PM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:


On Fri, Feb 5, 2016 at 4:46 PM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:


On Wed, Feb 3, 2016 at 3:31 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
On 2016/01/28 15:20, Rushabh Lathia wrote:
On Thu, Jan 28, 2016 at 11:33 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:

    On 2016/01/27 21:23, Rushabh Lathia wrote:

        If I understood correctly, above documentation means, that if
        FDW have
        DMLPushdown APIs that is enough. But in reality thats not the
        case, we
        need  ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete
        in case
        DML is not pushable.

        And here fact is DMLPushdown APIs are optional for FDW, so that
        if FDW
        don't have DMLPushdown APIs they can still very well perform the DML
        operations using ExecForeignInsert, ExecForeignUpdate, or
        ExecForeignDelete.

        So documentation should be like:

        If the IsForeignRelUpdatable pointer is set to NULL, foreign
        tables are
        assumed to be insertable, updatable, or deletable if the FDW
        provides
        ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete
        respectively,

        If FDW provides DMLPushdown APIs and the DML are pushable to the
        foreign
        server, then FDW still needs ExecForeignInsert,
        ExecForeignUpdate, or
        ExecForeignDelete for the non-pushable DML operation.

        What's your opinion ?

    I agree that we should add this to the documentation, too.

I added docs to the IsForeignRelUpdatable documentation.  Also, a brief introductory remark has been added at the beginning of the DML pushdown APIs' documentation.

    BTW, if I understand correctly, I think we should also modify
    relation_is_updatabale() accordingly.  Am I right?

Yep, we need to modify relation_is_updatable().

I thought I'd modify that function in the same way as CheckValidResultRel(), but I noticed that we cannot do that, because we don't have any information on whether each update is pushed down to the remote server by PlanDMLPushdown, during relation_is_updatabale(). 

Sorry I didn't get you here. Can't resultRelInfo->ri_FdwPushdown gives information update whether update is pushed down safe or not ? What my concern here is, lets say resultRelInfo->ri_FdwPushdown marked as true (PlanDMLPushdown return true), but later into CheckValidResultRel() it found out that missing BeginDMLPushdown, IterateDMLPushdown and EndDMLPushdown APIs and it will end up with error.

What I think CheckValidResultRel() should do is, if resultRelInfo->ri_FdwPushdown is true then check for the DMLPushdown API and if it doesn't find those API then check for traditional APIs (ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete). And when it doesn't find both then it should return an error.

I changed CheckValidResultRel(), where

1) Don't throw an error if resultRelInfo->ri_FdwPushdown is true and DMLPushdown APIs are missing as query can still perform operation with traditional ExecForeign APIs. So in this situation just marked resultRelInfo->ri_FdwPushdown to false.

(Wondering can we add the checks for DMLPushdown APIs into PlanDMLPushdown as additional check? Means PlanDMLPushdown should return true only if FDW provides the BeginDMLPushdown & IterateDMLPushdown & EndDMLPushdown APIs ? What you say ?)


On another thought, we should not give responsibility to check for the APIs to the FDW. So may be we should call PlanDMLPushdown only if BeginDMLPushdown & IterateDMLPushdown & EndDMLPushdown APIs are present into FDW. That means prepare DMLPushdown plan only when all the required APIs are available with FDW. This will also reduce the changes into CheckValidResultRel().

Thanks Ashutosh Bapat for healthy discussion.

PFA patch.

 
2) Don't throw an error if resultRelInfo->ri_FdwPushdown is true and DMLPushdown APIs is present but ExecForeign APIs are missing.
3) Throw an error if resultRelInfo->ri_FdwPushdown is false and ExecForeign APIs are missing.

Attaching is the WIP patch here, do share your thought.
(need to apply on top of V6 patch)


So, I left that function as-is.  relation_is_updatabale() is just used for display in the information_schema views, so ISTM that that function is fine as-is.  (As for CheckValidResultRel(), I revised it so as to check the presence of DML pushdown APIs after checking the existing APIs if the given command will be pushed down.  The reason is because we assume the presence of the existing APIs, anyway.)


I revised other docs and some comments, mostly for consistency.

Attached is an updated version of the patch, which has been created on top of the updated version of the bugfix patch posted by Robert in [1] (attached).



--
Rushabh Lathia



--
Rushabh Lathia



--
Rushabh Lathia
Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Thom Brown
Date:
On 10 February 2016 at 08:00, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> Fujita-san, I am attaching update version of the patch, which added
> the documentation update.
>
> Once we finalize this, I feel good with the patch and think that we
> could mark this as ready for committer.

I find this wording a bit confusing:

"If the IsForeignRelUpdatable pointer is set to NULL, foreign tables
are assumed to be insertable, updatable, or deletable either the FDW
provides ExecForeignInsert,ExecForeignUpdate, or ExecForeignDelete
respectively or if the FDW optimizes a foreign table update on a
foreign table using PlanDMLPushdown (PlanDMLPushdown still needs to
provide BeginDMLPushdown, IterateDMLPushdown and EndDMLPushdown to
execute the optimized update.)."

This is a very long sentence, and the word "either" doesn't work here.

Also:

"When the query doesn't has the clause, the FDW must also increment
the row count for the ForeignScanState node in the EXPLAIN ANALYZE
case."

Should read "doesn't have"

The rest looks fine AFAICT.

Regards

Thom



Re: Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Wed, Feb 10, 2016 at 3:00 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
> Fujita-san, I am attaching update version of the patch, which added
> the documentation update.
>
> Once we finalize this, I feel good with the patch and think that we
> could mark this as ready for committer.

It would be nice to hear from Tom and/or Stephen whether the changes
that have been made since they last commented on it.  I feel like the
design is reasonably OK, but I don't want to push this through if
they're still not happy with it.  One thing I'm not altogether keen on
is the use of "pushdown" or "dml pushdown" as a term of art; on the
other hand, I'm not sure what other term would be better.

Comments on specific points follow.

This seems to need minor rebasing in the wake of the join pushdown patch.

+       /* Likewise for ForeignScan that has pushed down INSERT/UPDATE/DELETE */
+       if (IsA(plan, ForeignScan) &&
+               (((ForeignScan *) plan)->operation == CMD_INSERT ||
+                ((ForeignScan *) plan)->operation == CMD_UPDATE ||
+                ((ForeignScan *) plan)->operation == CMD_DELETE))
+               return;

I don't really understand why this is a good idea.  Since target lists
are only displayed with EXPLAIN (VERBOSE), I don't really understand
what is to be gained by suppressing them in any case at all, and
therefore I don't understand why it's a good idea to do it here,
either.  If there is a good reason, maybe the comment should explain
what it is.

+       /* Check point 1 */
+       if (operation == CMD_INSERT)
+               return false;
+
+       /* Check point 2 */
+       if (nodeTag(subplan) != T_ForeignScan)
+               return false;
+
+       /* Check point 3 */
+       if (subplan->qual != NIL)
+               return false;
+
+       /* Check point 4 */
+       if (operation == CMD_UPDATE)

These comments are referring to something in the function header
further up, but you could instead just delete the stuff from the
header and mention the actual conditions here.  Also:

- If the first condition is supposed accept only CMD_UPDATE or
CMD_DELETE, say if (operation != CMD_UPDATE || operation !=
CMD_DELETE) rather than doing it this way.  Is-not-insert may in this
context be functionally equivalent to is-update-or-delete, but it's
better to write the comment and the code so that they exactly match
rather than kinda-match.

- For point 2, use IsA(subplan, ForiegnScan).

+                       /*
+                        * ignore subplan if the FDW pushes down the
command to the remote
+                        * server
+                        */

This comment states what the code does, instead of explaining why it
does it.  Please update it so that it explains why it does it.

+       List       *fdwPushdowns;       /* per-target-table FDW
pushdown flags */

Isn't a list of booleans an awfully inefficient representation?  How
about a Bitmapset?

+       /*
+        * Prepare remote-parameter expressions for evaluation.  (Note: in
+        * practice, we expect that all these expressions will be just
Params, so
+        * we could possibly do something more efficient than using the full
+        * expression-eval machinery for this.  But probably there
would be little
+        * benefit, and it'd require postgres_fdw to know more than is desirable
+        * about Param evaluation.)
+        */
+       dpstate->param_exprs = (List *)
+               ExecInitExpr((Expr *) fsplan->fdw_exprs,
+                                        (PlanState *) node);

This is an exact copy of an existing piece of code and its associated
comment.  A good bit of the surrounding code is copied, too.  You need
to refactor to avoid duplication, like by putting some of the code in
a new function that both postgresBeginForeignScan and
postgresBeginForeignModify can call.

execute_dml_stmt() has some of the same disease.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
Hi Rushabh and Thom,

Thanks for the review!

On 2016/02/10 22:37, Thom Brown wrote:
> On 10 February 2016 at 08:00, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
>> Fujita-san, I am attaching update version of the patch, which added
>> the documentation update.

Thanks for updating the patch!

>> Once we finalize this, I feel good with the patch and think that we
>> could mark this as ready for committer.

> I find this wording a bit confusing:
>
> "If the IsForeignRelUpdatable pointer is set to NULL, foreign tables
> are assumed to be insertable, updatable, or deletable either the FDW
> provides ExecForeignInsert,ExecForeignUpdate, or ExecForeignDelete
> respectively or if the FDW optimizes a foreign table update on a
> foreign table using PlanDMLPushdown (PlanDMLPushdown still needs to
> provide BeginDMLPushdown, IterateDMLPushdown and EndDMLPushdown to
> execute the optimized update.)."
>
> This is a very long sentence, and the word "either" doesn't work here.

Agreed.

As a result of our discussions, we reached a conclusion that the DML 
pushdown APIs should be provided together with existing APIs such as 
ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete, IIUC.  So, 
how about (1) leaving the description for the existing APIs as-is and 
(2) adding a new description for the DML pushdown APIs in parenthesis, 
like this?:
     If the <function>IsForeignRelUpdatable</> pointer is set to     <literal>NULL</>, foreign tables are assumed to be
insertable,
 
updatable,     or deletable if the FDW provides <function>ExecForeignInsert</>,     <function>ExecForeignUpdate</>, or
<function>ExecForeignDelete</>    respectively.     (If the FDW attempts to optimize a foreign table update, it still
 needs to provide PlanDMLPushdown, BeginDMLPushdown,     IterateDMLPushdown and EndDMLPushdown.)
 

Actually, if the FDW provides the DML pushdown APIs, (pushdown-able) 
foreign table updates can be done without ExecForeignInsert, 
ExecForeignUpdate or ExecForeignDelete.  So, the above docs are not 
necessarily correct.  But we don't recommend to do that without the 
existing APIs, so I'm not sure it's worth complicating the docs.

> Also:
>
> "When the query doesn't has the clause, the FDW must also increment
> the row count for the ForeignScanState node in the EXPLAIN ANALYZE
> case."
>
> Should read "doesn't have"

Will fix.

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
Hi Robert,

Thanks for the review!

On 2016/02/11 5:43, Robert Haas wrote:
> On Wed, Feb 10, 2016 at 3:00 AM, Rushabh Lathia
> <rushabh.lathia@gmail.com> wrote:
>> Fujita-san, I am attaching update version of the patch, which added
>> the documentation update.
>>
>> Once we finalize this, I feel good with the patch and think that we
>> could mark this as ready for committer.

> It would be nice to hear from Tom and/or Stephen whether the changes
> that have been made since they last commented on it.  I feel like the
> design is reasonably OK, but I don't want to push this through if
> they're still not happy with it.  One thing I'm not altogether keen on
> is the use of "pushdown" or "dml pushdown" as a term of art; on the
> other hand, I'm not sure what other term would be better.

I'm open to that naming.  Proposals are welcome!

> Comments on specific points follow.
>
> This seems to need minor rebasing in the wake of the join pushdown patch.

Will do.

> +       /* Likewise for ForeignScan that has pushed down INSERT/UPDATE/DELETE */
> +       if (IsA(plan, ForeignScan) &&
> +               (((ForeignScan *) plan)->operation == CMD_INSERT ||
> +                ((ForeignScan *) plan)->operation == CMD_UPDATE ||
> +                ((ForeignScan *) plan)->operation == CMD_DELETE))
> +               return;
>
> I don't really understand why this is a good idea.  Since target lists
> are only displayed with EXPLAIN (VERBOSE), I don't really understand
> what is to be gained by suppressing them in any case at all, and
> therefore I don't understand why it's a good idea to do it here,
> either.  If there is a good reason, maybe the comment should explain
> what it is.

I think that displaying target lists would be confusing for users.  Here 
is an example:

EXPLAIN (verbose, costs off)
DELETE FROM rem1;                 -- can be pushed down                 QUERY PLAN
--------------------------------------------- Delete on public.rem1   ->  Foreign Delete on public.rem1         Output:
ctid        Remote SQL: DELETE FROM public.loc1
 
(4 rows)

Should we output the "Output" line?

> +       /* Check point 1 */
> +       if (operation == CMD_INSERT)
> +               return false;
> +
> +       /* Check point 2 */
> +       if (nodeTag(subplan) != T_ForeignScan)
> +               return false;
> +
> +       /* Check point 3 */
> +       if (subplan->qual != NIL)
> +               return false;
> +
> +       /* Check point 4 */
> +       if (operation == CMD_UPDATE)
>
> These comments are referring to something in the function header
> further up, but you could instead just delete the stuff from the
> header and mention the actual conditions here.  Also:

Will fix.

> - If the first condition is supposed accept only CMD_UPDATE or
> CMD_DELETE, say if (operation != CMD_UPDATE || operation !=
> CMD_DELETE) rather than doing it this way.  Is-not-insert may in this
> context be functionally equivalent to is-update-or-delete, but it's
> better to write the comment and the code so that they exactly match
> rather than kinda-match.
>
> - For point 2, use IsA(subplan, ForiegnScan).

Will fix.

> +                       /*
> +                        * ignore subplan if the FDW pushes down the
> command to the remote
> +                        * server
> +                        */
>
> This comment states what the code does, instead of explaining why it
> does it.  Please update it so that it explains why it does it.

Will update.

> +       List       *fdwPushdowns;       /* per-target-table FDW
> pushdown flags */
>
> Isn't a list of booleans an awfully inefficient representation?  How
> about a Bitmapset?

OK, will do.

> +       /*
> +        * Prepare remote-parameter expressions for evaluation.  (Note: in
> +        * practice, we expect that all these expressions will be just
> Params, so
> +        * we could possibly do something more efficient than using the full
> +        * expression-eval machinery for this.  But probably there
> would be little
> +        * benefit, and it'd require postgres_fdw to know more than is desirable
> +        * about Param evaluation.)
> +        */
> +       dpstate->param_exprs = (List *)
> +               ExecInitExpr((Expr *) fsplan->fdw_exprs,
> +                                        (PlanState *) node);
>
> This is an exact copy of an existing piece of code and its associated
> comment.  A good bit of the surrounding code is copied, too.  You need
> to refactor to avoid duplication, like by putting some of the code in
> a new function that both postgresBeginForeignScan and
> postgresBeginForeignModify can call.
>
> execute_dml_stmt() has some of the same disease.

Will do.

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Fri, Feb 12, 2016 at 7:19 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> I think that displaying target lists would be confusing for users.  Here is
> an example:
>
> EXPLAIN (verbose, costs off)
> DELETE FROM rem1;                 -- can be pushed down
>                  QUERY PLAN
> ---------------------------------------------
>  Delete on public.rem1
>    ->  Foreign Delete on public.rem1
>          Output: ctid
>          Remote SQL: DELETE FROM public.loc1
> (4 rows)
>
> Should we output the "Output" line?

I see your point, but what if there's a RETURNING clause?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/02/12 21:46, Robert Haas wrote:
> On Fri, Feb 12, 2016 at 7:19 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> I think that displaying target lists would be confusing for users.  Here is
>> an example:
>>
>> EXPLAIN (verbose, costs off)
>> DELETE FROM rem1;                 -- can be pushed down
>>                   QUERY PLAN
>> ---------------------------------------------
>>   Delete on public.rem1
>>     ->  Foreign Delete on public.rem1
>>           Output: ctid
>>           Remote SQL: DELETE FROM public.loc1
>> (4 rows)
>>
>> Should we output the "Output" line?

> I see your point, but what if there's a RETURNING clause?

IMO I think that would be confusing in that case.  Here is an example:

EXPLAIN (verbose, costs off)
DELETE FROM rem1 RETURNING *;                 -- can be pushed down                          QUERY PLAN
-------------------------------------------------------------- Delete on public.rem1   Output: f1, f2   ->  Foreign
Deleteon public.rem1         Output: ctid         Remote SQL: DELETE FROM public.loc1 RETURNING f1, f2
 
(5 rows)

The Output line beneath the ForeignScan node doesn't match the RETURNING 
expressions in the remote query as the Output line beneath the 
ModifyTable node does, so I think displaying that would be confusing 
even in that case.

Another example:

postgres=# explain verbose update foo set a = a + 1 returning *;                                  QUERY PLAN
------------------------------------------------------------------------------ Update on public.foo
(cost=100.00..137.50rows=1000 width=10)   Output: a   ->  Foreign Update on public.foo  (cost=100.00..137.50 rows=1000

width=10)         Output: (a + 1), ctid         Remote SQL: UPDATE public.foo SET a = (a + 1) RETURNING a
(5 rows)

Same above.

As for case of INSERT .. RETURNING .., I guess there is not such a 
mismatch, but I'm not sure that displaying that is that helpful, 
honestly, so I'd vote for suppressing that in all cases, for consistency.

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Rushabh Lathia
Date:


On Fri, Feb 12, 2016 at 5:40 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
Hi Rushabh and Thom,

Thanks for the review!

On 2016/02/10 22:37, Thom Brown wrote:
On 10 February 2016 at 08:00, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
Fujita-san, I am attaching update version of the patch, which added
the documentation update.

Thanks for updating the patch!

Once we finalize this, I feel good with the patch and think that we
could mark this as ready for committer.

I find this wording a bit confusing:

"If the IsForeignRelUpdatable pointer is set to NULL, foreign tables
are assumed to be insertable, updatable, or deletable either the FDW
provides ExecForeignInsert,ExecForeignUpdate, or ExecForeignDelete
respectively or if the FDW optimizes a foreign table update on a
foreign table using PlanDMLPushdown (PlanDMLPushdown still needs to
provide BeginDMLPushdown, IterateDMLPushdown and EndDMLPushdown to
execute the optimized update.)."

This is a very long sentence, and the word "either" doesn't work here.

Agreed.

As a result of our discussions, we reached a conclusion that the DML pushdown APIs should be provided together with existing APIs such as ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete, IIUC.  So, how about (1) leaving the description for the existing APIs as-is and (2) adding a new description for the DML pushdown APIs in parenthesis, like this?:

     If the <function>IsForeignRelUpdatable</> pointer is set to
     <literal>NULL</>, foreign tables are assumed to be insertable, updatable,
     or deletable if the FDW provides <function>ExecForeignInsert</>,
     <function>ExecForeignUpdate</>, or <function>ExecForeignDelete</>
     respectively.
     (If the FDW attempts to optimize a foreign table update, it still
     needs to provide PlanDMLPushdown, BeginDMLPushdown,
     IterateDMLPushdown and EndDMLPushdown.)

Actually, if the FDW provides the DML pushdown APIs, (pushdown-able) foreign table updates can be done without ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete.  So, the above docs are not necessarily correct.  But we don't recommend to do that without the existing APIs, so I'm not sure it's worth complicating the docs.

Adding a new description for DML pushdown API seems good idea. I would suggest to add that as separate paragraph rather then into brackets.
 
 


Also:

"When the query doesn't has the clause, the FDW must also increment
the row count for the ForeignScanState node in the EXPLAIN ANALYZE
case."

Should read "doesn't have"

Will fix.

Best regards,
Etsuro Fujita





--
Rushabh Lathia

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/02/15 15:20, Rushabh Lathia wrote:
> On Fri, Feb 12, 2016 at 5:40 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:

>     As a result of our discussions, we reached a conclusion that the DML
>     pushdown APIs should be provided together with existing APIs such as
>     ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete, IIUC.
>     So, how about (1) leaving the description for the existing APIs
>     as-is and (2) adding a new description for the DML pushdown APIs in
>     parenthesis, like this?:
>
>           If the <function>IsForeignRelUpdatable</> pointer is set to
>           <literal>NULL</>, foreign tables are assumed to be insertable,
>     updatable,
>           or deletable if the FDW provides <function>ExecForeignInsert</>,
>           <function>ExecForeignUpdate</>, or <function>ExecForeignDelete</>
>           respectively.
>           (If the FDW attempts to optimize a foreign table update, it still
>           needs to provide PlanDMLPushdown, BeginDMLPushdown,
>           IterateDMLPushdown and EndDMLPushdown.)
>
>     Actually, if the FDW provides the DML pushdown APIs, (pushdown-able)
>     foreign table updates can be done without ExecForeignInsert,
>     ExecForeignUpdate or ExecForeignDelete.  So, the above docs are not
>     necessarily correct.  But we don't recommend to do that without the
>     existing APIs, so I'm not sure it's worth complicating the docs.

> Adding a new description for DML pushdown API seems good idea. I would
> suggest to add that as separate paragraph rather then into brackets.

OK, will do.

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/02/12 21:19, Etsuro Fujita wrote:
>> Comments on specific points follow.
>>
>> This seems to need minor rebasing in the wake of the join pushdown patch.

> Will do.

Done.

>> +       /* Likewise for ForeignScan that has pushed down
>> INSERT/UPDATE/DELETE */
>> +       if (IsA(plan, ForeignScan) &&
>> +               (((ForeignScan *) plan)->operation == CMD_INSERT ||
>> +                ((ForeignScan *) plan)->operation == CMD_UPDATE ||
>> +                ((ForeignScan *) plan)->operation == CMD_DELETE))
>> +               return;
>>
>> I don't really understand why this is a good idea.  Since target lists
>> are only displayed with EXPLAIN (VERBOSE), I don't really understand
>> what is to be gained by suppressing them in any case at all, and
>> therefore I don't understand why it's a good idea to do it here,
>> either.  If there is a good reason, maybe the comment should explain
>> what it is.

> I think that displaying target lists would be confusing for users.

There seems no objection from you (or anyone), I left that as proposed,
and added more comments.

>> +       /* Check point 1 */
>> +       if (operation == CMD_INSERT)
>> +               return false;
>> +
>> +       /* Check point 2 */
>> +       if (nodeTag(subplan) != T_ForeignScan)
>> +               return false;
>> +
>> +       /* Check point 3 */
>> +       if (subplan->qual != NIL)
>> +               return false;
>> +
>> +       /* Check point 4 */
>> +       if (operation == CMD_UPDATE)
>>
>> These comments are referring to something in the function header
>> further up, but you could instead just delete the stuff from the
>> header and mention the actual conditions here.

> Will fix.

Done.

The patch doesn't allow the postgres_fdw to push down an UPDATE/DELETE
on a foreign join, so I added one more condition here not to handle such
cases.  (I'm planning to propose a patch to handle such cases, in the
next CF.)

>> - If the first condition is supposed accept only CMD_UPDATE or
>> CMD_DELETE, say if (operation != CMD_UPDATE || operation !=
>> CMD_DELETE) rather than doing it this way.  Is-not-insert may in this
>> context be functionally equivalent to is-update-or-delete, but it's
>> better to write the comment and the code so that they exactly match
>> rather than kinda-match.
>>
>> - For point 2, use IsA(subplan, ForiegnScan).

> Will fix.

Done.

>> +                       /*
>> +                        * ignore subplan if the FDW pushes down the
>> command to the remote
>> +                        * server
>> +                        */
>>
>> This comment states what the code does, instead of explaining why it
>> does it.  Please update it so that it explains why it does it.

> Will update.

Done.

>> +       List       *fdwPushdowns;       /* per-target-table FDW
>> pushdown flags */
>>
>> Isn't a list of booleans an awfully inefficient representation?  How
>> about a Bitmapset?

> OK, will do.

Done.

>> +       /*
>> +        * Prepare remote-parameter expressions for evaluation.
>> (Note: in
>> +        * practice, we expect that all these expressions will be just
>> Params, so
>> +        * we could possibly do something more efficient than using
>> the full
>> +        * expression-eval machinery for this.  But probably there
>> would be little
>> +        * benefit, and it'd require postgres_fdw to know more than is
>> desirable
>> +        * about Param evaluation.)
>> +        */
>> +       dpstate->param_exprs = (List *)
>> +               ExecInitExpr((Expr *) fsplan->fdw_exprs,
>> +                                        (PlanState *) node);
>>
>> This is an exact copy of an existing piece of code and its associated
>> comment.  A good bit of the surrounding code is copied, too.  You need
>> to refactor to avoid duplication, like by putting some of the code in
>> a new function that both postgresBeginForeignScan and
>> postgresBeginForeignModify can call.
>>
>> execute_dml_stmt() has some of the same disease.

> Will do.

Done.

Other changes:

* I fixed docs as discussed before with Rushabh Lathia and Thom Brown.

* I keep Rushabh's code change that we call PlanDMLPushdown only when
all the required APIs are available with FDW, but for
CheckValidResultRel, I left the code as-is (no changes to that
function), to match the docs saying that the FDW needs to provide the
DML pushdown callback functions together with existing table-updating
functions such as ExecForeignInsert, ExecForeignUpdate and
ExecForeignDelete.

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Rushabh Lathia
Date:

I did another round of review for the latest patch and well as performed the sanity test, and
haven't found any functional issues. Found couple of issue, see in-line comments
for the same.

On Thu, Feb 18, 2016 at 3:15 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
On 2016/02/12 21:19, Etsuro Fujita wrote:
Comments on specific points follow.

This seems to need minor rebasing in the wake of the join pushdown patch.

Will do.

Done.

+       /* Likewise for ForeignScan that has pushed down
INSERT/UPDATE/DELETE */
+       if (IsA(plan, ForeignScan) &&
+               (((ForeignScan *) plan)->operation == CMD_INSERT ||
+                ((ForeignScan *) plan)->operation == CMD_UPDATE ||
+                ((ForeignScan *) plan)->operation == CMD_DELETE))
+               return;

I don't really understand why this is a good idea.  Since target lists
are only displayed with EXPLAIN (VERBOSE), I don't really understand
what is to be gained by suppressing them in any case at all, and
therefore I don't understand why it's a good idea to do it here,
either.  If there is a good reason, maybe the comment should explain
what it is.

I think that displaying target lists would be confusing for users.

There seems no objection from you (or anyone), I left that as proposed, and added more comments.

+       /* Check point 1 */
+       if (operation == CMD_INSERT)
+               return false;
+
+       /* Check point 2 */
+       if (nodeTag(subplan) != T_ForeignScan)
+               return false;
+
+       /* Check point 3 */
+       if (subplan->qual != NIL)
+               return false;
+
+       /* Check point 4 */
+       if (operation == CMD_UPDATE)

These comments are referring to something in the function header
further up, but you could instead just delete the stuff from the
header and mention the actual conditions here.

Will fix.

Done.

The patch doesn't allow the postgres_fdw to push down an UPDATE/DELETE on a foreign join, so I added one more condition here not to handle such cases.  (I'm planning to propose a patch to handle such cases, in the next CF.)

I think we should place the checking foreign join condition before the
target columns, as foreign join condition is less costly then the target
columns.
 

- If the first condition is supposed accept only CMD_UPDATE or
CMD_DELETE, say if (operation != CMD_UPDATE || operation !=
CMD_DELETE) rather than doing it this way.  Is-not-insert may in this
context be functionally equivalent to is-update-or-delete, but it's
better to write the comment and the code so that they exactly match
rather than kinda-match.

- For point 2, use IsA(subplan, ForiegnScan).

Will fix.

Done.

+                       /*
+                        * ignore subplan if the FDW pushes down the
command to the remote
+                        * server
+                        */

This comment states what the code does, instead of explaining why it
does it.  Please update it so that it explains why it does it.

Will update.

Done.

+       List       *fdwPushdowns;       /* per-target-table FDW
pushdown flags */

Isn't a list of booleans an awfully inefficient representation?  How
about a Bitmapset?

OK, will do.

Done.

+       /*
+        * Prepare remote-parameter expressions for evaluation.
(Note: in
+        * practice, we expect that all these expressions will be just
Params, so
+        * we could possibly do something more efficient than using
the full
+        * expression-eval machinery for this.  But probably there
would be little
+        * benefit, and it'd require postgres_fdw to know more than is
desirable
+        * about Param evaluation.)
+        */
+       dpstate->param_exprs = (List *)
+               ExecInitExpr((Expr *) fsplan->fdw_exprs,
+                                        (PlanState *) node);

This is an exact copy of an existing piece of code and its associated
comment.  A good bit of the surrounding code is copied, too.  You need
to refactor to avoid duplication, like by putting some of the code in
a new function that both postgresBeginForeignScan and
postgresBeginForeignModify can call.

execute_dml_stmt() has some of the same disease.

Will do.

Done.

Other changes:

* I fixed docs as discussed before with Rushabh Lathia and Thom Brown.

* I keep Rushabh's code change that we call PlanDMLPushdown only when all the required APIs are available with FDW, but for CheckValidResultRel, I left the code as-is (no changes to that function), to match the docs saying that the FDW needs to provide the DML pushdown callback functions together with existing table-updating functions such as ExecForeignInsert, ExecForeignUpdate and ExecForeignDelete.


I think we should also update the CheckValidResultRel(), because even though ExecForeignInsert,
ExecForeignUpdate and ExecForeignDelete not present, FDW still can perform
UPDATE/DELETE/INSERT using DML Pushdown APIs. Lets take committer's view on this.

PFA update patch, which includes changes into postgresPlanDMLPushdown() to check for join
condition before target columns and also fixed couple of whitespace issues.



Regards
Rushabh Lathia
Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/02/22 20:13, Rushabh Lathia wrote:
> I did another round of review for the latest patch and well as performed
> the sanity test, and
> haven't found any functional issues. Found couple of issue, see in-line
> comments
> for the same.

Thanks!

> On Thu, Feb 18, 2016 at 3:15 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
>
>     On 2016/02/12 21:19, Etsuro Fujita wrote:

>             +       /* Check point 1 */
>             +       if (operation == CMD_INSERT)
>             +               return false;
>             +
>             +       /* Check point 2 */
>             +       if (nodeTag(subplan) != T_ForeignScan)
>             +               return false;
>             +
>             +       /* Check point 3 */
>             +       if (subplan->qual != NIL)
>             +               return false;
>             +
>             +       /* Check point 4 */
>             +       if (operation == CMD_UPDATE)
>
>             These comments are referring to something in the function header
>             further up, but you could instead just delete the stuff from the
>             header and mention the actual conditions here.

>         Will fix.

>     Done.
>
>     The patch doesn't allow the postgres_fdw to push down an
>     UPDATE/DELETE on a foreign join, so I added one more condition here
>     not to handle such cases.  (I'm planning to propose a patch to
>     handle such cases, in the next CF.)

> I think we should place the checking foreign join condition before the
> target columns, as foreign join condition is less costly then the target
> columns.

Agreed.

>     Other changes:

>     * I keep Rushabh's code change that we call PlanDMLPushdown only
>     when all the required APIs are available with FDW, but for
>     CheckValidResultRel, I left the code as-is (no changes to that
>     function), to match the docs saying that the FDW needs to provide
>     the DML pushdown callback functions together with existing
>     table-updating functions such as ExecForeignInsert,
>     ExecForeignUpdate and ExecForeignDelete.

> I think we should also update the CheckValidResultRel(), because even
> though ExecForeignInsert,
> ExecForeignUpdate and ExecForeignDelete not present, FDW still can perform
> UPDATE/DELETE/INSERT using DML Pushdown APIs. Lets take committer's view
> on this.

OK

> PFA update patch, which includes changes into postgresPlanDMLPushdown()
> to check for join
> condition before target columns and also fixed couple of whitespace issues.

Thanks again for updating the patch and fixing the issues!

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/02/22 20:13, Rushabh Lathia wrote:
> PFA update patch, which includes changes into postgresPlanDMLPushdown()
> to check for join
> condition before target columns and also fixed couple of whitespace issues.

For pushing down an UPDATE/DELETE on a foreign join to the remote, I
created a WIP patch on top of the latest version of the DML pushdown
patch.  Attached is the WIP patch.  I'd like to propose this as part of
(I'd like to discuss this as a separate patch, though):

https://commitfest.postgresql.org/9/453/

The patch doesn't correctly evaluate the values of system columns of
joined relations in RETURNING, other than ctid.  I'll fix that ASAP.

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Tue, Feb 23, 2016 at 1:18 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> Thanks again for updating the patch and fixing the issues!

Some comments on the latest version.  I haven't reviewed the
postgres_fdw changes in detail here, so this is just about the core
changes.

I see that show_plan_tlist checks whether the operation is any of
CMD_INSERT, CMD_UPDATE, or CMD_DELETE.  But practically every place
else where a similar test is needed instead tests whether the
operation is *not* CMD_SELECT.  I think this place should do it that
way, too.

+               resultRelInfo = mtstate->resultRelInfo;               for (i = 0; i < nplans; i++)               {
                ExecAuxRowMark *aerm;
 

+                       /*
+                        * ignore subplan if the FDW pushes down the
command to the remote
+                        * server; the ModifyTable won't have anything
to do except for
+                        * evaluation of RETURNING expressions
+                        */
+                       if (resultRelInfo->ri_FdwPushdown)
+                       {
+                               resultRelInfo++;
+                               continue;
+                       }
+                       subplan = mtstate->mt_plans[i]->plan;                       aerm = ExecBuildAuxRowMark(erm,
subplan->targetlist);                      mtstate->mt_arowmarks[i] =
 
lappend(mtstate->mt_arowmarks[i], aerm);
+                       resultRelInfo++;               }


This kind of thing creates a hazard for future people maintaining this
code.  If somebody adds some code to this loop that needs to execute
even when resultRelInfo->ri_FdwPushdown is true, they have to add two
copies of it.  It's much better to move the three lines of logic that
execute only in the non-pushdown case inside of if
(!resultRelInfo->ri_FdwPushdown).

This issue crops up elsewhere as well.  The changes to
ExecModifyTable() have the same problem -- in that case, it might be
wise to move the code that's going to have to be indented yet another
level into a separate function.   That code also says this:

+                       /* No need to provide scan tuple to
ExecProcessReturning. */
+                       slot = ExecProcessReturning(resultRelInfo,
NULL, planSlot);

...but, uh, why not?  The comment says what the code does, but what it
should do is explain why it does it.

On a broader level, I'm not very happy with the naming this patch
uses.  Here's an example:

+    <para>
+     If an FDW supports optimizing foreign table updates, it still needs to
+     provide <function>PlanDMLPushdown</>, <function>BeginDMLPushdown</>,
+     <function>IterateDMLPushdown</> and <function>EndDMLPushdown</>
+     described below.
+    </para>

"Optimizing foreign table updates" is both inaccurate (since it
doesn't only optimize updates) and so vague as to be meaningless
unless you already know what it means.  The actual patch uses
terminology like "fdwPushdowns" which is just as bad.  We might push a
lot of things to the foreign side -- sorts, joins, aggregates, limits
-- and this is just one of them.  Worse, "pushdown" is itself
something of a term of art - will people who haven't been following
all of the mammoth, multi-hundred-email threads on this topic know
what that means?  I think we need some better terminology here.

The best thing that I can come up with offhand is "bulk modify".  So
we'd have PlanBulkModify, BeginBulkModify, IterateBulkModify,
EndBulkModify, ExplainBulkModify.  Other suggestions welcome.   The
ResultRelInfo flag could be ri_usesFDWBulkModify.  The documentation
could say something like this:

Some inserts, updates, and deletes to foreign tables can be optimized
by implementing an alternate set of interfaces.  The ordinary
interfaces for inserts, updates, and deletes fetch rows from the
remote server and then modify those rows one at a time.  In some
cases, this row-by-row approach is necessary, but it can be
inefficient.  If it is possible for the foreign server to determine
which rows should be modified without actually retrieving them, and if
there are no local triggers which would affect the operation, then it
is possible to arrange things so that the entire operation is
performed on the remote server.  The interfaces described below make
this possible.

+     Begin executing a foreign table update directly on the remote server.

I think this should say "Prepare to execute a bulk modification
directly on the remote server".  It shouldn't actually begin the
execution phase.

+     End the table update and release resources.  It is normally not important

And I think this one should say "Clean up following a bulk
modification on the remote server".  It's not actually ending the
update; the iterate method already did that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/03/05 5:45, Robert Haas wrote:
> Some comments on the latest version.  I haven't reviewed the
> postgres_fdw changes in detail here, so this is just about the core
> changes.

Thank you for taking the time to review the patch!

> I see that show_plan_tlist checks whether the operation is any of
> CMD_INSERT, CMD_UPDATE, or CMD_DELETE.  But practically every place
> else where a similar test is needed instead tests whether the
> operation is *not* CMD_SELECT.  I think this place should do it that
> way, too.
>
> +               resultRelInfo = mtstate->resultRelInfo;
>                  for (i = 0; i < nplans; i++)
>                  {
>                          ExecAuxRowMark *aerm;
>
> +                       /*
> +                        * ignore subplan if the FDW pushes down the
> command to the remote
> +                        * server; the ModifyTable won't have anything
> to do except for
> +                        * evaluation of RETURNING expressions
> +                        */
> +                       if (resultRelInfo->ri_FdwPushdown)
> +                       {
> +                               resultRelInfo++;
> +                               continue;
> +                       }
> +
>                          subplan = mtstate->mt_plans[i]->plan;
>                          aerm = ExecBuildAuxRowMark(erm, subplan->targetlist);
>                          mtstate->mt_arowmarks[i] =
> lappend(mtstate->mt_arowmarks[i], aerm);
> +                       resultRelInfo++;
>                  }
>
>
> This kind of thing creates a hazard for future people maintaining this
> code.  If somebody adds some code to this loop that needs to execute
> even when resultRelInfo->ri_FdwPushdown is true, they have to add two
> copies of it.  It's much better to move the three lines of logic that
> execute only in the non-pushdown case inside of if
> (!resultRelInfo->ri_FdwPushdown).

Another option to avoid such a hazard would be to remove the two changes 
from ExecInitModifyTable and create ExecAuxRowMarks and junk filters 
even in the pushdown case.  I made the changes because we won't use 
ExecAuxRowMarks in that case since we don't need to do EvalPlanQual 
rechecks and because we won't use junk filters in that case since we do 
UPDATE/DELETE in the subplan.  But the creating cost is enough small, so 
simply removing the changes seems like a good idea.

> This issue crops up elsewhere as well.  The changes to
> ExecModifyTable() have the same problem -- in that case, it might be
> wise to move the code that's going to have to be indented yet another
> level into a separate function.   That code also says this:
>
> +                       /* No need to provide scan tuple to
> ExecProcessReturning. */
> +                       slot = ExecProcessReturning(resultRelInfo,
> NULL, planSlot);
>
> ...but, uh, why not?  The comment says what the code does, but what it
> should do is explain why it does it.

As documented in IterateDMLPushdown in fdwhandler.sgml, the reason for 
that is that in the pushdown case it's the IterateDMLPushdown's 
responsiblity to get actually inserted/updated/deleted tuples and make 
those tuples available to the ExecProcessReturning.  I'll add comments.

> On a broader level, I'm not very happy with the naming this patch
> uses.  Here's an example:
>
> +    <para>
> +     If an FDW supports optimizing foreign table updates, it still needs to
> +     provide <function>PlanDMLPushdown</>, <function>BeginDMLPushdown</>,
> +     <function>IterateDMLPushdown</> and <function>EndDMLPushdown</>
> +     described below.
> +    </para>
>
> "Optimizing foreign table updates" is both inaccurate (since it
> doesn't only optimize updates) and so vague as to be meaningless
> unless you already know what it means.  The actual patch uses
> terminology like "fdwPushdowns" which is just as bad.  We might push a
> lot of things to the foreign side -- sorts, joins, aggregates, limits
> -- and this is just one of them.  Worse, "pushdown" is itself
> something of a term of art - will people who haven't been following
> all of the mammoth, multi-hundred-email threads on this topic know
> what that means?  I think we need some better terminology here.
>
> The best thing that I can come up with offhand is "bulk modify".  So
> we'd have PlanBulkModify, BeginBulkModify, IterateBulkModify,
> EndBulkModify, ExplainBulkModify.  Other suggestions welcome.   The
> ResultRelInfo flag could be ri_usesFDWBulkModify.

I'm not sure that "bulk modify" is best.  Yeah, this would improve the 
performance especially in the bulk-modification case, but would improve 
the performance even in the case where an UPDATE/DELETE modifies just a 
single row.  Let me explain using an example.  Without the patch, we 
have the following plan for an UPDATE on a foreign table that updates a 
single row:

postgres=# explain verbose update foo set a = a + 1 where a = 1;                                    QUERY PLAN
---------------------------------------------------------------------------------- Update on public.foo
(cost=100.00..101.05rows=1 width=14)   Remote SQL: UPDATE public.foo SET a = $2 WHERE ctid = $1   ->  Foreign Scan on
public.foo (cost=100.00..101.05 rows=1 width=14)         Output: (a + 1), b, ctid         Remote SQL: SELECT a, b, ctid
FROMpublic.foo WHERE ((a = 1)) 
 
FOR UPDATE
(5 rows)

The plan requires two queries, SELECT and UPDATE, to do the update. 
(Actually, the plan have additional overheads in creating a cursor for 
the SELECT and establishing a prepared statement for the UPDATE.)  But 
with the patch, we have:

postgres=# explain verbose update foo set a = a + 1 where a = 1;                                QUERY PLAN
--------------------------------------------------------------------------- Update on public.foo  (cost=100.00..101.05
rows=1width=14)   ->  Foreign Update on public.foo  (cost=100.00..101.05 rows=1 width=14)         Remote SQL: UPDATE
public.fooSET a = (a + 1) WHERE ((a = 1))
 
(3 rows)

The optimized plan requires just a single UPDATE query to do that!  So, 
even in the single-row-modification case the patch could improve the 
performance.

How about "Direct Modify"; PlanDirectModify, BeginDirectModify, 
IterateDirectModify, EndDirectModify, ExplainDirectModify, and 
ri_usesFDWDirectModify.

> The documentation
> could say something like this:
>
> Some inserts, updates, and deletes to foreign tables can be optimized
> by implementing an alternate set of interfaces.  The ordinary
> interfaces for inserts, updates, and deletes fetch rows from the
> remote server and then modify those rows one at a time.  In some
> cases, this row-by-row approach is necessary, but it can be
> inefficient.  If it is possible for the foreign server to determine
> which rows should be modified without actually retrieving them, and if
> there are no local triggers which would affect the operation, then it
> is possible to arrange things so that the entire operation is
> performed on the remote server.  The interfaces described below make
> this possible.

Will update as proposed.

> +     Begin executing a foreign table update directly on the remote server.
>
> I think this should say "Prepare to execute a bulk modification
> directly on the remote server".  It shouldn't actually begin the
> execution phase.
>
> +     End the table update and release resources.  It is normally not important
>
> And I think this one should say "Clean up following a bulk
> modification on the remote server".  It's not actually ending the
> update; the iterate method already did that.

OK, will fix.

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Mon, Mar 7, 2016 at 7:53 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> Another option to avoid such a hazard would be to remove the two changes
> from ExecInitModifyTable and create ExecAuxRowMarks and junk filters even in
> the pushdown case.  I made the changes because we won't use ExecAuxRowMarks
> in that case since we don't need to do EvalPlanQual rechecks and because we
> won't use junk filters in that case since we do UPDATE/DELETE in the
> subplan.  But the creating cost is enough small, so simply removing the
> changes seems like a good idea.

Sure, that works.

>> This issue crops up elsewhere as well.  The changes to
>> ExecModifyTable() have the same problem -- in that case, it might be
>> wise to move the code that's going to have to be indented yet another
>> level into a separate function.   That code also says this:
>>
>> +                       /* No need to provide scan tuple to
>> ExecProcessReturning. */
>> +                       slot = ExecProcessReturning(resultRelInfo,
>> NULL, planSlot);
>>
>> ...but, uh, why not?  The comment says what the code does, but what it
>> should do is explain why it does it.
>
> As documented in IterateDMLPushdown in fdwhandler.sgml, the reason for that
> is that in the pushdown case it's the IterateDMLPushdown's responsiblity to
> get actually inserted/updated/deleted tuples and make those tuples available
> to the ExecProcessReturning.  I'll add comments.

Comments are good things to have.  :-)

>> On a broader level, I'm not very happy with the naming this patch
>> uses.  Here's an example:
>>
>> +    <para>
>> +     If an FDW supports optimizing foreign table updates, it still needs
>> to
>> +     provide <function>PlanDMLPushdown</>, <function>BeginDMLPushdown</>,
>> +     <function>IterateDMLPushdown</> and <function>EndDMLPushdown</>
>> +     described below.
>> +    </para>
>>
>> "Optimizing foreign table updates" is both inaccurate (since it
>> doesn't only optimize updates) and so vague as to be meaningless
>> unless you already know what it means.  The actual patch uses
>> terminology like "fdwPushdowns" which is just as bad.  We might push a
>> lot of things to the foreign side -- sorts, joins, aggregates, limits
>> -- and this is just one of them.  Worse, "pushdown" is itself
>> something of a term of art - will people who haven't been following
>> all of the mammoth, multi-hundred-email threads on this topic know
>> what that means?  I think we need some better terminology here.
>>
>> The best thing that I can come up with offhand is "bulk modify".  So
>> we'd have PlanBulkModify, BeginBulkModify, IterateBulkModify,
>> EndBulkModify, ExplainBulkModify.  Other suggestions welcome.   The
>> ResultRelInfo flag could be ri_usesFDWBulkModify.
>
> I'm not sure that "bulk modify" is best.  Yeah, this would improve the
> performance especially in the bulk-modification case, but would improve the
> performance even in the case where an UPDATE/DELETE modifies just a single
> row.  Let me explain using an example.  Without the patch, we have the
> following plan for an UPDATE on a foreign table that updates a single row:
>
> postgres=# explain verbose update foo set a = a + 1 where a = 1;
>                                     QUERY PLAN
> ----------------------------------------------------------------------------------
>  Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
>    Remote SQL: UPDATE public.foo SET a = $2 WHERE ctid = $1
>    ->  Foreign Scan on public.foo  (cost=100.00..101.05 rows=1 width=14)
>          Output: (a + 1), b, ctid
>          Remote SQL: SELECT a, b, ctid FROM public.foo WHERE ((a = 1)) FOR
> UPDATE
> (5 rows)
>
> The plan requires two queries, SELECT and UPDATE, to do the update.
> (Actually, the plan have additional overheads in creating a cursor for the
> SELECT and establishing a prepared statement for the UPDATE.)  But with the
> patch, we have:
>
> postgres=# explain verbose update foo set a = a + 1 where a = 1;
>                                 QUERY PLAN
> ---------------------------------------------------------------------------
>  Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
>    ->  Foreign Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
>          Remote SQL: UPDATE public.foo SET a = (a + 1) WHERE ((a = 1))
> (3 rows)
>
> The optimized plan requires just a single UPDATE query to do that!  So, even
> in the single-row-modification case the patch could improve the performance.
>
> How about "Direct Modify"; PlanDirectModify, BeginDirectModify,
> IterateDirectModify, EndDirectModify, ExplainDirectModify, and
> ri_usesFDWDirectModify.

Works for me!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/03/08 2:35, Robert Haas wrote:
> On Mon, Mar 7, 2016 at 7:53 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> Another option to avoid such a hazard would be to remove the two changes
>> from ExecInitModifyTable and create ExecAuxRowMarks and junk filters even in
>> the pushdown case.  I made the changes because we won't use ExecAuxRowMarks
>> in that case since we don't need to do EvalPlanQual rechecks and because we
>> won't use junk filters in that case since we do UPDATE/DELETE in the
>> subplan.  But the creating cost is enough small, so simply removing the
>> changes seems like a good idea.

> Sure, that works.

OK, I removed the changes.

>>> This issue crops up elsewhere as well.  The changes to
>>> ExecModifyTable() have the same problem -- in that case, it might be
>>> wise to move the code that's going to have to be indented yet another
>>> level into a separate function.   That code also says this:
>>>
>>> +                       /* No need to provide scan tuple to
>>> ExecProcessReturning. */
>>> +                       slot = ExecProcessReturning(resultRelInfo,
>>> NULL, planSlot);
>>>
>>> ...but, uh, why not?  The comment says what the code does, but what it
>>> should do is explain why it does it.
>>
>> As documented in IterateDMLPushdown in fdwhandler.sgml, the reason for that
>> is that in the pushdown case it's the IterateDMLPushdown's responsiblity to
>> get actually inserted/updated/deleted tuples and make those tuples available
>> to the ExecProcessReturning.  I'll add comments.

> Comments are good things to have.  :-)

Yeah, I added comments.

>>> On a broader level, I'm not very happy with the naming this patch
>>> uses.  Here's an example:
>>>
>>> +    <para>
>>> +     If an FDW supports optimizing foreign table updates, it still needs
>>> to
>>> +     provide <function>PlanDMLPushdown</>, <function>BeginDMLPushdown</>,
>>> +     <function>IterateDMLPushdown</> and <function>EndDMLPushdown</>
>>> +     described below.
>>> +    </para>
>>>
>>> "Optimizing foreign table updates" is both inaccurate (since it
>>> doesn't only optimize updates) and so vague as to be meaningless
>>> unless you already know what it means.  The actual patch uses
>>> terminology like "fdwPushdowns" which is just as bad.  We might push a
>>> lot of things to the foreign side -- sorts, joins, aggregates, limits
>>> -- and this is just one of them.  Worse, "pushdown" is itself
>>> something of a term of art - will people who haven't been following
>>> all of the mammoth, multi-hundred-email threads on this topic know
>>> what that means?  I think we need some better terminology here.
>>>
>>> The best thing that I can come up with offhand is "bulk modify".  So
>>> we'd have PlanBulkModify, BeginBulkModify, IterateBulkModify,
>>> EndBulkModify, ExplainBulkModify.  Other suggestions welcome.   The
>>> ResultRelInfo flag could be ri_usesFDWBulkModify.
>>
>> I'm not sure that "bulk modify" is best.  Yeah, this would improve the
>> performance especially in the bulk-modification case, but would improve the
>> performance even in the case where an UPDATE/DELETE modifies just a single
>> row.  Let me explain using an example.  Without the patch, we have the
>> following plan for an UPDATE on a foreign table that updates a single row:
>>
>> postgres=# explain verbose update foo set a = a + 1 where a = 1;
>>                                      QUERY PLAN
>> ----------------------------------------------------------------------------------
>>   Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
>>     Remote SQL: UPDATE public.foo SET a = $2 WHERE ctid = $1
>>     ->  Foreign Scan on public.foo  (cost=100.00..101.05 rows=1 width=14)
>>           Output: (a + 1), b, ctid
>>           Remote SQL: SELECT a, b, ctid FROM public.foo WHERE ((a = 1)) FOR
>> UPDATE
>> (5 rows)
>>
>> The plan requires two queries, SELECT and UPDATE, to do the update.
>> (Actually, the plan have additional overheads in creating a cursor for the
>> SELECT and establishing a prepared statement for the UPDATE.)  But with the
>> patch, we have:
>>
>> postgres=# explain verbose update foo set a = a + 1 where a = 1;
>>                                  QUERY PLAN
>> ---------------------------------------------------------------------------
>>   Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
>>     ->  Foreign Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
>>           Remote SQL: UPDATE public.foo SET a = (a + 1) WHERE ((a = 1))
>> (3 rows)
>>
>> The optimized plan requires just a single UPDATE query to do that!  So, even
>> in the single-row-modification case the patch could improve the performance.
>>
>> How about "Direct Modify"; PlanDirectModify, BeginDirectModify,
>> IterateDirectModify, EndDirectModify, ExplainDirectModify, and
>> ri_usesFDWDirectModify.

> Works for me!

Great!  I changed the naming.  I also updated docs as proposed by you in
a previous email, and rebased the patch to the latest HEAD.  Please find
attached an updated version of the patch.

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Wed, Mar 9, 2016 at 3:30 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> Great!  I changed the naming.  I also updated docs as proposed by you in a
> previous email, and rebased the patch to the latest HEAD.  Please find
> attached an updated version of the patch.

Thanks.  The new naming looks much better (and better also than what I
suggested).

I see that you went and changed all of the places that tested for !=
CMD_SELECT and made them test for == CMD_INSERT || == CMD_UPDATE || ==
CMD_DELETE instead.  I think that's the wrong direction.  I think that
we should use the != CMD_SELECT version of the test everywhere.
That's a single test instead of three, so marginally faster, and maybe
marginally more future-proof.

I think deparsePushedDownUpdateSql should be renamed to use the new
"direct modify" naming, like deparseDirectUpdateSql, maybe.

I would suggest not numbering the tests in postgresPlanDirectModify.
That just becomes a nuisance to keep up to date as things change.

Overall, I think this is looking pretty good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Overall, I think this is looking pretty good.

I hadn't been paying any attention to this thread, but I wonder whether
this entire approach isn't considerably inferior to what we can do now
with the planner pathification patch.  To quote from the new docs:
  PlanForeignModify and the other callbacks described in Section 54.2.3  are designed around the assumption that the
foreignrelation will be  scanned in the usual way and then individual row updates will be driven  by a local
ModifyTableplan node. This approach is necessary for the  general case where an update requires reading local tables as
wellas  foreign tables. However, if the operation could be executed entirely by  the foreign server, the FDW could
generatea path representing that and  insert it into the UPPERREL_FINAL upper relation, where it would  compete against
theModifyTable approach. This approach could also be  used to implement remote SELECT FOR UPDATE, rather than using the
row locking callbacks described in Section 54.2.4. Keep in mind that a path  inserted into UPPERREL_FINAL is
responsiblefor implementing all  behavior of the query.
 

I don't really see anything that this patch does that wouldn't be better
done that way.  And I'd kind of like to get a working example of that type
of path insertion into 9.6, so that we can find out if we need any hooks
or callbacks that aren't there today.
        regards, tom lane



Re: Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Wed, Mar 9, 2016 at 1:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Overall, I think this is looking pretty good.
>
> I hadn't been paying any attention to this thread, but I wonder whether
> this entire approach isn't considerably inferior to what we can do now
> with the planner pathification patch.  To quote from the new docs:
>
>    PlanForeignModify and the other callbacks described in Section 54.2.3
>    are designed around the assumption that the foreign relation will be
>    scanned in the usual way and then individual row updates will be driven
>    by a local ModifyTable plan node. This approach is necessary for the
>    general case where an update requires reading local tables as well as
>    foreign tables. However, if the operation could be executed entirely by
>    the foreign server, the FDW could generate a path representing that and
>    insert it into the UPPERREL_FINAL upper relation, where it would
>    compete against the ModifyTable approach. This approach could also be
>    used to implement remote SELECT FOR UPDATE, rather than using the row
>    locking callbacks described in Section 54.2.4. Keep in mind that a path
>    inserted into UPPERREL_FINAL is responsible for implementing all
>    behavior of the query.
>
> I don't really see anything that this patch does that wouldn't be better
> done that way.  And I'd kind of like to get a working example of that type
> of path insertion into 9.6, so that we can find out if we need any hooks
> or callbacks that aren't there today.

Well, I guess I'm not quite seeing it.  What do you have in mind?
Just taking a guess here, you might be thinking that instead of
something like this...
 Update on public.ft2   ->  Foreign Update on public.ft2

We could boil it all the way down to this:
   Foreign Update on public.ft2

But can you, really?  What if the UPDATE is targeting an inheritance
hierarchy containing some local tables and some remote tables?

Apologies if I've completely misunderstood what you have in mind here,
but you haven't really explained what you have in mind here.

IMHO, if you want to do something really cool with the new
pathification stuff, the thing to do would be pushing down aggregates
to foreign servers.  A lot of people would be really happy to see
SELECT count(*) FROM ft ship the count operation to the remote side!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Mar 9, 2016 at 1:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I hadn't been paying any attention to this thread, but I wonder whether
>> this entire approach isn't considerably inferior to what we can do now
>> with the planner pathification patch.  To quote from the new docs:

> Well, I guess I'm not quite seeing it.  What do you have in mind?
> Just taking a guess here, you might be thinking that instead of
> something like this...

>   Update on public.ft2
>     ->  Foreign Update on public.ft2

> We could boil it all the way down to this:

>     Foreign Update on public.ft2

Exactly.  I'm not claiming that that would be particularly faster, but
it would eliminate a whole bunch of seriously ugly stuff that's in
this patch.

> But can you, really?  What if the UPDATE is targeting an inheritance
> hierarchy containing some local tables and some remote tables?

I don't really see why that couldn't be made to work.  You'd end up
with ForeignUpdates on the remote tables and a ModifyTable handling
the rest.
        regards, tom lane



Re: Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Wed, Mar 9, 2016 at 3:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Mar 9, 2016 at 1:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I hadn't been paying any attention to this thread, but I wonder whether
>>> this entire approach isn't considerably inferior to what we can do now
>>> with the planner pathification patch.  To quote from the new docs:
>
>> Well, I guess I'm not quite seeing it.  What do you have in mind?
>> Just taking a guess here, you might be thinking that instead of
>> something like this...
>
>>   Update on public.ft2
>>     ->  Foreign Update on public.ft2
>
>> We could boil it all the way down to this:
>
>>     Foreign Update on public.ft2
>
> Exactly.  I'm not claiming that that would be particularly faster, but
> it would eliminate a whole bunch of seriously ugly stuff that's in
> this patch.

Like what?

>> But can you, really?  What if the UPDATE is targeting an inheritance
>> hierarchy containing some local tables and some remote tables?
>
> I don't really see why that couldn't be made to work.  You'd end up
> with ForeignUpdates on the remote tables and a ModifyTable handling
> the rest.

I don't get it.  I mean, what's the parent node going to be?  If it's
the ModifyTable, then the plan tree looks the same as what this
already does.  If not, then what?

Just to recap the history, this patch was rewritten, criticized by
Stephen and you and rewritten to match your feedback.  Then, both of
you ignored it for a long time while I and others but a lot of work
into it.  Now, we're up against the deadline for this release and you
don't like it again.  Well, OK.  If you want to rewrite it into some
form you think is better, I'm cool with that.  But it would be really
unfair if this slipped out of this release after so much work has been
put into making it match the design ideas that *you* put forward just
because, at the eleventh hour, you now have new ones.  Personally, I
think we should just commit the darned thing and you can rewrite it
later if you want.  If you want to rewrite it now instead, I can live
with that, too.  But let's figure out how we're going to move this
forward.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/03/10 19:51, Robert Haas wrote:
> On Wed, Mar 9, 2016 at 3:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> Just taking a guess here, you might be thinking that instead of
>>> something like this...
>>
>>>    Update on public.ft2
>>>      ->  Foreign Update on public.ft2
>>
>>> We could boil it all the way down to this:
>>
>>>      Foreign Update on public.ft2
>>
>> Exactly.  I'm not claiming that that would be particularly faster, but
>> it would eliminate a whole bunch of seriously ugly stuff that's in
>> this patch.

>>> But can you, really?  What if the UPDATE is targeting an inheritance
>>> hierarchy containing some local tables and some remote tables?
>>
>> I don't really see why that couldn't be made to work.  You'd end up
>> with ForeignUpdates on the remote tables and a ModifyTable handling
>> the rest.
>
> I don't get it.  I mean, what's the parent node going to be?  If it's
> the ModifyTable, then the plan tree looks the same as what this
> already does.  If not, then what?

I don't get it, either.  If the ForeignUpdates would be executed 
separately from the ModifyTable, how would the query's reported row 
count (ie, estate->es_processed) be handled?

> Just to recap the history, this patch was rewritten, criticized by
> Stephen and you and rewritten to match your feedback.  Then, both of
> you ignored it for a long time while I and others but a lot of work
> into it.  Now, we're up against the deadline for this release and you
> don't like it again.  Well, OK.  If you want to rewrite it into some
> form you think is better, I'm cool with that.  But it would be really
> unfair if this slipped out of this release after so much work has been
> put into making it match the design ideas that *you* put forward just
> because, at the eleventh hour, you now have new ones.  Personally, I
> think we should just commit the darned thing and you can rewrite it
> later if you want.  If you want to rewrite it now instead, I can live
> with that, too.  But let's figure out how we're going to move this
> forward.

I agree with Robert.

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/03/10 2:56, Robert Haas wrote:
> I see that you went and changed all of the places that tested for !=
> CMD_SELECT and made them test for == CMD_INSERT || == CMD_UPDATE || ==
> CMD_DELETE instead.  I think that's the wrong direction.  I think that
> we should use the != CMD_SELECT version of the test everywhere.
> That's a single test instead of three, so marginally faster, and maybe
> marginally more future-proof.
>
> I think deparsePushedDownUpdateSql should be renamed to use the new
> "direct modify" naming, like deparseDirectUpdateSql, maybe.
>
> I would suggest not numbering the tests in postgresPlanDirectModify.
> That just becomes a nuisance to keep up to date as things change.

Agreed.  I updated the patch to address these comments.  Attached is the
updated version of the patch.

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Fri, Mar 18, 2016 at 5:15 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/03/10 2:56, Robert Haas wrote:
>> I see that you went and changed all of the places that tested for !=
>> CMD_SELECT and made them test for == CMD_INSERT || == CMD_UPDATE || ==
>> CMD_DELETE instead.  I think that's the wrong direction.  I think that
>> we should use the != CMD_SELECT version of the test everywhere.
>> That's a single test instead of three, so marginally faster, and maybe
>> marginally more future-proof.
>>
>> I think deparsePushedDownUpdateSql should be renamed to use the new
>> "direct modify" naming, like deparseDirectUpdateSql, maybe.
>>
>> I would suggest not numbering the tests in postgresPlanDirectModify.
>> That just becomes a nuisance to keep up to date as things change.
>
> Agreed.  I updated the patch to address these comments.  Attached is the
> updated version of the patch.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/03/19 3:30, Robert Haas wrote:
> On Fri, Mar 18, 2016 at 5:15 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> Attached is the updated version of the patch.

> Committed.

Thank you.

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Thom Brown
Date:
On 22 March 2016 at 02:30, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/03/19 3:30, Robert Haas wrote:
>>
>> On Fri, Mar 18, 2016 at 5:15 AM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>>
>>> Attached is the updated version of the patch.

I've noticed that you now can't cancel a query if there's DML pushdown
to a foreign server.  This previously worked while it was sending
individual statements as it interrupted and rolled it back.

Here's what the local server sees when trying to cancel:

# DELETE FROM remote.contacts;
^CCancel request sent
DELETE 5000000

This should probably be fixed.

Thom



Re: Optimization for updating foreign tables in Postgres FDW

From
Michael Paquier
Date:
On Wed, Mar 23, 2016 at 10:05 PM, Thom Brown <thom@linux.com> wrote:
> I've noticed that you now can't cancel a query if there's DML pushdown
> to a foreign server.  This previously worked while it was sending
> individual statements as it interrupted and rolled it back.
>
> Here's what the local server sees when trying to cancel:
>
> # DELETE FROM remote.contacts;
> ^CCancel request sent
> DELETE 5000000
>
> This should probably be fixed.

Looking at what has been committed, execute_dml_stmt is using
PQexecParams, so we'd want to use an asynchronous call and loop on
PQgetResult with CHECK_FOR_INTERRUPTS() in it.
-- 
Michael



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/03/24 11:14, Michael Paquier wrote:
> On Wed, Mar 23, 2016 at 10:05 PM, Thom Brown <thom@linux.com> wrote:
>> I've noticed that you now can't cancel a query if there's DML pushdown
>> to a foreign server.  This previously worked while it was sending
>> individual statements as it interrupted and rolled it back.
>>
>> Here's what the local server sees when trying to cancel:
>>
>> # DELETE FROM remote.contacts;
>> ^CCancel request sent
>> DELETE 5000000
>>
>> This should probably be fixed.

> Looking at what has been committed, execute_dml_stmt is using
> PQexecParams, so we'd want to use an asynchronous call and loop on
> PQgetResult with CHECK_FOR_INTERRUPTS() in it.

Will fix.

Thanks for the report, Thom!  Thanks for the advice, Michael!

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Michael Paquier
Date:
On Thu, Mar 24, 2016 at 1:02 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/03/24 11:14, Michael Paquier wrote:
>>
>> On Wed, Mar 23, 2016 at 10:05 PM, Thom Brown <thom@linux.com> wrote:
>>>
>>> I've noticed that you now can't cancel a query if there's DML pushdown
>>> to a foreign server.  This previously worked while it was sending
>>> individual statements as it interrupted and rolled it back.
>>>
>>> Here's what the local server sees when trying to cancel:
>>>
>>> # DELETE FROM remote.contacts;
>>> ^CCancel request sent
>>> DELETE 5000000
>>>
>>> This should probably be fixed.
>
>
>> Looking at what has been committed, execute_dml_stmt is using
>> PQexecParams, so we'd want to use an asynchronous call and loop on
>> PQgetResult with CHECK_FOR_INTERRUPTS() in it.
>
>
> Will fix.
>
> Thanks for the report, Thom!  Thanks for the advice, Michael!

I am adding that to the list of open items of 9.6 to not forget about it.
-- 
Michael



Re: Optimization for updating foreign tables in Postgres FDW

From
Michael Paquier
Date:
On Fri, Mar 25, 2016 at 3:10 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Mar 24, 2016 at 1:02 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> Thanks for the report, Thom!  Thanks for the advice, Michael!
>
> I am adding that to the list of open items of 9.6 to not forget about it.

My bad. Thom did it already :p
-- 
Michael



Re: Optimization for updating foreign tables in Postgres FDW

From
Noah Misch
Date:
On Thu, Mar 24, 2016 at 01:02:57PM +0900, Etsuro Fujita wrote:
> On 2016/03/24 11:14, Michael Paquier wrote:
> >On Wed, Mar 23, 2016 at 10:05 PM, Thom Brown <thom@linux.com> wrote:
> >>I've noticed that you now can't cancel a query if there's DML pushdown
> >>to a foreign server.  This previously worked while it was sending
> >>individual statements as it interrupted and rolled it back.
> >>
> >>Here's what the local server sees when trying to cancel:
> >>
> >># DELETE FROM remote.contacts;
> >>^CCancel request sent
> >>DELETE 5000000
> >>
> >>This should probably be fixed.
> 
> >Looking at what has been committed, execute_dml_stmt is using
> >PQexecParams, so we'd want to use an asynchronous call and loop on
> >PQgetResult with CHECK_FOR_INTERRUPTS() in it.
> 
> Will fix.
> 
> Thanks for the report, Thom!  Thanks for the advice, Michael!

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this.  Since new open items may be discovered at
any time and I want to plan to have them all fixed well in advance of the ship
date, I will appreciate your efforts toward speedy resolution.  Please
present, within 72 hours, a plan to fix the defect within seven days of this
message.  Thanks.



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/03/31 14:07, Noah Misch wrote:
> On Thu, Mar 24, 2016 at 01:02:57PM +0900, Etsuro Fujita wrote:
>> On 2016/03/24 11:14, Michael Paquier wrote:
>>> On Wed, Mar 23, 2016 at 10:05 PM, Thom Brown <thom@linux.com> wrote:
>>>> I've noticed that you now can't cancel a query if there's DML pushdown
>>>> to a foreign server.  This previously worked while it was sending
>>>> individual statements as it interrupted and rolled it back.
>>>>
>>>> Here's what the local server sees when trying to cancel:
>>>>
>>>> # DELETE FROM remote.contacts;
>>>> ^CCancel request sent
>>>> DELETE 5000000
>>>>
>>>> This should probably be fixed.

>>> Looking at what has been committed, execute_dml_stmt is using
>>> PQexecParams, so we'd want to use an asynchronous call and loop on
>>> PQgetResult with CHECK_FOR_INTERRUPTS() in it.

>> Will fix.

> [This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
> since you committed the patch believed to have created it, you own this open
> item.  If that responsibility lies elsewhere, please let us know whose
> responsibility it is to fix this.  Since new open items may be discovered at
> any time and I want to plan to have them all fixed well in advance of the ship
> date, I will appreciate your efforts toward speedy resolution.  Please
> present, within 72 hours, a plan to fix the defect within seven days of this
> message.  Thanks.

Sorry for not having taken any action.  I've been busy with another task 
lately, but I started working on this.  I plan to post a patch early 
next week.

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/03/31 16:38, Etsuro Fujita wrote:
> On 2016/03/31 14:07, Noah Misch wrote:
>> On Thu, Mar 24, 2016 at 01:02:57PM +0900, Etsuro Fujita wrote:
>>> On 2016/03/24 11:14, Michael Paquier wrote:
>>>> On Wed, Mar 23, 2016 at 10:05 PM, Thom Brown <thom@linux.com> wrote:
>>>>> I've noticed that you now can't cancel a query if there's DML pushdown
>>>>> to a foreign server.  This previously worked while it was sending
>>>>> individual statements as it interrupted and rolled it back.
>>>>>
>>>>> Here's what the local server sees when trying to cancel:
>>>>>
>>>>> # DELETE FROM remote.contacts;
>>>>> ^CCancel request sent
>>>>> DELETE 5000000
>>>>>
>>>>> This should probably be fixed.

>>>> Looking at what has been committed, execute_dml_stmt is using
>>>> PQexecParams, so we'd want to use an asynchronous call and loop on
>>>> PQgetResult with CHECK_FOR_INTERRUPTS() in it.

>>> Will fix.

>> [This is a generic notification.]

> Sorry for not having taken any action.  I've been busy with another task
> lately, but I started working on this.  I plan to post a patch early
> next week.

Here is a patch to fix this issue.  As proposed by Michael, I modified
execute_dml_stmt so that it uses PQsendQueryParams, not PQexecParams.
Any comments are welcome.

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Michael Paquier
Date:
On Mon, Apr 4, 2016 at 7:49 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/03/31 16:38, Etsuro Fujita wrote:
>>
>> On 2016/03/31 14:07, Noah Misch wrote:
>>>
>>> On Thu, Mar 24, 2016 at 01:02:57PM +0900, Etsuro Fujita wrote:
>>>>
>>>> On 2016/03/24 11:14, Michael Paquier wrote:
>>>>>
>>>>> On Wed, Mar 23, 2016 at 10:05 PM, Thom Brown <thom@linux.com> wrote:
>>>>>>
>>>>>> I've noticed that you now can't cancel a query if there's DML pushdown
>>>>>> to a foreign server.  This previously worked while it was sending
>>>>>> individual statements as it interrupted and rolled it back.
>>>>>>
>>>>>> Here's what the local server sees when trying to cancel:
>>>>>>
>>>>>> # DELETE FROM remote.contacts;
>>>>>> ^CCancel request sent
>>>>>> DELETE 5000000
>>>>>>
>>>>>> This should probably be fixed.
>
>
>>>>> Looking at what has been committed, execute_dml_stmt is using
>>>>> PQexecParams, so we'd want to use an asynchronous call and loop on
>>>>> PQgetResult with CHECK_FOR_INTERRUPTS() in it.
>
>
>>>> Will fix.
>
>
>>> [This is a generic notification.]
>
>
>> Sorry for not having taken any action.  I've been busy with another task
>> lately, but I started working on this.  I plan to post a patch early
>> next week.
>
>
> Here is a patch to fix this issue.  As proposed by Michael, I modified
> execute_dml_stmt so that it uses PQsendQueryParams, not PQexecParams. Any
> comments are welcome.

+  * This is based on pqSocketCheck.
+  */
+ bool
+ CheckSocket(PGconn *conn)
+ {
+     int            ret;
+
+     Assert(conn != NULL);
Instead of copying again pqSocketQuery, which is as well copied in
libpqwalreceiver.c, wouldn't it be better to use WaitLatchOrSocket
with the socket returned by PQsocket?
-- 
Michael



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/04/04 20:35, Michael Paquier wrote:
> On Mon, Apr 4, 2016 at 7:49 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> Here is a patch to fix this issue.  As proposed by Michael, I modified
>> execute_dml_stmt so that it uses PQsendQueryParams, not PQexecParams. Any
>> comments are welcome.

> +  * This is based on pqSocketCheck.
> +  */
> + bool
> + CheckSocket(PGconn *conn)
> + {
> +     int            ret;
> +
> +     Assert(conn != NULL);
> Instead of copying again pqSocketQuery, which is as well copied in
> libpqwalreceiver.c, wouldn't it be better to use WaitLatchOrSocket
> with the socket returned by PQsocket?

Will check.  Thanks for the comment!

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Noah Misch
Date:
On Tue, Apr 05, 2016 at 03:22:03PM +0900, Etsuro Fujita wrote:
> On 2016/04/04 20:35, Michael Paquier wrote:
> >On Mon, Apr 4, 2016 at 7:49 PM, Etsuro Fujita
> ><fujita.etsuro@lab.ntt.co.jp> wrote:
> >>Here is a patch to fix this issue.  As proposed by Michael, I modified
> >>execute_dml_stmt so that it uses PQsendQueryParams, not PQexecParams. Any
> >>comments are welcome.
> 
> >+  * This is based on pqSocketCheck.
> >+  */
> >+ bool
> >+ CheckSocket(PGconn *conn)
> >+ {
> >+     int            ret;
> >+
> >+     Assert(conn != NULL);
> >Instead of copying again pqSocketQuery, which is as well copied in
> >libpqwalreceiver.c, wouldn't it be better to use WaitLatchOrSocket
> >with the socket returned by PQsocket?
> 
> Will check.  Thanks for the comment!

What do you think?  This open item's seven-day deadline has passed.  It would
help keep things moving to know whether you consider your latest patch optimal
or whether you wish to change it the way Michael described.



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/04/08 13:42, Noah Misch wrote:
> On Tue, Apr 05, 2016 at 03:22:03PM +0900, Etsuro Fujita wrote:
>> On 2016/04/04 20:35, Michael Paquier wrote:
>>> On Mon, Apr 4, 2016 at 7:49 PM, Etsuro Fujita
>>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>>> Here is a patch to fix this issue.  As proposed by Michael, I modified
>>>> execute_dml_stmt so that it uses PQsendQueryParams, not PQexecParams. Any
>>>> comments are welcome.

>>> +  * This is based on pqSocketCheck.
>>> +  */
>>> + bool
>>> + CheckSocket(PGconn *conn)
>>> + {
>>> +     int            ret;
>>> +
>>> +     Assert(conn != NULL);
>>> Instead of copying again pqSocketQuery, which is as well copied in
>>> libpqwalreceiver.c, wouldn't it be better to use WaitLatchOrSocket
>>> with the socket returned by PQsocket?

>> Will check.  Thanks for the comment!

> What do you think?  This open item's seven-day deadline has passed.  It would
> help keep things moving to know whether you consider your latest patch optimal
> or whether you wish to change it the way Michael described.

I wish to change it that way because it not only avoids the duplicate
but fixes a bug in the previous patch that I overlooked that there is a
race condition if a signal arrives just before entering the CheckSocket.

Attached is an updated version of the patch.

Sorry for the delay.

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Fri, Apr 8, 2016 at 3:05 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>> What do you think?  This open item's seven-day deadline has passed.  It
>> would
>> help keep things moving to know whether you consider your latest patch
>> optimal
>> or whether you wish to change it the way Michael described.
>
> I wish to change it that way because it not only avoids the duplicate but
> fixes a bug in the previous patch that I overlooked that there is a race
> condition if a signal arrives just before entering the CheckSocket.
>
> Attached is an updated version of the patch.

The comment just before the second hunk in the patch says:
      * We don't use a PG_TRY block here, so be careful not to throw error      * without releasing the PGresult.

But the patch adds a whole bunch of new things there that seem like
they can error out, like CHECK_FOR_INTERRUPTS(), for example.  Isn't
that a problem?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Rushabh Lathia
Date:


On Fri, Apr 8, 2016 at 6:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Apr 8, 2016 at 3:05 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>> What do you think?  This open item's seven-day deadline has passed.  It
>> would
>> help keep things moving to know whether you consider your latest patch
>> optimal
>> or whether you wish to change it the way Michael described.
>
> I wish to change it that way because it not only avoids the duplicate but
> fixes a bug in the previous patch that I overlooked that there is a race
> condition if a signal arrives just before entering the CheckSocket.
>
> Attached is an updated version of the patch.

The comment just before the second hunk in the patch says:

       * We don't use a PG_TRY block here, so be careful not to throw error
       * without releasing the PGresult.

But the patch adds a whole bunch of new things there that seem like
they can error out, like CHECK_FOR_INTERRUPTS(), for example.  Isn't
that a problem?

Basically we fetching the PGresult, after the newly added hunk, so there
should not be any problem.

But yes comment is definitely at wrong place.

PFA patch with correction.

 
 
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Rushabh Lathia
Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/04/08 22:21, Rushabh Lathia wrote:
> On Fri, Apr 8, 2016 at 6:28 PM, Robert Haas <robertmhaas@gmail.com
> <mailto:robertmhaas@gmail.com>> wrote:

>     The comment just before the second hunk in the patch says:
>
>             * We don't use a PG_TRY block here, so be careful not to
>     throw error
>             * without releasing the PGresult.
>
>     But the patch adds a whole bunch of new things there that seem like
>     they can error out, like CHECK_FOR_INTERRUPTS(), for example.  Isn't
>     that a problem?

> Basically we fetching the PGresult, after the newly added hunk, so there
> should not be any problem.
>
> But yes comment is definitely at wrong place.
>
> PFA patch with correction.

I agree with Rushabh.  Thanks for updating the patch!

Another thing I'd like to propose to revise the patch is to call
pgfdw_report_error in the newly added hunk, with the clear argument set
to *false*.  The PGresult argument is NULL there, so no need to release
the PGresult.

Attached is an updated patch.

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Michael Paquier
Date:
On Mon, Apr 11, 2016 at 11:30 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> I agree with Rushabh.  Thanks for updating the patch!

Yes, not using a PG_TRY/CATCH block is better. We are not doing
anything that need to clean up PGresult in case of an unplanned
failure.

> Another thing I'd like to propose to revise the patch is to call
> pgfdw_report_error in the newly added hunk, with the clear argument set to
> *false*.  The PGresult argument is NULL there, so no need to release the
> PGresult.

Sure, this saves a couple of cycles. PQclear is anyway smart enough to
handle NULL results correctly, but this way is better.

> Attached is an updated patch.

+       if (wc & WL_SOCKET_READABLE)
+       {
+           if (!PQconsumeInput(dmstate->conn))
+               pgfdw_report_error(ERROR, NULL, dmstate->conn, false,
+                                  dmstate->query);
+       }
OK, so here we would fail with ERRCODE_CONNECTION_FAILURE in the case
where the socket is readable but no data can be consumed. I guess it
makes sense.

+                       if ((cancel = PQgetCancel(entry->conn)))
+                       {
+                           PQcancel(cancel, errbuf, sizeof(errbuf));
+                           PQfreeCancel(cancel);
+                       }
Wouldn't it be better to issue a WARNING here if PQcancel does not succeed?
-- 
Michael



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/04/11 12:30, Michael Paquier wrote:
> +                       if ((cancel = PQgetCancel(entry->conn)))
> +                       {
> +                           PQcancel(cancel, errbuf, sizeof(errbuf));
> +                           PQfreeCancel(cancel);
> +                       }
> Wouldn't it be better to issue a WARNING here if PQcancel does not succeed?

Seems like a good idea.  Attached is an updated version of the patch.

Thanks for the review!

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Michael Paquier
Date:
On Mon, Apr 11, 2016 at 5:16 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/04/11 12:30, Michael Paquier wrote:
>>
>> +                       if ((cancel = PQgetCancel(entry->conn)))
>> +                       {
>> +                           PQcancel(cancel, errbuf, sizeof(errbuf));
>> +                           PQfreeCancel(cancel);
>> +                       }
>> Wouldn't it be better to issue a WARNING here if PQcancel does not
>> succeed?
>
> Seems like a good idea.  Attached is an updated version of the patch.

Thanks for the new version. The patch looks good to me.
-- 
Michael



Re: Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Mon, Apr 11, 2016 at 9:45 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Apr 11, 2016 at 5:16 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> On 2016/04/11 12:30, Michael Paquier wrote:
>>>
>>> +                       if ((cancel = PQgetCancel(entry->conn)))
>>> +                       {
>>> +                           PQcancel(cancel, errbuf, sizeof(errbuf));
>>> +                           PQfreeCancel(cancel);
>>> +                       }
>>> Wouldn't it be better to issue a WARNING here if PQcancel does not
>>> succeed?
>>
>> Seems like a good idea.  Attached is an updated version of the patch.
>
> Thanks for the new version. The patch looks good to me.

I'm wondering why we are fixing this specific case and not any of the
other calls to PQexec() or PQexecParams() in postgres_fdw.c.

I mean, many of those instances are cases where the query isn't likely
to run for very long, but certainly "FETCH %d FROM c%u" is in theory
just as bad as the new code introduced in 9.6.  In practice, it
probably isn't, because we're probably only fetching 50 rows at a time
rather than potentially a lot more, but if we're fixing this code up
to be interrupt-safe, maybe we should fix it all at the same time.
Even for the short-running queries like CLOSE and DEALLOCATE, it seems
possible that there could be a network-related hang which you might
want to interrupt.

How about we encapsulate the while (PQisBusy(...)) loop into a new
function pgfdw_get_result(), which can be called after first calling
PQsendQueryParams()?  So then this code will say dmstate->result =
pgfdw_get_result(dmstate->conn).  And we can do something similar for
the other call to PQexecParams() in create_cursor().  Then let's also
add something like pgfdw_exec_query() which calls PQsendQuery() and
then pgfdw_get_result, and use that to replace all of the existing
calls to PQexec().

Then all the SQL postgres_fdw executes would be interruptible, not
just the new stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/04/13 3:14, Robert Haas wrote:
> I'm wondering why we are fixing this specific case and not any of the
> other calls to PQexec() or PQexecParams() in postgres_fdw.c.
>
> I mean, many of those instances are cases where the query isn't likely
> to run for very long, but certainly "FETCH %d FROM c%u" is in theory
> just as bad as the new code introduced in 9.6.  In practice, it
> probably isn't, because we're probably only fetching 50 rows at a time
> rather than potentially a lot more, but if we're fixing this code up
> to be interrupt-safe, maybe we should fix it all at the same time.
> Even for the short-running queries like CLOSE and DEALLOCATE, it seems
> possible that there could be a network-related hang which you might
> want to interrupt.

Actually, I was wondering, too, but I didn't propose that because, as 
far as I know, there are no reports from the field.  But I agree with you.

> How about we encapsulate the while (PQisBusy(...)) loop into a new
> function pgfdw_get_result(), which can be called after first calling
> PQsendQueryParams()?  So then this code will say dmstate->result =
> pgfdw_get_result(dmstate->conn).  And we can do something similar for
> the other call to PQexecParams() in create_cursor().  Then let's also
> add something like pgfdw_exec_query() which calls PQsendQuery() and
> then pgfdw_get_result, and use that to replace all of the existing
> calls to PQexec().
>
> Then all the SQL postgres_fdw executes would be interruptible, not
> just the new stuff.

Seems like a good idea.  Will do.

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Michael Paquier
Date:
On Wed, Apr 13, 2016 at 11:24 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/04/13 3:14, Robert Haas wrote:
>>
>> I'm wondering why we are fixing this specific case and not any of the
>> other calls to PQexec() or PQexecParams() in postgres_fdw.c.
>>
>> I mean, many of those instances are cases where the query isn't likely
>> to run for very long, but certainly "FETCH %d FROM c%u" is in theory
>> just as bad as the new code introduced in 9.6.  In practice, it
>> probably isn't, because we're probably only fetching 50 rows at a time
>> rather than potentially a lot more, but if we're fixing this code up
>> to be interrupt-safe, maybe we should fix it all at the same time.
>> Even for the short-running queries like CLOSE and DEALLOCATE, it seems
>> possible that there could be a network-related hang which you might
>> want to interrupt.
>
>
> Actually, I was wondering, too, but I didn't propose that because, as far as
> I know, there are no reports from the field.  But I agree with you.

For something that is HEAD-only that's a great idea to put everything
into the same flag like that.
-- 
Michael



Re: Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Tue, Apr 12, 2016 at 10:24 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>> How about we encapsulate the while (PQisBusy(...)) loop into a new
>> function pgfdw_get_result(), which can be called after first calling
>> PQsendQueryParams()?  So then this code will say dmstate->result =
>> pgfdw_get_result(dmstate->conn).  And we can do something similar for
>> the other call to PQexecParams() in create_cursor().  Then let's also
>> add something like pgfdw_exec_query() which calls PQsendQuery() and
>> then pgfdw_get_result, and use that to replace all of the existing
>> calls to PQexec().
>>
>> Then all the SQL postgres_fdw executes would be interruptible, not
>> just the new stuff.
>
> Seems like a good idea.  Will do.

When will you do this?  We are on a bit of a time budget here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Michael Paquier
Date:
On Wed, Apr 13, 2016 at 9:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Apr 12, 2016 at 10:24 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>> How about we encapsulate the while (PQisBusy(...)) loop into a new
>>> function pgfdw_get_result(), which can be called after first calling
>>> PQsendQueryParams()?  So then this code will say dmstate->result =
>>> pgfdw_get_result(dmstate->conn).  And we can do something similar for
>>> the other call to PQexecParams() in create_cursor().  Then let's also
>>> add something like pgfdw_exec_query() which calls PQsendQuery() and
>>> then pgfdw_get_result, and use that to replace all of the existing
>>> calls to PQexec().
>>>
>>> Then all the SQL postgres_fdw executes would be interruptible, not
>>> just the new stuff.
>>
>> Seems like a good idea.  Will do.
>
> When will you do this?  We are on a bit of a time budget here.

Fujita-san, I can code that tomorrow or in two days if need be. That
should not be an issue from here.
-- 
Michael



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/04/13 21:50, Michael Paquier wrote:
> On Wed, Apr 13, 2016 at 9:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Apr 12, 2016 at 10:24 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>>> How about we encapsulate the while (PQisBusy(...)) loop into a new
>>>> function pgfdw_get_result(), which can be called after first calling
>>>> PQsendQueryParams()?  So then this code will say dmstate->result =
>>>> pgfdw_get_result(dmstate->conn).  And we can do something similar for
>>>> the other call to PQexecParams() in create_cursor().  Then let's also
>>>> add something like pgfdw_exec_query() which calls PQsendQuery() and
>>>> then pgfdw_get_result, and use that to replace all of the existing
>>>> calls to PQexec().
>>>>
>>>> Then all the SQL postgres_fdw executes would be interruptible, not
>>>> just the new stuff.

>>> Seems like a good idea.  Will do.

>> When will you do this?  We are on a bit of a time budget here.

> Fujita-san, I can code that tomorrow or in two days if need be. That
> should not be an issue from here.

I would be happy if you work on that.

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Michael Paquier
Date:
On Thu, Apr 14, 2016 at 10:44 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/04/13 21:50, Michael Paquier wrote:
>> On Wed, Apr 13, 2016 at 9:46 PM, Robert Haas <robertmhaas@gmail.com>
>> wrote:
>>> On Tue, Apr 12, 2016 at 10:24 PM, Etsuro Fujita
>>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>>>>
>>>>> How about we encapsulate the while (PQisBusy(...)) loop into a new
>>>>> function pgfdw_get_result(), which can be called after first calling
>>>>> PQsendQueryParams()?  So then this code will say dmstate->result =
>>>>> pgfdw_get_result(dmstate->conn).  And we can do something similar for
>>>>> the other call to PQexecParams() in create_cursor().  Then let's also
>>>>> add something like pgfdw_exec_query() which calls PQsendQuery() and
>>>>> then pgfdw_get_result, and use that to replace all of the existing
>>>>> calls to PQexec().
>>>>>
>>>>> Then all the SQL postgres_fdw executes would be interruptible, not
>>>>> just the new stuff.
>
> I would be happy if you work on that.

OK, so I have finished with the attached. One thing that I noticed in
the previous patch version is that it completely ignored cases where
multiple PGresult could be returned by server, say when multiple
queries are sent in the same string: PQexec gets always the last one,
so I think that we had better do the same here. I have switched all
the PQexec calls to a custom routine that combines
PQsendQuery/PQgetResult, and PQexecParams is switched to
PQsendQueryParams/PQgetResult. This structure allows all queries run
though postgres_fdw.c to be interruptible.
--
Michael

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/04/15 14:31, Michael Paquier wrote:
> On Thu, Apr 14, 2016 at 10:44 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> On 2016/04/13 21:50, Michael Paquier wrote:
>>> On Wed, Apr 13, 2016 at 9:46 PM, Robert Haas <robertmhaas@gmail.com>
>>> wrote:
>>>> On Tue, Apr 12, 2016 at 10:24 PM, Etsuro Fujita
>>>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>>>>>
>>>>>> How about we encapsulate the while (PQisBusy(...)) loop into a new
>>>>>> function pgfdw_get_result(), which can be called after first calling
>>>>>> PQsendQueryParams()?  So then this code will say dmstate->result =
>>>>>> pgfdw_get_result(dmstate->conn).  And we can do something similar for
>>>>>> the other call to PQexecParams() in create_cursor().  Then let's also
>>>>>> add something like pgfdw_exec_query() which calls PQsendQuery() and
>>>>>> then pgfdw_get_result, and use that to replace all of the existing
>>>>>> calls to PQexec().
>>>>>>
>>>>>> Then all the SQL postgres_fdw executes would be interruptible, not
>>>>>> just the new stuff.

>> I would be happy if you work on that.

> OK, so I have finished with the attached.

Thank you for working on that!

> One thing that I noticed in
> the previous patch version is that it completely ignored cases where
> multiple PGresult could be returned by server, say when multiple
> queries are sent in the same string: PQexec gets always the last one,
> so I think that we had better do the same here.

Seems reasonable.

> so I think that we had better do the same here. I have switched all
> the PQexec calls to a custom routine that combines
> PQsendQuery/PQgetResult, and PQexecParams is switched to
> PQsendQueryParams/PQgetResult. This structure allows all queries run
> though postgres_fdw.c to be interruptible.

How about doing something similar for PQprepare/PQexecPrepared in 
postgresExecForeignInsert, postgresExecForeignUpdate, and 
postgresExecForeignDelete?  Also, how about doing that for PQexec in 
connection.c?

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Michael Paquier
Date:
On Fri, Apr 15, 2016 at 8:25 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> How about doing something similar for PQprepare/PQexecPrepared in
> postgresExecForeignInsert, postgresExecForeignUpdate, and
> postgresExecForeignDelete?

Yes, I hesitated to touch those, but they are good candidates for this
new interface, and actually it has proved not to be complicated to
plug in the new routines in those code paths.

> Also, how about doing that for PQexec in connection.c?

Here I disagree, this is not adapted. All the PQexec calls are part of
callbacks that are triggered on failures, and we rely on such a
callback when issuing PQcancel. do_sql_command runs commands that take
a short amount of time, so I think as well that it is fine as-is with
PQexec.
--
Michael

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Michael Paquier
Date:
On Fri, Apr 15, 2016 at 9:46 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Apr 15, 2016 at 8:25 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> How about doing something similar for PQprepare/PQexecPrepared in
>> postgresExecForeignInsert, postgresExecForeignUpdate, and
>> postgresExecForeignDelete?
>
> Yes, I hesitated to touch those, but they are good candidates for this
> new interface, and actually it has proved not to be complicated to
> plug in the new routines in those code paths.
>
>> Also, how about doing that for PQexec in connection.c?
>
> Here I disagree, this is not adapted. All the PQexec calls are part of
> callbacks that are triggered on failures, and we rely on such a
> callback when issuing PQcancel. do_sql_command runs commands that take
> a short amount of time, so I think as well that it is fine as-is with
> PQexec.

Here is a new version. I just recalled that I forgot a PQclear() call
to clean up results.
--
Michael

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Noah Misch
Date:
On Sat, Apr 16, 2016 at 08:59:40AM +0900, Michael Paquier wrote:
> On Fri, Apr 15, 2016 at 9:46 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > On Fri, Apr 15, 2016 at 8:25 PM, Etsuro Fujita
> > <fujita.etsuro@lab.ntt.co.jp> wrote:
> >> How about doing something similar for PQprepare/PQexecPrepared in
> >> postgresExecForeignInsert, postgresExecForeignUpdate, and
> >> postgresExecForeignDelete?
> >
> > Yes, I hesitated to touch those, but they are good candidates for this
> > new interface, and actually it has proved not to be complicated to
> > plug in the new routines in those code paths.
> >
> >> Also, how about doing that for PQexec in connection.c?
> >
> > Here I disagree, this is not adapted. All the PQexec calls are part of
> > callbacks that are triggered on failures, and we rely on such a
> > callback when issuing PQcancel. do_sql_command runs commands that take
> > a short amount of time, so I think as well that it is fine as-is with
> > PQexec.
> 
> Here is a new version. I just recalled that I forgot a PQclear() call
> to clean up results.

Robert, the deadline to fix this open item expired eleven days ago.  The
thread had been seeing regular activity, but it has now been quiet for three
days.  Do you have an updated plan for fixing this open item?



Re: Optimization for updating foreign tables in Postgres FDW

From
Michael Paquier
Date:
On Tue, Apr 19, 2016 at 12:16 PM, Noah Misch <noah@leadboat.com> wrote:
> On Sat, Apr 16, 2016 at 08:59:40AM +0900, Michael Paquier wrote:
>> On Fri, Apr 15, 2016 at 9:46 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>> > On Fri, Apr 15, 2016 at 8:25 PM, Etsuro Fujita
>> > <fujita.etsuro@lab.ntt.co.jp> wrote:
>> >> How about doing something similar for PQprepare/PQexecPrepared in
>> >> postgresExecForeignInsert, postgresExecForeignUpdate, and
>> >> postgresExecForeignDelete?
>> >
>> > Yes, I hesitated to touch those, but they are good candidates for this
>> > new interface, and actually it has proved not to be complicated to
>> > plug in the new routines in those code paths.
>> >
>> >> Also, how about doing that for PQexec in connection.c?
>> >
>> > Here I disagree, this is not adapted. All the PQexec calls are part of
>> > callbacks that are triggered on failures, and we rely on such a
>> > callback when issuing PQcancel. do_sql_command runs commands that take
>> > a short amount of time, so I think as well that it is fine as-is with
>> > PQexec.
>>
>> Here is a new version. I just recalled that I forgot a PQclear() call
>> to clean up results.
>
> Robert, the deadline to fix this open item expired eleven days ago.  The
> thread had been seeing regular activity, but it has now been quiet for three
> days.  Do you have an updated plan for fixing this open item?

Note for Robert: pgfdw_get_result copycats PQexec by discarding all
PQresult received except the last one. I think that's fine for the
purposes of postgres_fdw, but perhaps you have a different opinion on
the matter.
--
Michael



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/04/19 12:26, Michael Paquier wrote:
> On Tue, Apr 19, 2016 at 12:16 PM, Noah Misch <noah@leadboat.com> wrote:
>> On Sat, Apr 16, 2016 at 08:59:40AM +0900, Michael Paquier wrote:

>>> Here is a new version. I just recalled that I forgot a PQclear() call
>>> to clean up results.

Thanks for updating the patch!

>> Robert, the deadline to fix this open item expired eleven days ago.  The
>> thread had been seeing regular activity, but it has now been quiet for three
>> days.  Do you have an updated plan for fixing this open item?

> Note for Robert: pgfdw_get_result copycats PQexec by discarding all
> PQresult received except the last one. I think that's fine for the
> purposes of postgres_fdw, but perhaps you have a different opinion on
> the matter.

That seemed reasonable to me, but sorry, on second thought, I'm not sure 
that's still a good idea.  One reason is (1) I think it's better for the 
in-postgres_fdw.c functions using pgfdw_get_result to verify that there 
are no more results, in itself.  I think that would improve the 
robustness of those functions.  Another reason is I don't think 
pgfdw_report_error, which is used in pgfdw_get_result, works well for 
cases where the query contains multiple SQL commands.  So, +1 for the 
idea of simply encapsulating the while (PQisBusy(...)) loop into a new 
function pgfdw_get_result().

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/04/19 12:45, Etsuro Fujita wrote:
> On 2016/04/19 12:26, Michael Paquier wrote:
>> Note for Robert: pgfdw_get_result copycats PQexec by discarding all
>> PQresult received except the last one. I think that's fine for the
>> purposes of postgres_fdw, but perhaps you have a different opinion on
>> the matter.

> That seemed reasonable to me, but sorry, on second thought, I'm not sure
> that's still a good idea.  One reason is (1) I think it's better for the
> in-postgres_fdw.c functions using pgfdw_get_result to verify that there
> are no more results, in itself.  I think that would improve the
> robustness of those functions.  Another reason is I don't think
> pgfdw_report_error, which is used in pgfdw_get_result, works well for
> cases where the query contains multiple SQL commands.  So, +1 for the
> idea of simply encapsulating the while (PQisBusy(...)) loop into a new
> function pgfdw_get_result().

Here is a proposed patch for that.

Other changes:

       * We don't use a PG_TRY block here, so be careful not to throw error
       * without releasing the PGresult.
       */
-    res = PQexecPrepared(fmstate->conn,
-                         fmstate->p_name,
-                         fmstate->p_nums,
-                         p_values,
-                         NULL,
-                         NULL,
-                         0);
+    if (!PQsendQueryPrepared(fmstate->conn,
+                             fmstate->p_name,
+                             fmstate->p_nums,
+                             p_values,
+                             NULL,
+                             NULL,
+                             0))
+        pgfdw_report_error(ERROR, NULL, fmstate->conn, true, fmstate->query);

The comment "We don't use a PG_TRY block here ..." seems to be wrongly
placed, so I moved that comment.  Also, I think it'd be better to call
pgfdw_report_error() with the clear argument set to false, not true,
since we don't need to clear the PGresult.  Same for
postgresExecForeignUpdate, postgresExecForeignUpdate, and
prepare_foreign_modify.

What do you think about that?

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Michael Paquier
Date:
On Tue, Apr 19, 2016 at 1:14 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> The comment "We don't use a PG_TRY block here ..." seems to be wrongly
> placed, so I moved that comment.  Also, I think it'd be better to call
> pgfdw_report_error() with the clear argument set to false, not true, since
> we don't need to clear the PGresult.  Same for postgresExecForeignUpdate,
> postgresExecForeignUpdate, and prepare_foreign_modify.

No objection to moving those comment blocks where pgfdw_get_result is called.

> What do you think about that?

+   /* Wait for the result */
+   res = pgfdw_get_result(conn, query);
+   if (res == NULL)
+       pgfdw_report_error(ERROR, NULL, conn, false, query);
+   last_res = res;
+
+   /*
+    * Verify that there are no more results
+    *
+    * We don't use a PG_TRY block here, so be careful not to throw error
+    * without releasing the PGresult.
+    */
+   res = pgfdw_get_result(conn, query);
+   if (res != NULL)
+   {
+       PQclear(last_res);
+       pgfdw_report_error(ERROR, res, conn, true, query);
+   }

But huge objection to that because this fragilizes the current logic
postgres_fdw is based on: PQexec returns the last result to caller,
I'd rather not break that logic for 9.6 stability's sake.

A even better proof of that is the following, which just emulates what
your version of pgfdw_get_result is doing when consuming the results.
+   /* Verify that there are no more results */
+   res = pgfdw_get_result(fmstate->conn, fmstate->query);
+   if (res != NULL)
+       pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
This could even lead to incorrect errors in the future if multiple
queries are combined with those DMLs for a reason or another.
-- 
Michael



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/04/19 13:59, Michael Paquier wrote:
> On Tue, Apr 19, 2016 at 1:14 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> What do you think about that?

> +   /* Wait for the result */
> +   res = pgfdw_get_result(conn, query);
> +   if (res == NULL)
> +       pgfdw_report_error(ERROR, NULL, conn, false, query);
> +   last_res = res;
> +
> +   /*
> +    * Verify that there are no more results
> +    *
> +    * We don't use a PG_TRY block here, so be careful not to throw error
> +    * without releasing the PGresult.
> +    */
> +   res = pgfdw_get_result(conn, query);
> +   if (res != NULL)
> +   {
> +       PQclear(last_res);
> +       pgfdw_report_error(ERROR, res, conn, true, query);
> +   }
>
> But huge objection to that because this fragilizes the current logic
> postgres_fdw is based on: PQexec returns the last result to caller,
> I'd rather not break that logic for 9.6 stability's sake.

IIUC, I think each query submitted by PQexec in postgres_fdw.c contains 
just a single command.  Maybe I'm missing something, though.

> A even better proof of that is the following, which just emulates what
> your version of pgfdw_get_result is doing when consuming the results.
> +   /* Verify that there are no more results */
> +   res = pgfdw_get_result(fmstate->conn, fmstate->query);
> +   if (res != NULL)
> +       pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
> This could even lead to incorrect errors in the future if multiple
> queries are combined with those DMLs for a reason or another.

I'd like to leave such enhancements for future work...

Thanks for the comment!

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/04/19 14:25, Etsuro Fujita wrote:
> On 2016/04/19 13:59, Michael Paquier wrote:
>> But huge objection to that because this fragilizes the current logic
>> postgres_fdw is based on: PQexec returns the last result to caller,
>> I'd rather not break that logic for 9.6 stability's sake.

> IIUC, I think each query submitted by PQexec in postgres_fdw.c contains
> just a single command.  Maybe I'm missing something, though.

>> A even better proof of that is the following, which just emulates what
>> your version of pgfdw_get_result is doing when consuming the results.
>> +   /* Verify that there are no more results */
>> +   res = pgfdw_get_result(fmstate->conn, fmstate->query);
>> +   if (res != NULL)
>> +       pgfdw_report_error(ERROR, res, fmstate->conn, true,
>> fmstate->query);
>> This could even lead to incorrect errors in the future if multiple
>> queries are combined with those DMLs for a reason or another.

> I'd like to leave such enhancements for future work...

On reflection, I'd like to agree with Michael on that point.  As
mentioned by him, we should not lose the future extendability.  And I
don't think his version of pgfdw_get_result would damage the robustness
of in-postgres_fdw.c functions using that, which was my concern.  Sorry
for the noise.

Attached is an updated version of the patch, which modified Michael's
version of the patch, as I proposed in [1] (see "Other changes:").  I
modified comments for pgfdw_get_result/pgfdw_exec_query also, mainly
because words like "non-blocking mode" there seems confusing (note that
we have PQsetnonbloking).

Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/5715B08A.2030404@lab.ntt.co.jp

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Michael Paquier
Date:
On Thu, Apr 21, 2016 at 5:22 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> Attached is an updated version of the patch, which modified Michael's
> version of the patch, as I proposed in [1] (see "Other changes:").  I
> modified comments for pgfdw_get_result/pgfdw_exec_query also, mainly because
> words like "non-blocking mode" there seems confusing (note that we have
> PQsetnonbloking).

OK, so that is what I sent except that the comments mentioning PG_TRY
are moved to their correct places. That's fine for me. Thanks for
gathering everything in a single patch and correcting it.
-- 
Michael



Re: Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Thu, Apr 21, 2016 at 8:44 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Apr 21, 2016 at 5:22 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> Attached is an updated version of the patch, which modified Michael's
>> version of the patch, as I proposed in [1] (see "Other changes:").  I
>> modified comments for pgfdw_get_result/pgfdw_exec_query also, mainly because
>> words like "non-blocking mode" there seems confusing (note that we have
>> PQsetnonbloking).
>
> OK, so that is what I sent except that the comments mentioning PG_TRY
> are moved to their correct places. That's fine for me. Thanks for
> gathering everything in a single patch and correcting it.

I have committed this patch.  Thanks for working on this.  Sorry for the delay.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Michael Paquier
Date:
On Thu, Apr 21, 2016 at 11:53 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Apr 21, 2016 at 8:44 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Apr 21, 2016 at 5:22 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>> Attached is an updated version of the patch, which modified Michael's
>>> version of the patch, as I proposed in [1] (see "Other changes:").  I
>>> modified comments for pgfdw_get_result/pgfdw_exec_query also, mainly because
>>> words like "non-blocking mode" there seems confusing (note that we have
>>> PQsetnonbloking).
>>
>> OK, so that is what I sent except that the comments mentioning PG_TRY
>> are moved to their correct places. That's fine for me. Thanks for
>> gathering everything in a single patch and correcting it.
>
> I have committed this patch.  Thanks for working on this.  Sorry for the delay.

Thanks for the push. open_item--;
-- 
Michael



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/04/22 9:22, Michael Paquier wrote:
> On Thu, Apr 21, 2016 at 11:53 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I have committed this patch.  Thanks for working on this.  Sorry for the delay.

> Thanks for the push. open_item--;

Thank you, Robert and Michael.

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
Hi,

While re-reviewing the fix, I noticed that since PQcancel we added to 
pgfdw_xact_callback to cancel a DML pushdown query isn't followed by a 
ROLLBACK, the connection to the remote server will be discarded at the 
end of the while loop in that function, which will cause a FATAL error 
of "connection to client lost".  Probably, that was proposed by me in 
the first version of the patch, but I don't think that's a good idea. 
Shouldn't we execute ROLLBACK after that PQcancel?

Another thing I noticed is, ISTM that we miss the case where DML 
pushdown queries are performed in subtransactions.  I think cancellation 
logic would also need to be added to pgfdw_subxact_callback.

Comments are welcome!

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/04/26 21:45, Etsuro Fujita wrote:
> While re-reviewing the fix, I noticed that since PQcancel we added to
> pgfdw_xact_callback to cancel a DML pushdown query isn't followed by a
> ROLLBACK, the connection to the remote server will be discarded at the
> end of the while loop in that function, which will cause a FATAL error
> of "connection to client lost".  Probably, that was proposed by me in
> the first version of the patch, but I don't think that's a good idea.
> Shouldn't we execute ROLLBACK after that PQcancel?
>
> Another thing I noticed is, ISTM that we miss the case where DML
> pushdown queries are performed in subtransactions.  I think cancellation
> logic would also need to be added to pgfdw_subxact_callback.

Attached is a patch for that.

Best regards,
Etsuro Fujita

Attachment

Re: Optimization for updating foreign tables in Postgres FDW

From
Michael Paquier
Date:
On Wed, Apr 27, 2016 at 12:16 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2016/04/26 21:45, Etsuro Fujita wrote:
>> While re-reviewing the fix, I noticed that since PQcancel we added to
>> pgfdw_xact_callback to cancel a DML pushdown query isn't followed by a
>> ROLLBACK, the connection to the remote server will be discarded at the
>> end of the while loop in that function, which will cause a FATAL error
>> of "connection to client lost".  Probably, that was proposed by me in
>> the first version of the patch, but I don't think that's a good idea.
>> Shouldn't we execute ROLLBACK after that PQcancel?
>>
>> Another thing I noticed is, ISTM that we miss the case where DML
>> pushdown queries are performed in subtransactions.  I think cancellation
>> logic would also need to be added to pgfdw_subxact_callback.
>
>
> Attached is a patch for that.

I have spent some time looking at that...

And yeah, losing the connection because of that is a little bit
annoying if there are ways to make things clean, and as a START
TRANSACTION is always sent for such queries it seems really better to
issue a ROLLBACK in any case. Actually, by using PQcancel there is no
way to be sure if the cancel will be effective or not. So it could be
possible that the command is still able to complete correctly, or it
could be able to cancel correctly and it would return an ERROR
earlier. In any case, doing the ROLLBACK unconditionally seems adapted
to me because we had better clean up the remote state in both cases.
-- 
Michael



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/04/28 13:45, Michael Paquier wrote:
> On Wed, Apr 27, 2016 at 12:16 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> On 2016/04/26 21:45, Etsuro Fujita wrote:
>>> While re-reviewing the fix, I noticed that since PQcancel we added to
>>> pgfdw_xact_callback to cancel a DML pushdown query isn't followed by a
>>> ROLLBACK, the connection to the remote server will be discarded at the
>>> end of the while loop in that function, which will cause a FATAL error
>>> of "connection to client lost".  Probably, that was proposed by me in
>>> the first version of the patch, but I don't think that's a good idea.
>>> Shouldn't we execute ROLLBACK after that PQcancel?
>>>
>>> Another thing I noticed is, ISTM that we miss the case where DML
>>> pushdown queries are performed in subtransactions.  I think cancellation
>>> logic would also need to be added to pgfdw_subxact_callback.

>> Attached is a patch for that.

> I have spent some time looking at that...
>
> And yeah, losing the connection because of that is a little bit
> annoying if there are ways to make things clean, and as a START
> TRANSACTION is always sent for such queries it seems really better to
> issue a ROLLBACK in any case. Actually, by using PQcancel there is no
> way to be sure if the cancel will be effective or not. So it could be
> possible that the command is still able to complete correctly, or it
> could be able to cancel correctly and it would return an ERROR
> earlier. In any case, doing the ROLLBACK unconditionally seems adapted
> to me because we had better clean up the remote state in both cases.

Thanks for the review!

I'll add this to the next CF.  I think this should be addressed in 
advance of the release of 9.6, though.

Best regards,
Etsuro Fujita





Re: Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Wed, May 11, 2016 at 3:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> Thanks for the review!
>
> I'll add this to the next CF.  I think this should be addressed in advance
> of the release of 9.6, though.

I agree.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimization for updating foreign tables in Postgres FDW

From
Etsuro Fujita
Date:
On 2016/05/17 0:25, Robert Haas wrote:
> On Wed, May 11, 2016 at 3:20 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> Thanks for the review!
>>
>> I'll add this to the next CF.  I think this should be addressed in advance
>> of the release of 9.6, though.

> I agree.  Committed.

Thanks!

Best regards,
Etsuro Fujita





Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

From
Robert Haas
Date:
On Thu, Apr 21, 2016 at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Apr 21, 2016 at 8:44 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Apr 21, 2016 at 5:22 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>> Attached is an updated version of the patch, which modified Michael's
>>> version of the patch, as I proposed in [1] (see "Other changes:").  I
>>> modified comments for pgfdw_get_result/pgfdw_exec_query also, mainly because
>>> words like "non-blocking mode" there seems confusing (note that we have
>>> PQsetnonbloking).
>>
>> OK, so that is what I sent except that the comments mentioning PG_TRY
>> are moved to their correct places. That's fine for me. Thanks for
>> gathering everything in a single patch and correcting it.
>
> I have committed this patch.  Thanks for working on this.  Sorry for the delay.

This 9.6-era patch, as it turns out, has a problem, which is that we
now respond to an interrupt by sending a cancel request and a
NON-interruptible ABORT TRANSACTION command to the remote side.  If
the reason that the user is trying to interrupt is that the network
connection has been cut, they interrupt the original query only to get
stuck in a non-interruptible wait for ABORT TRANSACTION.  That is
non-optimal.

It is not exactly clear to me how to fix this.  Could we get by with
just slamming the remote connection shut, instead of sending an
explicit ABORT TRANSACTION?  The remote side ought to treat a
disconnect as equivalent to an ABORT anyway, I think, but maybe our
local state would get confused.  (I have not checked.)

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Optimization for updating foreign tables in PostgresFDW

From
Kyotaro HORIGUCHI
Date:
At Thu, 13 Apr 2017 13:04:12 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoaxnNmuONgP=bXJojrgbnMPTi6Ms8OSwZBC2YQ2ueUiSg@mail.gmail.com>
> On Thu, Apr 21, 2016 at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Thu, Apr 21, 2016 at 8:44 AM, Michael Paquier
> > <michael.paquier@gmail.com> wrote:
> >> On Thu, Apr 21, 2016 at 5:22 PM, Etsuro Fujita
> >> <fujita.etsuro@lab.ntt.co.jp> wrote:
> >>> Attached is an updated version of the patch, which modified Michael's
> >>> version of the patch, as I proposed in [1] (see "Other changes:").  I
> >>> modified comments for pgfdw_get_result/pgfdw_exec_query also, mainly because
> >>> words like "non-blocking mode" there seems confusing (note that we have
> >>> PQsetnonbloking).
> >>
> >> OK, so that is what I sent except that the comments mentioning PG_TRY
> >> are moved to their correct places. That's fine for me. Thanks for
> >> gathering everything in a single patch and correcting it.
> >
> > I have committed this patch.  Thanks for working on this.  Sorry for the delay.
> 
> This 9.6-era patch, as it turns out, has a problem, which is that we
> now respond to an interrupt by sending a cancel request and a
> NON-interruptible ABORT TRANSACTION command to the remote side.  If
> the reason that the user is trying to interrupt is that the network
> connection has been cut, they interrupt the original query only to get
> stuck in a non-interruptible wait for ABORT TRANSACTION.  That is
> non-optimal.

Agreed.

> It is not exactly clear to me how to fix this.  Could we get by with
> just slamming the remote connection shut, instead of sending an
> explicit ABORT TRANSACTION?  The remote side ought to treat a
> disconnect as equivalent to an ABORT anyway, I think, but maybe our
> local state would get confused.  (I have not checked.)
> 
> Thoughts?

Perhaps we will get stuck at query cancellation before ABORT
TRANSACTION in the case. A connection will be shut down when
anything wrong (PQstatus(conn) != CONNECTION_OK and so) on
aborting local transactoin . So I don't think fdw gets confused
or sholdn't be confused by shutting down there.

The most significant issue I can see is that the same thing
happens in the case of graceful ABORT TRANSACTION. It could be a
performance issue.

We could set timeout here but maybe we can just slamming the
connection down instead of sending a query cancellation. It is
caused only by timeout or interrupts so I suppose it is not a
problem *with a few connections*.


Things are a bit diffent with hundreds of connections. The
penalty of reconnection would be very high in the case.

If we are not willing to pay such high penalty, maybe we are to
manage busy-idle time of each connection and trying graceful
abort if it is short enough, maybe having a shoft timeout.

Furthermore, if most or all of the hundreds of connections get
stuck, such timeout will accumulate up like a mountain...


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

From
Ashutosh Bapat
Date:
On Mon, Apr 17, 2017 at 1:53 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Thu, 13 Apr 2017 13:04:12 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoaxnNmuONgP=bXJojrgbnMPTi6Ms8OSwZBC2YQ2ueUiSg@mail.gmail.com>
>> On Thu, Apr 21, 2016 at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> > On Thu, Apr 21, 2016 at 8:44 AM, Michael Paquier
>> > <michael.paquier@gmail.com> wrote:
>> >> On Thu, Apr 21, 2016 at 5:22 PM, Etsuro Fujita
>> >> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> >>> Attached is an updated version of the patch, which modified Michael's
>> >>> version of the patch, as I proposed in [1] (see "Other changes:").  I
>> >>> modified comments for pgfdw_get_result/pgfdw_exec_query also, mainly because
>> >>> words like "non-blocking mode" there seems confusing (note that we have
>> >>> PQsetnonbloking).
>> >>
>> >> OK, so that is what I sent except that the comments mentioning PG_TRY
>> >> are moved to their correct places. That's fine for me. Thanks for
>> >> gathering everything in a single patch and correcting it.
>> >
>> > I have committed this patch.  Thanks for working on this.  Sorry for the delay.
>>
>> This 9.6-era patch, as it turns out, has a problem, which is that we
>> now respond to an interrupt by sending a cancel request and a
>> NON-interruptible ABORT TRANSACTION command to the remote side.  If
>> the reason that the user is trying to interrupt is that the network
>> connection has been cut, they interrupt the original query only to get
>> stuck in a non-interruptible wait for ABORT TRANSACTION.  That is
>> non-optimal.
>
> Agreed.
>
>> It is not exactly clear to me how to fix this.  Could we get by with
>> just slamming the remote connection shut, instead of sending an
>> explicit ABORT TRANSACTION?  The remote side ought to treat a
>> disconnect as equivalent to an ABORT anyway, I think, but maybe our
>> local state would get confused.  (I have not checked.)
>>
>> Thoughts?
>
> Perhaps we will get stuck at query cancellation before ABORT
> TRANSACTION in the case. A connection will be shut down when
> anything wrong (PQstatus(conn) != CONNECTION_OK and so) on
> aborting local transactoin . So I don't think fdw gets confused
> or sholdn't be confused by shutting down there.
>
> The most significant issue I can see is that the same thing
> happens in the case of graceful ABORT TRANSACTION. It could be a
> performance issue.
>
> We could set timeout here but maybe we can just slamming the
> connection down instead of sending a query cancellation. It is
> caused only by timeout or interrupts so I suppose it is not a
> problem *with a few connections*.
>
>
> Things are a bit diffent with hundreds of connections. The
> penalty of reconnection would be very high in the case.
>
> If we are not willing to pay such high penalty, maybe we are to
> manage busy-idle time of each connection and trying graceful
> abort if it is short enough, maybe having a shoft timeout.
>
> Furthermore, if most or all of the hundreds of connections get
> stuck, such timeout will accumulate up like a mountain...

Even when the transaction is aborted because a user cancels a query,
we do want to preserve the connection, if possible, to avoid
reconnection. If the request to cancel the query itself fails, we
should certainly drop the connection. Here's the patch to do that.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Optimization for updating foreign tables in PostgresFDW

From
Kyotaro HORIGUCHI
Date:
At Mon, 17 Apr 2017 17:50:58 +0530, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote in
<CAFjFpRdcWw4h0a-zrL-EiaekkPj8O0GR2M1FwZ1useSRfRm3-g@mail.gmail.com>
> On Mon, Apr 17, 2017 at 1:53 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > At Thu, 13 Apr 2017 13:04:12 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoaxnNmuONgP=bXJojrgbnMPTi6Ms8OSwZBC2YQ2ueUiSg@mail.gmail.com>
> >> On Thu, Apr 21, 2016 at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> > On Thu, Apr 21, 2016 at 8:44 AM, Michael Paquier
> >> > <michael.paquier@gmail.com> wrote:
> >> >> On Thu, Apr 21, 2016 at 5:22 PM, Etsuro Fujita
> >> >> <fujita.etsuro@lab.ntt.co.jp> wrote:
> >> >>> Attached is an updated version of the patch, which modified Michael's
> >> >>> version of the patch, as I proposed in [1] (see "Other changes:").  I
> >> >>> modified comments for pgfdw_get_result/pgfdw_exec_query also, mainly because
> >> >>> words like "non-blocking mode" there seems confusing (note that we have
> >> >>> PQsetnonbloking).
> >> >>
> >> >> OK, so that is what I sent except that the comments mentioning PG_TRY
> >> >> are moved to their correct places. That's fine for me. Thanks for
> >> >> gathering everything in a single patch and correcting it.
> >> >
> >> > I have committed this patch.  Thanks for working on this.  Sorry for the delay.
> >>
> >> This 9.6-era patch, as it turns out, has a problem, which is that we
> >> now respond to an interrupt by sending a cancel request and a
> >> NON-interruptible ABORT TRANSACTION command to the remote side.  If
> >> the reason that the user is trying to interrupt is that the network
> >> connection has been cut, they interrupt the original query only to get
> >> stuck in a non-interruptible wait for ABORT TRANSACTION.  That is
> >> non-optimal.
> >
> > Agreed.
> >
> >> It is not exactly clear to me how to fix this.  Could we get by with
> >> just slamming the remote connection shut, instead of sending an
> >> explicit ABORT TRANSACTION?  The remote side ought to treat a
> >> disconnect as equivalent to an ABORT anyway, I think, but maybe our
> >> local state would get confused.  (I have not checked.)
> >>
> >> Thoughts?
> >
> > Perhaps we will get stuck at query cancellation before ABORT
> > TRANSACTION in the case. A connection will be shut down when
> > anything wrong (PQstatus(conn) != CONNECTION_OK and so) on
> > aborting local transactoin . So I don't think fdw gets confused
> > or sholdn't be confused by shutting down there.
> >
> > The most significant issue I can see is that the same thing
> > happens in the case of graceful ABORT TRANSACTION. It could be a
> > performance issue.
> >
> > We could set timeout here but maybe we can just slamming the
> > connection down instead of sending a query cancellation. It is
> > caused only by timeout or interrupts so I suppose it is not a
> > problem *with a few connections*.
> >
> >
> > Things are a bit diffent with hundreds of connections. The
> > penalty of reconnection would be very high in the case.
> >
> > If we are not willing to pay such high penalty, maybe we are to
> > manage busy-idle time of each connection and trying graceful
> > abort if it is short enough, maybe having a shoft timeout.
> >
> > Furthermore, if most or all of the hundreds of connections get
> > stuck, such timeout will accumulate up like a mountain...
> 
> Even when the transaction is aborted because a user cancels a query,
> we do want to preserve the connection, if possible, to avoid

Yes.

> reconnection. If the request to cancel the query itself fails, we
> should certainly drop the connection. Here's the patch to do that.

A problem I think on this would be that we still try to make
another connection for canceling and it would stall for several
minutes per connection on a packet stall, which should be done in
a second on ordinary circumstances. Perhaps we might want here is
async canceling with timeout.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

From
Ashutosh Bapat
Date:
On Tue, Apr 18, 2017 at 8:12 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Mon, 17 Apr 2017 17:50:58 +0530, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote in
<CAFjFpRdcWw4h0a-zrL-EiaekkPj8O0GR2M1FwZ1useSRfRm3-g@mail.gmail.com>
>> On Mon, Apr 17, 2017 at 1:53 PM, Kyotaro HORIGUCHI
>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> > At Thu, 13 Apr 2017 13:04:12 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoaxnNmuONgP=bXJojrgbnMPTi6Ms8OSwZBC2YQ2ueUiSg@mail.gmail.com>
>> >> On Thu, Apr 21, 2016 at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> >> > On Thu, Apr 21, 2016 at 8:44 AM, Michael Paquier
>> >> > <michael.paquier@gmail.com> wrote:
>> >> >> On Thu, Apr 21, 2016 at 5:22 PM, Etsuro Fujita
>> >> >> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> >> >>> Attached is an updated version of the patch, which modified Michael's
>> >> >>> version of the patch, as I proposed in [1] (see "Other changes:").  I
>> >> >>> modified comments for pgfdw_get_result/pgfdw_exec_query also, mainly because
>> >> >>> words like "non-blocking mode" there seems confusing (note that we have
>> >> >>> PQsetnonbloking).
>> >> >>
>> >> >> OK, so that is what I sent except that the comments mentioning PG_TRY
>> >> >> are moved to their correct places. That's fine for me. Thanks for
>> >> >> gathering everything in a single patch and correcting it.
>> >> >
>> >> > I have committed this patch.  Thanks for working on this.  Sorry for the delay.
>> >>
>> >> This 9.6-era patch, as it turns out, has a problem, which is that we
>> >> now respond to an interrupt by sending a cancel request and a
>> >> NON-interruptible ABORT TRANSACTION command to the remote side.  If
>> >> the reason that the user is trying to interrupt is that the network
>> >> connection has been cut, they interrupt the original query only to get
>> >> stuck in a non-interruptible wait for ABORT TRANSACTION.  That is
>> >> non-optimal.
>> >
>> > Agreed.
>> >
>> >> It is not exactly clear to me how to fix this.  Could we get by with
>> >> just slamming the remote connection shut, instead of sending an
>> >> explicit ABORT TRANSACTION?  The remote side ought to treat a
>> >> disconnect as equivalent to an ABORT anyway, I think, but maybe our
>> >> local state would get confused.  (I have not checked.)
>> >>
>> >> Thoughts?
>> >
>> > Perhaps we will get stuck at query cancellation before ABORT
>> > TRANSACTION in the case. A connection will be shut down when
>> > anything wrong (PQstatus(conn) != CONNECTION_OK and so) on
>> > aborting local transactoin . So I don't think fdw gets confused
>> > or sholdn't be confused by shutting down there.
>> >
>> > The most significant issue I can see is that the same thing
>> > happens in the case of graceful ABORT TRANSACTION. It could be a
>> > performance issue.
>> >
>> > We could set timeout here but maybe we can just slamming the
>> > connection down instead of sending a query cancellation. It is
>> > caused only by timeout or interrupts so I suppose it is not a
>> > problem *with a few connections*.
>> >
>> >
>> > Things are a bit diffent with hundreds of connections. The
>> > penalty of reconnection would be very high in the case.
>> >
>> > If we are not willing to pay such high penalty, maybe we are to
>> > manage busy-idle time of each connection and trying graceful
>> > abort if it is short enough, maybe having a shoft timeout.
>> >
>> > Furthermore, if most or all of the hundreds of connections get
>> > stuck, such timeout will accumulate up like a mountain...
>>
>> Even when the transaction is aborted because a user cancels a query,
>> we do want to preserve the connection, if possible, to avoid
>
> Yes.
>
>> reconnection. If the request to cancel the query itself fails, we
>> should certainly drop the connection. Here's the patch to do that.
>
> A problem I think on this would be that we still try to make
> another connection for canceling and it would stall for several
> minutes per connection on a packet stall, which should be done in
> a second on ordinary circumstances. Perhaps we might want here is
> async canceling with timeout.

I am not sure what do you mean "make another connection for
cancelling.". Do you mean to say that PQcancel would make another
connection?

The patch proposed simply improves the condition when PQcancel has
returned a failure. Right now we ignore that failure and try to ABORT
the transaction, which is most probably going to get stalled or fail
to be dispatched. With the patch such a connection will be reset.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Optimization for updating foreign tables in PostgresFDW

From
Kyotaro HORIGUCHI
Date:
At Tue, 18 Apr 2017 09:12:07 +0530, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote in
<CAFjFpRcRjohUZaCNon1zkP7c6fo2qd5hnfHCT6t_39SGtS_oKQ@mail.gmail.com>
> On Tue, Apr 18, 2017 at 8:12 AM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > At Mon, 17 Apr 2017 17:50:58 +0530, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote in
<CAFjFpRdcWw4h0a-zrL-EiaekkPj8O0GR2M1FwZ1useSRfRm3-g@mail.gmail.com>
> >> On Mon, Apr 17, 2017 at 1:53 PM, Kyotaro HORIGUCHI
> >> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >> > At Thu, 13 Apr 2017 13:04:12 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoaxnNmuONgP=bXJojrgbnMPTi6Ms8OSwZBC2YQ2ueUiSg@mail.gmail.com>
> >> >> On Thu, Apr 21, 2016 at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> >> > On Thu, Apr 21, 2016 at 8:44 AM, Michael Paquier
> >> >> > <michael.paquier@gmail.com> wrote:
> >> >> >> On Thu, Apr 21, 2016 at 5:22 PM, Etsuro Fujita
> >> >> >> <fujita.etsuro@lab.ntt.co.jp> wrote:
> >> >> >>> Attached is an updated version of the patch, which modified Michael's
> >> >> >>> version of the patch, as I proposed in [1] (see "Other changes:").  I
> >> >> >>> modified comments for pgfdw_get_result/pgfdw_exec_query also, mainly because
> >> >> >>> words like "non-blocking mode" there seems confusing (note that we have
> >> >> >>> PQsetnonbloking).
> >> >> >>
> >> >> >> OK, so that is what I sent except that the comments mentioning PG_TRY
> >> >> >> are moved to their correct places. That's fine for me. Thanks for
> >> >> >> gathering everything in a single patch and correcting it.
> >> >> >
> >> >> > I have committed this patch.  Thanks for working on this.  Sorry for the delay.
> >> >>
> >> >> This 9.6-era patch, as it turns out, has a problem, which is that we
> >> >> now respond to an interrupt by sending a cancel request and a
> >> >> NON-interruptible ABORT TRANSACTION command to the remote side.  If
> >> >> the reason that the user is trying to interrupt is that the network
> >> >> connection has been cut, they interrupt the original query only to get
> >> >> stuck in a non-interruptible wait for ABORT TRANSACTION.  That is
> >> >> non-optimal.
> >> >
> >> > Agreed.
> >> >
> >> >> It is not exactly clear to me how to fix this.  Could we get by with
> >> >> just slamming the remote connection shut, instead of sending an
> >> >> explicit ABORT TRANSACTION?  The remote side ought to treat a
> >> >> disconnect as equivalent to an ABORT anyway, I think, but maybe our
> >> >> local state would get confused.  (I have not checked.)
> >> >>
> >> >> Thoughts?
> >> >
> >> > Perhaps we will get stuck at query cancellation before ABORT
> >> > TRANSACTION in the case. A connection will be shut down when
> >> > anything wrong (PQstatus(conn) != CONNECTION_OK and so) on
> >> > aborting local transactoin . So I don't think fdw gets confused
> >> > or sholdn't be confused by shutting down there.
> >> >
> >> > The most significant issue I can see is that the same thing
> >> > happens in the case of graceful ABORT TRANSACTION. It could be a
> >> > performance issue.
> >> >
> >> > We could set timeout here but maybe we can just slamming the
> >> > connection down instead of sending a query cancellation. It is
> >> > caused only by timeout or interrupts so I suppose it is not a
> >> > problem *with a few connections*.
> >> >
> >> >
> >> > Things are a bit diffent with hundreds of connections. The
> >> > penalty of reconnection would be very high in the case.
> >> >
> >> > If we are not willing to pay such high penalty, maybe we are to
> >> > manage busy-idle time of each connection and trying graceful
> >> > abort if it is short enough, maybe having a shoft timeout.
> >> >
> >> > Furthermore, if most or all of the hundreds of connections get
> >> > stuck, such timeout will accumulate up like a mountain...
> >>
> >> Even when the transaction is aborted because a user cancels a query,
> >> we do want to preserve the connection, if possible, to avoid
> >
> > Yes.
> >
> >> reconnection. If the request to cancel the query itself fails, we
> >> should certainly drop the connection. Here's the patch to do that.
> >
> > A problem I think on this would be that we still try to make
> > another connection for canceling and it would stall for several
> > minutes per connection on a packet stall, which should be done in
> > a second on ordinary circumstances. Perhaps we might want here is
> > async canceling with timeout.
> 
> I am not sure what do you mean "make another connection for
> cancelling.". Do you mean to say that PQcancel would make another
> connection?

Yes. It will take about 3 minutes on standard settings when no
ACK returned. I thought that this discussion is based on such
situation.

> The patch proposed simply improves the condition when PQcancel has
> returned a failure. Right now we ignore that failure and try to ABORT
> the transaction, which is most probably going to get stalled or fail
> to be dispatched. With the patch such a connection will be reset.

Ah, I understand that. It is surely an improvement since it
avoids useless ABORT TRANSACTION that is known to stall.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

From
Ashutosh Bapat
Date:
The subject of this thread is not directly related to this discussion
and we have a new thread [1] for relevant discussion. So, let's
discuss this further on that thread.


[1] https://www.postgresql.org/message-id/CAF1DzPU8Kx%2BfMXEbFoP289xtm3bz3t%2BZfxhmKavr98Bh-C0TqQ%40mail.gmail.com

On Tue, Apr 18, 2017 at 9:30 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Tue, 18 Apr 2017 09:12:07 +0530, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote in
<CAFjFpRcRjohUZaCNon1zkP7c6fo2qd5hnfHCT6t_39SGtS_oKQ@mail.gmail.com>
>> On Tue, Apr 18, 2017 at 8:12 AM, Kyotaro HORIGUCHI
>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> > At Mon, 17 Apr 2017 17:50:58 +0530, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote in
<CAFjFpRdcWw4h0a-zrL-EiaekkPj8O0GR2M1FwZ1useSRfRm3-g@mail.gmail.com>
>> >> On Mon, Apr 17, 2017 at 1:53 PM, Kyotaro HORIGUCHI
>> >> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> >> > At Thu, 13 Apr 2017 13:04:12 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoaxnNmuONgP=bXJojrgbnMPTi6Ms8OSwZBC2YQ2ueUiSg@mail.gmail.com>
>> >> >> On Thu, Apr 21, 2016 at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> >> >> > On Thu, Apr 21, 2016 at 8:44 AM, Michael Paquier
>> >> >> > <michael.paquier@gmail.com> wrote:
>> >> >> >> On Thu, Apr 21, 2016 at 5:22 PM, Etsuro Fujita
>> >> >> >> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> >> >> >>> Attached is an updated version of the patch, which modified Michael's
>> >> >> >>> version of the patch, as I proposed in [1] (see "Other changes:").  I
>> >> >> >>> modified comments for pgfdw_get_result/pgfdw_exec_query also, mainly because
>> >> >> >>> words like "non-blocking mode" there seems confusing (note that we have
>> >> >> >>> PQsetnonbloking).
>> >> >> >>
>> >> >> >> OK, so that is what I sent except that the comments mentioning PG_TRY
>> >> >> >> are moved to their correct places. That's fine for me. Thanks for
>> >> >> >> gathering everything in a single patch and correcting it.
>> >> >> >
>> >> >> > I have committed this patch.  Thanks for working on this.  Sorry for the delay.
>> >> >>
>> >> >> This 9.6-era patch, as it turns out, has a problem, which is that we
>> >> >> now respond to an interrupt by sending a cancel request and a
>> >> >> NON-interruptible ABORT TRANSACTION command to the remote side.  If
>> >> >> the reason that the user is trying to interrupt is that the network
>> >> >> connection has been cut, they interrupt the original query only to get
>> >> >> stuck in a non-interruptible wait for ABORT TRANSACTION.  That is
>> >> >> non-optimal.
>> >> >
>> >> > Agreed.
>> >> >
>> >> >> It is not exactly clear to me how to fix this.  Could we get by with
>> >> >> just slamming the remote connection shut, instead of sending an
>> >> >> explicit ABORT TRANSACTION?  The remote side ought to treat a
>> >> >> disconnect as equivalent to an ABORT anyway, I think, but maybe our
>> >> >> local state would get confused.  (I have not checked.)
>> >> >>
>> >> >> Thoughts?
>> >> >
>> >> > Perhaps we will get stuck at query cancellation before ABORT
>> >> > TRANSACTION in the case. A connection will be shut down when
>> >> > anything wrong (PQstatus(conn) != CONNECTION_OK and so) on
>> >> > aborting local transactoin . So I don't think fdw gets confused
>> >> > or sholdn't be confused by shutting down there.
>> >> >
>> >> > The most significant issue I can see is that the same thing
>> >> > happens in the case of graceful ABORT TRANSACTION. It could be a
>> >> > performance issue.
>> >> >
>> >> > We could set timeout here but maybe we can just slamming the
>> >> > connection down instead of sending a query cancellation. It is
>> >> > caused only by timeout or interrupts so I suppose it is not a
>> >> > problem *with a few connections*.
>> >> >
>> >> >
>> >> > Things are a bit diffent with hundreds of connections. The
>> >> > penalty of reconnection would be very high in the case.
>> >> >
>> >> > If we are not willing to pay such high penalty, maybe we are to
>> >> > manage busy-idle time of each connection and trying graceful
>> >> > abort if it is short enough, maybe having a shoft timeout.
>> >> >
>> >> > Furthermore, if most or all of the hundreds of connections get
>> >> > stuck, such timeout will accumulate up like a mountain...
>> >>
>> >> Even when the transaction is aborted because a user cancels a query,
>> >> we do want to preserve the connection, if possible, to avoid
>> >
>> > Yes.
>> >
>> >> reconnection. If the request to cancel the query itself fails, we
>> >> should certainly drop the connection. Here's the patch to do that.
>> >
>> > A problem I think on this would be that we still try to make
>> > another connection for canceling and it would stall for several
>> > minutes per connection on a packet stall, which should be done in
>> > a second on ordinary circumstances. Perhaps we might want here is
>> > async canceling with timeout.
>>
>> I am not sure what do you mean "make another connection for
>> cancelling.". Do you mean to say that PQcancel would make another
>> connection?
>
> Yes. It will take about 3 minutes on standard settings when no
> ACK returned. I thought that this discussion is based on such
> situation.
>
>> The patch proposed simply improves the condition when PQcancel has
>> returned a failure. Right now we ignore that failure and try to ABORT
>> the transaction, which is most probably going to get stalled or fail
>> to be dispatched. With the patch such a connection will be reset.
>
> Ah, I understand that. It is surely an improvement since it
> avoids useless ABORT TRANSACTION that is known to stall.
>
> regards,
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company