Thread: [BUG?] postgres_fdw incorrectly updates remote table if it hasinherited children.

Hello,

I noticed the following scenario under the development of truncate
support on FDW.

In case when 'ftable' maps a remote table that has inherited children,...

postgres=# create table rtable_parent (id int, label text, x text);
CREATE TABLE
postgres=# create table rtable_child () inherits (rtable_parent);
CREATE TABLE
postgres=# insert into rtable_parent (select x, 'parent', md5(x::text)
from generate_series(1,10) x);
INSERT 0 10
postgres=# insert into rtable_child (select x, 'child', md5(x::text)
from generate_series(6,15) x);
INSERT 0 10
postgres=# create foreign table ftable (id int, label text, x text)
                   server loopback options (table_name 'rtable_parent');
CREATE FOREIGN TABLE

The 'ftable' shows the results from both of the parent and children.
postgres=# select * from ftable;
 id | label  |                x
----+--------+----------------------------------
  1 | parent | c4ca4238a0b923820dcc509a6f75849b
  2 | parent | c81e728d9d4c2f636f067f89cc14862c
  3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3
  4 | parent | a87ff679a2f3e71d9181a67b7542122c
  5 | parent | e4da3b7fbbce2345d7772b0674a318d5
  6 | parent | 1679091c5a880faf6fb5e6087eb1b2dc
  7 | parent | 8f14e45fceea167a5a36dedd4bea2543
  8 | parent | c9f0f895fb98ab9159f51fd0297e236d
  9 | parent | 45c48cce2e2d7fbdea1afc51c7c6ad26
 10 | parent | d3d9446802a44259755d38e6d163e820
  6 | child  | 1679091c5a880faf6fb5e6087eb1b2dc
  7 | child  | 8f14e45fceea167a5a36dedd4bea2543
  8 | child  | c9f0f895fb98ab9159f51fd0297e236d
  9 | child  | 45c48cce2e2d7fbdea1afc51c7c6ad26
 10 | child  | d3d9446802a44259755d38e6d163e820
 11 | child  | 6512bd43d9caa6e02c990b0a82652dca
 12 | child  | c20ad4d76fe97759aa27a0c99bff6710
 13 | child  | c51ce410c124a10e0db5e4b97fc2af39
 14 | child  | aab3238922bcc25a6f606eb525ffdc56
 15 | child  | 9bf31c7ff062936a96d3c8bd1f8f2ff3
(20 rows)

When we try to update the foreign-table without DirectUpdate mode,
remote query tries to update the rows specified by "ctid" system column.
However, it was not a unique key in this case.

postgres=# explain update ftable set x = 'updated' where id > 10 and
pg_backend_pid() > 0;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Update on ftable  (cost=100.00..133.80 rows=414 width=74)
   ->  Result  (cost=100.00..133.80 rows=414 width=74)
         One-Time Filter: (pg_backend_pid() > 0)
         ->  Foreign Scan on ftable  (cost=100.00..133.80 rows=414 width=42)
(4 rows)

[*] Note that pg_backend_pid() prevent direct update.

postgres=# update ftable set x = 'updated' where id > 10 and
pg_backend_pid() > 0;
UPDATE 5
postgres=# select ctid,* from ftable;
  ctid  | id | label  |                x
--------+----+--------+----------------------------------
 (0,1)  |  1 | parent | c4ca4238a0b923820dcc509a6f75849b
 (0,2)  |  2 | parent | c81e728d9d4c2f636f067f89cc14862c
 (0,3)  |  3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3
 (0,4)  |  4 | parent | a87ff679a2f3e71d9181a67b7542122c
 (0,5)  |  5 | parent | e4da3b7fbbce2345d7772b0674a318d5
 (0,11) |  6 | parent | updated
 (0,12) |  7 | parent | updated
 (0,13) |  8 | parent | updated
 (0,14) |  9 | parent | updated
 (0,15) | 10 | parent | updated
 (0,1)  |  6 | child  | 1679091c5a880faf6fb5e6087eb1b2dc
 (0,2)  |  7 | child  | 8f14e45fceea167a5a36dedd4bea2543
 (0,3)  |  8 | child  | c9f0f895fb98ab9159f51fd0297e236d
 (0,4)  |  9 | child  | 45c48cce2e2d7fbdea1afc51c7c6ad26
 (0,5)  | 10 | child  | d3d9446802a44259755d38e6d163e820
 (0,11) | 11 | child  | updated
 (0,12) | 12 | child  | updated
 (0,13) | 13 | child  | updated
 (0,14) | 14 | child  | updated
 (0,15) | 15 | child  | updated
(20 rows)

The WHERE-clause (id > 10) should affect only child table.
However, it updated the rows in the parent table with same ctid.

How about your thought?
Probably, we need to fetch a pair of tableoid and ctid to identify
the remote table exactly, if not direct-update cases.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>



Hi,

On Sun, Mar 1, 2020 at 12:00 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
>
> Hello,
>
> I noticed the following scenario under the development of truncate
> support on FDW.
>
> In case when 'ftable' maps a remote table that has inherited children,...
>
> postgres=# create table rtable_parent (id int, label text, x text);
> CREATE TABLE
> postgres=# create table rtable_child () inherits (rtable_parent);
> CREATE TABLE
> postgres=# insert into rtable_parent (select x, 'parent', md5(x::text)
> from generate_series(1,10) x);
> INSERT 0 10
> postgres=# insert into rtable_child (select x, 'child', md5(x::text)
> from generate_series(6,15) x);
> INSERT 0 10
> postgres=# create foreign table ftable (id int, label text, x text)
>                    server loopback options (table_name 'rtable_parent');
> CREATE FOREIGN TABLE
>
> The 'ftable' shows the results from both of the parent and children.
> postgres=# select * from ftable;
>  id | label  |                x
> ----+--------+----------------------------------
>   1 | parent | c4ca4238a0b923820dcc509a6f75849b
>   2 | parent | c81e728d9d4c2f636f067f89cc14862c
>   3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3
>   4 | parent | a87ff679a2f3e71d9181a67b7542122c
>   5 | parent | e4da3b7fbbce2345d7772b0674a318d5
>   6 | parent | 1679091c5a880faf6fb5e6087eb1b2dc
>   7 | parent | 8f14e45fceea167a5a36dedd4bea2543
>   8 | parent | c9f0f895fb98ab9159f51fd0297e236d
>   9 | parent | 45c48cce2e2d7fbdea1afc51c7c6ad26
>  10 | parent | d3d9446802a44259755d38e6d163e820
>   6 | child  | 1679091c5a880faf6fb5e6087eb1b2dc
>   7 | child  | 8f14e45fceea167a5a36dedd4bea2543
>   8 | child  | c9f0f895fb98ab9159f51fd0297e236d
>   9 | child  | 45c48cce2e2d7fbdea1afc51c7c6ad26
>  10 | child  | d3d9446802a44259755d38e6d163e820
>  11 | child  | 6512bd43d9caa6e02c990b0a82652dca
>  12 | child  | c20ad4d76fe97759aa27a0c99bff6710
>  13 | child  | c51ce410c124a10e0db5e4b97fc2af39
>  14 | child  | aab3238922bcc25a6f606eb525ffdc56
>  15 | child  | 9bf31c7ff062936a96d3c8bd1f8f2ff3
> (20 rows)
>
> When we try to update the foreign-table without DirectUpdate mode,
> remote query tries to update the rows specified by "ctid" system column.
> However, it was not a unique key in this case.
>
> postgres=# explain update ftable set x = 'updated' where id > 10 and
> pg_backend_pid() > 0;
>                                  QUERY PLAN
> -----------------------------------------------------------------------------
>  Update on ftable  (cost=100.00..133.80 rows=414 width=74)
>    ->  Result  (cost=100.00..133.80 rows=414 width=74)
>          One-Time Filter: (pg_backend_pid() > 0)
>          ->  Foreign Scan on ftable  (cost=100.00..133.80 rows=414 width=42)
> (4 rows)
>
> [*] Note that pg_backend_pid() prevent direct update.
>
> postgres=# update ftable set x = 'updated' where id > 10 and
> pg_backend_pid() > 0;
> UPDATE 5
> postgres=# select ctid,* from ftable;
>   ctid  | id | label  |                x
> --------+----+--------+----------------------------------
>  (0,1)  |  1 | parent | c4ca4238a0b923820dcc509a6f75849b
>  (0,2)  |  2 | parent | c81e728d9d4c2f636f067f89cc14862c
>  (0,3)  |  3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3
>  (0,4)  |  4 | parent | a87ff679a2f3e71d9181a67b7542122c
>  (0,5)  |  5 | parent | e4da3b7fbbce2345d7772b0674a318d5
>  (0,11) |  6 | parent | updated
>  (0,12) |  7 | parent | updated
>  (0,13) |  8 | parent | updated
>  (0,14) |  9 | parent | updated
>  (0,15) | 10 | parent | updated
>  (0,1)  |  6 | child  | 1679091c5a880faf6fb5e6087eb1b2dc
>  (0,2)  |  7 | child  | 8f14e45fceea167a5a36dedd4bea2543
>  (0,3)  |  8 | child  | c9f0f895fb98ab9159f51fd0297e236d
>  (0,4)  |  9 | child  | 45c48cce2e2d7fbdea1afc51c7c6ad26
>  (0,5)  | 10 | child  | d3d9446802a44259755d38e6d163e820
>  (0,11) | 11 | child  | updated
>  (0,12) | 12 | child  | updated
>  (0,13) | 13 | child  | updated
>  (0,14) | 14 | child  | updated
>  (0,15) | 15 | child  | updated
> (20 rows)
>
> The WHERE-clause (id > 10) should affect only child table.
> However, it updated the rows in the parent table with same ctid.
>
> How about your thought?
> Probably, we need to fetch a pair of tableoid and ctid to identify
> the remote table exactly, if not direct-update cases.

This was this discussed on this thread:

https://www.postgresql.org/message-id/CAFjFpRfcgwsHRmpvoOK-GUQi-n8MgAS%2BOxcQo%3DaBDn1COywmcg%40mail.gmail.com

Solutions have been proposed too, but none finalized yet.

Thanks,
Amit



Hi Amit,

Thanks, I didn't check the thread.

It looks to me the latest patch was submitted by Fujita-san, Oct-2018.
Then, Tom pointer out this simple approach has a problem of inefficient remote
query plan because of no intelligence on the structure of remote tables mapped
by postgres_fdw. After that, the patch has been left for a year.

Indeed, it is not an ideal query plan to execute for each updated rows...

postgres=# explain select * from rtable_parent where tableoid = 126397
and ctid = '(0,11)'::tid;
                               QUERY PLAN
-------------------------------------------------------------------------
 Append  (cost=0.00..5.18 rows=2 width=50)
   ->  Seq Scan on rtable_parent  (cost=0.00..1.15 rows=1 width=31)
         Filter: ((tableoid = '126397'::oid) AND (ctid = '(0,11)'::tid))
   ->  Tid Scan on rtable_child  (cost=0.00..4.02 rows=1 width=68)
         TID Cond: (ctid = '(0,11)'::tid)
         Filter: (tableoid = '126397'::oid)
(6 rows)

Rather than the refactoring at postgres_fdw, is it possible to have a
built-in partition
pruning rule when "tableoid = <OID>" was supplied?
If partition mechanism would have the feature, it should not be a
complicated problem.

Best regards,

2020年3月1日(日) 12:39 Amit Langote <amitlangote09@gmail.com>:
>
> Hi,
>
> On Sun, Mar 1, 2020 at 12:00 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> >
> > Hello,
> >
> > I noticed the following scenario under the development of truncate
> > support on FDW.
> >
> > In case when 'ftable' maps a remote table that has inherited children,...
> >
> > postgres=# create table rtable_parent (id int, label text, x text);
> > CREATE TABLE
> > postgres=# create table rtable_child () inherits (rtable_parent);
> > CREATE TABLE
> > postgres=# insert into rtable_parent (select x, 'parent', md5(x::text)
> > from generate_series(1,10) x);
> > INSERT 0 10
> > postgres=# insert into rtable_child (select x, 'child', md5(x::text)
> > from generate_series(6,15) x);
> > INSERT 0 10
> > postgres=# create foreign table ftable (id int, label text, x text)
> >                    server loopback options (table_name 'rtable_parent');
> > CREATE FOREIGN TABLE
> >
> > The 'ftable' shows the results from both of the parent and children.
> > postgres=# select * from ftable;
> >  id | label  |                x
> > ----+--------+----------------------------------
> >   1 | parent | c4ca4238a0b923820dcc509a6f75849b
> >   2 | parent | c81e728d9d4c2f636f067f89cc14862c
> >   3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3
> >   4 | parent | a87ff679a2f3e71d9181a67b7542122c
> >   5 | parent | e4da3b7fbbce2345d7772b0674a318d5
> >   6 | parent | 1679091c5a880faf6fb5e6087eb1b2dc
> >   7 | parent | 8f14e45fceea167a5a36dedd4bea2543
> >   8 | parent | c9f0f895fb98ab9159f51fd0297e236d
> >   9 | parent | 45c48cce2e2d7fbdea1afc51c7c6ad26
> >  10 | parent | d3d9446802a44259755d38e6d163e820
> >   6 | child  | 1679091c5a880faf6fb5e6087eb1b2dc
> >   7 | child  | 8f14e45fceea167a5a36dedd4bea2543
> >   8 | child  | c9f0f895fb98ab9159f51fd0297e236d
> >   9 | child  | 45c48cce2e2d7fbdea1afc51c7c6ad26
> >  10 | child  | d3d9446802a44259755d38e6d163e820
> >  11 | child  | 6512bd43d9caa6e02c990b0a82652dca
> >  12 | child  | c20ad4d76fe97759aa27a0c99bff6710
> >  13 | child  | c51ce410c124a10e0db5e4b97fc2af39
> >  14 | child  | aab3238922bcc25a6f606eb525ffdc56
> >  15 | child  | 9bf31c7ff062936a96d3c8bd1f8f2ff3
> > (20 rows)
> >
> > When we try to update the foreign-table without DirectUpdate mode,
> > remote query tries to update the rows specified by "ctid" system column.
> > However, it was not a unique key in this case.
> >
> > postgres=# explain update ftable set x = 'updated' where id > 10 and
> > pg_backend_pid() > 0;
> >                                  QUERY PLAN
> > -----------------------------------------------------------------------------
> >  Update on ftable  (cost=100.00..133.80 rows=414 width=74)
> >    ->  Result  (cost=100.00..133.80 rows=414 width=74)
> >          One-Time Filter: (pg_backend_pid() > 0)
> >          ->  Foreign Scan on ftable  (cost=100.00..133.80 rows=414 width=42)
> > (4 rows)
> >
> > [*] Note that pg_backend_pid() prevent direct update.
> >
> > postgres=# update ftable set x = 'updated' where id > 10 and
> > pg_backend_pid() > 0;
> > UPDATE 5
> > postgres=# select ctid,* from ftable;
> >   ctid  | id | label  |                x
> > --------+----+--------+----------------------------------
> >  (0,1)  |  1 | parent | c4ca4238a0b923820dcc509a6f75849b
> >  (0,2)  |  2 | parent | c81e728d9d4c2f636f067f89cc14862c
> >  (0,3)  |  3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3
> >  (0,4)  |  4 | parent | a87ff679a2f3e71d9181a67b7542122c
> >  (0,5)  |  5 | parent | e4da3b7fbbce2345d7772b0674a318d5
> >  (0,11) |  6 | parent | updated
> >  (0,12) |  7 | parent | updated
> >  (0,13) |  8 | parent | updated
> >  (0,14) |  9 | parent | updated
> >  (0,15) | 10 | parent | updated
> >  (0,1)  |  6 | child  | 1679091c5a880faf6fb5e6087eb1b2dc
> >  (0,2)  |  7 | child  | 8f14e45fceea167a5a36dedd4bea2543
> >  (0,3)  |  8 | child  | c9f0f895fb98ab9159f51fd0297e236d
> >  (0,4)  |  9 | child  | 45c48cce2e2d7fbdea1afc51c7c6ad26
> >  (0,5)  | 10 | child  | d3d9446802a44259755d38e6d163e820
> >  (0,11) | 11 | child  | updated
> >  (0,12) | 12 | child  | updated
> >  (0,13) | 13 | child  | updated
> >  (0,14) | 14 | child  | updated
> >  (0,15) | 15 | child  | updated
> > (20 rows)
> >
> > The WHERE-clause (id > 10) should affect only child table.
> > However, it updated the rows in the parent table with same ctid.
> >
> > How about your thought?
> > Probably, we need to fetch a pair of tableoid and ctid to identify
> > the remote table exactly, if not direct-update cases.
>
> This was this discussed on this thread:
>
> https://www.postgresql.org/message-id/CAFjFpRfcgwsHRmpvoOK-GUQi-n8MgAS%2BOxcQo%3DaBDn1COywmcg%40mail.gmail.com
>
> Solutions have been proposed too, but none finalized yet.
>
> Thanks,
> Amit



--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>



Hi KaiGai-san,

On Sun, Mar 1, 2020 at 1:47 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> It looks to me the latest patch was submitted by Fujita-san, Oct-2018.
> Then, Tom pointer out this simple approach has a problem of inefficient remote
> query plan because of no intelligence on the structure of remote tables mapped
> by postgres_fdw. After that, the patch has been left for a year.

Unfortunately, I didn't have time to work on that (and won't in the
development cycle for PG13.)

> Indeed, it is not an ideal query plan to execute for each updated rows...
>
> postgres=# explain select * from rtable_parent where tableoid = 126397
> and ctid = '(0,11)'::tid;
>                                QUERY PLAN
> -------------------------------------------------------------------------
>  Append  (cost=0.00..5.18 rows=2 width=50)
>    ->  Seq Scan on rtable_parent  (cost=0.00..1.15 rows=1 width=31)
>          Filter: ((tableoid = '126397'::oid) AND (ctid = '(0,11)'::tid))
>    ->  Tid Scan on rtable_child  (cost=0.00..4.02 rows=1 width=68)
>          TID Cond: (ctid = '(0,11)'::tid)
>          Filter: (tableoid = '126397'::oid)
> (6 rows)

IIRC, I think one of Tom's concerns about the solution I proposed was
that it added the tableoid restriction clause to the remote
UPDATE/DELETE query even if the remote table is not an inheritance
set.  To add the clause only if the remote table is an inheritance
set, what I have in mind is to 1) introduce a new postgres_fdw table
option to indicate whether the remote table is an inheritance set or
not, and 2) determine whether to add the clause or not, using the
option.

Best regards,
Etsuro Fujita



Fujita-san,

> Unfortunately, I didn't have time to work on that (and won't in the
> development cycle for PG13.)
>
> > Indeed, it is not an ideal query plan to execute for each updated rows...
> >
> > postgres=# explain select * from rtable_parent where tableoid = 126397
> > and ctid = '(0,11)'::tid;
> >                                QUERY PLAN
> > -------------------------------------------------------------------------
> >  Append  (cost=0.00..5.18 rows=2 width=50)
> >    ->  Seq Scan on rtable_parent  (cost=0.00..1.15 rows=1 width=31)
> >          Filter: ((tableoid = '126397'::oid) AND (ctid = '(0,11)'::tid))
> >    ->  Tid Scan on rtable_child  (cost=0.00..4.02 rows=1 width=68)
> >          TID Cond: (ctid = '(0,11)'::tid)
> >          Filter: (tableoid = '126397'::oid)
> > (6 rows)
>
> IIRC, I think one of Tom's concerns about the solution I proposed was
> that it added the tableoid restriction clause to the remote
> UPDATE/DELETE query even if the remote table is not an inheritance
> set.  To add the clause only if the remote table is an inheritance
> set, what I have in mind is to 1) introduce a new postgres_fdw table
> option to indicate whether the remote table is an inheritance set or
> not, and 2) determine whether to add the clause or not, using the
> option.
>
I don't think the new options in postgres_fdw is a good solution because
remote table structure is flexible regardless of the local configuration in
foreign-table options. People may add inherited child tables after the
declaration of foreign-tables. It can make configuration mismatch.
Even if we always add tableoid=OID restriction on the remote query,
it shall be evaluated after the TidScan fetched the row pointed by ctid.
Its additional cost is limited.

And, one potential benefit is tableoid=OID restriction can be used to prune
unrelated partition leafs/inherited children at the planner stage.
Probably, it is a separated groundwork from postgres_fdw.
One planner considers the built-in rule for this kind of optimization,
enhancement at postgres_fdw will be quite simple, I guess.

How about your thought?

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>