Thread: [BUG?] postgres_fdw incorrectly updates remote table if it hasinherited children.
[BUG?] postgres_fdw incorrectly updates remote table if it hasinherited children.
From
Kohei KaiGai
Date:
Hello, I noticed the following scenario under the development of truncate support on FDW. In case when 'ftable' maps a remote table that has inherited children,... postgres=# create table rtable_parent (id int, label text, x text); CREATE TABLE postgres=# create table rtable_child () inherits (rtable_parent); CREATE TABLE postgres=# insert into rtable_parent (select x, 'parent', md5(x::text) from generate_series(1,10) x); INSERT 0 10 postgres=# insert into rtable_child (select x, 'child', md5(x::text) from generate_series(6,15) x); INSERT 0 10 postgres=# create foreign table ftable (id int, label text, x text) server loopback options (table_name 'rtable_parent'); CREATE FOREIGN TABLE The 'ftable' shows the results from both of the parent and children. postgres=# select * from ftable; id | label | x ----+--------+---------------------------------- 1 | parent | c4ca4238a0b923820dcc509a6f75849b 2 | parent | c81e728d9d4c2f636f067f89cc14862c 3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3 4 | parent | a87ff679a2f3e71d9181a67b7542122c 5 | parent | e4da3b7fbbce2345d7772b0674a318d5 6 | parent | 1679091c5a880faf6fb5e6087eb1b2dc 7 | parent | 8f14e45fceea167a5a36dedd4bea2543 8 | parent | c9f0f895fb98ab9159f51fd0297e236d 9 | parent | 45c48cce2e2d7fbdea1afc51c7c6ad26 10 | parent | d3d9446802a44259755d38e6d163e820 6 | child | 1679091c5a880faf6fb5e6087eb1b2dc 7 | child | 8f14e45fceea167a5a36dedd4bea2543 8 | child | c9f0f895fb98ab9159f51fd0297e236d 9 | child | 45c48cce2e2d7fbdea1afc51c7c6ad26 10 | child | d3d9446802a44259755d38e6d163e820 11 | child | 6512bd43d9caa6e02c990b0a82652dca 12 | child | c20ad4d76fe97759aa27a0c99bff6710 13 | child | c51ce410c124a10e0db5e4b97fc2af39 14 | child | aab3238922bcc25a6f606eb525ffdc56 15 | child | 9bf31c7ff062936a96d3c8bd1f8f2ff3 (20 rows) When we try to update the foreign-table without DirectUpdate mode, remote query tries to update the rows specified by "ctid" system column. However, it was not a unique key in this case. postgres=# explain update ftable set x = 'updated' where id > 10 and pg_backend_pid() > 0; QUERY PLAN ----------------------------------------------------------------------------- Update on ftable (cost=100.00..133.80 rows=414 width=74) -> Result (cost=100.00..133.80 rows=414 width=74) One-Time Filter: (pg_backend_pid() > 0) -> Foreign Scan on ftable (cost=100.00..133.80 rows=414 width=42) (4 rows) [*] Note that pg_backend_pid() prevent direct update. postgres=# update ftable set x = 'updated' where id > 10 and pg_backend_pid() > 0; UPDATE 5 postgres=# select ctid,* from ftable; ctid | id | label | x --------+----+--------+---------------------------------- (0,1) | 1 | parent | c4ca4238a0b923820dcc509a6f75849b (0,2) | 2 | parent | c81e728d9d4c2f636f067f89cc14862c (0,3) | 3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3 (0,4) | 4 | parent | a87ff679a2f3e71d9181a67b7542122c (0,5) | 5 | parent | e4da3b7fbbce2345d7772b0674a318d5 (0,11) | 6 | parent | updated (0,12) | 7 | parent | updated (0,13) | 8 | parent | updated (0,14) | 9 | parent | updated (0,15) | 10 | parent | updated (0,1) | 6 | child | 1679091c5a880faf6fb5e6087eb1b2dc (0,2) | 7 | child | 8f14e45fceea167a5a36dedd4bea2543 (0,3) | 8 | child | c9f0f895fb98ab9159f51fd0297e236d (0,4) | 9 | child | 45c48cce2e2d7fbdea1afc51c7c6ad26 (0,5) | 10 | child | d3d9446802a44259755d38e6d163e820 (0,11) | 11 | child | updated (0,12) | 12 | child | updated (0,13) | 13 | child | updated (0,14) | 14 | child | updated (0,15) | 15 | child | updated (20 rows) The WHERE-clause (id > 10) should affect only child table. However, it updated the rows in the parent table with same ctid. How about your thought? Probably, we need to fetch a pair of tableoid and ctid to identify the remote table exactly, if not direct-update cases. Best regards, -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei <kaigai@heterodb.com>
Re: [BUG?] postgres_fdw incorrectly updates remote table if it hasinherited children.
From
Amit Langote
Date:
Hi, On Sun, Mar 1, 2020 at 12:00 PM Kohei KaiGai <kaigai@heterodb.com> wrote: > > Hello, > > I noticed the following scenario under the development of truncate > support on FDW. > > In case when 'ftable' maps a remote table that has inherited children,... > > postgres=# create table rtable_parent (id int, label text, x text); > CREATE TABLE > postgres=# create table rtable_child () inherits (rtable_parent); > CREATE TABLE > postgres=# insert into rtable_parent (select x, 'parent', md5(x::text) > from generate_series(1,10) x); > INSERT 0 10 > postgres=# insert into rtable_child (select x, 'child', md5(x::text) > from generate_series(6,15) x); > INSERT 0 10 > postgres=# create foreign table ftable (id int, label text, x text) > server loopback options (table_name 'rtable_parent'); > CREATE FOREIGN TABLE > > The 'ftable' shows the results from both of the parent and children. > postgres=# select * from ftable; > id | label | x > ----+--------+---------------------------------- > 1 | parent | c4ca4238a0b923820dcc509a6f75849b > 2 | parent | c81e728d9d4c2f636f067f89cc14862c > 3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3 > 4 | parent | a87ff679a2f3e71d9181a67b7542122c > 5 | parent | e4da3b7fbbce2345d7772b0674a318d5 > 6 | parent | 1679091c5a880faf6fb5e6087eb1b2dc > 7 | parent | 8f14e45fceea167a5a36dedd4bea2543 > 8 | parent | c9f0f895fb98ab9159f51fd0297e236d > 9 | parent | 45c48cce2e2d7fbdea1afc51c7c6ad26 > 10 | parent | d3d9446802a44259755d38e6d163e820 > 6 | child | 1679091c5a880faf6fb5e6087eb1b2dc > 7 | child | 8f14e45fceea167a5a36dedd4bea2543 > 8 | child | c9f0f895fb98ab9159f51fd0297e236d > 9 | child | 45c48cce2e2d7fbdea1afc51c7c6ad26 > 10 | child | d3d9446802a44259755d38e6d163e820 > 11 | child | 6512bd43d9caa6e02c990b0a82652dca > 12 | child | c20ad4d76fe97759aa27a0c99bff6710 > 13 | child | c51ce410c124a10e0db5e4b97fc2af39 > 14 | child | aab3238922bcc25a6f606eb525ffdc56 > 15 | child | 9bf31c7ff062936a96d3c8bd1f8f2ff3 > (20 rows) > > When we try to update the foreign-table without DirectUpdate mode, > remote query tries to update the rows specified by "ctid" system column. > However, it was not a unique key in this case. > > postgres=# explain update ftable set x = 'updated' where id > 10 and > pg_backend_pid() > 0; > QUERY PLAN > ----------------------------------------------------------------------------- > Update on ftable (cost=100.00..133.80 rows=414 width=74) > -> Result (cost=100.00..133.80 rows=414 width=74) > One-Time Filter: (pg_backend_pid() > 0) > -> Foreign Scan on ftable (cost=100.00..133.80 rows=414 width=42) > (4 rows) > > [*] Note that pg_backend_pid() prevent direct update. > > postgres=# update ftable set x = 'updated' where id > 10 and > pg_backend_pid() > 0; > UPDATE 5 > postgres=# select ctid,* from ftable; > ctid | id | label | x > --------+----+--------+---------------------------------- > (0,1) | 1 | parent | c4ca4238a0b923820dcc509a6f75849b > (0,2) | 2 | parent | c81e728d9d4c2f636f067f89cc14862c > (0,3) | 3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3 > (0,4) | 4 | parent | a87ff679a2f3e71d9181a67b7542122c > (0,5) | 5 | parent | e4da3b7fbbce2345d7772b0674a318d5 > (0,11) | 6 | parent | updated > (0,12) | 7 | parent | updated > (0,13) | 8 | parent | updated > (0,14) | 9 | parent | updated > (0,15) | 10 | parent | updated > (0,1) | 6 | child | 1679091c5a880faf6fb5e6087eb1b2dc > (0,2) | 7 | child | 8f14e45fceea167a5a36dedd4bea2543 > (0,3) | 8 | child | c9f0f895fb98ab9159f51fd0297e236d > (0,4) | 9 | child | 45c48cce2e2d7fbdea1afc51c7c6ad26 > (0,5) | 10 | child | d3d9446802a44259755d38e6d163e820 > (0,11) | 11 | child | updated > (0,12) | 12 | child | updated > (0,13) | 13 | child | updated > (0,14) | 14 | child | updated > (0,15) | 15 | child | updated > (20 rows) > > The WHERE-clause (id > 10) should affect only child table. > However, it updated the rows in the parent table with same ctid. > > How about your thought? > Probably, we need to fetch a pair of tableoid and ctid to identify > the remote table exactly, if not direct-update cases. This was this discussed on this thread: https://www.postgresql.org/message-id/CAFjFpRfcgwsHRmpvoOK-GUQi-n8MgAS%2BOxcQo%3DaBDn1COywmcg%40mail.gmail.com Solutions have been proposed too, but none finalized yet. Thanks, Amit
Re: [BUG?] postgres_fdw incorrectly updates remote table if it hasinherited children.
From
Kohei KaiGai
Date:
Hi Amit, Thanks, I didn't check the thread. It looks to me the latest patch was submitted by Fujita-san, Oct-2018. Then, Tom pointer out this simple approach has a problem of inefficient remote query plan because of no intelligence on the structure of remote tables mapped by postgres_fdw. After that, the patch has been left for a year. Indeed, it is not an ideal query plan to execute for each updated rows... postgres=# explain select * from rtable_parent where tableoid = 126397 and ctid = '(0,11)'::tid; QUERY PLAN ------------------------------------------------------------------------- Append (cost=0.00..5.18 rows=2 width=50) -> Seq Scan on rtable_parent (cost=0.00..1.15 rows=1 width=31) Filter: ((tableoid = '126397'::oid) AND (ctid = '(0,11)'::tid)) -> Tid Scan on rtable_child (cost=0.00..4.02 rows=1 width=68) TID Cond: (ctid = '(0,11)'::tid) Filter: (tableoid = '126397'::oid) (6 rows) Rather than the refactoring at postgres_fdw, is it possible to have a built-in partition pruning rule when "tableoid = <OID>" was supplied? If partition mechanism would have the feature, it should not be a complicated problem. Best regards, 2020年3月1日(日) 12:39 Amit Langote <amitlangote09@gmail.com>: > > Hi, > > On Sun, Mar 1, 2020 at 12:00 PM Kohei KaiGai <kaigai@heterodb.com> wrote: > > > > Hello, > > > > I noticed the following scenario under the development of truncate > > support on FDW. > > > > In case when 'ftable' maps a remote table that has inherited children,... > > > > postgres=# create table rtable_parent (id int, label text, x text); > > CREATE TABLE > > postgres=# create table rtable_child () inherits (rtable_parent); > > CREATE TABLE > > postgres=# insert into rtable_parent (select x, 'parent', md5(x::text) > > from generate_series(1,10) x); > > INSERT 0 10 > > postgres=# insert into rtable_child (select x, 'child', md5(x::text) > > from generate_series(6,15) x); > > INSERT 0 10 > > postgres=# create foreign table ftable (id int, label text, x text) > > server loopback options (table_name 'rtable_parent'); > > CREATE FOREIGN TABLE > > > > The 'ftable' shows the results from both of the parent and children. > > postgres=# select * from ftable; > > id | label | x > > ----+--------+---------------------------------- > > 1 | parent | c4ca4238a0b923820dcc509a6f75849b > > 2 | parent | c81e728d9d4c2f636f067f89cc14862c > > 3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3 > > 4 | parent | a87ff679a2f3e71d9181a67b7542122c > > 5 | parent | e4da3b7fbbce2345d7772b0674a318d5 > > 6 | parent | 1679091c5a880faf6fb5e6087eb1b2dc > > 7 | parent | 8f14e45fceea167a5a36dedd4bea2543 > > 8 | parent | c9f0f895fb98ab9159f51fd0297e236d > > 9 | parent | 45c48cce2e2d7fbdea1afc51c7c6ad26 > > 10 | parent | d3d9446802a44259755d38e6d163e820 > > 6 | child | 1679091c5a880faf6fb5e6087eb1b2dc > > 7 | child | 8f14e45fceea167a5a36dedd4bea2543 > > 8 | child | c9f0f895fb98ab9159f51fd0297e236d > > 9 | child | 45c48cce2e2d7fbdea1afc51c7c6ad26 > > 10 | child | d3d9446802a44259755d38e6d163e820 > > 11 | child | 6512bd43d9caa6e02c990b0a82652dca > > 12 | child | c20ad4d76fe97759aa27a0c99bff6710 > > 13 | child | c51ce410c124a10e0db5e4b97fc2af39 > > 14 | child | aab3238922bcc25a6f606eb525ffdc56 > > 15 | child | 9bf31c7ff062936a96d3c8bd1f8f2ff3 > > (20 rows) > > > > When we try to update the foreign-table without DirectUpdate mode, > > remote query tries to update the rows specified by "ctid" system column. > > However, it was not a unique key in this case. > > > > postgres=# explain update ftable set x = 'updated' where id > 10 and > > pg_backend_pid() > 0; > > QUERY PLAN > > ----------------------------------------------------------------------------- > > Update on ftable (cost=100.00..133.80 rows=414 width=74) > > -> Result (cost=100.00..133.80 rows=414 width=74) > > One-Time Filter: (pg_backend_pid() > 0) > > -> Foreign Scan on ftable (cost=100.00..133.80 rows=414 width=42) > > (4 rows) > > > > [*] Note that pg_backend_pid() prevent direct update. > > > > postgres=# update ftable set x = 'updated' where id > 10 and > > pg_backend_pid() > 0; > > UPDATE 5 > > postgres=# select ctid,* from ftable; > > ctid | id | label | x > > --------+----+--------+---------------------------------- > > (0,1) | 1 | parent | c4ca4238a0b923820dcc509a6f75849b > > (0,2) | 2 | parent | c81e728d9d4c2f636f067f89cc14862c > > (0,3) | 3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3 > > (0,4) | 4 | parent | a87ff679a2f3e71d9181a67b7542122c > > (0,5) | 5 | parent | e4da3b7fbbce2345d7772b0674a318d5 > > (0,11) | 6 | parent | updated > > (0,12) | 7 | parent | updated > > (0,13) | 8 | parent | updated > > (0,14) | 9 | parent | updated > > (0,15) | 10 | parent | updated > > (0,1) | 6 | child | 1679091c5a880faf6fb5e6087eb1b2dc > > (0,2) | 7 | child | 8f14e45fceea167a5a36dedd4bea2543 > > (0,3) | 8 | child | c9f0f895fb98ab9159f51fd0297e236d > > (0,4) | 9 | child | 45c48cce2e2d7fbdea1afc51c7c6ad26 > > (0,5) | 10 | child | d3d9446802a44259755d38e6d163e820 > > (0,11) | 11 | child | updated > > (0,12) | 12 | child | updated > > (0,13) | 13 | child | updated > > (0,14) | 14 | child | updated > > (0,15) | 15 | child | updated > > (20 rows) > > > > The WHERE-clause (id > 10) should affect only child table. > > However, it updated the rows in the parent table with same ctid. > > > > How about your thought? > > Probably, we need to fetch a pair of tableoid and ctid to identify > > the remote table exactly, if not direct-update cases. > > This was this discussed on this thread: > > https://www.postgresql.org/message-id/CAFjFpRfcgwsHRmpvoOK-GUQi-n8MgAS%2BOxcQo%3DaBDn1COywmcg%40mail.gmail.com > > Solutions have been proposed too, but none finalized yet. > > Thanks, > Amit -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei <kaigai@heterodb.com>
Re: [BUG?] postgres_fdw incorrectly updates remote table if it hasinherited children.
From
Etsuro Fujita
Date:
Hi KaiGai-san, On Sun, Mar 1, 2020 at 1:47 PM Kohei KaiGai <kaigai@heterodb.com> wrote: > It looks to me the latest patch was submitted by Fujita-san, Oct-2018. > Then, Tom pointer out this simple approach has a problem of inefficient remote > query plan because of no intelligence on the structure of remote tables mapped > by postgres_fdw. After that, the patch has been left for a year. Unfortunately, I didn't have time to work on that (and won't in the development cycle for PG13.) > Indeed, it is not an ideal query plan to execute for each updated rows... > > postgres=# explain select * from rtable_parent where tableoid = 126397 > and ctid = '(0,11)'::tid; > QUERY PLAN > ------------------------------------------------------------------------- > Append (cost=0.00..5.18 rows=2 width=50) > -> Seq Scan on rtable_parent (cost=0.00..1.15 rows=1 width=31) > Filter: ((tableoid = '126397'::oid) AND (ctid = '(0,11)'::tid)) > -> Tid Scan on rtable_child (cost=0.00..4.02 rows=1 width=68) > TID Cond: (ctid = '(0,11)'::tid) > Filter: (tableoid = '126397'::oid) > (6 rows) IIRC, I think one of Tom's concerns about the solution I proposed was that it added the tableoid restriction clause to the remote UPDATE/DELETE query even if the remote table is not an inheritance set. To add the clause only if the remote table is an inheritance set, what I have in mind is to 1) introduce a new postgres_fdw table option to indicate whether the remote table is an inheritance set or not, and 2) determine whether to add the clause or not, using the option. Best regards, Etsuro Fujita
Re: [BUG?] postgres_fdw incorrectly updates remote table if it hasinherited children.
From
Kohei KaiGai
Date:
Fujita-san, > Unfortunately, I didn't have time to work on that (and won't in the > development cycle for PG13.) > > > Indeed, it is not an ideal query plan to execute for each updated rows... > > > > postgres=# explain select * from rtable_parent where tableoid = 126397 > > and ctid = '(0,11)'::tid; > > QUERY PLAN > > ------------------------------------------------------------------------- > > Append (cost=0.00..5.18 rows=2 width=50) > > -> Seq Scan on rtable_parent (cost=0.00..1.15 rows=1 width=31) > > Filter: ((tableoid = '126397'::oid) AND (ctid = '(0,11)'::tid)) > > -> Tid Scan on rtable_child (cost=0.00..4.02 rows=1 width=68) > > TID Cond: (ctid = '(0,11)'::tid) > > Filter: (tableoid = '126397'::oid) > > (6 rows) > > IIRC, I think one of Tom's concerns about the solution I proposed was > that it added the tableoid restriction clause to the remote > UPDATE/DELETE query even if the remote table is not an inheritance > set. To add the clause only if the remote table is an inheritance > set, what I have in mind is to 1) introduce a new postgres_fdw table > option to indicate whether the remote table is an inheritance set or > not, and 2) determine whether to add the clause or not, using the > option. > I don't think the new options in postgres_fdw is a good solution because remote table structure is flexible regardless of the local configuration in foreign-table options. People may add inherited child tables after the declaration of foreign-tables. It can make configuration mismatch. Even if we always add tableoid=OID restriction on the remote query, it shall be evaluated after the TidScan fetched the row pointed by ctid. Its additional cost is limited. And, one potential benefit is tableoid=OID restriction can be used to prune unrelated partition leafs/inherited children at the planner stage. Probably, it is a separated groundwork from postgres_fdw. One planner considers the built-in rule for this kind of optimization, enhancement at postgres_fdw will be quite simple, I guess. How about your thought? Best regards, -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei <kaigai@heterodb.com>