Re: inherit support for foreign tables - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: inherit support for foreign tables
Date
Msg-id CAFjFpRd6PR3Seg8D3t-XcXGxXyb+p2WDsk_wBGge7b=PyzP_Qw@mail.gmail.com
Whole thread Raw
In response to Re: inherit support for foreign tables  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Responses Re: inherit support for foreign tables  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Re: inherit support for foreign tables  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
List pgsql-hackers


On Thu, Nov 27, 2014 at 3:52 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
(2014/11/17 17:55), Ashutosh Bapat wrote:
Here are my review comments for patch fdw-inh-3.patch.

Thanks for the review!

Tests
-------
1. It seems like you have copied from testcase inherit.sql to
postgres_fdw testcase. That's a good thing, but it makes the test quite
long. May be we should have two tests in postgres_fdw contrib module,
one for simple cases, and other for inheritance. What do you say?

IMO, the test is not so time-consuming, so it doesn't seem worth splitting it into two.

I am not worried about the timing but I am worried about the length of the file and hence ease of debugging in case we find any issues there. We will leave that for the commiter to decide.
 

Documentation
--------------------
1. The change in ddl.sgml
-        We will refer to the child tables as partitions, though they
-        are in every way normal <productname>PostgreSQL</> tables.
+        We will refer to the child tables as partitions, though we assume
+        that they are normal <productname>PostgreSQL</> tables.

adds phrase "we assume that", which confuses the intention behind the
sentence. The original text is intended to highlight the equivalence
between "partition" and "normal table", where as the addition esp. the
word "assume" weakens that equivalence. Instead now we have to highlight
the equivalence between "partition" and "normal or foreign table". The
wording similar to "though they are either normal or foreign tables"
should be used there.

You are right, but I feel that there is something out of place in saying that there (5.10.2. Implementing Partitioning) because the procedure there has been written based on normal tables only.  Put another way, if we say that, I think we'd need to add more docs, describing the syntax and/or the corresponding examples for foreign-table cases.  But I'd like to leave that for another patch.  So, how about the wording "we assume *here* that", instead of "we assume that", as I added the following notice in the previous section (5.10.1. Overview)?

@@ -2650,7 +2669,10 @@ VALUES ('Albany', NULL, NULL, 'NY');
     table of a single parent table.  The parent table itself is normally
     empty; it exists just to represent the entire data set.  You should be
     familiar with inheritance (see <xref linkend="ddl-inherit">) before
-    attempting to set up partitioning.
+    attempting to set up partitioning.  (The setup and management of
+    partitioned tables illustrated in this section assume that each
+    partition is a normal table.  However, you can do that in a similar way
+    for cases where some or all partitions are foreign tables.)


This looks ok, though, I would like to see final version of the document. But I think, we will leave that for committer to handle.
 
2. The wording "some kind of optimization" gives vague picture. May be
it can be worded as "Since the constraints are assumed to be true, they
are used in constraint-based query optimization like constraint
exclusion for partitioned tables.".
+    Those constraints are used in some kind of query optimization such
+    as constraint exclusion for partitioned tables (see
+    <xref linkend="ddl-partitioning">).

Will follow your revision.

Thanks.
 

Code
-------
1. In the following change
+/*
   * acquire_inherited_sample_rows -- acquire sample rows from
inheritance tree
   *
   * This has the same API as acquire_sample_rows, except that rows are
   * collected from all inheritance children as well as the specified table.
- * We fail and return zero if there are no inheritance children.
+ * We fail and return zero if there are no inheritance children or
there are
+ * inheritance children that foreign tables.

The addition should be "there are inheritance children that *are all
*foreign tables. Note the addition "are all".

Sorry, I incorrectly wrote the comment.  What I tried to write is "We fail and return zero if there are no inheritance children or if we are not in VAC_MODE_SINGLE case and inheritance tree contains at least one foreign table.".


You might want to use "English" description of VAC_MODE_SINGLE instead of that macro in the comment, so that reader doesn't have to look up VAC_MODE_SINGLE. But I think, we will leave this for the committer.
 
2. The function has_foreign() be better named has_foreign_child()? This

How about "has_foreign_table"?

has_foreign_child() would be better, since these are "children" in the inheritance hierarchy and not mere "table"s.
 

function loops through all the tableoids passed even after it has found
a foreign table. Why can't we return true immediately after finding the
first foreign table, unless the side effects of heap_open() on all the
table are required. But I don't see that to be the case, since these
tables are locked already through a previous call to heap_open(). In the

Good catch!  Will fix.

same function instead of argument name parentrelId may be we should use
name parent_oid, so that we use same notation for parent and child table
OIDs.

Will fix.

Thanks.
 

3.  Regarding enum VacuumMode - it's being used only in case of
acquire_inherited_sample_rows() and that too, to check only a single
value of the three defined there. May be we should just infer that value
inside acquire_inherited_sample_rows() or pass a boolean true or false
from do_analyse_rel() based on the VacuumStmt. I do not see need for a
separate three value enum of which only one value is used and also to
pass it down from vacuum() by changing signatures of the minion functions.

I introduced that for possible future use.  See the discussion in [1].

Will leave it for the commiter to decide.
 

4. In postgresGetForeignPlan(), the code added by this patch is required
to handle the case when the row mark is placed on a parent table and
hence is required to be applied on the child table. We need a comment
explaining this. Otherwise, the three step process to get the row mark
information isn't clear for a reader.

Will add the comment.

5. In expand_inherited_rtentry() why do you need a separate variable
hasForeign? Instead of using that variable, you can actually set/reset
rte->hasForeign since existence of even a single foreign child would
mark that member as true. - After typing this, I got the answer when I
looked at the function code. Every child's RTE is initially a copy of
parent's RTE and hence hasForeign status would be inherited by every
child after the first foreign child. I think, this reasoning should be
added as comment before assignment to rte->hasForeign at the end of the
function.

As you mentioned, I think we could set rte->hasForeign directly during the scan for the inheritance set, without the separate variable, hasForeign.  But ISTM that it'd improve code readability to set rte->hasForeign using the separate variable at the end of the function because rte->hasForeign has its meaning only when rte->inh is true and because we know whether rte->inh is true, at the end of the function.

Fine. Please use "hasForeignChild" instead of just "hasForeign" without a clue as to what is "foreign" here.
 

6. The tests in foreign_data.sql are pretty extensive. Thanks for that.
I think, we should also check the effect of each of the following
command using \d on appropriate tables.
+CREATE FOREIGN TABLE ft2 () INHERITS (pt1)
+  SERVER s0 OPTIONS (delimiter ',', quote '"', "be quoted" 'value');
+ALTER FOREIGN TABLE ft2 NO INHERIT pt1;
+DROP FOREIGN TABLE ft2;
+CREATE FOREIGN TABLE ft2 (
+   c1 integer NOT NULL,
+   c2 text,
+   c3 date
+) SERVER s0 OPTIONS (delimiter ',', quote '"', "be quoted" 'value');
+ALTER FOREIGN TABLE ft2 INHERIT pt1;

Will fix.

Apart from the above, I noticed that the patch doesn't consider to call ExplainForeignModify during EXPLAIN for an inherited UPDATE/DELETE, as shown below (note that there are no UPDATE remote queries displayed):

postgres=# explain verbose update parent set a = a * 2 where a = 5;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Update on public.parent  (cost=0.00..280.77 rows=25 width=10)
   ->  Seq Scan on public.parent  (cost=0.00..0.00 rows=1 width=10)
         Output: (parent.a * 2), parent.ctid
         Filter: (parent.a = 5)
   ->  Foreign Scan on public.ft1  (cost=100.00..140.38 rows=12 width=10)
         Output: (ft1.a * 2), ft1.ctid
         Remote SQL: SELECT a, ctid FROM public.mytable_1 WHERE ((a = 5)) FOR UPDATE
   ->  Foreign Scan on public.ft2  (cost=100.00..140.38 rows=12 width=10)
         Output: (ft2.a * 2), ft2.ctid
         Remote SQL: SELECT a, ctid FROM public.mytable_2 WHERE ((a = 5)) FOR UPDATE
(10 rows)

So, I'd like to modify explain.c to show those queries like this:

postgres=# explain verbose update parent set a = a * 2 where a = 5;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Update on public.parent  (cost=0.00..280.77 rows=25 width=10)
   ->  Seq Scan on public.parent  (cost=0.00..0.00 rows=1 width=10)
         Output: (parent.a * 2), parent.ctid
         Filter: (parent.a = 5)
   Remote SQL: UPDATE public.mytable_1 SET a = $2 WHERE ctid = $1
   ->  Foreign Scan on public.ft1  (cost=100.00..140.38 rows=12 width=10)
         Output: (ft1.a * 2), ft1.ctid
         Remote SQL: SELECT a, ctid FROM public.mytable_1 WHERE ((a = 5)) FOR UPDATE
   Remote SQL: UPDATE public.mytable_2 SET a = $2 WHERE ctid = $1
   ->  Foreign Scan on public.ft2  (cost=100.00..140.38 rows=12 width=10)
         Output: (ft2.a * 2), ft2.ctid
         Remote SQL: SELECT a, ctid FROM public.mytable_2 WHERE ((a = 5)) FOR UPDATE
(12 rows)

What do you say?

Two "remote SQL" under a single node would be confusing. Also the node is labelled as "Foreign Scan". It would be confusing to show an "UPDATE" command under this "scan" node.

BTW, I was experimenting with DMLs being executed on multiple FDW server under same transaction and found that the transactions may not be atomic (and may be inconsistent), if one or more of the server fails to commit while rest of them commit the transaction. The reason for this is, we do not "rollback" the already "committed" changes to the foreign server, if one or more of them fail to "commit" a transaction. With foreign tables under inheritance hierarchy a single DML can span across multiple servers and the result may not be atomic (and may be inconsistent). So, either we have to disable DMLs on an inheritance hierarchy which spans multiple servers. OR make sure that such transactions follow 2PC norms.
 

Sorry for the delay.

[1] http://www.postgresql.org/message-id/1316566782-sup-2678@alvh.no-ip.org

Best regards,
Etsuro Fujita



--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: Marginal performance improvement: replace bms_first_member loops
Next
From: Alexander Korotkov
Date:
Subject: Re: Fillfactor for GIN indexes