Re: [HACKERS] Partitioned tables and relfilenode - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [HACKERS] Partitioned tables and relfilenode
Date
Msg-id CAB7nPqREJRXSTk9fQC2oQgAyQXapeNxjs1SQC2NDS7JOm-FKBQ@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Partitioned tables and relfilenode  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses Re: [HACKERS] Partitioned tables and relfilenode
Re: [HACKERS] Partitioned tables and relfilenode
List pgsql-hackers
On Fri, Feb 10, 2017 at 3:19 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> The new partitioned tables do not contain any data by themselves.  Any
> data inserted into a partitioned table is routed to and stored in one of
> its partitions.  In fact, it is impossible to insert *any* data before a
> partition (to be precise, a leaf partition) is created.  It seems wasteful
> then to allocate physical storage (files) for partitioned tables.  If we
> do not allocate the storage, then we must make sure that the right thing
> happens when a command that is intended to manipulate a table's storage
> encounters a partitioned table, the "right thing" here being that the
> command's code either throws an error or warning (in some cases) if the
> specified table is a partitioned table or ignores any partitioned tables
> when it reads the list of relations to process from pg_class.  Commands
> that need to be taught about this are vacuum, analyze, truncate, and alter
> table.  Specifically:

Thanks. I have been looking a bit at this set of patches.

> - In case of vacuum, specifying a partitioned table causes a warning
> - In case of analyze, we do not throw an error or warning but simply
>   avoid calling do_analyze_rel() *non-recursively*.  Further in
>   acquire_inherited_sample_rows(), any partitioned tables in the list
>   returned by find_all_inheritors() are skipped.
> - In case of truncate, only the part which manipulates table's physical
>   storage is skipped for partitioned tables.

I am wondering if instead those should not just be errors saying that
such operations are just not support. This could be relaxed in the
future to mean that a vacuum, truncate or analyze on the partitioned
table triggers an operator in cascade.

> - ATRewriteTables() skips on the AlteredTableInfo entries for partitioned
>   tables, because there is nothing to be done.

Perhaps that makes sense, foreign tables do that.

> - Since we cannot create indexes on partitioned tables anyway, there is
>   no need to handle cluster and reindex (they throw a meaningful error
>   already due to the lack of indexes.)

Yep.

> Patches 0001 and 0002 of the attached implement the above part.  0001
> teaches the above mentioned commands to do the "right thing" as described
> above and 0002 teaches heap_create() and heap_create_with_catalog() to not
> create any physical storage (none of the forks) for partitioned tables.

-   else
+   /*
+    * Although we cannot analyze partitioned tables themselves, we are
+    * still be able to do the recursive ANALYZE.
+    */
+   else if (onerel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)   {       /* No need for a WARNING if we already
complainedduring VACUUM */       if (!(options & VACOPT_VACUUM))
 
It seems to me that it is better style to use an empty else if with
only the comment. Comments related to a condition that are outside
this condition should be conditional in their formulations. Comments
within a condition can be affirmations when they refer to a decided
state of things.

From patch 2:
@@ -1351,7 +1352,12 @@ heap_create_with_catalog(const char *relname,              relkind == RELKIND_TOASTVALUE ||
       relkind == RELKIND_PARTITIONED_TABLE);
 

-       heap_create_init_fork(new_rel_desc);
+       /*
+        * We do not want to create any storage objects for a partitioned
+        * table.
+        */
+       if (relkind != RELKIND_PARTITIONED_TABLE)
+           heap_create_init_fork(new_rel_desc);   }
This does not make much sense with an assertion telling exactly the
contrary a couple of lines before. I think that it would make more
sense to move the assertion on relkind directly in
heap_create_init_fork().

> Then comes 0003, which concerns inheritance planning.  In a regular
> inheritance set (ie, the inheritance set corresponding to an inheritance
> hierarchy whose root is a regular table), the inheritance parents are
> included in their role as child members, because they might contribute
> rows to the final result.  So AppendRelInfo's are created for each such
> table by the planner prep phase, which the later planning steps use to
> create a scan plan for those tables as the Append's child plans.
> Currently, the partitioned tables are also processed by the optimizer as
> inheritance sets.  Partitioned table inheritance parents do not own any
> storage, so we *must* not create scan plans for them.  So we do not need
> to process them as child members of the inheritance set.  0003 teaches
> expand_inherited_rtentry() to not add partitioned tables as child members.
>  Also, since the root partitioned table RTE is no longer added to the
> Append list as the 1st child member, inheritance_planner() cannot assume
> that it can install the 1st child RTE as the nominalRelation of a given
> ModifyTable node, instead the original root parent table RTE is installed
> as the nominalRelation.

This is a collection of checks on relkind == RELKIND_PARTITIONED_TABLE
to avoid interactions with partition tables. Did you consider doing
something in the executor instead?
-- 
Michael



pgsql-hackers by date:

Previous
From: "Seki, Eiji"
Date:
Subject: Re: [HACKERS] Proposal: GetOldestXminExtend for ignoring arbitraryvacuum flags
Next
From: Amit Khandekar
Date:
Subject: Re: [HACKERS] Parallel Append implementation