Re: [HACKERS] Partitioned tables and relfilenode - Mailing list pgsql-hackers

From Amit Langote
Subject Re: [HACKERS] Partitioned tables and relfilenode
Date
Msg-id 30f05f6d-7341-b9b0-a089-2b626abc133b@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] Partitioned tables and relfilenode  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Partitioned tables and relfilenode  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2017/03/08 2:27, Robert Haas wrote:
> On Tue, Mar 7, 2017 at 12:11 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> I see that all the changes by Amit and myself to what was earlier 0003
>> patch are now part of 0002 patch. The patch looks ready for committer.
> 
> Reviewing 0002:

Thanks for the review.

> 
> This patch seems to have falsified the header comments for
> expand_inherited_rtentry.

Oops, updated.

> I don't quite understand the need for the change to set_rel_size().
> The rte->inh case is handled before reaching the new code, and IIUC
> the !rte->inh case should never happen given the changes to
> expand_inherited_rtentry.  Or am I confused?

In expand_inherited_rtentry(), patch only changes the rule about the
minimum number of child RTEs required to keep rte->inh true.  In the
traditional case, it is 2 (the table itself and at least one child).  For
a partitioned table, since we don't want to scan the table itself, that
becomes 1 (at least one leaf partition).

So, it's still possible for rte->inh to become false if the required
minimum is not met, even for partitioned tables.

> If not, I think we
> should just add an Assert() to the "plain relation" case that this is
> not RELKIND_PARTITIONED_TABLE (with a comment explaining why it can't
> happen).

How about adding Assert(rte->relkind != RELKIND_PARTITIONED_TABLE) at the
beginning of the following functions as the updated patch does:

set_plain_rel_size()
set_tablesample_rel_size()
set_plain_rel_pathlist()
set_tablesample_rel_pathlist()

> In inheritance_planner, this comment addition does not seem adequate:
> 
> +         * If the parent is a partitioned table, we already set the nominal
> +         * relation.
> 
> That basically contradicts the previous paragraph.  I think you need
> to adjust the previous paragraph instead of adding a new one, probably
> in multiple places.  For example, the parenthesized bit now only
> applies in the non-partitioned case.

Hmm, that's true.  I rewrote the entire comment.

> 
> -    rel = mtstate->resultRelInfo->ri_RelationDesc;
> +    nominalRTE = rt_fetch(node->nominalRelation, estate->es_range_table);
> +    nominalRel = heap_open(nominalRTE->relid, NoLock);
> 
> No lock?

Hmm, I think I missed that a partitioned parent table would not be locked
by the *executor* at all after applying this patch.  As of now, InitPlan()
takes care of locking all the result relations in the
PlannedStmt.resultRelations list.  This patch removes partitioned
parent(s) from appearing in this list.  But that makes me wonder - aren't
the locks taken by expand_inherited_rtentry() kept long enough?  Why does
InitPlan need to acquire them again?  I see this comment in
expand_inherited_rtentry:

   * If the parent relation is the query's result relation, then we need
   * RowExclusiveLock.  Otherwise, if it's accessed FOR UPDATE/SHARE, we
   * need RowShareLock; otherwise AccessShareLock.  We can't just grab
   * AccessShareLock because then the executor would be trying to upgrade
   * the lock, leading to possible deadlocks.  (This code should match the
   * parser and rewriter.)

So, I assumed all the necessary locking is being taken care of here.

Anyway, I changed NoLock to RowExclusiveLock here for now, but maybe it's
either not necessary or a way should be found to do that in InitPlan along
with other result relations.

> Another thing that bothers me about this is that, apparently, the use
> of nominalRelation is no longer strictly for EXPLAIN, as the comments
> in plannodes.h/relation.h still claim that it is.  I'm not sure how to
> adjust that exactly; there's not a lot of room in those comments to
> give all the details.  Maybe they should simply say something like /*
> Parent RT index */ instead of /* Parent RT index for use of EXPLAIN
> */.  But we can't just go around changing the meaning of things
> without updating the comments accordingly.

It seems that this comment and one more need to be updated.  The other
comment change is in explain.c as follows:

  * Adds the relid of each referenced RTE to *rels_used.  The result controls
  * which RTEs are assigned aliases by select_rtable_names_for_explain.
  * This ensures that we don't confusingly assign un-suffixed aliases to RTEs
- * that never appear in the EXPLAIN output (such as inheritance parents).
+ * that never appear in the EXPLAIN output (such as non-partitioned
+ * inheritance parents).
  */
 static bool
 ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)

I studied the EXPLAIN code a bit to see whether the problem cited for
using inheritance parent RTE as nominalRelation (for example, in comments
in inheritance_planner) apply to the partitioned parent RTE case, which it
doesn't.  The partitioned parent RTE won't appear anywhere else in the
plan.  So, different aliases for what's the same table/RTE won't happen in
the partitioned parent case.

> A related question is
> whether we should really be using nominalRelation for this or whether
> there's some other way we should be getting at the parent -- I don't
> have another idea, though.

This is just for the ModifyTable node.  Instead of adding any new member,
I thought nominalRelation would work fine.  But as I said above regarding
locking, we *may* need to find a different place for the partitioned
parent RTE (other than ModifyTable.nominalRelation).

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: [HACKERS] Hash support for grouping sets
Next
From: Rushabh Lathia
Date:
Subject: Re: [HACKERS] wait events for disk I/O