Re: [HACKERS] Partitioned tables and relfilenode - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: [HACKERS] Partitioned tables and relfilenode
Date
Msg-id CANP8+jLYZ7NMjXzZH+MH57ME3Hq5hR-y9hfczwgH6wnTVqMgdw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Partitioned tables and relfilenode  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Partitioned tables and relfilenode
List pgsql-hackers
On 21 March 2017 at 16:33, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 21, 2017 at 12:19 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 16 March 2017 at 10:03, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>> On 2017/03/15 7:09, Robert Haas wrote:
>>
>>>> I think that eliding the Append node when there's only one child may
>>>> be unsafe in the case where the child's attribute numbers are
>>>> different from the parent's attribute numbers.  I remember Tom making
>>>> some comment about this when I was working on MergeAppend, although I
>>>> no longer remember the specific details.
>>>
>>> Append node elision does not occur in the one-child case.  With the patch:
>> ...
>>> create table q1 partition of q for values in (1);
>>> explain select * from q;
>>>                          QUERY PLAN
>>> ------------------------------------------------------------
>>>  Append  (cost=0.00..35.50 rows=2550 width=4)
>>>    ->  Seq Scan on q1  (cost=0.00..35.50 rows=2550 width=4)
>>> (2 rows)
>>>
>>> Maybe that should be done, but this patch doesn't implement that.
>>
>> Robert raises the possible problem that removing the Append wouldn't
>> work when the parent and child attribute numbers don't match. Surely
>> that never happens with partitions, by definition?
>
> No, the attribute numbers don't have to match.  This decision was made
> a long time ago, and there have been a whole bunch of followup commits
> since the original partitioning patch that were dedicated to fixing up
> cases where that wasn't working properly in the original commit.  It
> seems superficially attractive to require the attribute numbers to
> match, but it creates some really unpleasant cases.  For example,
> suppose a user tries to creates a partitioned table, drops a column,
> then creates a standalone table which matches the apparent column list
> of the partitioned table, then tries to attach it as a partition.
>
> ERROR: the columns you previously dropped from the parent that you
> can't see and don't know about aren't the same as the ones dropped
> from the standalone table you're trying to attach as a partition
> DETAIL: Try recreating your proposed new partition with a
> pass-by-value column of width 8 after the third column, and then
> dropping that column before trying to attach it as a partition.
> HINT: Abandon all hope, ye who enter here.
>
> Not cool with that.

Thanks for the explanation.

> The decision not to require the attribute numbers to match doesn't
> necessarily mean we can't get rid of the Append node, though.  First
> of all, in a lot of practical cases the attribute numbers will all
> match.  Second, if they don't, the most that would be required is a
> projection step, which could usually be done without a separate node
> because most nodes are projection-capable.  And maybe not even that
> much is needed; I'd have to go back and look at what Tom was worried
> about the last time this came up.  (Hmm, maybe the problem had to do
> with varnos matching, rather then attribute numbers?)

There used to be some code there to fix them up, not sure where that went.

> Another and independent problem with eliding the Append node is that,
> if we did that, we'd still have to guarantee that the parent relation
> corresponding to the Append node got locked somehow.  Otherwise, we'll
> be accessing the tuple routing information for a table on which we
> don't have a lock.  That's probably a solvable problem, too, but it
> hasn't been solved yet.

Hmm, why would we need to access tuple routing information?

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
Next
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] GUC for cleanup indexes threshold.