Re: [HACKERS] Adding support for Default partition in partitioning - Mailing list pgsql-hackers

From Keith Fiske
Subject Re: [HACKERS] Adding support for Default partition in partitioning
Date
Msg-id CAG1_KcAS6ernbvQC65XOCDjmtvb+aacvDs-o-HjGYGrstuYzbQ@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Adding support for Default partition in partitioning  (Rahila Syed <rahilasyed90@gmail.com>)
List pgsql-hackers

On Thu, Apr 6, 2017 at 1:18 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/04/06 13:08, Keith Fiske wrote:
> On Wed, Apr 5, 2017 at 2:51 PM, Keith Fiske wrote:
>> Only issue I see with this, and I'm not sure if it is an issue, is what
>> happens to that default constraint clause when 1000s of partitions start
>> getting added? From what I gather the default's constraint is built based
>> off the cumulative opposite of all other child constraints. I don't
>> understand the code well enough to see what it's actually doing, but if
>> there are no gaps, is the method used smart enough to aggregate all the
>> child constraints to make a simpler constraint that is simply outside the
>> current min/max boundaries? If so, for serial/time range partitioning this
>> should typically work out fine since there are rarely gaps. This actually
>> seems more of an issue for list partitioning where each child is a distinct
>> value or range of values that are completely arbitrary. Won't that check
>> and re-evaluation of the default's constraint just get worse and worse as
>> more children are added? Is there really even a need for the default to
>> have an opposite constraint like this? Not sure on how the planner works
>> with partitioning now, but wouldn't it be better to first check all
>> non-default children for a match the same as it does now without a default
>> and, failing that, then route to the default if one is declared? The
>> default should accept any data then so I don't see the need for the
>> constraint unless it's required for the current implementation. If that's
>> the case, could that be changed?

Unless I misread your last sentence, I think there might be some
confusion.  Currently, the partition constraint (think of these as you
would of user-defined check constraints) is needed for two reasons: 1. to
prevent direct insertion of rows into the default partition for which a
non-default partition exists; no two partitions should ever have duplicate
rows.  2. so that planner can use the constraint to determine if the
default partition needs to be scanned for a query using constraint
exclusion; no need, for example, to scan the default partition if the
query requests only key=3 rows and a partition for the same exists (no
other partition should have key=3 rows by definition, not even the
default).  As things stand today, planner needs to look at every partition
individually for using constraint exclusion to possibly exclude it, *even*
with declarative partitioning and that would include the default partition.

Forgot about constraint exclusion. My follow up email that you answered below was addressing the prevention of data to the default if there was no constraint on the default. I guess my main concern was with how manageable that cumulative opposite constraint of the default would be over time, especially with list partitioning. And also that it's smart enough to consolidate constraint conditions to simplify things if it's found that two or more conditions cover a continuous range.
 

> Actually, thinking on this more, I realized this does again come back to
> the lack of a global index. Without the constraint, data could be put
> directly into the default that could technically conflict with the
> partition scheme elsewhere. Perhaps, instead of the constraint, inserts
> directly to the default could be prevented on the user level. Writing to
> valid children directly certainly has its place, but been thinking about
> it, and I can't see any reason why one would ever want to write directly to
> the default. It's use case seems to be around being a sort of temporary
> storage until that data can be moved to a valid location. Would still need
> to allow removal of data, though.

As mentioned above, the default partition will not allow directly
inserting a row whose key maps to some existing (non-default) partition.

As far as tuple-routing is concerned, it will choose the default partition
only if no other partition is found for the key.  Tuple-routing doesn't
use the partition constraints directly per se, like one of the two things
mentioned above do.  One could say that tuple-routing assigns the incoming
rows to partitions such that their individual partition constraints are
not violated.
 
Finally, we don't yet offer global guarantees for constraints like unique.
 The only guarantee that's in place is that no two partitions can contain
the same partition key.

Thanks,
Amit



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] Interval for launching the table sync worker
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] [GSoC] Push-based query executor discussion