Re: Declarative partitioning - another take - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Declarative partitioning - another take
Date
Msg-id 1b9d028b-26fe-a91a-7d6d-234f4665dee9@lab.ntt.co.jp
Whole thread Raw
In response to Re: Declarative partitioning - another take  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Declarative partitioning - another take
Re: Declarative partitioning - another take
List pgsql-hackers
On 2016/11/02 2:34, Robert Haas wrote:
> On Fri, Oct 28, 2016 at 3:53 AM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> [ new patches ]
> 
> Reviewing 0006:

Thanks for the review!

> This patch seems scary.  I sort of assumed from the title -- "Teach a
> few places to use partition check quals." -- that this was an optional
> thing, some kind of optimization from which we could reap further
> advantage once the basic infrastructure was in place.  But it's not
> that at all.  It's absolutely necessary that we do this, or data
> integrity is fundamentally compromised.  How do we know that we've
> found all of the places that need to be taught about these new,
> uncatalogued constraints?

Making this a separate commit from 0003 was essentially to avoid this
getting lost among all of its other changes.  In fact, it was to bring to
notice for closer scrutiny whether all the sites in the backend code that
are critical for data integrity in face of the implicit partition
constraints are being informed about those constraints.

> I'm feeling fairly strongly like you should rewind and make the
> partitioning constraints normal catalogued constraints.  That's got a
> number of advantages, most notably that we can be sure they will be
> properly enforced by the entire system (modulo existing bugs, of
> course).  Also, they'll show up automatically in tools like psql's \d
> output, pgAdmin, and anything else that is accustomed to being able to
> find constraints in the catalog.  We do need to make sure that those
> constraints can't be dropped (or altered?) inappropriately, but that's
> a relatively small problem.  If we stick with the design you've got
> here, every client tool in the world needs to be updated, and I'm not
> seeing nearly enough advantage in this system to justify that kind of
> upheaval.

As for which parts of the system need to know about these implicit
partition constraints to *enforce* them for data integrity, we could say
that it's really just one site - ExecConstraints() called from
ExecInsert()/ExecUpdate().

Admittedly, the current error message style as in this patch exposes the
implicit constraint approach to a certain criticism: "ERROR:  new row
violates the partition boundary specification of \"%s\"".  It would say
the following if it were a named constraint: "ERROR: new row for relation
\"%s\" violates check constraint \"%s\""

For constraint exclusion optimization, we teach get_relation_constraints()
to look at these constraints.  Although greatly useful, it's not the case
of being absolutely critical.

Beside the above two cases, there is bunch of code (relcache, DDL) that
looks at regular constraints, but IMHO, we need not let any of that code
concern itself with the implicit partition constraints.  Especially, I
wonder why the client tools should want the implicit partitioning
constraint to be shown as a CHECK constraint?  As the proposed patch 0004
(psql) currently does, isn't it better to instead show the partition
bounds as follows?

+CREATE TABLE part_b PARTITION OF parted (
+    b WITH OPTIONS NOT NULL DEFAULT 1 CHECK (b >= 0),
+    CONSTRAINT check_a CHECK (length(a) > 0)
+) FOR VALUES IN ('b');

+\d part_b
+         Table "public.part_b"
+ Column |  Type   |     Modifiers
+--------+---------+--------------------
+ a      | text    |
+ b      | integer | not null default 1
+Partition of: parted FOR VALUES IN ('b')
+Check constraints:
+    "check_a" CHECK (length(a) > 0)
+    "part_b_b_check" CHECK (b >= 0)

Needless to say, that could save a lot of trouble thinking about
generating collision-free names of these constraints, their dependency
handling, unintended altering of these constraints, pg_dump, etc.

> In fact, as far as I can see, the only advantage of this approach is
> that when the insert arrives through the parent and is routed to the
> child by whatever tuple-routing code we end up with (I guess that's
> what 0008 does), we get to skip checking the constraint, saving CPU
> cycles.  That's probably an important optimization, but I don't think
> that putting the partitioning constraint in the catalog in any way
> rules out the possibility of performing that optimization.  It's just
> that instead of having the partitioning excluded-by-default and then
> sometimes choosing to include it, you'll have it included-by-default
> and then sometimes choose to exclude it.

Hmm, doesn't it seem like we would be making *more* modifications to the
existing code (backend or otherwise) to teach it about excluding the
implicit partition constraints than the other way around?  The other way
around being to modify the existing code to include the implicit
constraints which is what this patch is about.

Having said all that, I am open to switching to the catalogued partition
constraints if the arguments I make above in favor of this patch are not
all that sound.

Thanks,
Amit





pgsql-hackers by date:

Previous
From: Kuntal Ghosh
Date:
Subject: Re: WAL consistency check facility
Next
From: Michael Paquier
Date:
Subject: Re: WAL consistency check facility