Thread: [HACKERS] Constraint exclusion for partitioned tables
Hi, In relation_excluded_by_constraints(), we do not apply constraint exclusion if rte->inh is true. /* Only plain relations have constraints */ if (rte->rtekind != RTE_RELATION || rte->inh) return false; Thus every partitioned table will not benefit from the constraint exclusion, even when constraint_exclusion = on. Hence for a partitioned table \d+ t1 Table "public.t1" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------+-----------+----------+---------+---------+--------------+------------- a | integer | | not null | | plain | | b | integer | | | | plain | | Partition key: RANGE (a) Check constraints: "t1_b_check" CHECK (b > 100) Partitions: t1p1 FOR VALUES FROM (0) TO (100), t1p2 FOR VALUES FROM (100) TO (200) while executing a query "select * from t1 where b < 100" set_rel_size() doesn't mark t1 as dummy. It gets marked dummy only after all the children have been deemed dummy by constraint exclusion. This means that we will unnecessarily examine children when the parent itself is known dummy. I am guessing that for normal inheritance, a constraint on parent doesn't necessarily imply the same constraint on the child (Amit Langote gives me an example of NOT NULL constraint). But in case of partitioned table, every constraint on the parent is applicable to the child as well. So, we can apply constraint exclusion on partitioned relation. Here's patch to do that. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Apr 6, 2017 at 6:47 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > I am guessing that for normal inheritance, a constraint on parent > doesn't necessarily imply the same constraint on the child (Amit > Langote gives me an example of NOT NULL constraint). CHECK constraints that apply to the parent would apply to all children, unless they are NO INHERIT, so even for regular inheritance, it might still be possible to prove something by ignoring things that won't necessarily cascade. For partitioning, it may be that we've got enough restrictions in place on what can happen that we can assume everything can cascade. Actually, I hope that's true, since the partitioned table has no storage of its own. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi,
I had a look at these changes and here are my observations:
1. Patch applies cleanly with "git apply'.
2. make / make install / make check-world all are good.
This patch clearly improves the planning time with given conditions.
To verify that, I have created a table like:
create table foo(a int, b int check (b > 100), c text) partition by range(a);
And then used following query to get planning time:
select * from foo where b < 100;
And on my local setup, I have observed that,
For 16 partitions, planning time was 0.234692 ms, which reduced to 0.112948 ms with this patch.
For 128 partitions, planning time was 1.62305 ms, which reduced to 0.654252 ms with this patch.
For 1024 partitions, planning time was 18.720993 ms, which reduced to 9.667395 ms with this patch.
This clearly shows an improvement in planning time.
Patch looks good to me. So passing that to the committer.
ThanksI had a look at these changes and here are my observations:
1. Patch applies cleanly with "git apply'.
2. make / make install / make check-world all are good.
This patch clearly improves the planning time with given conditions.
To verify that, I have created a table like:
create table foo(a int, b int check (b > 100), c text) partition by range(a);
And then used following query to get planning time:
select * from foo where b < 100;
And on my local setup, I have observed that,
For 16 partitions, planning time was 0.234692 ms, which reduced to 0.112948 ms with this patch.
For 128 partitions, planning time was 1.62305 ms, which reduced to 0.654252 ms with this patch.
For 1024 partitions, planning time was 18.720993 ms, which reduced to 9.667395 ms with this patch.
This clearly shows an improvement in planning time.
Patch looks good to me. So passing that to the committer.
On Tue, Aug 1, 2017 at 7:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
-- On Thu, Apr 6, 2017 at 6:47 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> I am guessing that for normal inheritance, a constraint on parent
> doesn't necessarily imply the same constraint on the child (Amit
> Langote gives me an example of NOT NULL constraint).
CHECK constraints that apply to the parent would apply to all
children, unless they are NO INHERIT, so even for regular inheritance,
it might still be possible to prove something by ignoring things that
won't necessarily cascade.
For partitioning, it may be that we've got enough restrictions in
place on what can happen that we can assume everything can cascade.
Actually, I hope that's true, since the partitioned table has no
storage of its own.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
On Tue, Sep 12, 2017 at 7:08 AM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote: > This patch clearly improves the planning time with given conditions. > > To verify that, I have created a table like: > create table foo(a int, b int check (b > 100), c text) partition by > range(a); > And then used following query to get planning time: > select * from foo where b < 100; > > And on my local setup, I have observed that, > For 16 partitions, planning time was 0.234692 ms, which reduced to 0.112948 > ms with this patch. > For 128 partitions, planning time was 1.62305 ms, which reduced to 0.654252 > ms with this patch. > For 1024 partitions, planning time was 18.720993 ms, which reduced to > 9.667395 ms with this patch. > > This clearly shows an improvement in planning time. What about the extra cost of checking the parent when it doesn't help?In that case we will have some loss. I'm inclined to think that's OK, but it's something to think about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 12, 2017 at 8:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I have updated query like:
select * from foo where b > 100;
Which matches with the CHECK constraint, and here are the result on my local setup:
Time in milliseconds
Partitions | without patch | with patch
-----------|---------------|------------
2 | 0.072551 | 0.074154
4 | 0.102537 | 0.108024
8 | 0.162703 | 0.175017
16 | 0.288589 | 0.305285
128 | 2.7119 | 2.636247
1024 | 29.101347 | 29.48275
So yes, as you said, it will have slight (may be negligible) overhead.
This observation are from local setup and I have also seen a large standard deviation in the runs.
On Tue, Sep 12, 2017 at 7:08 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> This patch clearly improves the planning time with given conditions.
>
> To verify that, I have created a table like:
> create table foo(a int, b int check (b > 100), c text) partition by
> range(a);
> And then used following query to get planning time:
> select * from foo where b < 100;
>
> And on my local setup, I have observed that,
> For 16 partitions, planning time was 0.234692 ms, which reduced to 0.112948
> ms with this patch.
> For 128 partitions, planning time was 1.62305 ms, which reduced to 0.654252
> ms with this patch.
> For 1024 partitions, planning time was 18.720993 ms, which reduced to
> 9.667395 ms with this patch.
>
> This clearly shows an improvement in planning time.
What about the extra cost of checking the parent when it doesn't help?
In that case we will have some loss.
I'm inclined to think that's OK, but it's something to think about.
I have updated query like:
select * from foo where b > 100;
Which matches with the CHECK constraint, and here are the result on my local setup:
Time in milliseconds
Partitions | without patch | with patch
-----------|---------------|------------
2 | 0.072551 | 0.074154
4 | 0.102537 | 0.108024
8 | 0.162703 | 0.175017
16 | 0.288589 | 0.305285
128 | 2.7119 | 2.636247
1024 | 29.101347 | 29.48275
So yes, as you said, it will have slight (may be negligible) overhead.
This observation are from local setup and I have also seen a large standard deviation in the runs.
Thanks
--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
On Wed, Sep 13, 2017 at 12:17 PM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote: > > > On Tue, Sep 12, 2017 at 8:12 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> On Tue, Sep 12, 2017 at 7:08 AM, Jeevan Chalke >> <jeevan.chalke@enterprisedb.com> wrote: >> > This patch clearly improves the planning time with given conditions. >> > >> > To verify that, I have created a table like: >> > create table foo(a int, b int check (b > 100), c text) partition by >> > range(a); >> > And then used following query to get planning time: >> > select * from foo where b < 100; >> > >> > And on my local setup, I have observed that, >> > For 16 partitions, planning time was 0.234692 ms, which reduced to >> > 0.112948 >> > ms with this patch. >> > For 128 partitions, planning time was 1.62305 ms, which reduced to >> > 0.654252 >> > ms with this patch. >> > For 1024 partitions, planning time was 18.720993 ms, which reduced to >> > 9.667395 ms with this patch. >> > >> > This clearly shows an improvement in planning time. >> >> What about the extra cost of checking the parent when it doesn't help? >> In that case we will have some loss. >> >> I'm inclined to think that's OK, but it's something to think about. > > > I have updated query like: Thanks a lot Jeevan for all your experiments. They are very useful. > select * from foo where b > 100; > Which matches with the CHECK constraint, and here are the result on my local > setup: So, in this case, constraint exclusion fails since the WHERE condition can not be refuted by the constraints. > > Time in milliseconds > Partitions | without patch | with patch > -----------|---------------|------------ > 2 | 0.072551 | 0.074154 > 4 | 0.102537 | 0.108024 > 8 | 0.162703 | 0.175017 > 16 | 0.288589 | 0.305285 > 128 | 2.7119 | 2.636247 > 1024 | 29.101347 | 29.48275 > > So yes, as you said, it will have slight (may be negligible) overhead. > > This observation are from local setup and I have also seen a large standard > deviation in the runs. For a regular table if the constraint exclusion fails, we will waste those many CPU cycles. But if the relation is excluded we will save disk I/O or buffer access and time to apply the conditions on all the rows in the relation. Given the magnitude of difference in the time to run constraint exclusion and time for all those things, we take the hit and run constraint exclusion always. For a partitioned table, this patch saves the time to run constraint exclusion on all the partitions if constraint exclusion succeeds on the partitioned table. If constraint exclusion fails, we have wasted CPU cycles on one run of constraint exclusion. The difference between the time spent in the two scenarios increases with the number of partitions. Practically, users will have a handful partitions rather than a couple and thus running overhead of running constraint exclusion on partitioned table would be justified given the time it will save when CE succeeds. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Dec 1, 2017 at 12:21 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Sep 13, 2017 at 4:07 PM, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >> For a partitioned table, this patch saves the time to run constraint >> exclusion on all the partitions if constraint exclusion succeeds on >> the partitioned table. If constraint exclusion fails, we have wasted >> CPU cycles on one run of constraint exclusion. The difference between >> the time spent in the two scenarios increases with the number of >> partitions. Practically, users will have a handful partitions rather >> than a couple and thus running overhead of running constraint >> exclusion on partitioned table would be justified given the time it >> will save when CE succeeds. > > Moved patch to next CF. Committed after adding a comment. Generally, code changes should be accompanied by comment updates. I tested this and found out that this is quite useful for cases where multiple levels of partitioning are in use. Consider creating 100 partitions like this: #!/usr/bin/perl use strict; use warnings; print "create table foo (a int, b int, c text) partition by list (a);\n"; for $a (1..10) { print "create table foo$a partition of foo for values in ($a) partition by list (b);\n"; for $b (1..10) { print "create table foo${a}_$b partition of foo$a for values in ($b);\n"; } } Then consider this query: select * from foo where a = 5; Without this patch, we have to reject 90 leaf partitions individually, but with the patch, we can reject the intermediate partitioned tables; each time we do, it substitutes for rejecting 10 children individually. This seems to me to be a case that is quite likely to come up in the real world. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Dec 2, 2017 at 1:11 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Dec 1, 2017 at 12:21 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Wed, Sep 13, 2017 at 4:07 PM, Ashutosh Bapat >> <ashutosh.bapat@enterprisedb.com> wrote: >>> For a partitioned table, this patch saves the time to run constraint >>> exclusion on all the partitions if constraint exclusion succeeds on >>> the partitioned table. If constraint exclusion fails, we have wasted >>> CPU cycles on one run of constraint exclusion. The difference between >>> the time spent in the two scenarios increases with the number of >>> partitions. Practically, users will have a handful partitions rather >>> than a couple and thus running overhead of running constraint >>> exclusion on partitioned table would be justified given the time it >>> will save when CE succeeds. >> >> Moved patch to next CF. > > Committed after adding a comment. Generally, code changes should be > accompanied by comment updates. Thanks for committing the patch. Sorry for not including the comments. Your comment looks good. > > I tested this and found out that this is quite useful for cases where > multiple levels of partitioning are in use. Consider creating 100 > partitions like this: > > #!/usr/bin/perl > > use strict; > use warnings; > > print "create table foo (a int, b int, c text) partition by list (a);\n"; > for $a (1..10) > { > print "create table foo$a partition of foo for values in ($a) > partition by list (b);\n"; > for $b (1..10) > { > print "create table foo${a}_$b partition of foo$a for values > in ($b);\n"; > } > } > > Then consider this query: select * from foo where a = 5; > > Without this patch, we have to reject 90 leaf partitions individually, > but with the patch, we can reject the intermediate partitioned tables; > each time we do, it substitutes for rejecting 10 children > individually. This seems to me to be a case that is quite likely to > come up in the real world. > Right. Thanks for the testing. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company