Thread: Constraint Exclusion (Partitioning) - Initial Review requested
I enclose a fully working implementation of Constraint Exclusion, a very basic form of Partitioning. Initial review is requested, to allow us all to assess what further work is required on this prior to Beta freeze. Patch against current cvstip; passes make check and all special tests. The main purpose of this feature is to reduce access time against large tables that have been split into partitions by using the PostgreSQL inheritance facility. It has been written in a very generic way allowing a whole range of applications. If a) a table is part of an inheritance set b) the table has check constraints defined upon it c) enable_constraint_exclusion = true then the planner will attempt to use the definition of the Constraints to see if that relation could ever have rows in it that the query might see. *No* additional SQL DDL syntax is required to define this. Only query clauses of the form ATTR OP CONSTANT will be considered, in a very similar way to the way partial indexes work already. The code changes effect only the planner, building upon the partial index logic to allow refutation as well as implication. There are clearly many questions to be answered by me and I'm happy to do so, so please fire away. My hope is to get a more polished form of this functionality into 8.1. Further developments on Partitioning are foreseen, though the feature submitted today is the main building block for any further work/optimization in this area and so additional features will be discussed at a later time. A full test suite has been specially written for this feature. This is included here also, though no attempt has been made as yet to integrate that with the main regression test suite (as yet). Required files are included in a single tar file with this email. Extract these to the PostgreSQL installation directory and run using ./testprange.sh The test suite executes around 100 queries against 7 different database designs, comparing results with/without the new enable option. Full and pruned EXPLAINs are also derived during execution to allow easier analysis of the success of the exclusion process (view the testprange_t*e.out files). There are no cases where any of the test queries returns a logically incorrect answer; hence fully working. There are a few cases where queries have not been optimised as far as possible; in those cases checks on my propositional logic are requested... This is extremely complex and my expectation is that testers/reviewers will find at least of couple of logic improvements. The most frequent queries are believed to work optimally. There is no documentation at this time. Main questions: 1. How should we handle the case where *all* inherited relations are excluded? (This is not currently covered in the code). 2. Should this feature be available for all queries or just inherited relations? 3. And should we extend RelOptInfo to include constraint information? 4. Do we want to integrate the test suite also? 5. Presumably a section under Performance tips would be appropriate to document this feature? (As well as section in run-time parameters). Additional thoughts: 1. We should be able to optimise the case where there is only a single non-excluded relation by removing the Append node. Best Regards, Simon Riggs
Attachment
On Mon, 2005-06-27 at 01:41 +0100, Simon Riggs wrote: > I enclose a fully working implementation of Constraint Exclusion, a very > basic form of Partitioning. Initial review is requested, to allow us all > to assess what further work is required on this prior to Beta freeze. > > Patch against current cvstip; passes make check and all special tests. Has this patch been registered for review? I haven't heard anything. Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > Has this patch been registered for review? I haven't heard anything. I think Bruce is still out of town, and maintaining his patch list isn't his highest priority ... regards, tom lane
Simon Riggs wrote: > On Mon, 2005-06-27 at 01:41 +0100, Simon Riggs wrote: > > I enclose a fully working implementation of Constraint Exclusion, a very > > basic form of Partitioning. Initial review is requested, to allow us all > > to assess what further work is required on this prior to Beta freeze. > > > > Patch against current cvstip; passes make check and all special tests. > > Has this patch been registered for review? I haven't heard anything. Well, it has been three days, and no one said anything, so I will put it the queue. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Seems you have managed to combine inheritance, check constraints, and partial index into table paritioning. It is nice it requires no new syntax. Here is an example from your tests: DROP TABLE pm cascade; CREATE TABLE pm ( dkey INT NOT NULL ); CREATE TABLE p1 ( CHECK (dkey BETWEEN 10000 AND 19999)) INHERITS (pm); CREATE TABLE p2 ( CHECK (dkey BETWEEN 20000 AND 29999)) INHERITS (pm); CREATE TABLE p3 ( CHECK (dkey BETWEEN 30000 AND 39999)) INHERITS (pm); So, in this case, a SELECT from pm would pull from the three base tables, and those tables can be located on different tablespaces, and the backend will only look in child tables that might contain rows basd on the check constraints. It is that last phrase that is the new functionality here. Oh, why would someone want to set enable_constraint_exclusion to false? You had a few questions: > Main questions: > 1. How should we handle the case where *all* inherited relations are > excluded? (This is not currently covered in the code). I assume this means we don't return any rows. Why it is an issue? > 2. Should this feature be available for all queries or just inherited > relations? I don't see why other queries should not use this. Our TODO already has: * Use CHECK constraints to influence optimizer decisions CHECK constraints contain information about the distribution of values within the table. This is also useful for implementing subtables where a tables content is distributed across several subtables. and this looks like what you are doing. However, again, I see the constraint as just informing whether there might be any rows in the table. Am I missing something? Are you thinking views with UNION could benefit from this? > 3. And should we extend RelOptInfo to include constraint information? Is the problem that you have to do multiple lookups without it? > 4. Do we want to integrate the test suite also? No, once this thing works, I don't see it getting broken frequently. We usually don't test the optimizer, but we could add a single test for one row in each table. > 5. Presumably a section under Performance tips would be appropriate to > document this feature? (As well as section in run-time parameters). Yep. I am surprised no one else has commented on it, which I think means your code is ready for the queue. Do you want to adjust it based on this feedback or should I apply and you can adjust it later. --------------------------------------------------------------------------- Simon Riggs wrote: > > I enclose a fully working implementation of Constraint Exclusion, a very > basic form of Partitioning. Initial review is requested, to allow us all > to assess what further work is required on this prior to Beta freeze. > > Patch against current cvstip; passes make check and all special tests. > > The main purpose of this feature is to reduce access time against large > tables that have been split into partitions by using the PostgreSQL > inheritance facility. It has been written in a very generic way allowing > a whole range of applications. > > If > a) a table is part of an inheritance set > b) the table has check constraints defined upon it > c) enable_constraint_exclusion = true > > then the planner will attempt to use the definition of the Constraints > to see if that relation could ever have rows in it that the query might > see. *No* additional SQL DDL syntax is required to define this. > > Only query clauses of the form ATTR OP CONSTANT will be considered, in a > very similar way to the way partial indexes work already. > > The code changes effect only the planner, building upon the partial > index logic to allow refutation as well as implication. > > There are clearly many questions to be answered by me and I'm happy to > do so, so please fire away. My hope is to get a more polished form of > this functionality into 8.1. Further developments on Partitioning are > foreseen, though the feature submitted today is the main building block > for any further work/optimization in this area and so additional > features will be discussed at a later time. > > A full test suite has been specially written for this feature. This is > included here also, though no attempt has been made as yet to integrate > that with the main regression test suite (as yet). Required files are > included in a single tar file with this email. Extract these to the > PostgreSQL installation directory and run using ./testprange.sh > The test suite executes around 100 queries against 7 different database > designs, comparing results with/without the new enable option. Full and > pruned EXPLAINs are also derived during execution to allow easier > analysis of the success of the exclusion process (view the > testprange_t*e.out files). > > There are no cases where any of the test queries returns a logically > incorrect answer; hence fully working. There are a few cases where > queries have not been optimised as far as possible; in those cases > checks on my propositional logic are requested... This is extremely > complex and my expectation is that testers/reviewers will find at least > of couple of logic improvements. The most frequent queries are believed > to work optimally. > > There is no documentation at this time. > > Main questions: > 1. How should we handle the case where *all* inherited relations are > excluded? (This is not currently covered in the code). > 2. Should this feature be available for all queries or just inherited > relations? > 3. And should we extend RelOptInfo to include constraint information? > 4. Do we want to integrate the test suite also? > 5. Presumably a section under Performance tips would be appropriate to > document this feature? (As well as section in run-time parameters). > > Additional thoughts: > 1. We should be able to optimise the case where there is only a single > non-excluded relation by removing the Append node. > > Best Regards, Simon Riggs [ Attachment, skipping... ] [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Sat, Jul 02, 2005 at 03:56:48PM -0400, Bruce Momjian wrote: > Simon Riggs wrote: > > > > I enclose a fully working implementation of Constraint Exclusion, a very > > basic form of Partitioning. Initial review is requested, to allow us all > > to assess what further work is required on this prior to Beta freeze. > I am surprised no one else has commented on it, which I think means your > code is ready for the queue. Do you want to adjust it based on this > feedback or should I apply and you can adjust it later. I think that the fact that no one commented on it means that nobody has thoroughly reviewed it. I skimmed over the patch, and it seems OK to me (save a typo in a comment, it said "OR" and the code said "and" in one of the truth tables), but my knowledge of the optimizer is not enough for my comments to have any weight. -- Alvaro Herrera (<alvherre[a]surnet.cl>) "Ciencias políticas es la ciencia de entender por qué los políticos actúan como lo hacen" (netfunny.com)
Alvaro Herrera <alvherre@surnet.cl> writes: > On Sat, Jul 02, 2005 at 03:56:48PM -0400, Bruce Momjian wrote: >> I am surprised no one else has commented on it, which I think means your >> code is ready for the queue. Do you want to adjust it based on this >> feedback or should I apply and you can adjust it later. > I think that the fact that no one commented on it means that nobody has > thoroughly reviewed it. I haven't looked at it at all, because I've had a few other things to do this week ;-). I do intend however to review it closely before it goes in, and put dibs on the applying. regards, tom lane
On Sat, 2005-07-02 at 19:05 -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@surnet.cl> writes: > > On Sat, Jul 02, 2005 at 03:56:48PM -0400, Bruce Momjian wrote: > >> I am surprised no one else has commented on it, which I think means your > >> code is ready for the queue. Do you want to adjust it based on this > >> feedback or should I apply and you can adjust it later. > > > I think that the fact that no one commented on it means that nobody has > > thoroughly reviewed it. > > I do intend however to review it closely before it goes > in, and put dibs on the applying. Good thanks, I was hoping you'd say that. Best Regards, Simon Riggs
On Sat, 2005-07-02 at 15:56 -0400, Bruce Momjian wrote: > Seems you have managed to combine inheritance, check constraints, and > partial index into table partitioning. It is nice it requires no new > syntax. Here is an example from your tests: > > DROP TABLE pm cascade; > CREATE TABLE pm > ( dkey INT NOT NULL > ); > > CREATE TABLE p1 ( CHECK (dkey BETWEEN 10000 AND 19999)) INHERITS (pm); > CREATE TABLE p2 ( CHECK (dkey BETWEEN 20000 AND 29999)) INHERITS (pm); > CREATE TABLE p3 ( CHECK (dkey BETWEEN 30000 AND 39999)) INHERITS (pm); > > So, in this case, a SELECT from pm would pull from the three base > tables, and those tables can be located on different tablespaces, and > the backend will only look in child tables that might contain rows basd > on the check constraints. It is that last phrase that is the new > functionality here. Yes, dead on. Thank you for this elegant summary. The main idea was originally Hannu Krosing's, I believe, with suggestion from Tom to enhance the partial index machinery to this end. So a query such as select * from pm where dkey = 25000 will have an EXPLAIN that completely ignores p1 and p3, since these can be provably excluded from the plan without effecting the result. I see the "no syntax" version as the first step towards additional functionality that would require additional syntax. > Oh, why would someone want to set enable_constraint_exclusion to false? The included functionality performs the exclusion at plan time. If a query was prepared for later execution, it *could* return the wrong answer when the plan was executed at a later time since plans are not invalidated when constraints change. So, in general, this should be set to false except for circumstances where the user can guarantee no such mistake would be made. > You had a few questions: > > > Main questions: > > 1. How should we handle the case where *all* inherited relations are > > excluded? (This is not currently covered in the code). > > I assume this means we don't return any rows. Why it is an issue? A code question only. No issue, just how should the code look? > > 2. Should this feature be available for all queries or just inherited > > relations? > > I don't see why other queries should not use this. Our TODO already > has: > > * Use CHECK constraints to influence optimizer decisions > > CHECK constraints contain information about the distribution of values > within the table. This is also useful for implementing subtables where > a tables content is distributed across several subtables. > > and this looks like what you are doing. However, again, I see the > constraint as just informing whether there might be any rows in the > table. Am I missing something? Are you thinking views with UNION could > benefit from this? In general, it seems you might want this. In normal use check constraints tend to be on minor columns, not key columns. Queries that would be provably able to exclude tables based upon this would be strange queries. i.e. select count(distinct item_pk) from warehouse where quantity < 0 is not a very common query. So we would pay the overhead of checking for exclusion for all queries when only a few wierd ones would ever take advantage of it. Sounds like a poor trade-off to me. IMHO, the only time you might expect to see benefit is when you have many similar tables that are partitioned by design into pieces that lend themselves to exclusion. If you specifically designed a set of tables and used UNION to bring them together, then I can see that you would want it then also.... but is there any benefit in supporting two different ways of achieving the same basic design: partitioned table. > > 3. And should we extend RelOptInfo to include constraint information? > > Is the problem that you have to do multiple lookups without it? > > 4. Do we want to integrate the test suite also? > > No, once this thing works, I don't see it getting broken frequently. We > usually don't test the optimizer, but we could add a single test for one > row in each table. > > > 5. Presumably a section under Performance tips would be appropriate to > > document this feature? (As well as section in run-time parameters). > > Yep. > > I am surprised no one else has commented on it, which I think means your > code is ready for the queue. Do you want to adjust it based on this > feedback or should I apply and you can adjust it later. I think that it means I did not explain myself well enough. :-) Best Regards, Simon Riggs
Simon Riggs wrote: > Yes, dead on. Thank you for this elegant summary. The main idea was > originally Hannu Krosing's, I believe, with suggestion from Tom to > enhance the partial index machinery to this end. > > So a query such as > > select * from pm where dkey = 25000 > > will have an EXPLAIN that completely ignores p1 and p3, since these can > be provably excluded from the plan without effecting the result. > > I see the "no syntax" version as the first step towards additional > functionality that would require additional syntax. OK, makes sense. > > Oh, why would someone want to set enable_constraint_exclusion to false? > > The included functionality performs the exclusion at plan time. If a > query was prepared for later execution, it *could* return the wrong > answer when the plan was executed at a later time since plans are not > invalidated when constraints change. So, in general, this should be set > to false except for circumstances where the user can guarantee no such > mistake would be made. Ah, so there is a small additional restriction (changing constraints on planned queries) that this would affect. > > You had a few questions: > > > > > Main questions: > > > 1. How should we handle the case where *all* inherited relations are > > > excluded? (This is not currently covered in the code). > > > > I assume this means we don't return any rows. Why it is an issue? > > A code question only. No issue, just how should the code look? Ah, so there is no sequential/index scan on anything then. Don't we have another case like this in the code? > > > 2. Should this feature be available for all queries or just inherited > > > relations? > > > > I don't see why other queries should not use this. Our TODO already > > has: > > > > * Use CHECK constraints to influence optimizer decisions > > > > CHECK constraints contain information about the distribution of values > > within the table. This is also useful for implementing subtables where > > a tables content is distributed across several subtables. > > > > and this looks like what you are doing. However, again, I see the > > constraint as just informing whether there might be any rows in the > > table. Am I missing something? Are you thinking views with UNION could > > benefit from this? > > In general, it seems you might want this. In normal use check > constraints tend to be on minor columns, not key columns. Queries that > would be provably able to exclude tables based upon this would be > strange queries. > > i.e. > select count(distinct item_pk) from warehouse where quantity < 0 > > is not a very common query. So we would pay the overhead of checking for > exclusion for all queries when only a few wierd ones would ever take > advantage of it. Sounds like a poor trade-off to me. > > IMHO, the only time you might expect to see benefit is when you have > many similar tables that are partitioned by design into pieces that lend > themselves to exclusion. If you specifically designed a set of tables > and used UNION to bring them together, then I can see that you would > want it then also.... but is there any benefit in supporting two > different ways of achieving the same basic design: partitioned table. I think you are probably right with the GUC at this stage. As the feature is expanded in 8.2, we can then turn it on automatically and remove the GUC. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Simon Riggs wrote: >>> Oh, why would someone want to set enable_constraint_exclusion to false? >> >> The included functionality performs the exclusion at plan time. If a >> query was prepared for later execution, it *could* return the wrong >> answer when the plan was executed at a later time since plans are not >> invalidated when constraints change. So, in general, this should be set >> to false except for circumstances where the user can guarantee no such >> mistake would be made. > Ah, so there is a small additional restriction (changing constraints on > planned queries) that this would affect. This is a stopgap until we have automatic plan invalidation. regards, tom lane
On Sat, 2005-07-02 at 15:56 -0400, Bruce Momjian wrote: > > Main questions: > > 1. How should we handle the case where *all* inherited relations are > > excluded? (This is not currently covered in the code). > > I assume this means we don't return any rows. Why it is an issue? Not so much an issue as tidy up for a corner case within the code. I have 3 ways of handling this case: 1. Set the query to scan the first parent relation only (even though we know we don't have to). 2. Add a flag to RelOptInfo that can be set when this situation occurs, which would then be used to add a Result(false) top level node at the end of query_planner in planmain.c 3. Add function return values for a flag to make_one_rel(), just as in pull_constant_clauses() that would be handled as a a Result(false) top level node at the end of query_planner in planmain.c. This would require changes to call parameters of set_base_rel_pathlists() and set_inherited_rel_pathlist() in allpaths.c At the moment, this case is not handled in my patch. (3) is probably structurally the most similar to other existing calls. I'm not that fussed either way, so will settle towards (3) unless somebody screams.... Best Regards, Simon Riggs
On Sat, 2005-07-02 at 18:29 -0400, Alvaro Herrera wrote: > On Sat, Jul 02, 2005 at 03:56:48PM -0400, Bruce Momjian wrote: > > > Simon Riggs wrote: > > > > > > I enclose a fully working implementation of Constraint Exclusion, a very > > > basic form of Partitioning. Initial review is requested, to allow us all > > > to assess what further work is required on this prior to Beta freeze. > > save a typo in a comment, it said "OR" and the code said "and" in one > of the truth tables There is one place where I say if OR => OR then use the AND truth table, which could possible say "use the same truth table as AND". So the comment, and the code are correct there. Best Regards, Simon Riggs
On Mon, 2005-06-27 at 01:41 +0100, Simon Riggs wrote: > I enclose a fully working implementation of Constraint Exclusion, a very > basic form of Partitioning. Initial review is requested, to allow us all > to assess what further work is required on this prior to Beta freeze. > > Patch against current cvstip; passes make check and all special tests. It has been pointed out to me that the patch has a minor, yet basic error in the planning. This has now been corrected in this patch. It appears that my midnight-patch submission was not such a good idea after all. My full and sincere apologies to anybody that tried and failed on that patch. To err is human... I have also now filled the gap for when all relations are excluded, mentioned as question 1 in my original post. I have now re-tested the patch against cvstip... The special tests files are unchanged. Performance tests and comments welcome. Best Regards, Simon Riggs