Thread: A Guide to Constraint Exclusion (Partitioning)
A Guide to the Constraint Exclusion Feature =========================================== Simon Riggs 2ndQuadrant simon@2ndquadrant.com INTRODUCTION Constraint Exclusion (CE) is an optimizer patch submitted for PostgreSQL 8.1. CE aims to greatly improve the performance for certain types of common queries against large tables. CE extends the logic originally developed for Partial Indexes to allow the optimizer to avoid scanning particular tables for certain queries. CE will exclude a table from the query plan when the query's WHERE clause is proven not to "overlap" with any of the Constraints defined on those tables. No new syntax is required to take advantage of this feature, which is designed to work in conjunction with existing Inheritance features. Some syntax *is* required to declare Constraints, which are already ANSI/ISO SQL:2003 compliant. In other RDBMS these features are often referred to as Partitioning. The PostgreSQL CE feature is much more widely applicable than that name might suggest and will be of benefit to many users, not just for Business Intelligence applications. The aim of this guide is to help prospective users understand how the feature will work, to allow improvement during the 8.1 beta test cycle. All feedback is welcome. TABLE PARTITIONING OVERVIEW Let's look at a practical example of how this works and the performance benefits it provides: In many BI workloads there are a small number of very large tables, often referred to as Fact tables. In order to improve load and query performance against those tables it is often useful to split these larger tables into pieces, known as partitions. PostgreSQL 8.0 could support pseudo-partitioning using the Inheritance feature. Here's a BI Fact table for a Retail DW as an example of that: CREATE TABLE Sales_DateItemOutlet ( DateKey Integer , OutletKey Integer , ItemKey Integer , SoldQty Integer ); This table could be accessed using a query likeSELECT sum(soldqty) FROM Sales_DateItemOutletWHERE DateKey between 20050101and 20050101 This can then be split up so that we have a number of sub-tables: CREATE TABLE Sales_Jan_DateItemOutlet () INHERITS (Sales_DateItemOutlet); CREATE TABLE Sales_Feb_DateItemOutlet () INHERITS (Sales_DateItemOutlet); CREATE TABLE Sales_Mar_DateItemOutlet () INHERITS (Sales_DateItemOutlet); When we load this we put January's data in the Jan table etc, though put no rows at all in the "parent" table. Now, if we run our example query againSELECT sum(soldqty) FROM Sales_DateItemOutletWHERE DateKey between 20050101 and 20050101 we find that the query willScan all rows in Sales_DateItemOutlet (which is empty)Scan all rows in Sales_Jan_DateItemOutletScanall rows in Sales_Feb_DateItemOutletScan all rows in Sales_Mar_DateItemOutlet and return the correct answer. But we know that the query did not really need to have scanned the Feb and Mar tables, since these do not contain any data that would satisfy the query. (The full EXPLAIN output is not shown above, for clarity only.) The new CE functionality aims to improve the performance of such queries. Ideally we would like the query to Scan all rows in Sales_Jan_DateItemOutlet and ignore the parent table, and all other child tables. To do allow this, we must provide more declarative information to allow the optimizer to understand as much as we do. To do this, we add constraints on to each table, so that it is clear what rows they can contain. CREATE TABLE Sales_Jan_DateItemOutlet ( CHECK (DateKey BETWEEN 20050101 AND 20050131) ) INHERITS (Sales_DateItemOutlet); CREATE TABLE Sales_Feb_DateItemOutlet ( CHECK (DateKey BETWEEN 20050201 AND 20050229) ) INHERITS (Sales_DateItemOutlet); CREATE TABLE Sales_Mar_DateItemOutlet ( CHECK (DateKey BETWEEN 20050301 AND 20050331) ) INHERITS (Sales_DateItemOutlet); Now, when we execute our test query, the optimizer can understand that the Feb and Mar tables can *never* hold rows that would match our query. As a result it is able to provably exclude them from the query plan. Thus, with Constraint Exclusion enabled our test querySELECT sum(soldqty) FROM Sales_DateItemOutletWHERE DateKey between20050101 and 20050101 will perform the following scansScan all rows in Sales_DateItemOutlet (which is empty)Scan all rows in Sales_Jan_DateItemOutlet Running an EXPLAIN or EXPLAIN ANALYZE will allow you to see which tables have been included in the query. There is no explicit message to say that a table has been excluded. We aren't able to exclude the parent table from the above query because no Constraint was defined upon it. Since, in our example, the parent is empty there will be little effect on the query performance. It would be a mistake to attempt to get around this by placing a Constraint on the parent, since that would then automatically be created on all child tables also. So we can never exclude the parent without automatically excluding *all* of the children also. Overall then, if the partitions are all the same size, we will have reduced execution time of the query to around one-third of its previous elapsed time. If we had 10 or 100 child tables, certain queries could be 10 or 100 times faster than without the CE feature. In summary, the CE feature will be a huge performance gain for qualifying queries against large tables in PostgreSQL databases. Since no new syntax is required, existing PostgreSQL databases designed to take advantage of inheritance will automatically benefit from the performance enhancements. CONSTRAINT EXCLUSION If you wish to enable this feature you must set the server parameterenable_constraint_exclusion = true This is a user-settable parameter, so could be enabled for specific users or queries. Constraint exclusion is not enabled by default. This is because PostgreSQL does not yet have full plan invalidation when DDL changes are made. If a change were made to any of the tables, the query would not automatically re-optimize, so it would be possible to get an incorrect answer returned from a CE plan. This is not a fault of CE, but simply a side-effect of the current lack of full plan invalidation in PostgreSQL. CE will work only when the query contains a direct and simple restriction of the table, such as Attribute > Constant If Operators are available for that datatype, CE will use = < <= > >= <> These operators must be defined as IMMUTABLE. CE will work for most simple Constraints. Constraints are already limited to IMMUTABLE predicates, such as Attribute > Constant. Constraints likeAttribute BETWEEN Const1 and Const2 can be used to produce ranges of values, and would be generally described as RANGE PARTITIONING. Constraints likeAttribute IN (Const1, Const2, Const3 ...) can be used to produce lists of values, and would generally be described as LIST PARTITIONING. All of these Constraint types could be mixed, to allow Constraints likeAttribute1 IN (Const1, Const2) AND Attribute2 >= Const3 AND Attribute2 < Const4 Thus, the CE feature allows a very flexible partitioning scheme to be developed that could mix LIST, RANGE style partitioning clauses. Currently, there is no restriction that all constraints *must* be mutually exclusive, nor even that the constraints may be similar on each table. This can be useful for some designs where the inheritance hierarchy is not "disjoint", as UML would term this situation. CE does not prevent direct access to one of the child tables in an inheritance hierarchy. In this case, no exclusion test would be performed. Exclusion tests are performed *only* when the parent table in an inheritance hierarchy is accessed. Exclusion tests are performed even if the inheritance hierarchy is many levels deep (e.g. parent-child- grandchild). CE also supports multiple inheritance. PostgreSQL makes no restriction upon what indices are defined on the various child tables. CE is not dependent upon the existence or absence of indices, only upon the Query's WHERE clause and the Constraints defined upon the tables. CE can be very useful for large historical databases. As an example, PostgreSQL would allow a historical data table split into sections like this: - Last 3 months: one table per week, each table defined with 3 indices - 3-12 months: one table per month, no indices defined - 12-36 months: tablespaces defined on hierarchical storage (near-line), with no indices, so de-archived when required for use. The CE feature would allow the optimizer to avoid wasteful de-archiving of the older data, with an appropriate database/query design. The current patch also includes a suite of 700 tests that successfully exclude child tables in all common query types (whilst returning the correct answer!). These include examples of List, Range and mixed partitioning scenarios. CURRENT RESTRICTIONS It is not yet possible to specify that Constraints on child tables will be mutually exclusive of each other. Currently, it would be up to the designer to ensure that, if desired. It is not yet possible to specify that an inheritance parent has no rows, and, if so, should always be excluded from the query. If a parent table has a Constraint defined upon it, then this will be automatically copied to all child tables. Currently, there is no way to tell which Constraints have been inherited from the parent, so exclusion tests will be re-executed against all child tables. This will cause additional optimization time. Currently, all child tables will be considered. It may be possible in the future to pre-sort the list of child tables, so that optimization time can be reduced for parent tables with large numbers of partitions. Currently, there is no index on the pg_inherits system table. As a result, parents with more than 1000 child tables are likely to experience longer than desirable planning times for their queries. CE checks will not currently recognise STABLE functions within a query. So WHERE clauses such asDateKey > CURRENT DATE will not cause exclusion because CURRENT DATE is a STABLE function. CE checks are not made when the parent table is involved in a join. Other existing restrictions on Inherited tables continue to apply. Further enhancements to the CE feature can be expected in the future. COMPARISONS WITH OTHER RDBMS In brief: Teradata and Oracle already have table Partitioning. Oracle's partitioning allows either LIST or RANGE partitioning. The new PostgreSQL CE feature is considerably more flexible than Oracle's declarative Partitioning syntax, and is achieved without adding non-standard SQL extensions. Both Sybase and MS SQLServer 2005 have new Partitioning features released this year. The Microsoft feature does allow joins between tables to exclude partitions. However, it also requires that all partitions should have identical indexes, which is a major blocking factor to the use of very large tables. DB2 has partitioning using UNION ALL views, which includes somewhat similar functionality to the PostgreSQL CE feature. DB2 uses the PARTITION keyword to signify a different type of feature, so do not be confused that they do have this feature with declarative syntax. =================================================================
Sounds very useful - even for my small systems. Does it/would it work on an ordinary table (for those cases currently using UNION ALL)? -- Richard Huxton Archonet Ltd
On Wed, 2005-07-13 at 14:02 +0100, Richard Huxton wrote: > Sounds very useful - even for my small systems. Does it/would it work on > an ordinary table (for those cases currently using UNION ALL)? I'm looking into that aspect right now. I see no reason why it shouldn't work for UNION ALL views/queries in the same manner as it does for inheritance. Best Regards, Simon Riggs
On K, 2005-07-13 at 16:10 +0100, Simon Riggs wrote: > On Wed, 2005-07-13 at 14:02 +0100, Richard Huxton wrote: > > Sounds very useful - even for my small systems. Does it/would it work on > > an ordinary table (for those cases currently using UNION ALL)? > > I'm looking into that aspect right now. > > I see no reason why it shouldn't work for UNION ALL views/queries in the > same manner as it does for inheritance. Why does the CE speed depend on having index on pg_inherits ? Can't you just check the constraints on individual tables ? Or are the constraints not actually copied to child tables, and you must look them up from parent tabless ? -- Hannu Krosing <hannu@tm.ee>
On K, 2005-07-13 at 11:53 +0100, Simon Riggs wrote: > > We aren't able to exclude the parent table from the above query because > no Constraint was defined upon it. Since, in our example, the parent is > empty there will be little effect on the query performance. It would be > a mistake to attempt to get around this by placing a Constraint on the > parent, since that would then automatically be created on all child > tables also. So we can never exclude the parent without automatically > excluding *all* of the children also. At least in 8.0 you can drop the inherited constraint from child table: hannu=# create table empty_master_table ( hannu(# id serial, hannu(# data text, hannu(# constraint table_must_be_empty check(false) hannu(# ); NOTICE: CREATE TABLE will create implicit sequence "empty_master_table_id_seq" for "serial" column "empty_master_table.id" CREATE TABLE hannu=# insert into empty_master_table (data) values (NULL); ERROR: new row for relation "empty_master_table" violates check constraint "tab le_must_be_empty" hannu=# create table first_partition() inherits (empty_master_table); CREATE TABLE hannu=# \d first_partition Table "public.first_partition"Column | Type | Modifiers --------+--------- +--------------------------------------------------------------------id | integer | not null default nextval ('public.empty_master_table_id_seq'::text)data | text | Check constraints: "table_must_be_empty" CHECK (false) Inherits: empty_master_table hannu=# alter table first_partition drop constraint table_must_be_empty; ALTER TABLE hannu=# \d first_partition Table "public.first_partition"Column | Type | Modifiers --------+--------- +--------------------------------------------------------------------id | integer | not null default nextval ('public.empty_master_table_id_seq'::text)data | text | Inherits: empty_master_table hannu=# \d empty_master_table Table "public.empty_master_table"Column | Type | Modifiers --------+--------- +--------------------------------------------------------------------id | integer | not null default nextval ('public.empty_master_table_id_seq'::text)data | text | Check constraints: "table_must_be_empty" CHECK (false) hannu=# insert into first_partition(data) values ('first_partition'); INSERT 19501405 1 hannu=# I imagine that this kind of thing does not work well with pg_dump, but it is at least possible. > Currently, there is no restriction that all constraints *must* be > mutually exclusive, nor even that the constraints may be similar on each > table. This can be useful for some designs where the inheritance > hierarchy is not "disjoint", as UML would term this situation. actually this is GOOD, as this way I can have a constraint on both insert_timestamp and primary_key fields, which are mostly but not absolutely in the same order. And also to add extra IN (X,Y,Z) constraints for some other fields. > CE does not prevent direct access to one of the child tables in an > inheritance hierarchy. In this case, no exclusion test would be > performed. Exclusion tests are performed *only* when the parent table in > an inheritance hierarchy is accessed. Exclusion tests are performed even > if the inheritance hierarchy is many levels deep (e.g. parent-child- > grandchild). CE also supports multiple inheritance. I'd like to see an option to ALWAYS do CE, inheritance or union (all) or even simple queries. > CURRENT RESTRICTIONS > > It is not yet possible to specify that Constraints on child tables will > be mutually exclusive of each other. Currently, it would be up to the > designer to ensure that, if desired. > > It is not yet possible to specify that an inheritance parent has no > rows, and, if so, should always be excluded from the query. I think that a simple "CHECK(false)" constraint should be enough for this. > If a parent table has a Constraint defined upon it, then this will be > automatically copied to all child tables. But they can be removed later if desired. > Currently, there is no way to > tell which Constraints have been inherited from the parent, so exclusion > tests will be re-executed against all child tables. This will cause > additional optimization time. Have you done any performance testing, i.e. what is the actual impact of CE on planning time ? > Currently, all child tables will be considered. It may be possible in > the future to pre-sort the list of child tables, so that optimization > time can be reduced for parent tables with large numbers of partitions. > > Currently, there is no index on the pg_inherits system table. As a > result, parents with more than 1000 child tables are likely to > experience longer than desirable planning times for their queries. Am I right that this is a general postgresql issue and has nothing to do with CE ? > CE checks will not currently recognise STABLE functions within a query. > So WHERE clauses such as > DateKey > CURRENT DATE > will not cause exclusion because CURRENT DATE is a STABLE function. > > CE checks are not made when the parent table is involved in a join. Is this also the case where parent table is inside subquery and that subquery is involved in a join? Or do I have to make sure that it is not lifted out of that subquery using something like pl/pgsql function ? > Other existing restrictions on Inherited tables continue to apply. WHat happens for multiple inheritance ? Is it detected and then also excluded from CE ? Or is that just a "don't do it" item ? > Further enhancements to the CE feature can be expected in the future. Great! :D And a big Thank You! -- Hannu Krosing <hannu@tm.ee>
On N, 2005-07-14 at 00:13 +0300, Hannu Krosing wrote: > On K, 2005-07-13 at 11:53 +0100, Simon Riggs wrote: > > > > > We aren't able to exclude the parent table from the above query because > > no Constraint was defined upon it. Since, in our example, the parent is > > empty there will be little effect on the query performance. It would be > > a mistake to attempt to get around this by placing a Constraint on the > > parent, since that would then automatically be created on all child > > tables also. So we can never exclude the parent without automatically > > excluding *all* of the children also. hannu=# create table ttt(id serial, t text); NOTICE: CREATE TABLE will create implicit sequence "ttt_id_seq" for "serial" co lumn "ttt.id" CREATE TABLE hannu=# create table ttt2() inherits (ttt); CREATE TABLE hannu=# alter table ONLY ttt add constraint ccc check(false); ALTER TABLE this also works, but new inherited tables will still inherit the constraint. Perhaps we just need a way to store the "ONLY" status of the constraint, and then not inherit these. -- Hannu Krosing <hannu@skype.net>
Hannu Krosing <hannu@tm.ee> writes: > > It is not yet possible to specify that an inheritance parent has no > > rows, and, if so, should always be excluded from the query. > > I think that a simple "CHECK(false)" constraint should be enough for > this. huh, that's clever. > > CE checks are not made when the parent table is involved in a join. > > Is this also the case where parent table is inside subquery and that > subquery is involved in a join? I assume this is a TODO and just not in your first batch of work? It seems like a pretty important piece eventually. Are there any fundamental difficulties with handling joins eventually? -- greg
On Thu, 2005-07-14 at 00:13 +0300, Hannu Krosing wrote: > On K, 2005-07-13 at 11:53 +0100, Simon Riggs wrote: > > We aren't able to exclude the parent table from the above query because > > no Constraint was defined upon it. Since, in our example, the parent is > > empty there will be little effect on the query performance. It would be > > a mistake to attempt to get around this by placing a Constraint on the > > parent, since that would then automatically be created on all child > > tables also. So we can never exclude the parent without automatically > > excluding *all* of the children also. > > At least in 8.0 you can drop the inherited constraint from child table: Yes, you can. But in discussion on -hackers on 20 May there was agreement (OK, me agreeing with Tom) that being able to drop inherited constraints was a bug (or at least a deprecated feature...) > > Currently, there is no restriction that all constraints *must* be > > mutually exclusive, nor even that the constraints may be similar on each > > table. This can be useful for some designs where the inheritance > > hierarchy is not "disjoint", as UML would term this situation. > > actually this is GOOD, as this way I can have a constraint on both > insert_timestamp and primary_key fields, which are mostly but not > absolutely in the same order. And also to add extra IN (X,Y,Z) > constraints for some other fields. Yes, understood. My description was not very good. I did not mean that constraints on any one table should be mutually exclusive. I meant that a set of constraints across a set of child tables should be able to be defined mutually exclusive (e.g. just as Oracle partitions are). > > CE does not prevent direct access to one of the child tables in an > > inheritance hierarchy. In this case, no exclusion test would be > > performed. Exclusion tests are performed *only* when the parent table in > > an inheritance hierarchy is accessed. Exclusion tests are performed even > > if the inheritance hierarchy is many levels deep (e.g. parent-child- > > grandchild). CE also supports multiple inheritance. > > I'd like to see an option to ALWAYS do CE, inheritance or union (all) or > even simple queries. I've argued against that, but I guess if there is enough opinion I can be persuaded. My argument is that most constraints look something likeCHECK ( salary > 0 and salary < 1000000) If you ask a query likeselect count(*) from emp where salary < -10000 then CE would work great. But you don't find many people asking such nonsensical questions, or at least very often. Those queries do get asked, but they seem either naive or "data profiling" type queries. So my conclusion is that for most constraints and most queries, CE is literally just a waste of planning time. When a designer deliberately creates a scenario where constraints have meaning, as in a mutually exclusive inheritance hierarchy or classic partitioned table design, then you have a high chance of CE being effective. The indicator for the value of CE, IMHO, is the "and/or" aspect of multiple related tables. Thats my argument, but I'm willing to hear counter arguments. I just wish to avoid overselling CE and slowing most people's queries down for no good reason. > > CURRENT RESTRICTIONS > > > > It is not yet possible to specify that Constraints on child tables will > > be mutually exclusive of each other. Currently, it would be up to the > > designer to ensure that, if desired. > > > > It is not yet possible to specify that an inheritance parent has no > > rows, and, if so, should always be excluded from the query. > > I think that a simple "CHECK(false)" constraint should be enough for > this. That works, but as I said.... > > If a parent table has a Constraint defined upon it, then this will be > > automatically copied to all child tables. > > But they can be removed later if desired. But should not be... > > Currently, there is no way to > > tell which Constraints have been inherited from the parent, so exclusion > > tests will be re-executed against all child tables. This will cause > > additional optimization time. > > Have you done any performance testing, i.e. what is the actual impact of > CE on planning time ? I think "bad" would be my assessment. But not "very bad". But this must be offset by the extraordinary time saving in execution time. Planning time depends upon the complexity of the query, the number and complexity of the constraints and the number of child tables. I think I would guess currently at child tables of around 1 GB, with sizes as small as 100 MB being reasonable. Trying to keep table size relatively evenly distributed would be useful also. With those recommendations it will always be worth 0.1 sec investment in trying to avoid 100+ sec sequential scan times. But again, you need to consider your workload. > > Currently, all child tables will be considered. It may be possible in > > the future to pre-sort the list of child tables, so that optimization > > time can be reduced for parent tables with large numbers of partitions. > > > > Currently, there is no index on the pg_inherits system table. As a > > result, parents with more than 1000 child tables are likely to > > experience longer than desirable planning times for their queries. > > Am I right that this is a general postgresql issue and has nothing to do > with CE ? Yep. > > CE checks will not currently recognise STABLE functions within a query. > > So WHERE clauses such as > > DateKey > CURRENT DATE > > will not cause exclusion because CURRENT DATE is a STABLE function. > > > > CE checks are not made when the parent table is involved in a join. > > Is this also the case where parent table is inside subquery and that > subquery is involved in a join? My comment was too terse. What I meant was that you can't do dynamic exclusion based upon the results of a join. i.e. PPUC2 > Or do I have to make sure that it is not lifted out of that subquery > using something like pl/pgsql function ? > > > Other existing restrictions on Inherited tables continue to apply. > > WHat happens for multiple inheritance ? CE works. > Is it detected and then also excluded from CE ? No > Or is that just a "don't do it" item ? Thats a religious debate.... I'll leave that one. Best Regards, Simon Riggs
On Thu, 2005-07-14 at 15:30 -0400, Greg Stark wrote: > Hannu Krosing <hannu@tm.ee> writes: > > > > It is not yet possible to specify that an inheritance parent has no > > > rows, and, if so, should always be excluded from the query. > > > > I think that a simple "CHECK(false)" constraint should be enough for > > this. > > huh, that's clever. But not clever enough... see my reply to Hannu. Sorry. > > > CE checks are not made when the parent table is involved in a join. > > > > Is this also the case where parent table is inside subquery and that > > subquery is involved in a join? > > I assume this is a TODO and just not in your first batch of work? It seems > like a pretty important piece eventually. Are there any fundamental > difficulties with handling joins eventually? Its a reasonable size piece of work, and could not be reworked in time for 8.1. Some other mental leaps may also be required. Best Regards, Simon Riggs
>>> CE checks will not currently recognise STABLE functions within a query. >>> So WHERE clauses such as >>> DateKey > CURRENT DATE >>> will not cause exclusion because CURRENT DATE is a STABLE function. >>> >>> CE checks are not made when the parent table is involved in a join. >> >> Is this also the case where parent table is inside subquery and that >> subquery is involved in a join? > > My comment was too terse. What I meant was that you can't do dynamic > exclusion based upon the results of a join. i.e. PPUC2 Phew! Correlated subqueries won't CE then, but the more common complex queries will. We'll test with some common ones soon. - Luke
On N, 2005-07-14 at 21:29 +0100, Simon Riggs wrote: > On Thu, 2005-07-14 at 00:13 +0300, Hannu Krosing wrote: > > On K, 2005-07-13 at 11:53 +0100, Simon Riggs wrote: > > > > We aren't able to exclude the parent table from the above query because > > > no Constraint was defined upon it. Since, in our example, the parent is > > > empty there will be little effect on the query performance. It would be > > > a mistake to attempt to get around this by placing a Constraint on the > > > parent, since that would then automatically be created on all child > > > tables also. So we can never exclude the parent without automatically > > > excluding *all* of the children also. > > > > At least in 8.0 you can drop the inherited constraint from child table: > > Yes, you can. But in discussion on -hackers on 20 May there was > agreement (OK, me agreeing with Tom) that being able to drop inherited > constraints was a bug (or at least a deprecated feature...) Why is it a bug ? Tom wrote on May 20: > I think a good argument can be made that the above behavior is a bug, > and that the ALTER command should have been rejected. We've gone to > great lengths to make sure you can't ALTER a child table to make it > incompatible with the parent in terms of the column names and types; > shouldn't this be true of check constraints as well? There is a good fundamental reason why we dont let people drop columns from children or to add them to parent ONLY: if we did not, then there would be no way to query from the hierarchy. I can't see any such reason for forbidding dropping constraints from child tables or disallowing CREAETE CONSTRAINT ON parent ONLY , at least not for CHECK constraints. And even disallowing it can probably be circumvented by a clever functional CHECK constraint, which checks for also the table it is defined on. OTOH, disallowing all this for child FOREIGN KEY's seems logical though. OTOOH, to be symmetrical with previous, we should also have UNIQUE and PK constraints that span all the inheritance hierarchy, but I don't want to go there now :) > > > CE does not prevent direct access to one of the child tables in an > > > inheritance hierarchy. In this case, no exclusion test would be > > > performed. Exclusion tests are performed *only* when the parent table in > > > an inheritance hierarchy is accessed. Exclusion tests are performed even > > > if the inheritance hierarchy is many levels deep (e.g. parent-child- > > > grandchild). CE also supports multiple inheritance. > > > > I'd like to see an option to ALWAYS do CE, inheritance or union (all) or > > even simple queries. > > I've argued against that, but I guess if there is enough opinion I can > be persuaded. > > My argument is that most constraints look something like > CHECK ( salary > 0 and salary < 1000000) > If you ask a query like > select count(*) from emp where salary < -10000 > then CE would work great. But you don't find many people asking such > nonsensical questions, or at least very often. Those queries do get > asked, but they seem either naive or "data profiling" type queries. So > my conclusion is that for most constraints and most queries, CE is > literally just a waste of planning time. CE may have some use for automatically generated queries, but in general I agree with you. Btw, not just UNION ALL, but also simple UNION, INTERSECT and EXCEPT could probably be taught to use CE at some stage. Also CE could at some stage be used to drop the UNIQUEifying nodes from UNION (without ALL), if it can prove that the UNION is already UNIQUE. > > > If a parent table has a Constraint defined upon it, then this will be > > > automatically copied to all child tables. > > > > But they can be removed later if desired. > > But should not be... FOREIGN KEY constraints should not, but I think that simple CHECK's could. > > Have you done any performance testing, i.e. what is the actual impact of > > CE on planning time ? > > I think "bad" would be my assessment. But not "very bad". But this must > be offset by the extraordinary time saving in execution time. > > Planning time depends upon the complexity of the query, the number and > complexity of the constraints and the number of child tables. > > I think I would guess currently at child tables of around 1 GB, with > sizes as small as 100 MB being reasonable. Trying to keep table size > relatively evenly distributed would be useful also. With those > recommendations it will always be worth 0.1 sec investment in trying to > avoid 100+ sec sequential scan times. But again, you need to consider > your workload. That's why I asked for GUC, not a default behaviour ;) > > > CE checks will not currently recognise STABLE functions within a query. > > > So WHERE clauses such as > > > DateKey > CURRENT DATE > > > will not cause exclusion because CURRENT DATE is a STABLE function. > > > > > > CE checks are not made when the parent table is involved in a join. > > > > Is this also the case where parent table is inside subquery and that > > subquery is involved in a join? > > My comment was too terse. What I meant was that you can't do dynamic > exclusion based upon the results of a join. i.e. PPUC2 but what about _static_ exlusion based on constraints ? I mean if there is a left side table with say a single partition having CHECK(id_order BETWEEN 1 AND 1000) (either originally or left after eliminating other by other constraints) and 3 right side partition with CHECK(key_order BETWEEN 1 AND 1000) CHECK(key_order BETWEEN 1001 AND 2000) CHECK(key_orderBETWEEN 2001 AND 3000) then the 3rd one could be eliminated statically from a join on id_order=key_order ... > > WHat happens for multiple inheritance ? > > CE works. > > > Is it detected and then also excluded from CE ? > > No > > > Or is that just a "don't do it" item ? > > Thats a religious debate.... I'll leave that one. :) > Best Regards, Simon Riggs And thanks for the good work so far! -- Hannu Krosing <hannu@skype.net>
On Fri, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote: > On N, 2005-07-14 at 21:29 +0100, Simon Riggs wrote: > > On Thu, 2005-07-14 at 00:13 +0300, Hannu Krosing wrote: > > > On K, 2005-07-13 at 11:53 +0100, Simon Riggs wrote: > > > > > > We aren't able to exclude the parent table from the above query because > > > > no Constraint was defined upon it. Since, in our example, the parent is > > > > empty there will be little effect on the query performance. It would be > > > > a mistake to attempt to get around this by placing a Constraint on the > > > > parent, since that would then automatically be created on all child > > > > tables also. So we can never exclude the parent without automatically > > > > excluding *all* of the children also. > > > > > > At least in 8.0 you can drop the inherited constraint from child table: > > > > Yes, you can. But in discussion on -hackers on 20 May there was > > agreement (OK, me agreeing with Tom) that being able to drop inherited > > constraints was a bug (or at least a deprecated feature...) > > Why is it a bug ? > > Tom wrote on May 20: > > I think a good argument can be made that the above behavior is a bug, > > and that the ALTER command should have been rejected. We've gone to > > great lengths to make sure you can't ALTER a child table to make it > > incompatible with the parent in terms of the column names and types; > > shouldn't this be true of check constraints as well? > > There is a good fundamental reason why we dont let people drop columns > from children or to add them to parent ONLY: if we did not, then there > would be no way to query from the hierarchy. > > I can't see any such reason for forbidding dropping constraints from > child tables or disallowing CREAETE CONSTRAINT ON parent ONLY , at least > not for CHECK constraints. If we allow DROPing them, why allow them to be inherited in the first place? One or the other, not both. I do still agree with Tom on that, but as I said at the time, I don't see it as a big enough problem to spend time removing that feature. But I personally wouldn't grow to rely on its existence either. There are other ways... > And even disallowing it can probably be circumvented by a clever > functional CHECK constraint, which checks for also the table it is > defined on. > > OTOH, disallowing all this for child FOREIGN KEY's seems logical though. > > OTOOH, to be symmetrical with previous, we should also have UNIQUE and > PK constraints that span all the inheritance hierarchy, but I don't want > to go there now :) Well, I am working towards that. Best Regards, Simon Riggs
On Fri, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote: > Btw, not just UNION ALL, but also simple UNION, INTERSECT and EXCEPT > could probably be taught to use CE at some stage. It turns out that to solve this problem you very nearly have to solve the "any table" problem. Thats an extra argument in favour of making this work for any table. > That's why I asked for GUC, not a default behaviour ;) > but what about _static_ exlusion based on constraints ? > > I mean if there is a left side table with say a single partition having > CHECK(id_order BETWEEN 1 AND 1000) > (either originally or left after eliminating other by other constraints) > > and 3 right side partition with > CHECK(key_order BETWEEN 1 AND 1000) > CHECK(key_order BETWEEN 1001 AND 2000) > CHECK(key_order BETWEEN 2001 AND 3000) > > then the 3rd one could be eliminated statically from a join on > id_order=key_order Well, SQL allows you to express almost any query, but that doesn't mean it is all 3 of: frequently occcuring, meaningful and interesting. Have you ever seen such a construct? I think we might be able to use equivalence to show that a restriction on one table could be translated into a restriction on the Fact table. e.g. SELECT FROM Fact, OtherTable WHERE Fact.Key = OtherTable.Key AND OtherTable.Key > 28000 But the harder and yet more common problem is where there is no direct restriction on the equivalent join column. Thats the one I would eventually seek to solve e.g. SELECT FROM Fact, Dimension WHERE Fact.Key = Dimension.Key AND Dimension.DescriptiveField = 'Blah' where there is a relationship between DescriptiveField and Key enshrined within the Dimension table. > and thanks for the good work so far! Thank you, Best Regards, Simon Riggs
>> I assume this is a TODO and just not in your first batch of work? It seems >> like a pretty important piece eventually. Are there any fundamental >> difficulties with handling joins eventually? > > Its a reasonable size piece of work, and could not be reworked in time > for 8.1. As you've said previously, when you refer to joins not benefiting from CE, you mean data-driven joins, or joins whose predicates are dynamically determined, right? - Luke
Hannu, >> My comment was too terse. What I meant was that you can't do dynamic >> exclusion based upon the results of a join. i.e. PPUC2 > > but what about _static_ exlusion based on constraints ? > > I mean if there is a left side table with say a single partition having > CHECK(id_order BETWEEN 1 AND 1000) > (either originally or left after eliminating other by other constraints) > > and 3 right side partition with > CHECK(key_order BETWEEN 1 AND 1000) > CHECK(key_order BETWEEN 1001 AND 2000) > CHECK(key_order BETWEEN 2001 AND 3000) > > then the 3rd one could be eliminated statically from a join on > id_order=key_order I would expect that the 2nd and 3rd partitions would be CE'ed if the basic support is there. Does your current implementation include this capability Simon? Or is it limited to use with constant predicates? - Luke
Simon, > SELECT > FROM Fact, Dimension > WHERE Fact.Key = Dimension.Key > AND Dimension.DescriptiveField = 'Blah' So, what happens with this: SELECT FROM Fact, Dimension WHERE Fact.Key = Dimension.Key AND Fact.part = 100; With Fact defined with 3 partitions: CHECK(Part BETWEEN 1 AND 1000) CHECK(Part BETWEEN 1001 AND 2000) CHECK(PartBETWEEN 2001 AND 3000) Will the 2nd and 3rd partitions be eliminated? - Luke
On R, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote: > > but what about _static_ exlusion based on constraints ? > > I mean if there is a left side table with say a single partition having > CHECK(id_order BETWEEN 1 AND 1000) > (either originally or left after eliminating other by other constraints) > > and 3 right side partition with > CHECK(key_order BETWEEN 1 AND 1000) > CHECK(key_order BETWEEN 1001 AND 2000) > CHECK(key_order BETWEEN 2001 AND 3000) > > then the 3rd one could be eliminated statically from a join on > id_order=key_order the simplest form of this seems to be carrying checks to both ends of joins before CE. so for two partitioned tables "main" and "detail", and query select * from main m, detail d where m.id_main = d.key_main and m.id_main in (1,7,42) CE is done based on main.id_main in (1,7,42) and detail.key_mainin (1,7,42) Or perhaps this carrying over is already done automatically by postgres planner before CE ? -- Hannu Krosing <hannu@skype.net>
On Thu, 2005-07-14 at 15:16 -0700, Luke Lonergan wrote: > Simon, > > > SELECT > > FROM Fact, Dimension > > WHERE Fact.Key = Dimension.Key > > AND Dimension.DescriptiveField = 'Blah' > > So, what happens with this: > > SELECT > FROM Fact, Dimension > WHERE Fact.Key = Dimension.Key > AND Fact.part = 100; > > With Fact defined with 3 partitions: > CHECK(Part BETWEEN 1 AND 1000) > CHECK(Part BETWEEN 1001 AND 2000) > CHECK(Part BETWEEN 2001 AND 3000) > > Will the 2nd and 3rd partitions be eliminated? Yes. That is because the clause "part = 100" is a direct restriction on the partitioned table, using an immutable operator and a constant. Best Regards, Simon Riggs
On Thu, 2005-07-14 at 15:07 -0700, Luke Lonergan wrote: > Hannu, > > >> My comment was too terse. What I meant was that you can't do dynamic > >> exclusion based upon the results of a join. i.e. PPUC2 > > > > but what about _static_ exlusion based on constraints ? > > > > I mean if there is a left side table with say a single partition having > > CHECK(id_order BETWEEN 1 AND 1000) > > (either originally or left after eliminating other by other constraints) > > > > and 3 right side partition with > > CHECK(key_order BETWEEN 1 AND 1000) > > CHECK(key_order BETWEEN 1001 AND 2000) > > CHECK(key_order BETWEEN 2001 AND 3000) > > > > then the 3rd one could be eliminated statically from a join on > > id_order=key_order > > I would expect that the 2nd and 3rd partitions would be CE'ed if the basic > support is there. Does your current implementation include this capability > Simon? No > Or is it limited to use with constant predicates? Yes. It's the first time I've thought to compare the constraint predicates on joined tables based upon the join restriction. That's possible, but would take some time to work out. I've argued that such a construct is not common. I'm open to suggestions about what *is* common... Best Regards, Simon Riggs
On Fri, 2005-07-15 at 01:20 +0300, Hannu Krosing wrote: > On R, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote: > > > > > but what about _static_ exlusion based on constraints ? > > > > I mean if there is a left side table with say a single partition having > > CHECK(id_order BETWEEN 1 AND 1000) > > (either originally or left after eliminating other by other constraints) > > > > and 3 right side partition with > > CHECK(key_order BETWEEN 1 AND 1000) > > CHECK(key_order BETWEEN 1001 AND 2000) > > CHECK(key_order BETWEEN 2001 AND 3000) > > > > then the 3rd one could be eliminated statically from a join on > > id_order=key_order > > the simplest form of this seems to be carrying checks to both ends of > joins before CE. > > so for two partitioned tables "main" and "detail", and query > > select * > from main m, > detail d > where m.id_main = d.key_main > and m.id_main in (1,7,42) > > CE is done based on main.id_main in (1,7,42) and detail.key_mainin > (1,7,42) > > Or perhaps this carrying over is already done automatically by postgres > planner before CE ? Not sure... will check. I don't do it explicitly, but could do, given time. Best Regards, Simon Riggs
Simon, > It's the first time I've thought to compare the constraint predicates on > joined tables based upon the join restriction. That's possible, but > would take some time to work out. > > I've argued that such a construct is not common. I'm open to suggestions > about what *is* common... I also don't believe it is commonly used. There are occasions where fact tables are joined on a common key, but for CE to be effective, the key would have to be used in a similar way in constructing the partitions. There are many other ways to accomplish useful patterns, like partitioning fact tables based on a date based key, then using a constant predicate in the join, e.g: SELECT * FROM Fact AND Dimension WHERE Fact.a = Dimension.a AND to_char(T1.key,"MONTH") = 'JUNE' AND to_char(T1.key,"YYYY")= '2003'; With Fact defined with 12 partitions CHECK(to_char(key,"MONTH") EQUAL 'JANUARY') ... CHECK(to_char(key,"MONTH") EQUAL 'DECEMBER') - Luke
On Fri, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote: > And thanks for the good work so far! I think I should add: thanks for the good ideas so far. Your ideas and challenges have been essential to progress to date, as has been all the discussions and feedback. This is the beginning of a journey, not the end. CE for 8.1 is really just a taster of what could be for 8.2. I hope that doesn't dissuade anybody from thinking we should "wait for the full functionality". Best Regards, Simon Riggs
On R, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote: > There is a good fundamental reason why we dont let people drop columns > from children or to add them to parent ONLY: if we did not, then there > would be no way to query from the hierarchy. > > I can't see any such reason for forbidding dropping constraints from > child tables or disallowing CREAETE CONSTRAINT ON parent ONLY , at least > not for CHECK constraints. > > And even disallowing it can probably be circumvented by a clever > functional CHECK constraint, which checks for also the table it is > defined on. > > OTOH, disallowing all this for child FOREIGN KEY's seems logical though. Well, having thought more about it, I don't think that row-level constraints (CHECK, FK and NULL) should be restricted at all by inheritance hierarchy. Table-level constraints (UNIQUE and PK) should, but they need to be fixed first. What should be done with CHECK, FK and NULL is making a way for marking them as being defined on some table 'ONLY' if created as such. It should also be marked with ONLY when any of the constraints inherited from this one is dropped from child. Then all direct ancestor copies should get 'ONLY' status so thet they can be dumped properly. Or perhaps just disallow dropping constraints from children, but still allow creating constraints on ONLY parent table, which seems a cleaner solution. Why I also think that allowing this for FK's is good, is that it allows partitioning tables on both sides of FK relation even without global UNIQUE/PK constraints, by partitioning both of them on same or compatible boundaries. By compatible I mean here that we could partition the PK table on say PK values with a step of 1M but FK table on values with step of 100K, so that they both will have approximately the same number of rows per partition and there will be 10 tables with FK constraints pointing to the same PK partition. What comes to Toms's May 20 argument that people would be surprised when they select form a table whith check(i>0) constraint and get out i<0 then I think this is a question of education. If they don't know better, they will be equally surprised when changing a row in parent table actually changes a row in child table, or when a row inserted into child shows up in parent. The ONLY status of constraints should be accounted for in pg_dump so it will work right and also shown in psql's \d output so it confuses users less :) -- Hannu Krosing <hannu@tm.ee>
Bruce: could you change this TODO item o Prevent child tables from altering constraints like CHECK that were inherited from the parent table to o Prevent new child tables from inheriting constraints defined on parents with keyword ONLY: alter table ONLY t addconstraint c check(x=y); Prevent child tables from altering constraints that were inherited from the parent table(i.e were defined without ONLY/with ALL) This is a less restrictive solution to the same problem. On N, 2005-07-14 at 22:41 +0100, Simon Riggs wrote: > On Fri, 2005-07-15 at 00:24 +0300, Hannu Krosing wrote: > > On N, 2005-07-14 at 21:29 +0100, Simon Riggs wrote: > > > On Thu, 2005-07-14 at 00:13 +0300, Hannu Krosing wrote: > > > > On K, 2005-07-13 at 11:53 +0100, Simon Riggs wrote: > > > > > > > > We aren't able to exclude the parent table from the above query because > > > > > no Constraint was defined upon it. Since, in our example, the parent is > > > > > empty there will be little effect on the query performance. It would be > > > > > a mistake to attempt to get around this by placing a Constraint on the > > > > > parent, since that would then automatically be created on all child > > > > > tables also. So we can never exclude the parent without automatically > > > > > excluding *all* of the children also. > > > > > > > > At least in 8.0 you can drop the inherited constraint from child table: > > > > > > Yes, you can. But in discussion on -hackers on 20 May there was > > > agreement (OK, me agreeing with Tom) that being able to drop inherited > > > constraints was a bug (or at least a deprecated feature...) > > > > Why is it a bug ? > > > > Tom wrote on May 20: > > > I think a good argument can be made that the above behavior is a bug, > > > and that the ALTER command should have been rejected. We've gone to > > > great lengths to make sure you can't ALTER a child table to make it > > > incompatible with the parent in terms of the column names and types; > > > shouldn't this be true of check constraints as well? > > > > There is a good fundamental reason why we dont let people drop columns > > from children or to add them to parent ONLY: if we did not, then there > > would be no way to query from the hierarchy. > > > > I can't see any such reason for forbidding dropping constraints from > > child tables or disallowing CREATE CONSTRAINT ON parent ONLY , at least > > not for CHECK constraints. > > If we allow DROPing them, why allow them to be inherited in the first > place? One or the other, not both. > > I do still agree with Tom on that, but as I said at the time, I don't > see it as a big enough problem to spend time removing that feature. But > I personally wouldn't grow to rely on its existence either. > > There are other ways... > > > And even disallowing it can probably be circumvented by a clever > > functional CHECK constraint, which checks for also the table it is > > defined on. > > > > OTOH, disallowing all this for child FOREIGN KEY's seems logical though. > > > > OTOOH, to be symmetrical with previous, we should also have UNIQUE and > > PK constraints that span all the inheritance hierarchy, but I don't want > > to go there now :) > > Well, I am working towards that. > > Best Regards, Simon Riggs > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Hannu Krosing <hannu@skype.net>
On Fri, 2005-07-15 at 11:10 +0300, Hannu Krosing wrote: > Bruce: could you change this TODO item > > o Prevent child tables from altering constraints like CHECK that were > inherited from the parent table > > to > > o Prevent new child tables from inheriting constraints defined on > parents with keyword ONLY: > alter table ONLY t add constraint c check(x=y); > Prevent child tables from altering constraints that were > inherited from the parent table (i.e were defined without > ONLY/with ALL) My thought was to introduce a new type of constraint, ABSTRACT. This would not be inherited (it is not a check constraint). It would allow a table to be declared "will never contain rows". This fits better with the ideas and common understandings of inheritance. Best Regards, Simon Riggs
On Fri, 15 Jul 2005, Hannu Krosing wrote: > What comes to Toms's May 20 argument that people would be surprised when > they select form a table whith check(i>0) constraint and get out i<0 > then I think this is a question of education. I potentially disagree. What are we trying to model here? Systems which allow children to have values that are not in the domain of the parent are useful for some things, but they also potentially lose some other useful properties like substitutability since a child isn't a parent (or more specifically f(parent) cannot assume parent's invarients are valid on its argument which may be a child unless they are applied to the child at call time).
Simon Riggs <simon@2ndquadrant.com> writes: > In summary, the CE feature will be a huge performance gain for > qualifying queries against large tables in PostgreSQL databases. BTW, before we spend too much time chasing an emperor that may have no clothes, it's worth asking whether this feature is really going to buy anything in the real world. What is bothering me at the moment is the thought that the partitioning key would normally be indexed within each table, and that an indexscan that's off the end of the indexed range is cheap. For example, you write > Now, if we run our example query again > SELECT sum(soldqty) FROM Sales_DateItemOutlet > WHERE DateKey between 20050101 and 20050101 > we find that the query will > Scan all rows in Sales_DateItemOutlet (which is empty) > Scan all rows in Sales_Jan_DateItemOutlet > Scan all rows in Sales_Feb_DateItemOutlet > Scan all rows in Sales_Mar_DateItemOutlet but the "scan all rows" will only happen if no index is provided on DateKey in the child tables. Otherwise the planner will probably select plans like this: -> Index Scan using i1 on sales_jan_dateitemoutlet sales_dateitemoutlet (cost=0.00..5.98 rows=1 width=0) Index Cond: ((datekey >= 20050101) AND (datekey <= 20050101)) for each child table for which the statistics indicate that no rows are likely to be selected. This will fall through quite quickly in practice, meaning that the "huge performance gain" from not doing it at all is a bit oversold. (Note that it's already true that each child table is planned separately, so the plan for the partition that *is* targeted by the query may be different.) AFAICS, CE as proposed is only worth bothering with if the partitioning key is something you would not want to create indexes on; which does not strike me as a major use-case. It'd be more attractive if the methodology allowed an inheritance or union-all collection to be reduced to one single table (ie, CE gets rid of all but one collection member) and then that could be planned as if it were a primitive table entry (ie, no Append plan node). This doesn't help much for simple queries on the fact table but it is interesting for join cases, because if the Append is in the way there's no way to handle inner indexscan join plans. Unfortunately the patch as proposed is a long way from being able to do that, and given the current semantics of inherited constraints it's not even remotely feasible, since as you note we can't put a constraint on just the parent table. We could maybe do it for UNION ALL views, but the patch doesn't handle that case. So at the moment I'm feeling a bit dubious about the real value. regards, tom lane
Tom, On 7/22/05 3:32 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > but the "scan all rows" will only happen if no index is provided on > DateKey in the child tables. Otherwise the planner will probably > select plans like this: > > -> Index Scan using i1 on sales_jan_dateitemoutlet > sales_dateitemoutlet (cost=0.00..5.98 rows=1 width=0) Good point. > So at the moment I'm feeling a bit dubious about the real value. What about the need to run DML against the partition master? Partitioning itself is a big deal to BI/DW users, and to make it really usable will require that DML be automatically applied based on the mapping of partition master to the partitions. Does the CE strategy help with the implementation of automated DML? - Luke
On Fri, 2005-07-22 at 18:32 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > In summary, the CE feature will be a huge performance gain for > > qualifying queries against large tables in PostgreSQL databases. > > BTW, before we spend too much time chasing an emperor that may have no > clothes, it's worth asking whether this feature is really going to buy > anything in the real world. Perfectly valid thing to ask... > What is bothering me at the moment is the > thought that the partitioning key would normally be indexed within > each table, and that an indexscan that's off the end of the indexed > range is cheap. > > For example, you write > > > Now, if we run our example query again > > SELECT sum(soldqty) FROM Sales_DateItemOutlet > > WHERE DateKey between 20050101 and 20050101 > > we find that the query will > > Scan all rows in Sales_DateItemOutlet (which is empty) > > Scan all rows in Sales_Jan_DateItemOutlet > > Scan all rows in Sales_Feb_DateItemOutlet > > Scan all rows in Sales_Mar_DateItemOutlet > > but the "scan all rows" will only happen if no index is provided on > DateKey in the child tables. Otherwise the planner will probably > select plans like this: > > -> Index Scan using i1 on sales_jan_dateitemoutlet sales_dateitemoutlet (cost=0.00..5.98 rows=1 width=0) > Index Cond: ((datekey >= 20050101) AND (datekey <= 20050101)) > > for each child table for which the statistics indicate that no rows are > likely to be selected. This will fall through quite quickly in > practice, meaning that the "huge performance gain" from not doing it at > all is a bit oversold. (Note that it's already true that each child > table is planned separately, so the plan for the partition that *is* > targeted by the query may be different.) > > AFAICS, CE as proposed is only worth bothering with if the partitioning > key is something you would not want to create indexes on; which does not > strike me as a major use-case. Yes you can do that now, at cost, if you have range partitioning and you want indexes on every table on the partitioning key. That is a major use case but there are important variations on that theme. (I actually want to build upon that use case also, but more on that later). It's very common to scan whole ranges of dates on a large table, so in those cases you are really just maintaining the indexes for partitioning purposes. On older data it may be desirable not to have lots of indexes, or at least use their resources on the indexes they really do want. Also, if you have a List partitioned table where all rows in that table have a single value, then you maintain an index for no reason other than partitioning. Thats an expensive waste. Simply put, adding a constraint is faster and cheaper than adding an pointless index. CE gives people that option. Having constraints also allows (eventually) for us to have mutually exclusive constraints. That then allows a more optimal scan of constraints as well as some more advanced possibilities. How would we implement mutual exclusivity? Use the proving logic for CE, of course. In my experience, there is benefit from combined Range and List partitioning and that is a frequent design choice. That makes it harder to select which indexes to have: a single two-key index or 2 one-key indexes. A later objective is to have read-only tables on various kinds of media to allow occasional access. Partition-wise joins are also an eventual goal. The total feature set of partitioning is fairly large. The technique you've described is possible on most DBMS, yet almost all now or will shortly support Partitioning also. Many things are possible once the right declarative structures are created for the optimizer. CE is the basis on which to hang other forthcoming features; I feel happy that I got CE done in time for the 8.1 freeze. More things are to come. The emperor's clothes are all hand-made, one stitch at a time. > It'd be more attractive if the methodology allowed an inheritance or > union-all collection to be reduced to one single table (ie, CE gets rid > of all but one collection member) and then that could be planned as if > it were a primitive table entry (ie, no Append plan node). This doesn't > help much for simple queries on the fact table but it is interesting for > join cases, because if the Append is in the way there's no way to handle > inner indexscan join plans. That's been discussed on -hackers recently. I have that planned and am half-way through implementing it, but I wanted to get CE accepted first. > Unfortunately the patch as proposed is a long way from being able to do > that Well, I think its *very* close. > and given the current semantics of inherited constraints it's not > even remotely feasible, since as you note we can't put a constraint on > just the parent table. That's the bit that was discussed on -hackers. I suggested the use of the keyword ABSTRACT to denote a table that could be sub-classed but that would not allow any rows to be inserted into it. Such a table could be immediately excluded from any query, then we can do get rid of the Append... > We could maybe do it for UNION ALL views, but > the patch doesn't handle that case. Yet. I've discussed a simple implementation for that on -hackers based on feedback from the initial patch. Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > It's very common to scan whole ranges of dates on a large table, so in > those cases you are really just maintaining the indexes for partitioning > purposes. On older data it may be desirable not to have lots of indexes, > or at least use their resources on the indexes they really do want. > > Also, if you have a List partitioned table where all rows in that table > have a single value, then you maintain an index for no reason other than > partitioning. Thats an expensive waste. > > Simply put, adding a constraint is faster and cheaper than adding an > pointless index. CE gives people that option. Note also that the index is only useful if the index is *being used*. And index scans are much slower than sequential scans. So a query like "select * from invoices where fiscal_year = ?" is best implemented by doing a sequential scan across invoices_fy05. This is *much* faster than using indexes even if the indexes manage to speed up the empty partitions simply because an index scan across the full partition would be so much slower than a sequential scan. -- greg
Greg Stark <gsstark@mit.edu> writes: > Note also that the index is only useful if the index is *being used*. And > index scans are much slower than sequential scans. You miss my point entirely: an indexscan that hasn't got to retrieve any rows (because it has a constraint that points off the end of the index range) is extremely fast, and the planner will reliably detect that and use the index scan over a seqscan (assuming it has statistics showing the range of indexed values). And this decision is made separately for each child table, so the fact that a seqscan might be the best bet for the target partition doesn't stop the planner from using the indexscan in other partitions. However, Simon made a fair argument that there are useful cases where you don't need an index on a partitioning key, so my objection is answered. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > And this decision is made separately for each child table, so the fact that > a seqscan might be the best bet for the target partition doesn't stop the > planner from using the indexscan in other partitions. That was the detail I was missing. I'm surprised because I actually tested this before I sent the message and saw a plan like this with a single sequential scan node despite the three child tables: staging=> explain select * from _test where a=1; QUERY PLAN ------------------------------------------------------Seq Scan on _test (cost=0.00..22.50 rows=5 width=4) Filter: (a =1) (2 rows) [This is on 7.4, maybe the 8.0 plans are more explicit though I don't recall any mention of changes in that area] -- greg
Greg Stark <gsstark@mit.edu> writes: > That was the detail I was missing. I'm surprised because I actually tested > this before I sent the message and saw a plan like this with a single > sequential scan node despite the three child tables: > staging=> explain select * from _test where a=1; > QUERY PLAN > ------------------------------------------------------ > Seq Scan on _test (cost=0.00..22.50 rows=5 width=4) > Filter: (a = 1) > (2 rows) Uh, maybe you have sql_inheritance turned off? Every version I can remember would show you a pretty explicit Append plan for inheritance scans... regards, tom lane
Simon Riggs wrote: > in those cases you are really just maintaining the indexes for partitioning > purposes. On older data it may be desirable not to have lots of indexes, > or at least use their resources on the indexes they really do want. > > Also, if you have a List partitioned table where all rows in that table > have a single value, then you maintain an index for no reason other than > partitioning. Thats an expensive waste. > > Simply put, adding a constraint is faster and cheaper than adding an > pointless index. CE gives people that option. It seems with a partial index that whose partial index condition specifies a range outside that of a partition would make the expense much cheaper. For example, if I created a couple partial indexes ON sales_2005(year) WHERE year<2005 ON sales_2005(year) WHERE year>2005 I would think it would be a very cheap index to maintain (they'd be very small because they're empty, I'd think) and give many of the same benefits for excluding tables as a non-partial index on year would have given. Ron I like the other features Simon mentioned, though, that sound like they're based on these constraints.
On Sun, 2005-07-24 at 06:44 -0700, Ron Mayer wrote: > Simon Riggs wrote: > > in those cases you are really just maintaining the indexes for partitioning > > purposes. On older data it may be desirable not to have lots of indexes, > > or at least use their resources on the indexes they really do want. > > > > Also, if you have a List partitioned table where all rows in that table > > have a single value, then you maintain an index for no reason other than > > partitioning. Thats an expensive waste. > > > > Simply put, adding a constraint is faster and cheaper than adding an > > pointless index. CE gives people that option. > > > It seems with a partial index that whose partial index condition > specifies a range outside that of a partition would make the expense > much cheaper. > > For example, if I created a couple partial indexes > > ON sales_2005(year) WHERE year<2005 > ON sales_2005(year) WHERE year>2005 > > I would think it would be a very cheap index to maintain > (they'd be very small because they're empty, I'd think) > and give many of the same benefits for excluding tables > as a non-partial index on year would have given. The indexes would only be empty if you could guarantee that no rows were inserted that actually did match the index criteria. How would you do that? Check Constraints. So you'd have a more complex arrangement than if you used CE. Internally, CE uses very similar logic to that used by partial indexes and they actually share sections of code. The logic is reversed however, so you are looking for tables to provably exclude and that gives different results from the positive-proof case. That means that partial indexes wouldn't be able to be used in all cases and you'd end up with sequential scans: exactly what we are trying to avoid. ...Check Constraints are very, very cheap to maintain. Best Regards, Simon Riggs
Tom Lane <tgl@sss.pgh.pa.us> writes: > Uh, maybe you have sql_inheritance turned off? Every version I can > remember would show you a pretty explicit Append plan for inheritance > scans... staging=> show sql_inheritance;sql_inheritance -----------------on (1 row) Maybe I'm doing something more obvious wrong? db=> create table _test (a integer); CREATE TABLE db=> create table _test2 (like _test); CREATE TABLE db=> create table _test3 (like _test); CREATE TABLE db=> create index _i on _test2 (a); CREATE INDEX db=> explain select * from _test where a=1; QUERY PLAN ------------------------------------------------------ Seq Scan on _test (cost=0.00..22.50 rows=5 width=4) Filter: (a= 1) (2 rows) -- greg
On Sun, 2005-07-24 at 15:00 -0400, Greg Stark wrote: > Maybe I'm doing something more obvious wrong? > > db=> create table _test (a integer); > CREATE TABLE > > db=> create table _test2 (like _test); > CREATE TABLE Yes, unfortunately. You need the phrase "INHERITS (_test)" after the bracket and before the ; for any tables you want to be children of _test, such as _test2. The syntax "like..." creates a similar table, but with no linkage between parent and children. Best Regards, Simon Riggs