Re: A Guide to Constraint Exclusion (Partitioning) - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: A Guide to Constraint Exclusion (Partitioning) |
Date | |
Msg-id | 1122105739.21502.265.camel@localhost.localdomain Whole thread Raw |
In response to | Re: A Guide to Constraint Exclusion (Partitioning) (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: A Guide to Constraint Exclusion (Partitioning)
Re: A Guide to Constraint Exclusion (Partitioning) |
List | pgsql-hackers |
On Fri, 2005-07-22 at 18:32 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > In summary, the CE feature will be a huge performance gain for > > qualifying queries against large tables in PostgreSQL databases. > > BTW, before we spend too much time chasing an emperor that may have no > clothes, it's worth asking whether this feature is really going to buy > anything in the real world. Perfectly valid thing to ask... > What is bothering me at the moment is the > thought that the partitioning key would normally be indexed within > each table, and that an indexscan that's off the end of the indexed > range is cheap. > > For example, you write > > > Now, if we run our example query again > > SELECT sum(soldqty) FROM Sales_DateItemOutlet > > WHERE DateKey between 20050101 and 20050101 > > we find that the query will > > Scan all rows in Sales_DateItemOutlet (which is empty) > > Scan all rows in Sales_Jan_DateItemOutlet > > Scan all rows in Sales_Feb_DateItemOutlet > > Scan all rows in Sales_Mar_DateItemOutlet > > but the "scan all rows" will only happen if no index is provided on > DateKey in the child tables. Otherwise the planner will probably > select plans like this: > > -> Index Scan using i1 on sales_jan_dateitemoutlet sales_dateitemoutlet (cost=0.00..5.98 rows=1 width=0) > Index Cond: ((datekey >= 20050101) AND (datekey <= 20050101)) > > for each child table for which the statistics indicate that no rows are > likely to be selected. This will fall through quite quickly in > practice, meaning that the "huge performance gain" from not doing it at > all is a bit oversold. (Note that it's already true that each child > table is planned separately, so the plan for the partition that *is* > targeted by the query may be different.) > > AFAICS, CE as proposed is only worth bothering with if the partitioning > key is something you would not want to create indexes on; which does not > strike me as a major use-case. Yes you can do that now, at cost, if you have range partitioning and you want indexes on every table on the partitioning key. That is a major use case but there are important variations on that theme. (I actually want to build upon that use case also, but more on that later). It's very common to scan whole ranges of dates on a large table, so in those cases you are really just maintaining the indexes for partitioning purposes. On older data it may be desirable not to have lots of indexes, or at least use their resources on the indexes they really do want. Also, if you have a List partitioned table where all rows in that table have a single value, then you maintain an index for no reason other than partitioning. Thats an expensive waste. Simply put, adding a constraint is faster and cheaper than adding an pointless index. CE gives people that option. Having constraints also allows (eventually) for us to have mutually exclusive constraints. That then allows a more optimal scan of constraints as well as some more advanced possibilities. How would we implement mutual exclusivity? Use the proving logic for CE, of course. In my experience, there is benefit from combined Range and List partitioning and that is a frequent design choice. That makes it harder to select which indexes to have: a single two-key index or 2 one-key indexes. A later objective is to have read-only tables on various kinds of media to allow occasional access. Partition-wise joins are also an eventual goal. The total feature set of partitioning is fairly large. The technique you've described is possible on most DBMS, yet almost all now or will shortly support Partitioning also. Many things are possible once the right declarative structures are created for the optimizer. CE is the basis on which to hang other forthcoming features; I feel happy that I got CE done in time for the 8.1 freeze. More things are to come. The emperor's clothes are all hand-made, one stitch at a time. > It'd be more attractive if the methodology allowed an inheritance or > union-all collection to be reduced to one single table (ie, CE gets rid > of all but one collection member) and then that could be planned as if > it were a primitive table entry (ie, no Append plan node). This doesn't > help much for simple queries on the fact table but it is interesting for > join cases, because if the Append is in the way there's no way to handle > inner indexscan join plans. That's been discussed on -hackers recently. I have that planned and am half-way through implementing it, but I wanted to get CE accepted first. > Unfortunately the patch as proposed is a long way from being able to do > that Well, I think its *very* close. > and given the current semantics of inherited constraints it's not > even remotely feasible, since as you note we can't put a constraint on > just the parent table. That's the bit that was discussed on -hackers. I suggested the use of the keyword ABSTRACT to denote a table that could be sub-classed but that would not allow any rows to be inserted into it. Such a table could be immediately excluded from any query, then we can do get rid of the Append... > We could maybe do it for UNION ALL views, but > the patch doesn't handle that case. Yet. I've discussed a simple implementation for that on -hackers based on feedback from the initial patch. Best Regards, Simon Riggs
pgsql-hackers by date: