Re: A Guide to Constraint Exclusion (Partitioning) - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: A Guide to Constraint Exclusion (Partitioning)
Date
Msg-id 1122105739.21502.265.camel@localhost.localdomain
Whole thread Raw
In response to Re: A Guide to Constraint Exclusion (Partitioning)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: A Guide to Constraint Exclusion (Partitioning)
Re: A Guide to Constraint Exclusion (Partitioning)
List pgsql-hackers
On Fri, 2005-07-22 at 18:32 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > In summary, the CE feature will be a huge performance gain for
> > qualifying queries against large tables in PostgreSQL databases.
> 
> BTW, before we spend too much time chasing an emperor that may have no
> clothes, it's worth asking whether this feature is really going to buy
> anything in the real world.  

Perfectly valid thing to ask...

> What is bothering me at the moment is the
> thought that the partitioning key would normally be indexed within
> each table, and that an indexscan that's off the end of the indexed
> range is cheap.
> 
> For example, you write
> 
> > Now, if we run our example query again
> >     SELECT sum(soldqty) FROM Sales_DateItemOutlet
> >     WHERE DateKey between 20050101 and 20050101
> > we find that the query will
> >     Scan all rows in Sales_DateItemOutlet (which is empty)
> >     Scan all rows in Sales_Jan_DateItemOutlet
> >     Scan all rows in Sales_Feb_DateItemOutlet
> >     Scan all rows in Sales_Mar_DateItemOutlet
> 
> but the "scan all rows" will only happen if no index is provided on
> DateKey in the child tables.  Otherwise the planner will probably
> select plans like this:
> 
>         ->  Index Scan using i1 on sales_jan_dateitemoutlet sales_dateitemoutlet  (cost=0.00..5.98 rows=1 width=0)
>                Index Cond: ((datekey >= 20050101) AND (datekey <= 20050101))
> 
> for each child table for which the statistics indicate that no rows are
> likely to be selected.  This will fall through quite quickly in
> practice, meaning that the "huge performance gain" from not doing it at
> all is a bit oversold.  (Note that it's already true that each child
> table is planned separately, so the plan for the partition that *is*
> targeted by the query may be different.)
> 
> AFAICS, CE as proposed is only worth bothering with if the partitioning
> key is something you would not want to create indexes on; which does not
> strike me as a major use-case.

Yes you can do that now, at cost, if you have range partitioning and you
want indexes on every table on the partitioning key. That is a major use
case but there are important variations on that theme. (I actually want
to build upon that use case also, but more on that later).

It's very common to scan whole ranges of dates on a large table, so in
those cases you are really just maintaining the indexes for partitioning
purposes. On older data it may be desirable not to have lots of indexes,
or at least use their resources on the indexes they really do want.

Also, if you have a List partitioned table where all rows in that table
have a single value, then you maintain an index for no reason other than
partitioning. Thats an expensive waste. 

Simply put, adding a constraint is faster and cheaper than adding an
pointless index. CE gives people that option.

Having constraints also allows (eventually) for us to have mutually
exclusive constraints. That then allows a more optimal scan of
constraints as well as some more advanced possibilities. How would we
implement mutual exclusivity? Use the proving logic for CE, of course.

In my experience, there is benefit from combined Range and List
partitioning and that is a frequent design choice. That makes it harder
to select which indexes to have: a single two-key index or 2 one-key
indexes.

A later objective is to have read-only tables on various kinds of media
to allow occasional access. Partition-wise joins are also an eventual
goal.

The total feature set of partitioning is fairly large. The technique
you've described is possible on most DBMS, yet almost all now or will
shortly support Partitioning also. Many things are possible once the
right declarative structures are created for the optimizer.

CE is the basis on which to hang other forthcoming features; I feel
happy that I got CE done in time for the 8.1 freeze. More things are to
come. The emperor's clothes are all hand-made, one stitch at a time.

> It'd be more attractive if the methodology allowed an inheritance or
> union-all collection to be reduced to one single table (ie, CE gets rid
> of all but one collection member) and then that could be planned as if
> it were a primitive table entry (ie, no Append plan node).  This doesn't
> help much for simple queries on the fact table but it is interesting for
> join cases, because if the Append is in the way there's no way to handle
> inner indexscan join plans.

That's been discussed on -hackers recently.

I have that planned and am half-way through implementing it, but I
wanted to get CE accepted first.

> Unfortunately the patch as proposed is a long way from being able to do
> that

Well, I think its *very* close.

> and given the current semantics of inherited constraints it's not
> even remotely feasible, since as you note we can't put a constraint on
> just the parent table.  

That's the bit that was discussed on -hackers. I suggested the use of
the keyword ABSTRACT to denote a table that could be sub-classed but
that would not allow any rows to be inserted into it. Such a table could
be immediately excluded from any query, then we can do get rid of the
Append...

> We could maybe do it for UNION ALL views, but
> the patch doesn't handle that case.

Yet. I've discussed a simple implementation for that on -hackers based
on feedback from the initial patch.

Best Regards, Simon Riggs




pgsql-hackers by date:

Previous
From: "Luke Lonergan"
Date:
Subject: Re: A Guide to Constraint Exclusion (Partitioning)
Next
From: Sam Mason
Date:
Subject: Re: [PERFORM] Planner doesn't look at LIMIT?