Re: A Guide to Constraint Exclusion (Partitioning) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: A Guide to Constraint Exclusion (Partitioning)
Date
Msg-id 17011.1122071572@sss.pgh.pa.us
Whole thread Raw
In response to A Guide to Constraint Exclusion (Partitioning)  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: A Guide to Constraint Exclusion (Partitioning)  ("Luke Lonergan" <llonergan@greenplum.com>)
Re: A Guide to Constraint Exclusion (Partitioning)  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
Simon Riggs <simon@2ndquadrant.com> writes:
> In summary, the CE feature will be a huge performance gain for
> qualifying queries against large tables in PostgreSQL databases.

BTW, before we spend too much time chasing an emperor that may have no
clothes, it's worth asking whether this feature is really going to buy
anything in the real world.  What is bothering me at the moment is the
thought that the partitioning key would normally be indexed within
each table, and that an indexscan that's off the end of the indexed
range is cheap.

For example, you write

> Now, if we run our example query again
>     SELECT sum(soldqty) FROM Sales_DateItemOutlet
>     WHERE DateKey between 20050101 and 20050101
> we find that the query will
>     Scan all rows in Sales_DateItemOutlet (which is empty)
>     Scan all rows in Sales_Jan_DateItemOutlet
>     Scan all rows in Sales_Feb_DateItemOutlet
>     Scan all rows in Sales_Mar_DateItemOutlet

but the "scan all rows" will only happen if no index is provided on
DateKey in the child tables.  Otherwise the planner will probably
select plans like this:
       ->  Index Scan using i1 on sales_jan_dateitemoutlet sales_dateitemoutlet  (cost=0.00..5.98 rows=1 width=0)
      Index Cond: ((datekey >= 20050101) AND (datekey <= 20050101))
 

for each child table for which the statistics indicate that no rows are
likely to be selected.  This will fall through quite quickly in
practice, meaning that the "huge performance gain" from not doing it at
all is a bit oversold.  (Note that it's already true that each child
table is planned separately, so the plan for the partition that *is*
targeted by the query may be different.)

AFAICS, CE as proposed is only worth bothering with if the partitioning
key is something you would not want to create indexes on; which does not
strike me as a major use-case.

It'd be more attractive if the methodology allowed an inheritance or
union-all collection to be reduced to one single table (ie, CE gets rid
of all but one collection member) and then that could be planned as if
it were a primitive table entry (ie, no Append plan node).  This doesn't
help much for simple queries on the fact table but it is interesting for
join cases, because if the Append is in the way there's no way to handle
inner indexscan join plans.

Unfortunately the patch as proposed is a long way from being able to do
that, and given the current semantics of inherited constraints it's not
even remotely feasible, since as you note we can't put a constraint on
just the parent table.  We could maybe do it for UNION ALL views, but
the patch doesn't handle that case.

So at the moment I'm feeling a bit dubious about the real value.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Dawid Kuroczko
Date:
Subject: Re: [PERFORM] Planner doesn't look at LIMIT?
Next
From: Alvaro Herrera
Date:
Subject: Re: Autovacuum loose ends