Thread: Partitioning and constraint exclusion

Partitioning and constraint exclusion

From
Samuel Smith
Date:
Howdy,
I spent a majority of today playing around with pg_partman (awesome tool
btw!). I am mainly using the time-static method with an interval of one
month.
I wanted to see what performance improvements I could get with some
common queries that are used by our analytics team. A lot of these
queries summarize data by day or by month. Our largest database gets 10+
million rows a day to several different tables (each). I played around
with a subset of the data spread across about 6 months (about 10 million
rows total).

I noticed that I could get very nice partition elimination using
constant values in the where clause.

Ex:
select * from <table> where <constraint_col> between '2015-01-01' and
'2015-02-15'

However, I could not get any partition elimination for queries that did
not have constant values in the where clause.

Ex:
select * from <table> where <constraint_col> >= (select max(date) from
<other_table>)

Unfortunately all of our queries on the analytics team need to be
dynamic like this and summarize data based around certain recorded
events and dates from other tables. I saw the note in the docs about not
being able to use current_timestamp in the where clause but I really
need to be able to use a sub select or CTE in the where clause for the
needed dates.

I tried about 10 different ways (on both 9.1 and 9.4) to dynamically get
the data (sub selects, cte, joins) for my constraint column but all of
them resulted in a full scan of all partitions.

I am kind of bummed out by this as dropping in partitioning in this
method will just hurt performance and not improve it. The only good
thing I see is the ability to delete (drop) older data from the table,
but this is not a functionality we need right now.

I am going to try a few other ways tomorrow, I am hoping I am doing
something wrong, or is this just typical?

Thanks,
Sam


Re: Partitioning and constraint exclusion

From
David G Johnston
Date:
Samuel Smith wrote
> I noticed that I could get very nice partition elimination using
> constant values in the where clause.
>
> Ex:
> select * from

>  where
> <constraint_col>
>  between '2015-01-01' and
> '2015-02-15'
>
> However, I could not get any partition elimination for queries that did
> not have constant values in the where clause.
>
> Ex:
> select * from

>  where
> <constraint_col>
>  >= (select max(date) from
> <other_table>
> )
>
> Unfortunately all of our queries on the analytics team need to be
> dynamic like this and summarize data based around certain recorded
> events and dates from other tables. I saw the note in the docs about not
> being able to use current_timestamp in the where clause but I really
> need to be able to use a sub select or CTE in the where clause for the
> needed dates.
>
> I tried about 10 different ways (on both 9.1 and 9.4) to dynamically get
> the data (sub selects, cte, joins) for my constraint column but all of
> them resulted in a full scan of all partitions.
>
> I am going to try a few other ways tomorrow, I am hoping I am doing
> something wrong, or is this just typical?

In short - since the planner determines exclusion constraints and the
executor, which strictly follows the planner in the query execution process,
would be the one to determine what the value of your date is - there is no
way for a single query to provide data that would then be used to determine
exclusion constraints.

Now, that said, I don't believe you should be actual full table scans during
processing if you have proper indexes setup.  An index scan should be usable
and quickly determine which tables lack data to contribute to the query
results.

In terms of separating out the date query and partition query:

PREPARE/EXECUTE in pure SQL (can, must?, be direct)
EXECUTE/USING in pl/pgsql (via a function)

You may have other reasonable options on the client side...

You should consider providing EXPLAIN ANALYZE results and maybe a test case
so others can give pointers.

David J.




--
View this message in context: http://postgresql.nabble.com/Partitioning-and-constraint-exclusion-tp5840353p5840356.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Partitioning and constraint exclusion

From
Stephen Frost
Date:
Samuel,

* Samuel Smith (pgsql@net153.net) wrote:
> I noticed that I could get very nice partition elimination using
> constant values in the where clause.
>
> Ex:
> select * from <table> where <constraint_col> between '2015-01-01'
> and '2015-02-15'
>
> However, I could not get any partition elimination for queries that
> did not have constant values in the where clause.
>
> Ex:
> select * from <table> where <constraint_col> >= (select max(date)
> from <other_table>)

That's correct.

> Unfortunately all of our queries on the analytics team need to be
> dynamic like this and summarize data based around certain recorded
> events and dates from other tables. I saw the note in the docs about
> not being able to use current_timestamp in the where clause but I
> really need to be able to use a sub select or CTE in the where
> clause for the needed dates.

Not sure if this will help, but the planner is smart enough to implement
one-time filters for certain cases.  Instead of using inheiritance-based
partitioning, you can use a view like so:

CREATE VIEW v AS
  SELECT * FROM table1 WHERE column1 = 5
UNION ALL
  SELECT * FROM table2 WHERE column1 = 6
;

Then for cases where we can prove that no results will be returned from
the individual union-all branch, we'll skip it:

SELECT * FROM v WHERE column1 = (select max(column1) from table3);

Unfortunately, we don't appear to support that for an inequality as you
show above.  I'm not sure offhand why not but it didn't work in my
testing.

Another approach to dealing with this is to use plpgsql functions and
'return execute' which essentially compute the constant and then build a
dyanmic SQL query using the constant and return the results.  It's a bit
awkward compared to just writing the query, but it does work.

    Thanks!

        Stephen

Attachment

Re: Partitioning and constraint exclusion

From
Samuel Smith
Date:
On 03/04/2015 12:11 AM, David G Johnston wrote:
>
> In short - since the planner determines exclusion constraints and the
> executor, which strictly follows the planner in the query execution process,
> would be the one to determine what the value of your date is - there is no
> way for a single query to provide data that would then be used to determine
> exclusion constraints.
>
> Now, that said, I don't believe you should be actual full table scans during
> processing if you have proper indexes setup.  An index scan should be usable
> and quickly determine which tables lack data to contribute to the query
> results.
>
> In terms of separating out the date query and partition query:
>
> PREPARE/EXECUTE in pure SQL (can, must?, be direct)
> EXECUTE/USING in pl/pgsql (via a function)
>
> You may have other reasonable options on the client side...
>
> You should consider providing EXPLAIN ANALYZE results and maybe a test case
> so others can give pointers.
>
> David J.
>


Just to correct, the constraint column is also an indexed column. What I
meant by "scanning all partitions" was that the index of all the
partitions was being checked. Not the biggest issue, but still unwanted.

I spent most of yesterday comparing how DB2 handles table partitions and
running similar explain queries. It indeed can do partition elimination
when the values of the constraint column are coming from a sub query or
other none constant value.

Does anyone know if there is a wishlist item for improving this in
postgresql or is this as good as it gets for now?

I can work around the issues, but it just kind of caught me off guard
since there are so many nice benchmarks on the web with partitioning in
postgresql (and now that I go back an look at them, they all have
constants in the where clause).


--Sam


Re: Partitioning and constraint exclusion

From
Stephen Frost
Date:
Sam,

* Samuel Smith (pgsql@net153.net) wrote:
> Does anyone know if there is a wishlist item for improving this in
> postgresql or is this as good as it gets for now?

It's absolutely on the todo list for PG to support declarative
partitioning and handle these cases better.  There has been a good sized
thread very recently discussing how to make that happen over on the
hackers mailing list recently and while it almost certainly won't be in
the next release (expected to be 9.5, released around September or
October), it may very well be in the release the following year.

    Thanks!

        Stephen

Attachment