Re: nested partitioning - Mailing list pgsql-general

From Gabriel E. Sánchez Martínez
Subject Re: nested partitioning
Date
Msg-id 5239CF54.4000208@gmail.com
Whole thread Raw
In response to Re: nested partitioning  (Shaun Thomas <sthomas@optionshouse.com>)
Responses Re: nested partitioning  (Shaun Thomas <sthomas@optionshouse.com>)
List pgsql-general
On 09/18/2013 11:34 AM, Shaun Thomas wrote:
> On 09/18/2013 07:45 AM, "Gabriel E. Sánchez Martínez" wrote:
>
>> Now to the questions.  Would constraint exclusion work in a tree
>> fashion to prune tables from a query plan?
>
> Constraint exclusion works by checking the constraints themselves. So
> if your constraints are based on date, they will always work as
> expected. Assuming they're generated, you could easily check down to
> the day level by >= desired day, and < next day.
>
> With that in mind, if you queried a specific day, the query would
> remove every partition but the relevant day (or range of days).
I understand that, but I want to know whether all the child day tables
would be checked for exclusion, or if instead entire groups of tables
can be excluded by checking the year and month first, and then checking
the days only for the queried year-month.  For 10 years of data, that is
a difference between checking 10 years + 12 months + 30 days = 52
partitions vs. checking all 3652 partitions.

Example: Querying for data of 2012-01-01.

Would a query plan prune tables in a tree fashion, as follows?

#
Check partitions 2000, 2001, 2002, ... , 2013.  14 checks.  All years
but 2012 are excluded.  Approximately 4748 partitions excluded with only
14 constraint exclusion checks.

Check partitions 2012-01, 2012-02, ... , 2012-12.  12 checks.  All
months but 2012-01 are excluded.  325 partitions excluded with only 12
constraint exclusion checks.

Check partitions 2012-01-01, 2012-01-02, ... , 2012-01-31. 31 checks.
All days but 2012-01-01 are excluded.  30 partitions excluded with 31
constraint exclusion checks.

Sequential scan on partition 2012-01-01.
#

Or would it have to check every child partition, as follows?

#
Check partition 2000-01-01, 2000-01-02, ... , 2013-09-18.  5010 checks.
All days but 2012-01-01 are excluded.  5009 partitions excluded with
5010 constraint exclusion checks.

Sequential scan on partition 2012-01-01.
#

The former would be much more efficient.

>
>> One can see that with a decade of data stored, there would be
>> approximately 3652 partitions. The documentation discourages having
>> this many partitions.
>
> And you shouldn't. In our partitions, we try to keep it around one
> partition per month. I've recently written a library that eschews
> triggers in favor of a nightly movement job to redistribute data, and
> it can handle partition granularity on a sliding scale. The default is
> one month, but my tests show it works properly down to 1 day. Still,
> the lowest I'd use it is probably 1 week. That would still drastically
> reduce your amount of estimated partitions.
>
> However you decide to do it, don't optimize prematurely. Monthly
> partitions are enough for us and our 130GB table with 250M rows, and
> that's for a single year.
>
>



pgsql-general by date:

Previous
From: Shaun Thomas
Date:
Subject: Re: nested partitioning
Next
From: Shaun Thomas
Date:
Subject: Re: Something Weird Going on with VACUUM ANALYZE