Re: Transparent table partitioning in future version of PG? - Mailing list pgsql-performance

From david@lang.hm
Subject Re: Transparent table partitioning in future version of PG?
Date
Msg-id alpine.DEB.1.10.0905071932050.15782@asgard
Whole thread Raw
In response to Re: Transparent table partitioning in future version of PG?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Transparent table partitioning in future version of PG?  (Robert Haas <robertmhaas@gmail.com>)
Re: Transparent table partitioning in future version of PG?  (Scott Carey <scott@richrelevance.com>)
List pgsql-performance
On Thu, 7 May 2009, Robert Haas wrote:

> On Wed, May 6, 2009 at 6:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Agreed. Perhaps I should say then that the syntax needs to express the
>> requirements of the planner/executor behaviour, rather than being the
>> main aspect of the feature, as some have suggested.
>
> Agreed.
>
>> Hopefully, notions of partitioning won't be directly tied to chunking of
>> data for parallel query access. Most queries access recent data and
>> hence only a single partition (or stripe), so partitioning and
>> parallelism and frequently exactly orthogonal.
>
> Yes, I think those things are unrelated.

I'm not so sure (warning, I am relativly inexperianced in this area)

it sounds like you can take two basic approaches to partition a database

1. The Isolation Plan

    you want to have it so that your queries match your partitioning.

   this is with the goal of only having to query a small number of
paritions, minimizing the total amount of data touched (including
minimumizing the number of indexes searched)

  this matches the use case mentioned above, with the partition based on
date and only looking at the most recent date range.

2. The Load Balancing Plan

   you want to have your partitioning and your queries _not_ match as much
as possible

   this is with the goal of having the query hit as many partitions as
possible, so that the different parts of the search can happen in parallel


However, with either partitioning plan, you will have queries that
degenerate to look like the other plan.

In the case of the isolation plan, you may need to search for all
instances of a rare thing over the entire history (after all, if you never
need to access that history, why do you pay for disks to store it? ;-)

and even when you are searching a narrow time window, it may still span
multiple partitions. I have a log analysis setup using the Splunk
prioriatary database, it paritions by time, creating a new parition as the
current one hits a configurable size (by default 10G on 64 bit systems).
for my volume of logs I end up with each parition only covering a few
hours. it's very common to want to search over a few days, which can be a
few dozen partitions (this is out of many hundreds of partitions, so it's
still a _huge_ win to narrow the timeframe)


In the case of the load balancing plan, you may run into a query that
happens to only fall into one partition (the query matches your
paritioning logic)




I think the only real difference is how common it is to need to search
multiple partitions.

If the expectation is that you will frequently need to search most/all of
the partitions (the load balancing plan), then it's a waste of time to
analyse the query to try and figure out which paritions you need to look
at.

If the expectation is that you will frequently only need to search a small
number of the partitions (the isolation plan), then it's extremely valuble
to spend as much time as needed working to analyse the query to try and
figure out which partitions you need to look at.


I believe that the isolation plan is probably more common than the load
balancing plan, but I don't see them as being that different for the
database engine point of view. To tune a system that can handle the
isolation plan for load balancing, the key thing to do would be to have a
knob to disable the partition planning, and just blindly send the search
out to every partition.

David Lang

pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: Transparent table partitioning in future version of PG?
Next
From: Rohan Pethkar
Date:
Subject: Rohan Pethkar sent you a Friend Request on Yaari