Thread: non-overlapping, consecutive partitions

non-overlapping, consecutive partitions

From
Hans-Jürgen Schönig
Date:
hello everybody,

i have just come across some issue which has been bugging me for a while.
consider:
SELECT * FROM foo ORDER BY bar;

if we have an index on bar, we can nicely optimize away the sort step by consulting the index - a btree will return
sortedoutput. 
under normal circumstances it will be seq->sort but doing some config settings we can turn this into an index scan
nicelyto avoid to the sort (disk space is my issue here). 

this is not so easy anymore:
create table foo ( x date );create table foo_2010 () INHERITS (foo)create table foo_2009 () INHERITS (foo)create table
foo_2008() INHERITS (foo) 

now we add constraints to make sure that data is only in 2008, 2009 and 2010.
we assume that everything is indexed:

SELECT * FROM foo ORDER BY bar  will now demand an ugly sort for this data.
this is not an option if you need more than a handful of rows ...

if constraints are non overlapping and if they are based on a "sortable" data type, we might be able to scan one index
afterthe other and get a sorted list. 
why is this an issue? imagine a case where you want to do billing, eg. some phone calls. the job now is: the last 10
callsof a customer are free and you want to sum up those which are not free. 
to do that you basically need a sorted list per customer. if you have data here which is partitioned over time you are
screwedup because you want to return a sorted list taken from X partitions to some higher level operation (windowing or
whatever).
resorting vast amounts of data is a killer here. in the particular case i am talking about my problem is roughly 2 TB
scaledout to some PL/proxy farm. 

does anybody see a solution to this problem?
what are the main showstoppers to make something like this work?
many thanks,
    hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de



Re: non-overlapping, consecutive partitions

From
Marko Tiikkaja
Date:
On 7/23/2010 11:04 PM, Hans-Jürgen Schönig wrote:
> does anybody see a solution to this problem?
> what are the main showstoppers to make something like this work?

I think we should absolutely make this work when we have a good 
partitioning implementation.

That said, I don't think it's wise to put a lot of effort into making 
this work with our current partitioning method when the partitioning 
patches are just around the corner.  The developer time should be 
directed at those patches instead.


Regards,
Marko Tiikkaja


Re: non-overlapping, consecutive partitions

From
Martijn van Oosterhout
Date:
On Fri, Jul 23, 2010 at 10:04:00PM +0200, Hans-Jürgen Schönig wrote:
>     create table foo ( x date );
>     create table foo_2010 () INHERITS (foo)
>     create table foo_2009 () INHERITS (foo)
>     create table foo_2008 () INHERITS (foo)
>
> now we add constraints to make sure that data is only in 2008, 2009 and 2010.
> we assume that everything is indexed:
>
> SELECT * FROM foo ORDER BY bar  will now demand an ugly sort for this data.
> this is not an option if you need more than a handful of rows ...

I think the right way to approach this is to teach the planner about
merge sorts. This is, if the planner has path to foo_* all ordered by
the same key (because they have the same indexes) then it has a path to
the UNION of those tables simply by merging the results of those paths.

This would be fairly straight forward to implement I think, you may
even be able to reuse the merge sort in the normal sort machinery.
(You'll need to watch out for UNION vs UNION ALL.)

The real advantage of this approach is that you no longer have to prove
anything about the constraints or various datatypes and it is more
general. Say you have partitioned by start_date but you want to sort by
end_date, simple index scanning won't work while a merge sort will work
beautifully.

You're also not limited to how the partitioning machinery will
eventually work.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle

Re: non-overlapping, consecutive partitions

From
PostgreSQL - Hans-Jürgen Schönig
Date:
On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote:

> On Fri, Jul 23, 2010 at 10:04:00PM +0200, Hans-Jürgen Schönig wrote:
>>     create table foo ( x date );
>>     create table foo_2010 () INHERITS (foo)
>>     create table foo_2009 () INHERITS (foo)
>>     create table foo_2008 () INHERITS (foo)
>>
>> now we add constraints to make sure that data is only in 2008, 2009 and 2010.
>> we assume that everything is indexed:
>>
>> SELECT * FROM foo ORDER BY bar  will now demand an ugly sort for this data.
>> this is not an option if you need more than a handful of rows ...
>
> I think the right way to approach this is to teach the planner about
> merge sorts. This is, if the planner has path to foo_* all ordered by
> the same key (because they have the same indexes) then it has a path to
> the UNION of those tables simply by merging the results of those paths.
>
> This would be fairly straight forward to implement I think, you may
> even be able to reuse the merge sort in the normal sort machinery.
> (You'll need to watch out for UNION vs UNION ALL.)
>
> The real advantage of this approach is that you no longer have to prove
> anything about the constraints or various datatypes and it is more
> general. Say you have partitioned by start_date but you want to sort by
> end_date, simple index scanning won't work while a merge sort will work
> beautifully.
>
> You're also not limited to how the partitioning machinery will
> eventually work.
>
> Hope this helps,


i think this is excellent input.
i will do some research going into that direction.
many thanks,
    hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de



Re: non-overlapping, consecutive partitions

From
Robert Haas
Date:
2010/7/25 PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at>:
>
> On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote:
>
>> On Fri, Jul 23, 2010 at 10:04:00PM +0200, Hans-Jürgen Schönig wrote:
>>>      create table foo ( x date );
>>>      create table foo_2010 () INHERITS (foo)
>>>      create table foo_2009 () INHERITS (foo)
>>>      create table foo_2008 () INHERITS (foo)
>>>
>>> now we add constraints to make sure that data is only in 2008, 2009 and 2010.
>>> we assume that everything is indexed:
>>>
>>> SELECT * FROM foo ORDER BY bar  will now demand an ugly sort for this data.
>>> this is not an option if you need more than a handful of rows ...
>>
>> I think the right way to approach this is to teach the planner about
>> merge sorts. This is, if the planner has path to foo_* all ordered by
>> the same key (because they have the same indexes) then it has a path to
>> the UNION of those tables simply by merging the results of those paths.
>>
>> This would be fairly straight forward to implement I think, you may
>> even be able to reuse the merge sort in the normal sort machinery.
>> (You'll need to watch out for UNION vs UNION ALL.)
>>
>> The real advantage of this approach is that you no longer have to prove
>> anything about the constraints or various datatypes and it is more
>> general. Say you have partitioned by start_date but you want to sort by
>> end_date, simple index scanning won't work while a merge sort will work
>> beautifully.
>>
>> You're also not limited to how the partitioning machinery will
>> eventually work.
>>
>> Hope this helps,
>
>
> i think this is excellent input.
> i will do some research going into that direction.

Greg Stark had a patch to do this a while back called merge append,
but it never got finished...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: non-overlapping, consecutive partitions

From
Greg Stark
Date:
2010/7/25 Robert Haas <robertmhaas@gmail.com>:
> 2010/7/25 PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at>:
>>
>> On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote:
>>
>>> I think the right way to approach this is to teach the planner about
>>> merge sorts.

For what it's worth I think this is a belt-and-suspenders type of
situation where we want two solutions which overlap somewhat.

I would really like to have merge-append nodes because there are all
sorts of plans where append nodes destroying the ordering of their
inputs eliminates a lot of good plans. Those cases can be UNION ALL
nodes, or partitions where there's no filter on the partition key at
all.

But for partitioned tables like the OPs the "real" solution would be
to have more structured meta-data about the partitions that allows the
planner to avoid needing the merge at all. It would also means the
planner wouldn't need to look at every node; it could do a binary
search or equivalent for the right partitions.

> Greg Stark had a patch to do this a while back called merge append,
> but it never got finished...

I was basically in over my head with the planner. I don't understand
how equivalent classes are used or should be used and didn't
understand the code I was pointed at as being analogous. It's probably
not so complicated as all that, but I never really wrapped my head
around it and moved onto tasks I could make more progress on.

--
greg


Re: non-overlapping, consecutive partitions

From
Robert Haas
Date:
On Sun, Jul 25, 2010 at 6:40 PM, Greg Stark <gsstark@mit.edu> wrote:
> 2010/7/25 Robert Haas <robertmhaas@gmail.com>:
>> 2010/7/25 PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at>:
>>>
>>> On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote:
>>>
>>>> I think the right way to approach this is to teach the planner about
>>>> merge sorts.
>
> For what it's worth I think this is a belt-and-suspenders type of
> situation where we want two solutions which overlap somewhat.
>
> I would really like to have merge-append nodes because there are all
> sorts of plans where append nodes destroying the ordering of their
> inputs eliminates a lot of good plans. Those cases can be UNION ALL
> nodes, or partitions where there's no filter on the partition key at
> all.
>
> But for partitioned tables like the OPs the "real" solution would be
> to have more structured meta-data about the partitions that allows the
> planner to avoid needing the merge at all. It would also means the
> planner wouldn't need to look at every node; it could do a binary
> search or equivalent for the right partitions.

Agreed on all points.

>> Greg Stark had a patch to do this a while back called merge append,
>> but it never got finished...
>
> I was basically in over my head with the planner. I don't understand
> how equivalent classes are used or should be used and didn't
> understand the code I was pointed at as being analogous. It's probably
> not so complicated as all that, but I never really wrapped my head
> around it and moved onto tasks I could make more progress on.

Yeah, I don't fully understand those either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: non-overlapping, consecutive partitions

From
Hans-Jürgen Schönig
Date:
hello ...

yeah, this is fairly complicated.

greg:
can you send me how far you got?
i would be curious to see how you have attacked this issue.

i am still in the process of checking the codes.
we somehow have to find a solution for that. otherwise we are in slight trouble here.
it seems we have to solve it no matter what it takes.
many thanks,
    hans


On Jul 26, 2010, at 1:14 AM, Robert Haas wrote:

> On Sun, Jul 25, 2010 at 6:40 PM, Greg Stark <gsstark@mit.edu> wrote:
>> 2010/7/25 Robert Haas <robertmhaas@gmail.com>:
>>> 2010/7/25 PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at>:
>>>>
>>>> On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote:
>>>>
>>>>> I think the right way to approach this is to teach the planner about
>>>>> merge sorts.
>>
>> For what it's worth I think this is a belt-and-suspenders type of
>> situation where we want two solutions which overlap somewhat.
>>
>> I would really like to have merge-append nodes because there are all
>> sorts of plans where append nodes destroying the ordering of their
>> inputs eliminates a lot of good plans. Those cases can be UNION ALL
>> nodes, or partitions where there's no filter on the partition key at
>> all.
>>
>> But for partitioned tables like the OPs the "real" solution would be
>> to have more structured meta-data about the partitions that allows the
>> planner to avoid needing the merge at all. It would also means the
>> planner wouldn't need to look at every node; it could do a binary
>> search or equivalent for the right partitions.
>
> Agreed on all points.
>
>>> Greg Stark had a patch to do this a while back called merge append,
>>> but it never got finished...
>>
>> I was basically in over my head with the planner. I don't understand
>> how equivalent classes are used or should be used and didn't
>> understand the code I was pointed at as being analogous. It's probably
>> not so complicated as all that, but I never really wrapped my head
>> around it and moved onto tasks I could make more progress on.
>
> Yeah, I don't fully understand those either.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de