Thread: non-overlapping, consecutive partitions
hello everybody, i have just come across some issue which has been bugging me for a while. consider: SELECT * FROM foo ORDER BY bar; if we have an index on bar, we can nicely optimize away the sort step by consulting the index - a btree will return sortedoutput. under normal circumstances it will be seq->sort but doing some config settings we can turn this into an index scan nicelyto avoid to the sort (disk space is my issue here). this is not so easy anymore: create table foo ( x date );create table foo_2010 () INHERITS (foo)create table foo_2009 () INHERITS (foo)create table foo_2008() INHERITS (foo) now we add constraints to make sure that data is only in 2008, 2009 and 2010. we assume that everything is indexed: SELECT * FROM foo ORDER BY bar will now demand an ugly sort for this data. this is not an option if you need more than a handful of rows ... if constraints are non overlapping and if they are based on a "sortable" data type, we might be able to scan one index afterthe other and get a sorted list. why is this an issue? imagine a case where you want to do billing, eg. some phone calls. the job now is: the last 10 callsof a customer are free and you want to sum up those which are not free. to do that you basically need a sorted list per customer. if you have data here which is partitioned over time you are screwedup because you want to return a sorted list taken from X partitions to some higher level operation (windowing or whatever). resorting vast amounts of data is a killer here. in the particular case i am talking about my problem is roughly 2 TB scaledout to some PL/proxy farm. does anybody see a solution to this problem? what are the main showstoppers to make something like this work? many thanks, hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de
On 7/23/2010 11:04 PM, Hans-Jürgen Schönig wrote: > does anybody see a solution to this problem? > what are the main showstoppers to make something like this work? I think we should absolutely make this work when we have a good partitioning implementation. That said, I don't think it's wise to put a lot of effort into making this work with our current partitioning method when the partitioning patches are just around the corner. The developer time should be directed at those patches instead. Regards, Marko Tiikkaja
On Fri, Jul 23, 2010 at 10:04:00PM +0200, Hans-Jürgen Schönig wrote: > create table foo ( x date ); > create table foo_2010 () INHERITS (foo) > create table foo_2009 () INHERITS (foo) > create table foo_2008 () INHERITS (foo) > > now we add constraints to make sure that data is only in 2008, 2009 and 2010. > we assume that everything is indexed: > > SELECT * FROM foo ORDER BY bar will now demand an ugly sort for this data. > this is not an option if you need more than a handful of rows ... I think the right way to approach this is to teach the planner about merge sorts. This is, if the planner has path to foo_* all ordered by the same key (because they have the same indexes) then it has a path to the UNION of those tables simply by merging the results of those paths. This would be fairly straight forward to implement I think, you may even be able to reuse the merge sort in the normal sort machinery. (You'll need to watch out for UNION vs UNION ALL.) The real advantage of this approach is that you no longer have to prove anything about the constraints or various datatypes and it is more general. Say you have partitioned by start_date but you want to sort by end_date, simple index scanning won't work while a merge sort will work beautifully. You're also not limited to how the partitioning machinery will eventually work. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patriotism is when love of your own people comes first; nationalism, > when hate for people other than your own comes first. > - Charles de Gaulle
On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote: > On Fri, Jul 23, 2010 at 10:04:00PM +0200, Hans-Jürgen Schönig wrote: >> create table foo ( x date ); >> create table foo_2010 () INHERITS (foo) >> create table foo_2009 () INHERITS (foo) >> create table foo_2008 () INHERITS (foo) >> >> now we add constraints to make sure that data is only in 2008, 2009 and 2010. >> we assume that everything is indexed: >> >> SELECT * FROM foo ORDER BY bar will now demand an ugly sort for this data. >> this is not an option if you need more than a handful of rows ... > > I think the right way to approach this is to teach the planner about > merge sorts. This is, if the planner has path to foo_* all ordered by > the same key (because they have the same indexes) then it has a path to > the UNION of those tables simply by merging the results of those paths. > > This would be fairly straight forward to implement I think, you may > even be able to reuse the merge sort in the normal sort machinery. > (You'll need to watch out for UNION vs UNION ALL.) > > The real advantage of this approach is that you no longer have to prove > anything about the constraints or various datatypes and it is more > general. Say you have partitioned by start_date but you want to sort by > end_date, simple index scanning won't work while a merge sort will work > beautifully. > > You're also not limited to how the partitioning machinery will > eventually work. > > Hope this helps, i think this is excellent input. i will do some research going into that direction. many thanks, hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de
2010/7/25 PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at>: > > On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote: > >> On Fri, Jul 23, 2010 at 10:04:00PM +0200, Hans-Jürgen Schönig wrote: >>> create table foo ( x date ); >>> create table foo_2010 () INHERITS (foo) >>> create table foo_2009 () INHERITS (foo) >>> create table foo_2008 () INHERITS (foo) >>> >>> now we add constraints to make sure that data is only in 2008, 2009 and 2010. >>> we assume that everything is indexed: >>> >>> SELECT * FROM foo ORDER BY bar will now demand an ugly sort for this data. >>> this is not an option if you need more than a handful of rows ... >> >> I think the right way to approach this is to teach the planner about >> merge sorts. This is, if the planner has path to foo_* all ordered by >> the same key (because they have the same indexes) then it has a path to >> the UNION of those tables simply by merging the results of those paths. >> >> This would be fairly straight forward to implement I think, you may >> even be able to reuse the merge sort in the normal sort machinery. >> (You'll need to watch out for UNION vs UNION ALL.) >> >> The real advantage of this approach is that you no longer have to prove >> anything about the constraints or various datatypes and it is more >> general. Say you have partitioned by start_date but you want to sort by >> end_date, simple index scanning won't work while a merge sort will work >> beautifully. >> >> You're also not limited to how the partitioning machinery will >> eventually work. >> >> Hope this helps, > > > i think this is excellent input. > i will do some research going into that direction. Greg Stark had a patch to do this a while back called merge append, but it never got finished... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
2010/7/25 Robert Haas <robertmhaas@gmail.com>: > 2010/7/25 PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at>: >> >> On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote: >> >>> I think the right way to approach this is to teach the planner about >>> merge sorts. For what it's worth I think this is a belt-and-suspenders type of situation where we want two solutions which overlap somewhat. I would really like to have merge-append nodes because there are all sorts of plans where append nodes destroying the ordering of their inputs eliminates a lot of good plans. Those cases can be UNION ALL nodes, or partitions where there's no filter on the partition key at all. But for partitioned tables like the OPs the "real" solution would be to have more structured meta-data about the partitions that allows the planner to avoid needing the merge at all. It would also means the planner wouldn't need to look at every node; it could do a binary search or equivalent for the right partitions. > Greg Stark had a patch to do this a while back called merge append, > but it never got finished... I was basically in over my head with the planner. I don't understand how equivalent classes are used or should be used and didn't understand the code I was pointed at as being analogous. It's probably not so complicated as all that, but I never really wrapped my head around it and moved onto tasks I could make more progress on. -- greg
On Sun, Jul 25, 2010 at 6:40 PM, Greg Stark <gsstark@mit.edu> wrote: > 2010/7/25 Robert Haas <robertmhaas@gmail.com>: >> 2010/7/25 PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at>: >>> >>> On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote: >>> >>>> I think the right way to approach this is to teach the planner about >>>> merge sorts. > > For what it's worth I think this is a belt-and-suspenders type of > situation where we want two solutions which overlap somewhat. > > I would really like to have merge-append nodes because there are all > sorts of plans where append nodes destroying the ordering of their > inputs eliminates a lot of good plans. Those cases can be UNION ALL > nodes, or partitions where there's no filter on the partition key at > all. > > But for partitioned tables like the OPs the "real" solution would be > to have more structured meta-data about the partitions that allows the > planner to avoid needing the merge at all. It would also means the > planner wouldn't need to look at every node; it could do a binary > search or equivalent for the right partitions. Agreed on all points. >> Greg Stark had a patch to do this a while back called merge append, >> but it never got finished... > > I was basically in over my head with the planner. I don't understand > how equivalent classes are used or should be used and didn't > understand the code I was pointed at as being analogous. It's probably > not so complicated as all that, but I never really wrapped my head > around it and moved onto tasks I could make more progress on. Yeah, I don't fully understand those either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
hello ... yeah, this is fairly complicated. greg: can you send me how far you got? i would be curious to see how you have attacked this issue. i am still in the process of checking the codes. we somehow have to find a solution for that. otherwise we are in slight trouble here. it seems we have to solve it no matter what it takes. many thanks, hans On Jul 26, 2010, at 1:14 AM, Robert Haas wrote: > On Sun, Jul 25, 2010 at 6:40 PM, Greg Stark <gsstark@mit.edu> wrote: >> 2010/7/25 Robert Haas <robertmhaas@gmail.com>: >>> 2010/7/25 PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at>: >>>> >>>> On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote: >>>> >>>>> I think the right way to approach this is to teach the planner about >>>>> merge sorts. >> >> For what it's worth I think this is a belt-and-suspenders type of >> situation where we want two solutions which overlap somewhat. >> >> I would really like to have merge-append nodes because there are all >> sorts of plans where append nodes destroying the ordering of their >> inputs eliminates a lot of good plans. Those cases can be UNION ALL >> nodes, or partitions where there's no filter on the partition key at >> all. >> >> But for partitioned tables like the OPs the "real" solution would be >> to have more structured meta-data about the partitions that allows the >> planner to avoid needing the merge at all. It would also means the >> planner wouldn't need to look at every node; it could do a binary >> search or equivalent for the right partitions. > > Agreed on all points. > >>> Greg Stark had a patch to do this a while back called merge append, >>> but it never got finished... >> >> I was basically in over my head with the planner. I don't understand >> how equivalent classes are used or should be used and didn't >> understand the code I was pointed at as being analogous. It's probably >> not so complicated as all that, but I never really wrapped my head >> around it and moved onto tasks I could make more progress on. > > Yeah, I don't fully understand those either. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise Postgres Company > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de