Thread: Re: [HACKERS] No: implied sort with group by
> postgres=> select b,c,sum(a) from t1 group by b,c; > b|c|sum > -+-+--- > |x| 5 > |z| 3 > |x| 0 > (3 rows) > > postgres=> select * from t1; > a|b|c > -+-+- > 1| |x > 2| |x > 2| |x > 3| |z > 0| |x > (5 rows) > > I just inserted a single out-of-order row at the end of the table which, since the > integer value is zero, should have not affected the result. Sorry I didn't understand > the nature of the test case. The order of the implicit sort would be arbitrary, but should first sort on any fields in a given ORDER BY to help speed things up later in the tree. What are the effects of sorted or partially sorted input data to the sort code? The current group/aggregate code seems to just loop over the tuples as they are. I see two ways to fix the above, one w/minimal code, second w/more work, but potentially better speed for large queries. 1. Put a sort node immediately before the group node, taking into account any user given ordering. Also make sure the optimizer is aware of this sort when calculating query costs. 2. Instead of sorting the tuples before grouping, add a hashing system to the group node so that the pre-sorting is not necessary. Hmmm...is this a grouping problem or an aggregate problem? Or both? The first query above should have the data sorted before aggregating, shouldn't it, or I am still missing a piece of this puzzle? darrenk
Darren King wrote: > [examples deleted] > I see two ways to fix the above, one w/minimal code, second w/more work, but > potentially better speed for large queries. > > 1. Put a sort node immediately before the group node, taking into account > any user given ordering. Also make sure the optimizer is aware of this sort > when calculating query costs. > > 2. Instead of sorting the tuples before grouping, add a hashing system to > the group node so that the pre-sorting is not necessary. > > Hmmm...is this a grouping problem or an aggregate problem? Or both? The first > query above should have the data sorted before aggregating, shouldn't it, or I > am still missing a piece of this puzzle? > > darrenk The hash should work. If the hash key is built on the group-by items, then any row with the same entries in these columns will get hashed to the same result row. At this point, it should be fairly easy to perform aggregation (test and substitute for min and max, add for sum,avg, etc). Ocie
> > postgres=> select b,c,sum(a) from t1 group by b,c; > > b|c|sum > > -+-+--- > > |x| 5 > > |z| 3 > > |x| 0 > > (3 rows) > > > > postgres=> select * from t1; > > a|b|c > > -+-+- > > 1| |x > > 2| |x > > 2| |x > > 3| |z > > 0| |x > > (5 rows) > > > > I just inserted a single out-of-order row at the end of the table which, since the > > integer value is zero, should have not affected the result. Sorry I didn't understand > > the nature of the test case. > Hmmm...is this a grouping problem or an aggregate problem? Or both? The first > query above should have the data sorted before aggregating, shouldn't it, or I > am still missing a piece of this puzzle? fwiw, I see the same incorrect behavior in v6.2.1p5. - Tom