Thread: reorder GROUP BY list
Per recent discussion on -hackers, we should sometimes try to reorder the columns of the grouping clause to avoid redundant sorts. The optimizer is not currently capable of doing this, so this patch implements a simple hack in the analysis phase (transformGroupClause): if any subset of the GROUP BY clause matches a prefix of the ORDER BY list, that prefix is moved to the front of the GROUP BY clause. This shouldn't change the semantics of the query, and allows a redundant sort to be avoided for queries like "GROUP BY a, b ORDER BY b". One question about the implementation: to avoid redundant and potentially expensive calls to findTargetlistEntry(), I constructed a temporary list of TLEs. I think that's probably a good tradeoff, but suggestions for improvement are welcome. Also, I released the list via list_free() when finished with it -- is that a waste of cycles? Barring any objections, I'll apply this to HEAD tomorrow. -Neil
Attachment
Neil Conway <neilc@samurai.com> writes: > Per recent discussion on -hackers, we should sometimes try to reorder > the columns of the grouping clause to avoid redundant sorts. The > optimizer is not currently capable of doing this, so this patch > implements a simple hack in the analysis phase (transformGroupClause): > if any subset of the GROUP BY clause matches a prefix of the ORDER BY > list, that prefix is moved to the front of the GROUP BY clause. I find this patch a tad inelegant. It seems like a confusing way to think about the problem, and it also means that the routine is effectively matching the sort clause and group clause twice --- once in the code you added and once in the existing code. The idea that I had about how the revised routine should work is basically: 1. For each ORDER BY item: 1a. Find a matching GROUP BY item. (Break out of loop if no match.) 1b. Add it to the output list with ORDER BY item's ordering op. 1c. Remove the matched item from the GROUP BY list. 2. For each remaining GROUP BY item: 2a. Add it to output list using default ordering op. This might require a bit of code duplication from the two "add to output list" steps, but I think it'd be a lot clearer. You could probably avoid most of the code duplication by making a preliminary list of TargetEntrys (findTargetlistEntry and the UNKNOWN->TEXT hack) before applying the above steps. Or make a subroutine. regards, tom lane
On Sat, 2006-03-04 at 21:21 -0500, Tom Lane wrote: > 1. For each ORDER BY item: > 1a. Find a matching GROUP BY item. > (Break out of loop if no match.) > 1b. Add it to the output list with ORDER BY item's ordering op. > 1c. Remove the matched item from the GROUP BY list. > 2. For each remaining GROUP BY item: > 2a. Add it to output list using default ordering op. Okay, attached is a revised patch that implements this. Barring any objections I'll apply it tomorrow. -Neil
Attachment
On Sun, 2006-03-05 at 00:04 -0500, Neil Conway wrote: > Okay, attached is a revised patch that implements this. Applied. -Neil