On Sun, Dec 4, 2016 at 4:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Of course, we would also have to teach cost_sort or someplace near there
> that non-C sorting is much more expensive than C-collation sorting. Not
> sure about exactly how to set that up without it being a kluge.
We've talked about that before, in the context of parallel query. At
the 2014 developer meeting, IIRC.
> A related problem is that if you have "GROUP BY x,y" and no particular
> ORDER BY requirement, you could sort by either x,y or y,x before the
> GroupAgg. This would matter if, say, there was an index matching one
> but not the other. Right now we're very stupid and only consider x,y,
> but if there were room to consider more than one set of target pathkeys
> it would be fairly simple to make that better.
That sounds valuable, especially because it seems natural to make the
leading group-on var the least selective within a GROUP BY; having a
matching index that you can thereby use might be less common than that
in practice, unless and until the partial sort patch is committed.
I will tend to write "GROUP BY country, province, city" -- never
"GROUP BY city, province, country". I speak a language that is written
left-to-right, but it would be the opposite way around in both
directions if I spoke a language written right-to-left, I bet. Same
difference. This might be a very prevalent habit. In general, a
tuplesort will be faster with a high cardinality leading attribute, so
this habit works against tuplesort. (Assuming a leading attribute of
pass-by-value type, or with abbreviated key support.)
--
Peter Geoghegan