Re: B-Tree support function number 3 (strxfrm() optimization) - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: B-Tree support function number 3 (strxfrm() optimization) |
Date | |
Msg-id | CAM3SWZSoE3Do7Edm07xpSDL6soYx1yYQ1K5G5=jmRXwARFGxYQ@mail.gmail.com Whole thread Raw |
In response to | Re: B-Tree support function number 3 (strxfrm() optimization) (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Responses |
Re: B-Tree support function number 3 (strxfrm() optimization)
Re: B-Tree support function number 3 (strxfrm() optimization) |
List | pgsql-hackers |
On Tue, Jan 20, 2015 at 3:46 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > The comment in tuplesort_begin_datum that abbreviation can't be used > seems wrong to me; why is the copy of the original value pointed to by > stup->tuple (in the case of by-reference types, and abbreviation is > obviously not needed for by-value types) not sufficient? We haven't formalized the idea that pass-by-value types are not targets for abbreviation (it's just that the practical application of abbreviated keys is likely to be limited to pass-by-reference types, generating a compact pass-by-value abbreviated representation). That could be a useful restriction to formalize, and certainly seems likely to be a harmless one, but for now that's the way it is. It might be sufficient for some tuplesort_begin_datum() callers. Datum tuple sorts require the original values. Aside from the formalization of abbreviation only applying to pass-by-value types, you'd have to teach tuplesort_getdatum() to reconstruct the non-abbreviated representation transparently from each SortTuple's "tuple proper". However, the actual tuplesort_getdatum() calls could be the dominant cost, not the sort (I'm not sure of that offhand - that's total speculation). Basically, the intersection of the datum sort case with abbreviated keys seems complicated. I tended to think that the solution was to force a heaptuple sort instead (where abbreviation naturally can be used), since clearly that could work in some important cases like nodeAgg.c, iff the gumption to do it that way was readily available. Rightly or wrongly, I preferred that idea to the idea of teaching the Datum case to handle abbreviation across the board. Maybe that's the wrong way of fixing that, but for now I don't think it's acceptable that abbreviation isn't always used in certain cases where it could make sense (e.g. not for simple GroupAggregates with a single attribute -- only multiple attribute GroupAggregates). After all, most sort cases (e.g. B-Tree builds) didn't use SortSupport for several years, simply because no one got around to it until I finally did a few months back. Note that most tuplesort non-users of abbreviation don't use abbreviation for sensible reasons. For example, abbreviation simply doesn't make sense for Top-N heap sorts, or MJExamineQuals(). The non-single-attribute GroupAggregate/nodeAgg.c case seems bad, but I don't have a good sense of how bad things are with orderedsetaggs.c non-use is...it might matter less than the other case. -- Peter Geoghegan
pgsql-hackers by date: