Re: sortsupport for text - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: sortsupport for text |
Date | |
Msg-id | CAEYLb_UzBidXtMAqdBQ_9QV1tiRy9Rh2u179vHzurgmpX9C=SA@mail.gmail.com Whole thread Raw |
In response to | Re: sortsupport for text (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: sortsupport for text
|
List | pgsql-hackers |
On 20 June 2012 15:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <peter@2ndquadrant.com> writes: >> I think that this change may have made the difference between the >> Hungarians getting away with it and not getting away with it. Might it >> have been that for text, they were using some operator that wasn't '=' >> (perhaps one which has no fastpath, and thus correctly made a >> representation about equivalency) rather than texteq prior to this >> commit? > > Uh, no. There aren't any magic variants of equality now, and there were > not back then either. I'm inclined to think that the "Hungarian > problem" did exist long before we fixed it. I was suggesting that an operator other than equality may have been used for some reason. It probably isn't a significant issue though. > Um ... are you proposing to replace text btree index entries with > strxfrm values? Because if you aren't, I don't see that this is likely > to win anything. And if you are, it loses on the grounds of (a) index > bloat and (b) loss of ability to do index-only scans. No, I'm suggesting it would probably be at least a bit of a win here to cache the constant, and only have to do a strxfrm() + strcmp() per comparison. Not enough to justify all the added complexity of course, but I wouldn't seek to justify this on the basis of improvements to the speed of btree traversal. It would obviously be much more valuable for tuple sorting. > Personally I think this is not a direction we want to go in. I think > that it's going to end up a significant performance loss in many cases, > break backwards compatibility in numerous ways, and provide a useful > behavioral improvement to only a small minority of users. I may have over-emphasized the improvement in correctness that this would bring, which I personally consider to be very much a secondary benefit. The fact is that this is likely to be a fairly significant performance win, because strxfrm() is quite simply the way you're supposed to do collation-aware sorting, and is documented as such. For that reason, C standard library implementations should not be expected to emphasize its performance - they assume that you're using strxfrm() + their highly optimised strcmp() (as I've said, the glibc implementation is written in ASM, and makes use of hand-written SSSE3 instructions on x86_64 for example). The only performance downside that I can think of is the added check for equivalence for each tuple within _bt_checkkeys() - perhaps you were thinking of something else that hasn't occurred to me though. Were you? The added overhead is only going to be paid only once per index scan, and not once per tuple returned by an index scan. Since equality implies equivalence for us, there is no need to check equivalence until something is unequal, and when that happens, and equivalence doesn't hold, we're done. Meanwhile, that entire index scan just got measurably faster from caching the constant's strxfrm() blob. Now, maybe it is a bit funky that this equivalence test has to happen even though the vast majority of locales don't care. If the only alternative is to bloat up the strxfrm() blobs with the original string, I'd judge that the funkiness is well worth it. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
pgsql-hackers by date: