Re: B-Tree support function number 3 (strxfrm() optimization) - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: B-Tree support function number 3 (strxfrm() optimization) |
Date | |
Msg-id | CAM3SWZTf80f7tENG9T_2cgHZWo9Anskeh4+0_ixAyweB3G2P8Q@mail.gmail.com Whole thread Raw |
In response to | Re: B-Tree support function number 3 (strxfrm() optimization) (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: B-Tree support function number 3 (strxfrm() optimization)
|
List | pgsql-hackers |
On Mon, Apr 7, 2014 at 3:49 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Now that is definitely interesting, and it does seem to demonstrate > that the worst case for this patch might not be as bad as I had feared > - it's about a 5% regression: not great, but perhaps tolerable. It's > not actually a worse case unless firstname is a fair ways into a tuple > with lots of variable-length columns before it, because part of what > the datum1 thing saves you is the cost of repeatedly walking through > the tuple's column list. I only use strxfrm() when the leading key is text. Otherwise, it's pretty much just what your original sort support for text patch from 2012 does, and we don't bother with anything other than fmgr-elision. The only thing that stops the above test case being perfectly pessimal is that we don't always chance a memcmp(), due to some comparisons realizing that len1 != len2. Also, exactly one comparison actually benefits from that same "chance a memcmp()" trick in practice (there is one duplicate). But it really is vanishingly close to perfectly pessimal on my dev system, or that is at least my sincere belief. I think that we're going to find ourselves with more and more cases where to risk wasting lots of compute bandwidth to save a little memory bandwidth is, counter-intuitively, a useful optimization. My opportunistic memcmp() is probably an example of this. I'm aware of others. This is perhaps a contributing factor to how inexpensive it is to waste all the effort that goes into strxfrm() and so on. Having said that, this is an idea from the 1960s, which might explain something about the name of the technique. > But I still think that a lot more could be done - and I'd challenge > you (or others) to do it - to look for cases where this might be a > pessimization. I get that the patch has an upside, but nearly every > patch that anybody proposes does, and the author usually points out > those cases quite prominently, as you have, and right so. But what > makes for a much more compelling submission is when the author *also* > tries really hard to break the patch, and hopefully fails. I agree > that the test result shown above is good news for this patch's future > prospects, but I *don't* agree that it nails the door shut. What > about other locales? Other operating systems? Other versions of > libc? Longer strings? Wider tuples? Systems where datums are only > 32-bits? Sure, you can leave those things to the reviewer and/or > committer to worry about, but that's not the way to expedite the > patch's trip into the tree. I don't think that 32-bit systems are all that high a priority for performance work. Cellphones will come out with 64-bit processors this year. Even still, it's hard to reason about how much difference that will make, except to say that the worst case cannot be that bad. As for other locales, it only gets better for this patch, because I believe en_US.UTF-8 is one of the cheapest to compare locales (which is to say, requires fewest passes). If you come up with some test case with complicated collation rules (I'm thinking of hu_HU, Hungarian), it surely makes the patch look much better. Having said that, I still don't do anything special with the C locale (just provide a non-fmgr-accessed comparator), which should probably be fixed, on the grounds that sorting using the C locale is probably now more expensive than with the en_US.UTF-8 collation. > I have to admit, I didn't really view the original postings on this > topic to be anything more than, hey, we've got something promising > here, it's worth more study. That's part of why I was so taken aback > by Greg's email. There's certainly good potential here, but I think > there's quite a bit left of work left to do before you can declare > victory... I think that Greg's choice of words was a little imprudent, but must be viewed in the context of an offline discussion during the hall track of pgConf NYC. Clearly Greg wasn't about to go off and unilaterally commit this. FWIW, I think I put him off the idea a few hours after he made those remarks, without intending for what I'd said to have that specific effect (or the opposite effect). -- Peter Geoghegan
pgsql-hackers by date: