Re: B-Tree support function number 3 (strxfrm() optimization) - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: B-Tree support function number 3 (strxfrm() optimization) |
Date | |
Msg-id | CAM3SWZQsVit6xGGJZqt2sU-gMJ9zRHRo3+202U4EUuT_=58M0Q@mail.gmail.com Whole thread Raw |
In response to | Re: B-Tree support function number 3 (strxfrm() optimization) (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: B-Tree support function number 3 (strxfrm() optimization)
|
List | pgsql-hackers |
On Sun, Sep 14, 2014 at 7:37 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > Got to be careful to not let the compiler optimize away microbenchmarks like > this. At least with my version of gcc, the strcoll calls get optimized away, > as do the memcmp calls, if you don't use the result for anything. Clang was > even more aggressive; it ran both comparisons in 0.0 seconds. Apparently it > optimizes away the loops altogether. I suppose the fact that I saw results that fit my pre-conceived notion of what was happening made me lose my initial concern about that. > Also, there should be a setlocale(LC_ALL, "") call somewhere. Otherwise it > runs in C locale, and we don't use strcoll() at all for C locale. Oops. This might be a useful mistake, though -- if the strcoll() using the C locale is enough to make the memcmp() not free, then that suggests that strcoll() is the "hiding place" for the useless memcmp(), where instructions relating to the memcmp() can execute in parallel to instructions relating to strcoll() that add latency from memory accesses (for non-C locales). With the C locale, strcoll() is equivalent to strcmp()/memcmp(). Commenting out the setlocale(LC_ALL, "") in your revised versions shows something like my original numbers (so I guess my compiler wasn't smart enough to optimize away the strcoll() + memcmp() cases). Whereas, there is no noticeable regression/difference between each case when I run the revised program unmodified. That seems to prove that strcoll() is a good enough "hiding place". > Both values vary in range 5.9 - 6.1 s, so it's fair to say that the useless > memcmp() is free with these parameters. > > Is this the worst case scenario? Other than pushing the differences much much later in the strings (which you surely thought of already), yes. I think it's worse than the worst, because we've boiled this down to just the comparison part, leaving only the strcoll() as a "hiding place", which is evidently good enough. I thought that it was important that there be an unpredictable access pattern (characteristic of quicksort), so that memory latency is added here and there. I'm happy to learn that I was wrong about that, and that a strcoll() alone hides the would-be memcmp() latency. Large strings matter much less anyway, I think. If you have a pair of strings both longer than CACHE_LINE_SIZE bytes, and the first CACHE_LINE_SIZE bytes are identical, and the lengths are known to match, it seems like a very sensible bet to anticipate that they're fully equal. So in a world where that affects the outcome of this test program, I think it still changes nothing (if, indeed, it matters at all, which it appears not to anyway, at least with 256 byte strings). We should probably do the a fully opportunistic "memcmp() == 0" within varstr_cmp() itself, so that Windows has the benefit of this too, as well as callers like compareJsonbScalarValue(). Actually, looking at it closely, I think that there might still be a microscopic regression, as there might have also been with my variant of your SQL test case [1] - certainly in the noise, but perhaps measurable with enough runs. If there is, that seems like an acceptable price to pay. When I test this stuff, I'm now very careful about power management settings on my laptop...there are many ways to be left with egg on your face with this kind of benchmark. [1] http://www.postgresql.org/message-id/CAM3SWZQY95Sow00b+zJycrGMR-uF1mz8rYv4_Ou2ENcvsTnxYA@mail.gmail.com -- Peter Geoghegan
pgsql-hackers by date: