Re: PageRepairFragmentation performance - Mailing list pgsql-hackers
From | José Luis Tallón |
---|---|
Subject | Re: PageRepairFragmentation performance |
Date | |
Msg-id | 546B91EC.7070500@adv-solutions.net Whole thread Raw |
In response to | PageRepairFragmentation performance (Heikki Linnakangas <hlinnakangas@vmware.com>) |
List | pgsql-hackers |
On 11/18/2014 07:03 PM, Heikki Linnakangas wrote: > When profiling replay the WAL generated by pgbench, I noticed the > PageRepairFragmentation consumes a large fraction of the CPU time: > [snip] > > 1. Replace the qsort with something cheaper. The itemid arrays being > sorted are small, a few hundred item at most, usually even smaller. In > this pgbench test case I used, the typical size is about 60. With a > small array a plain insertion sort is cheaper than the generic > qsort(), because it can avoid the function overhead etc. involved with > generic qsort. Or we could use something smarter, like a radix sort, > knowing that we're sorting small integers. Or we could implement an > inlineable version of qsort and use that. IIRC, we would have a theoretical complexity of quicksort and radix sort should be approximately the same for 256-1024 items... O(n*log(n)) vs O(d*n), where d is ~log2(n) or just 16 in this case. However, lexicographical ("bitstring-wise" ordering) might not be what we are aiming for here AFAIK, an inlined quicksort should be about the best performing sort available (most of the enhancement coming from staying within the I-cache) > 2. Instead of sorting the array and using memmove in-place, we could > copy all the tuples to a temporary buffer in arbitrary order, and > finally copy the temporary copy back to the buffer. That requires two > memory copies per tuple, instead of one memmove, but memcpy() is > pretty darn fast. It would be a loss when there are only a few large > tuples on the page, so that avoiding the sort doesn't help, or when > the tuples are mostly already in the correct places, so that most of > the memmove()s are no-ops. But with a lot of small tuples, it would be > a win, and it would be simple. Memmove *should* be no slower than memcpy.... if both are actually translated by the compiler to use intrinsics as opposed to calling the functions --- as it seems to be done here (cfr. __memmove_ssse3_back ) A simple "if" in order to eliminate the no-op memmoves might as well do it, too. Just my two (euro) cents, though > The second option would change behaviour slightly, as the tuples would > be placed on the page in different physical order than before. It > wouldn't be visible to to users, though. > > I spent some time hacking approach 1, and replaced the qsort() call > with a bucket sort. I'm not sure if a bucket sort is optimal, or > better than a specialized quicksort implementation, but it seemed simple. > > With the testcase I've been using - replaying about 2GB of WAL > generated by pgbench - this reduces the replay time from about 55 s to > 45 s. Not bad at all... though I suspect most of it might come from staying within the I-cache as opposed to regular qsort. The smaller itemIdSortData structure surely helps a bit, too :) > Thoughts? Attached is the patch I put together. It's actually two > patches: the first is just refactoring, putting the common code > between PageRepairFragmentation, PageIndexMultiDelete, and > PageIndexDeleteNoCompact to function. The second replaces the qsort(). Definitively worth-while, even if just for the refactor. The speed-up sounds very good, too. Thanks, / J.L.
pgsql-hackers by date: