Home > mailing lists

Re: PageRepairFragmentation performance - Mailing list pgsql-hackers

From	José Luis Tallón
Subject	Re: PageRepairFragmentation performance
Date	November 18, 2014 21:37:49
Msg-id	546B91EC.7070500@adv-solutions.net Whole thread Raw
In response to	PageRepairFragmentation performance (Heikki Linnakangas <hlinnakangas@vmware.com>)
List	pgsql-hackers

Tree view

On 11/18/2014 07:03 PM, Heikki Linnakangas wrote:
> When profiling replay the WAL generated by pgbench, I noticed the 
> PageRepairFragmentation consumes a large fraction of the CPU time:
> [snip]
>
> 1. Replace the qsort with something cheaper. The itemid arrays being 
> sorted are small, a few hundred item at most, usually even smaller. In 
> this pgbench test case I used, the typical size is about 60. With a 
> small array a plain insertion sort is cheaper than the generic 
> qsort(), because it can avoid the function overhead etc. involved with 
> generic qsort. Or we could use something smarter, like a radix sort, 
> knowing that we're sorting small integers. Or we could implement an 
> inlineable version of qsort and use that.
IIRC, we would have a theoretical complexity of quicksort and radix sort 
should be approximately the same for 256-1024 items... O(n*log(n)) vs 
O(d*n), where d is ~log2(n) or just 16 in this case. However, 
lexicographical ("bitstring-wise" ordering) might not be what we are 
aiming for here

AFAIK, an inlined quicksort should be about the best performing sort 
available (most of the enhancement coming from staying within the I-cache)

> 2. Instead of sorting the array and using memmove in-place, we could 
> copy all the tuples to a temporary buffer in arbitrary order, and 
> finally copy the temporary copy back to the buffer. That requires two 
> memory copies per tuple, instead of one memmove, but memcpy() is 
> pretty darn fast. It would be a loss when there are only a few large 
> tuples on the page, so that avoiding the sort doesn't help, or when 
> the tuples are mostly already in the correct places, so that most of 
> the memmove()s are no-ops. But with a lot of small tuples, it would be 
> a win, and it would be simple.

Memmove *should* be no slower than memcpy.... if both are actually 
translated by the compiler to use intrinsics as opposed to calling the 
functions --- as it seems to be done here (cfr. __memmove_ssse3_back )
A simple "if" in order to eliminate the no-op memmoves might as well do 
it, too.

Just my two (euro) cents, though

> The second option would change behaviour slightly, as the tuples would 
> be placed on the page in different physical order than before. It 
> wouldn't be visible to to users, though.
>
> I spent some time hacking approach 1, and replaced the qsort() call 
> with a bucket sort. I'm not sure if a bucket sort is optimal, or 
> better than a specialized quicksort implementation, but it seemed simple.
>
> With the testcase I've been using - replaying about 2GB of WAL 
> generated by pgbench - this reduces the replay time from about 55 s to 
> 45 s.

Not bad at all... though I suspect most of it might come from staying 
within the I-cache as opposed to regular qsort.
The smaller itemIdSortData structure surely helps a bit, too :)

> Thoughts? Attached is the patch I put together. It's actually two 
> patches: the first is just refactoring, putting the common code 
> between PageRepairFragmentation, PageIndexMultiDelete, and 
> PageIndexDeleteNoCompact to function. The second replaces the qsort(). 

Definitively worth-while, even if just for the refactor. The speed-up 
sounds very good, too.



Thanks,
    / J.L.

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 18 November 2014, 21:03:58
Subject: PageRepairFragmentation performance

From: Josh Berkus
Date: 18 November 2014, 21:43:05
Subject: pg_locks doesn't check for interrupts?

Re: PageRepairFragmentation performance - Mailing list pgsql-hackers

Previous

Next