Re: Minor performance improvement in transition to external sort - Mailing list pgsql-hackers

From Jeremy Harris
Subject Re: Minor performance improvement in transition to external sort
Date
Msg-id 52F40CE9.1070509@wizmail.org
Whole thread Raw
In response to Re: Minor performance improvement in transition to external sort  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 06/02/14 14:22, Robert Haas wrote:
> On Thu, Feb 6, 2014 at 4:22 AM, Jeremy Harris <jgh@wizmail.org> wrote:
>> On 05/02/14 21:56, Robert Haas wrote:
>>> On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris <jgh@wizmail.org> wrote:
>>>> The attached patch replaces the existing siftup method for heapify with
>>>> a siftdown method. Tested with random integers it does 18% fewer
>>>> compares and takes 10% less time for the heapify, over the work_mem
>>>> range 1024 to 1048576.
>>>>
>>>> Both algorithms appear to be O(n) (contradicting Wikipedia's claim
>>>> that a siftup heapify is O(n log n)).
>>>
>>>
>>> I think Wikipedia is right.  Inserting an a tuple into a heap is O(lg
>>> n); doing that n times is therefore O(n lg n).  Your patch likewise
>>> implements an outer loop which runs O(n) times and an inner loop which
>>> runs at most O(lg n) times, so a naive analysis of that algorithm also
>>> comes out to O(n lg n).
>>
>> The patch implements a siftdown.  Measurements attached.
>
> Uh, this spreadsheet is useless.  You haven't explained how these
> numbers were generated, or what the columns in the spreadsheet
> actually mean.  Ideally, you should include enough information that
> someone else can try to reproduce your results.
>

I'm sorry I wasn't clear.

Test code was groups of the form:

set work_mem to 1024;
CREATE TABLE jgh (i integer);
INSERT INTO jgh (i) SELECT 20000 * random() FROM generate_series(1, 20000);
VACUUM ANALYZE jgh;
explain analyze SELECT * FROM jgh ORDER BY i;
set enable_siftdown_heapify to off;
explain analyze SELECT * FROM jgh ORDER BY i;
set enable_siftdown_heapify to on;
DROP TABLE jgh;

.. for the tabulated work_mem, and scaled values for the INSERT.

Compares were counted for the heapify stage (only), and timings
taken using pg_rusage_init() before and pg_rusage_show() after
it.  Times are in seconds.

Baseline value for compares and time used 858ec11858.
Sift-down compares and time are for the patch.

"Baseline K cmps" is compares divided by work_mem (which should
be a scaled version of compares-per-tuple being heapified) pre-patch.
"Siftdown K cmps" is likewise with the patch.

"K ratio cmps" is the ratio  (patched compares)/(unpatched compares).

Repeat for time measurements.

-- 
Cheers,   Jeremy



pgsql-hackers by date:

Previous
From: David Johnston
Date:
Subject: Re: 'dml' value for log_statement
Next
From: Greg Stark
Date:
Subject: Re: Recovery inconsistencies, standby much larger than primary