Re: Polyphase merge is obsolete - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Polyphase merge is obsolete
Date
Msg-id 6a15881f-0cc5-3564-abc0-03ec28883380@iki.fi
Whole thread Raw
In response to Re: [HACKERS] Polyphase merge is obsolete  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Polyphase merge is obsolete  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On 11/09/2017 13:37, Tomas Vondra wrote:
> I planned to do some benchmarking on this patch, but apparently the
> patch no longer applies. Rebase please?

Here's a rebase of this. Sorry to keep you waiting :-).

I split the patch into two, one patch to refactor the logtape.c APIs, 
and another to change the merge algorithm. With the new logtape.c API, 
you can create and destroy LogicalTapes in a tape set on the fly, which 
simplified the new hash agg spilling code somewhat. I think that's nice, 
even without the sort merge algorithm change.

I did some benchmarking too. I used four data sets: ordered integers, 
random integers, ordered text, and random text, with 1 GB of data in 
each. I sorted those datasets with different work_mem settings, and 
plotted the results. I only ran each combination once, and all on my 
laptop, so there's a fair amount of noise in individual data points. The 
trend is clear, though.

The results show that this makes sorting faster with small work_mem 
settings. As work_mem grows so that there's only one merge pass, there's 
no difference. The rightmost data points are with work_mem='2500 MB', so 
that the data fits completely in work_mem and you get a quicksort. 
That's unaffected by the patch.

I've attached the test scripts I used for the plots, although they are 
quite messy and not very automated. I don't expect them to be very 
helpful to anyone, but might as well have it archived.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: [HACKERS] Custom compression methods
Next
From: Thomas Munro
Date:
Subject: Re: Collation versioning