On 11/09/2017 13:37, Tomas Vondra wrote:
> I planned to do some benchmarking on this patch, but apparently the
> patch no longer applies. Rebase please?
Here's a rebase of this. Sorry to keep you waiting :-).
I split the patch into two, one patch to refactor the logtape.c APIs,
and another to change the merge algorithm. With the new logtape.c API,
you can create and destroy LogicalTapes in a tape set on the fly, which
simplified the new hash agg spilling code somewhat. I think that's nice,
even without the sort merge algorithm change.
I did some benchmarking too. I used four data sets: ordered integers,
random integers, ordered text, and random text, with 1 GB of data in
each. I sorted those datasets with different work_mem settings, and
plotted the results. I only ran each combination once, and all on my
laptop, so there's a fair amount of noise in individual data points. The
trend is clear, though.
The results show that this makes sorting faster with small work_mem
settings. As work_mem grows so that there's only one merge pass, there's
no difference. The rightmost data points are with work_mem='2500 MB', so
that the data fits completely in work_mem and you get a quicksort.
That's unaffected by the patch.
I've attached the test scripts I used for the plots, although they are
quite messy and not very automated. I don't expect them to be very
helpful to anyone, but might as well have it archived.
- Heikki