Home > mailing lists

Re: Sorting Improvements for 8.4 - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: Sorting Improvements for 8.4
Date	December 3, 2007 20:42:56
Msg-id	1196717340.4255.65.camel@ebony.site Whole thread Raw
In response to	Re: Sorting Improvements for 8.4 (Gregory Stark <stark@enterprisedb.com>)
Responses	Re: Sorting Improvements for 8.4
List	pgsql-hackers

Tree view

On Mon, 2007-12-03 at 20:40 +0000, Gregory Stark wrote:

> I think the way to do what you're proposing is...

Don't understand that. Algorithm F covers that already doesn't it?

> So the question is just how many seeks are we doing during sorting. If we're
> doing 0.1% seeks and 99.9% sequential i/o then eliminating the 1% entirely
> (which we can't do) isn't going to speed up seeking all that much. If we're
> doing 20% seeks and can get that down to 10% it might be worthwhile.

The buffer size at max tapes is an optimum - a trade off between
avoiding intermediate merging and merging efficiently. Freeing more
memory is definitely going to help in the case of low work_mem and lots
of runs.

You're right that there is a limit to the benefit you can get. I wrote a
patch in 2005/6 to optimise the memory usage when there were few runs
and lots of memory. I still think there's value in that.

> I'm not sure where the idea of keeping the current bounds of the input tapes
> comes into it. We only preread when we run out of tuples anyways and then we
> don't really have a choice about which tape we want to preread from.

You have to decide whether to perform intermediate merges or whether you
can do everything at the final merge. Otherwise you can't merge more
runs than you have buffers for, since you'd at some point freeze up and
not be able to input.
And it's
> a good thing too since maintaining such a list of bounds and finding the
> lowest or highest would mean maintaining a second heap which would basically
> double the cpu cost of sorting.

I think you're not understanding me.

You only need to record the lowest or highest when a run
completes/starts. When all runs have been written we then have a table
of the highest and lowest values for each run. We then scan that to see
whether we can perform merging in one pass, or if not what kind of
intermediate merging is required. We keep the merge plan in memory and
then follow it. So probably very small % of total sort cost, though
might save you doing intermediate merges with huge costs.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com

pgsql-hackers by date:

From: Kris Jurka
Date: 03 December 2007, 20:42:06
Subject: Re: Is postgres.gif missing in cvs?

From: Alvaro Herrera
Date: 03 December 2007, 20:57:48
Subject: Re: Is postgres.gif missing in cvs?

Re: Sorting Improvements for 8.4 - Mailing list pgsql-hackers

Previous

Next