Home > mailing lists

Re: Sorting Improvements for 8.4 - Mailing list pgsql-hackers

From	Gregory Stark
Subject	Re: Sorting Improvements for 8.4
Date	December 3, 2007 19:41:37
Msg-id	87prxnseag.fsf@oxford.xeocode.com Whole thread Raw
In response to	Re: Sorting Improvements for 8.4 (Simon Riggs <simon@2ndquadrant.com>)
Responses	Re: Sorting Improvements for 8.4 (Simon Riggs <simon@2ndquadrant.com>) Re: Sorting Improvements for 8.4 (Jeff Davis <pgsql@j-davis.com>) Re: Sorting Improvements for 8.4 (Jeff Davis <pgsql@j-davis.com>)
List	pgsql-hackers

Tree view

"Simon Riggs" <simon@2ndquadrant.com> writes:

> On Mon, 2007-12-03 at 10:32 -0800, Jeff Davis wrote:
>
>> If I understand correctly, we just keep track of the maximum value of
>> the last block read from each run, and then always read from the run in
>> which the last block read has the lowest maximum.

So it sounds like the use case where this is the biggest win would be a
situation where you have presorted input which has been sliced up. So for
example sorting by "zip code" in a table which was clustered by city. The
alphabetic order of the cities isn't correlated to the results but all the zip
codes for a city are in a contiguous block somewhere in the output.

In such a case after doing a single pass we would have a bunch of tapes each
of which corresponded to a single city and was able to completely reorder the
zip codes in that city to be ordered. So the desired results would be, for
example, all the tuples from tape 17 (NYC) followed by all the tuples from
tape 3 (Buffalo) followed by all the tuples from tape 1 (Albuquerque), etc.

We currently preread an equal amount from each tape and then would empty all
the preread tuples from tape 17, refill them, preread them again, repeat until
tape 17 is empty then move on to tape 3. All the tuples except the currently
active tape are completely idle.

I think the way to do what you're proposing is to preread one tuple from each
tape, then when one preread bunch is emptied refill it with twice as many and
repeat. In this case you would end up with nearly all of workmem full of
tuples from NYC until you're done with NYC. That would increase the prereading
block size by a factor of 20 in this case. 

So the question is just how many seeks are we doing during sorting. If we're
doing 0.1% seeks and 99.9% sequential i/o then eliminating the 1% entirely
(which we can't do) isn't going to speed up seeking all that much. If we're
doing 20% seeks and can get that down to 10% it might be worthwhile.

I'm not sure where the idea of keeping the current bounds of the input tapes
comes into it. We only preread when we run out of tuples anyways and then we
don't really have a choice about which tape we want to preread from. And it's
a good thing too since maintaining such a list of bounds and finding the
lowest or highest would mean maintaining a second heap which would basically
double the cpu cost of sorting.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!

pgsql-hackers by date:

From: Devrim GÜNDÜZ
Date: 03 December 2007, 19:28:22
Subject: Is postgres.gif missing in cvs?

From: Kris Jurka
Date: 03 December 2007, 20:42:06
Subject: Re: Is postgres.gif missing in cvs?

Re: Sorting Improvements for 8.4 - Mailing list pgsql-hackers

Previous

Next