Re: Sorting Improvements for 8.4 - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Sorting Improvements for 8.4
Date
Msg-id 1196453253.22428.416.camel@dogma.ljc.laika.com
Whole thread Raw
In response to Sorting Improvements for 8.4  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Sorting Improvements for 8.4  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
On Tue, 2007-11-27 at 18:03 +0000, Simon Riggs wrote:
> 5. DYNAMIC RUN HANDLING (in Final Merge)
> 
> Another way of addressing a) is to simply make better use of memory
> itself. Let's look at that in more detail:
> 
> Number of runs that can be merged at once is currently fixed, based upon
> available memory. This has the underlying assumption that all runs will
> be concurrently active during final merging, which may not always be
> true.
> 
> If we have random data then almost all runs will overlap with all other
> runs, i.e. the min and max values are sufficiently wide that the runs do
> all overlap. In many cases, data arrives in somewhat sorted order, e.g.
> financial data is fairly regular with some late payers but not many, and
> those trail off with a fairly tight decay. In the somewhat sorted case
> we find that the actual overlap is less than total, so there are many
> later runs that don't overlap the earlier ones. In the best case we
> would find run 1 and 2 overlap, runs 2 and 3 overlap, then 3 and 4
> overlap.

I have spoken with Len Shapiro, a professor at Portland State
University, regarding sorting before.

He suggests that PostgreSQL should implement forecasting, which is
similar to what you're describing. Forecasting does not require that
entire runs are disjoint, it works by tracking the maximum values from
the last block read from every run. This allows you to know which run
you will need more blocks from the soonest.

I'm still looking into the problem to understand it better, but the
algorithm is in Knuth Vol 3.

I can look at it in more detail, but have you already looked into this
idea? Is there a reason we don't do this currently?

Regards,Jeff Davis



pgsql-hackers by date:

Previous
From: Sam Mason
Date:
Subject: Re: PostGreSQL and recursive queries...
Next
From: Andrew Dunstan
Date:
Subject: Re: [GENERAL] plperl and regexps with accented characters - incompatible?