Home > mailing lists

Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: Using quicksort for every external sort run
Date	February 7, 2016 18:52:40
Msg-id	CAM-w4HPOTjeUqjahY+Dh2JzaiNRz_VU4Z9-9wVbeiJsFN4rmvw@mail.gmail.com Whole thread Raw
In response to	Re: Using quicksort for every external sort run (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Using quicksort for every external sort run
List	pgsql-hackers

Tree view

On Sun, Feb 7, 2016 at 8:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Feb 7, 2016 at 11:00 AM, Peter Geoghegan <pg@heroku.com> wrote:
> > It was my understanding, based on your emphasis on producing only a
> > single run, as well as your recent remarks on this thread about the
> > first run being special, that you are really only interested in the
> > presorted case, where one run is produced. That is, you are basically
> > not interested in preserving the general ability of replacement
> > selection to double run size in the event of a uniform distribution.
>...
> > You don't want to change the behavior of the current patch for the
> > second or subsequent run; that should remain a quicksort, pure and
> > simple. Do I have that right?
>
> Yes.

I'm not even sure this is necessary. The idea of missing out on
producing a single sorted run sounds bad but in practice since we
normally do the final merge on the fly there doesn't seem like there's
really any difference between reading one tape or reading two or three
tapes when outputing the final results. There will be the same amount
of I/O happening and a 2-way or 3-way merge for most data types should
be basically free.

On Sun, Feb 7, 2016 at 8:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> 3. At this point, we have one sorted tape per worker.  Perform a final
> merge pass to get the final result.

I don't even think you have to merge until you get one tape per
worker. You can statically decide how many tapes you can buffer in
memory based on work_mem and merge until you get N/workers tapes so
that a single merge in the gather node suffices. I would expect that
to nearly always mean the workers are only responsible for generating
the initial sorted runs and the single merge pass is done in the
gather node on the fly as the tuples are read.

-- 
greg

pgsql-hackers by date:

From: Robert Haas
Date: 07 February 2016, 17:21:27
Subject: Re: Using quicksort for every external sort run

From: Tom Lane
Date: 07 February 2016, 19:17:29
Subject: Re: First-draft release notes for next week's back-branch releases

Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

Previous

Next