Re: Parallel tuplesort, partitioning, merging, and the future - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Parallel tuplesort, partitioning, merging, and the future
Date
Msg-id CAM3SWZQ0F7skDggoUG0KMGLb6G9yBC=0x0PP3t01yy4G+JqN6Q@mail.gmail.com
Whole thread Raw
In response to Re: Parallel tuplesort, partitioning, merging, and the future  (Claudio Freire <klaussfreire@gmail.com>)
List pgsql-hackers
On Wed, Aug 10, 2016 at 12:08 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> I think it's a great design, but for that, per-worker final tapes have
> to always be random-access.

Thanks. I don't think I need to live with the randomAccess
restriction, because I can be clever about reading only the first
tuple on each logtape.c block initially. Much later, when the binary
search gets down to seeking within a single block, everything in the
block can be read at once into memory, and we can take the binary
search to that other representation. This latter part only needs to
happen once or twice per partition boundary per worker.

> I'm not hugely familiar with the code, but IIUC there's some penalty
> to making them random-access right?

Yeah, there is. For one thing, you have to store the length of the
tuple twice, to support incremental seeking in both directions. For
another, you cannot perform the final merge on-the-fly; you must
produce a serialized tape as output, which is used subsequently to
support random seeks. There is no penalty when you manage to do the
sort in memory, though (not that that has anything to do with parallel
sort).

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: new pgindent run before branch?
Next
From: Tom Lane
Date:
Subject: Re: Set log_line_prefix and application name in test drivers