Home > mailing lists

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date	December 22, 2016 00:21:16
Msg-id	CAM3SWZRckVHLH2xCZVxDvzy7s0g3Vh2tax1_oxRn-cR=NVXvVA@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
List	pgsql-hackers

Tree view

On Wed, Dec 21, 2016 at 6:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> 3. Just live with the waste of space.

I am loathe to create a special case for the parallel interface too,
but I think it's possible that *no* caller will ever actually need to
live with this restriction at any time in the future. I am strongly
convinced that adopting tuplesort.c for parallelism should involve
partitioning [1]. With that approach, even randomAccess callers will
not want to read at random for one big materialized tape, since that's
at odds with the whole point of partitioning, which is to remove any
dependencies between workers quickly and early, so that as much work
as possible is pushed down into workers. If a merge join were
performed in a world where we have this kind of partitioning, we
definitely wouldn't require one big materialized tape that is
accessible within each worker.

What are the chances of any real user actually having to live with the
waste of space at some point in the future?

> Another tangentially-related problem I just realized is that we need
> to somehow handle the issues that tqueue.c does when transferring
> tuples between backends -- most of the time there's no problem, but if
> anonymous record types are involved then tuples require "remapping".
> It's probably harder to provoke a failure in the tuplesort case than
> with parallel query per se, but it's probably not impossible.

Thanks for pointing that out. I'll look into it.

BTW, I discovered a bug where there is very low memory available
within each worker -- tuplesort.c throws an error within workers
immediately. It's just a matter of making sure that they at least have
64KB of workMem, which is a pretty straightforward fix. Obviously it
makes no sense to use so little memory in the first place; this is a
corner case.

[1] https://www.postgresql.org/message-id/CAM3SWZR+ATYAzyMT+hm-Bo=1L1smtJbNDtibwBTKtYqS0dYZVg@mail.gmail.com
-- 
Peter Geoghegan

pgsql-hackers by date:

From: Tom Lane
Date: 22 December 2016, 00:08:53
Subject: Re: [HACKERS] Getting rid of "unknown error" in dblink and postgres_fdw

From: Stephen Frost
Date: 22 December 2016, 01:43:52
Subject: Re: [HACKERS] pg_dump vs. TRANSFORMs

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

Previous

Next