Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date
Msg-id CAH2-WznvQFDLRk7z8MAwFQVt6zBiLPStNGcGXgqh=HeNfeXZ+Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Rushabh Lathia <rushabh.lathia@gmail.com>)
Responses Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Thomas Munro <thomas.munro@enterprisedb.com>)
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Rushabh Lathia <rushabh.lathia@gmail.com>)
List pgsql-hackers
On Thu, Oct 26, 2017 at 4:22 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
> Attaching the re based patch according to the v22 parallel-hash patch sets

I took a quick look at this today, and noticed a few issues:

* make_name() is used to name files in sharedtuplestore.c, which is
what is passed to BufFileOpenShared() for parallel hash join. Your
using your own logic for that within the equivalent logtape.c call to
BufFileOpenShared(), presumably because make_name() wants to identify
participants by PID rather than by an ordinal identifier number.

I think that we need some kind of central registry for things that use
shared buffiles. It could be that sharedtuplestore.c is further
generalized to support this, or it could be that they both call
something else that takes care of naming. It's not okay to have this
left to random chance.

You're going to have to ask Thomas about this. You should also use
MAXPGPATH for the char buffer on the stack.

* This logtape.c comment needs to be updated, as it's no longer true:
* successfully.  In general, workers can take it that the leader will* reclaim space in files under their ownership,
andso should not* reread from tape.
 

* Robert hated the comment changes in the header of nbtsort.c. You
might want to change it back, because he is likely to be the one that
commits this.

* You should look for similar comments in tuplesort.c (IIRC a couple
of places will need to be revised).

* tuplesort_begin_common() should actively reject a randomAccess
parallel case using elog(ERROR).

* tuplesort.h should note that randomAccess isn't supported, too.

* What's this all about?:

+ /* Accessor for the SharedBufFileSet that is at the end of Sharedsort. */
+ #define GetSharedBufFileSet(shared)                    \
+   ((BufFileSet *) (&(shared)->tapes[(shared)->nTapes]))

You can't just cast from one type to the other without regard for the
underling size of the shared memory buffer, which is what this looks
like to me. This only fails to crash because you're only abusing the
last member in the tapes array for this purpose, and there happens to
be enough shared memory slop that you get away with it. I'm pretty
sure that ltsUnify() ends up clobbering the last/leader tape, which is
a place where BufFileSet is also used, so this is just wrong. You
should rethink the shmem structure a little bit.

* There is still that FIXME comment within leader_takeover_tapes(). I
believe that you should still have a leader tape (at least in local
memory in the leader), even though you'll never be able to do anything
with it, since randomAccess is no longer supported. You can remove the
FIXME, and just note that you have a leader tape to be consistent with
the serial case, though recognize that it's not useful. Note that even
with randomAccess, we always had the leader tape, so it's not that
different, really.

I suppose it might make sense to make shared->tapes not have a leader
tape. It hardly matters -- perhaps you should leave it there in order
to keep the code simple, as you'll be keeping the leader tape in local
memory, too. (But it still won't fly to continue to clobber it, of
course -- you still need to find a dedicated place for BufFileSet in
shared memory.)

That's all I have right now.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
Next
From: Gilles Darold
Date:
Subject: Re: [HACKERS] proposal: schema variables