Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

From Rushabh Lathia
Subject Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date
Msg-id CAGPqQf17vxDXFi=u9yf=aDF_0LJ+NGmRKJ0fUJYVUtzseQm6Zw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Thanks Peter and Thomas for the review comments.


On Wed, Nov 1, 2017 at 3:59 AM, Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Oct 26, 2017 at 4:22 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
> Attaching the re based patch according to the v22 parallel-hash patch sets

I took a quick look at this today, and noticed a few issues:

* make_name() is used to name files in sharedtuplestore.c, which is
what is passed to BufFileOpenShared() for parallel hash join. Your
using your own logic for that within the equivalent logtape.c call to
BufFileOpenShared(), presumably because make_name() wants to identify
participants by PID rather than by an ordinal identifier number.

I think that we need some kind of central registry for things that use
shared buffiles. It could be that sharedtuplestore.c is further
generalized to support this, or it could be that they both call
something else that takes care of naming. It's not okay to have this
left to random chance.

You're going to have to ask Thomas about this. You should also use
MAXPGPATH for the char buffer on the stack.


Used MAXPGPATH for the char buffer.
 
* This logtape.c comment needs to be updated, as it's no longer true:
 
 * successfully.  In general, workers can take it that the leader will
 * reclaim space in files under their ownership, and so should not
 * reread from tape.


Done.
 
* Robert hated the comment changes in the header of nbtsort.c. You
might want to change it back, because he is likely to be the one that
commits this.
 
* You should look for similar comments in tuplesort.c (IIRC a couple
of places will need to be revised).


Pending.
 
* tuplesort_begin_common() should actively reject a randomAccess
parallel case using elog(ERROR).


Done.
 
* tuplesort.h should note that randomAccess isn't supported, too.


Done.
 
* What's this all about?:

+ /* Accessor for the SharedBufFileSet that is at the end of Sharedsort. */
+ #define GetSharedBufFileSet(shared)                    \
+   ((BufFileSet *) (&(shared)->tapes[(shared)->nTapes]))

You can't just cast from one type to the other without regard for the
underling size of the shared memory buffer, which is what this looks
like to me. This only fails to crash because you're only abusing the
last member in the tapes array for this purpose, and there happens to
be enough shared memory slop that you get away with it. I'm pretty
sure that ltsUnify() ends up clobbering the last/leader tape, which is
a place where BufFileSet is also used, so this is just wrong. You
should rethink the shmem structure a little bit.


Fixed this by adding a SharedFileSet directly into the Sharedsort struct.

Thanks Thomas Munro for the offline help here.

* There is still that FIXME comment within leader_takeover_tapes(). I
believe that you should still have a leader tape (at least in local
memory in the leader), even though you'll never be able to do anything
with it, since randomAccess is no longer supported. You can remove the
FIXME, and just note that you have a leader tape to be consistent with
the serial case, though recognize that it's not useful. Note that even
with randomAccess, we always had the leader tape, so it's not that
different, really.


Done.
 
I suppose it might make sense to make shared->tapes not have a leader
tape. It hardly matters -- perhaps you should leave it there in order
to keep the code simple, as you'll be keeping the leader tape in local
memory, too. (But it still won't fly to continue to clobber it, of
course -- you still need to find a dedicated place for BufFileSet in
shared memory.)


Attaching the latest patch (v13) here and I will continue working on the comment
improvement part for nbtsort.c and tuplesort.c.  Also will perform more testing
with the attached patch.

Patch is re-base of v25 patch set of Parallel hash.


Thanks,
Rushabh Lathia
Attachment

pgsql-hackers by date:

Previous
From: "Moon Insung"
Date:
Subject: RE: [HACKERS][PATCH]pg_buffercache add a buffer state column, Add fuction to decode buffer state
Next
From: David Rowley
Date:
Subject: Re: [HACKERS] Proposal: Local indexes for partitioned table