Thread: Re: pgsql: Support parallel btree index builds.

Re: pgsql: Support parallel btree index builds.

From
Alvaro Herrera
Date:
On 2018-Feb-02, Robert Haas wrote:

> Support parallel btree index builds.

While looking at a complaint related to progress report of parallel
index builds[1], I noticed this comment

+   /*
+    * Execute this worker's part of the sort.
+    *
+    * Unlike leader and serial cases, we cannot avoid calling
+    * tuplesort_performsort() for spool2 if it ends up containing no dead
+    * tuples (this is disallowed for workers by tuplesort).
+    */
+   tuplesort_performsort(btspool->sortstate);
+   if (btspool2)
+       tuplesort_performsort(btspool2->sortstate);

I've been trying to understand why this says "Unlike leader and serial
cases, ...".   I understand the "serial" part -- it refers to
_bt_leafbuild.  So I'm to understand that that one works differently;
see below.  But why does it say "the leader case"?  As far as I can see,
the leader executes exactly the same code, so what is the comment
talking about?

Now, if you do look at _bt_leafbuild(), it can be seen that nothing is
done differently there either; we're not actually skipping any calls to
tuplesort_performsort().  Any differentiation between serial/leader/
worker cases seems to be done inside that routine.  So the comment is
not very useful there either.

I am wondering if these comments are leftovers from early development
versions of this patch.  Maybe we could remove them -- or rewrite them
to indicate not that we avoid calling tuplesort_performsort(), but
instead to say that that function behaves differently.

[1] https://postgr.es/m/CAEze2Wgm-NnZe3rOnwjYTVriS8xsVhzzVBCGj34h06cDNuaTig@mail.gmail.com

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"



Re: pgsql: Support parallel btree index builds.

From
Peter Geoghegan
Date:
On Mon, Jun 7, 2021 at 4:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Now, if you do look at _bt_leafbuild(), it can be seen that nothing is
> done differently there either; we're not actually skipping any calls to
> tuplesort_performsort().  Any differentiation between serial/leader/
> worker cases seems to be done inside that routine.  So the comment is
> not very useful there either.
>
> I am wondering if these comments are leftovers from early development
> versions of this patch.  Maybe we could remove them -- or rewrite them
> to indicate not that we avoid calling tuplesort_performsort(), but
> instead to say that that function behaves differently.

It's talking about something described in the tuplesort.h contract. It
applies to a tuplesort state, not a process -- the leader always has two
tuplesort states (the leader tuplesort state, plus its own worker
tuplesort state).

The leader tuplesort is very much like a serial tuplesort. In
particular, as step 8 in tuplesort.h points out, the leader doesn't
have to call tuplesort_performsort() for the leader tuplesort state if
it already knows that there is no input to sort.

This matters less than it might in a world where we had a user of
parallel tuplesort that doesn't always simply make the leader
participate as a worker. There is a build-time testing option in
nbtsort.c that does this for parallel CREATE INDEX, actually -- see
DISABLE_LEADER_PARTICIPATION.

You kind of have a point about this being something that made more
sense in revisions of the patch from before commit, though. There was
a question about the cost model and the role of the leader that was
ultimately resolved by inventing the current simple behavior. So
feel free to change the wording now.

--
Peter Geoghegan