Home > mailing lists

Re: Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

From	Claudio Freire
Subject	Re: Parallel tuplesort (for parallel B-Tree index creation)
Date	September 6, 2016 19:57:36
Msg-id	CAGTBQpa6y=z3E4nDys6NyB9SBirQOZ63wfToG38r-TD752f_PQ@mail.gmail.com Whole thread
In response to	Re: Parallel tuplesort (for parallel B-Tree index creation) (Peter Geoghegan <pg@heroku.com>)
Responses	Re: Parallel tuplesort (for parallel B-Tree index creation)
List	pgsql-hackers

Tree view

On Mon, Aug 15, 2016 at 9:33 PM, Peter Geoghegan <pg@heroku.com> wrote:
> The patch is intended to be applied on top of parallel B-Tree patches
> 0001-* and 0002-* [1]. I happened to test it with parallelism, but
> these are all independently useful, and will be entered as a separate
> CF entry (perhaps better to commit the earlier two patches first, to
> avoid merge conflicts). I'm optimistic that we can get those 3 patches
> in the series out of the way early, without blocking on discussing
> parallel sort.

Applied patches 1 and 2, builds fine, regression tests run fine. It
was a prerequisite to reviewing patch 3 (which I'm going to do below),
so I thought I might as well report on that tidbit of info, fwiw.

> The patch makes tuplesort merging shift down and displace the root
> tuple with the tape's next preread tuple, rather than compacting and
> then inserting into the heap anew. This approach to maintaining the
> heap as tuples are returned to caller will always produce fewer
> comparisons overall. The new approach is also simpler. We were already
> shifting down to compact the heap within the misleadingly named [2]
> function tuplesort_heap_siftup() -- why not instead just use the
> caller tuple (the tuple that we currently go on to insert) when
> initially shifting down (not the heap's preexisting last tuple, which
> is guaranteed to go straight to the leaf level again)? That way, we
> don't need to enlarge the heap at all through insertion, shifting up,
> etc. We're done, and are *guaranteed* to have performed less work
> (fewer comparisons and swaps) than with the existing approach (this is
> the reason for my optimism about getting this stuff out of the way
> early).

Patch 3 applies fine to git master as of
25794e841e5b86a0f90fac7f7f851e5d950e51e2 (on top of patches 1 and 2).

Builds fine and without warnings on gcc 4.8.5 AFAICT, regression test
suite runs without issues as well.

Patch lacks any new tests, but the changed code paths seem covered
sufficiently by existing tests. A little bit of fuzzing on the patch
itself, like reverting some key changes, or flipping some key
comparisons, induces test failures as it should, mostly in cluster.

The logic in tuplesort_heap_root_displace seems sound, except:

+                */
+               memtuples[i] = memtuples[imin];
+               i = imin;
+       }
+
+       Assert(state->memtupcount > 1 || imin == 0);
+       memtuples[imin] = *newtup;
+}

Why that assert? Wouldn't it make more sense to Assert(imin < n) ?

In the meanwhile, I'll go and do some perf testing.

Assuming the speedup is realized during testing, LGTM.

pgsql-hackers by date:

From: Peter Geoghegan
Date: 06 September 2016, 19:52:11
Subject: Re: Bug in 9.6 tuplesort batch memory growth logic

From: Andrew Dunstan
Date: 06 September 2016, 20:08:41
Subject: Re: [PATCH] Alter or rename enum value

Re: Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

Previous

Next