Home > mailing lists

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date	January 17, 2018 19:47:24
Msg-id	CAA4eK1KcqrRyVEvn6PBsR=MeY1C8NUO0o3jmgOWhZEUOvRkaig@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) (Peter Geoghegan <pg@bowt.ie>)
Responses	Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
List	pgsql-hackers

Tree view

On Tue, Jan 16, 2018 at 6:24 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Fri, Jan 12, 2018 at 10:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> More comments:
>
> Attached patch has all open issues worked through, including those
> that I respond to or comment on below, as well as the other feedback
> from your previous e-mails. Note also that I fixed the issue that Amit
> raised,
>

I could still reproduce it. I think the way you have fixed it has a
race condition.  In _bt_parallel_scan_and_sort(), the value of
brokenhotchain is set after you signal the leader that the worker is
done (by incrementing workersFinished). Now, the leader is free to
decide based on the current shared state which can give the wrong
value.  Similarly, I think the value of havedead and reltuples can
also be wrong.

You neither seem to have fixed nor responded to the second problem
mentioned in my email upthread [1].  To reiterate, the problem is that
we can't assume that the workers we have launched will always start
and finish. It is possible that postmaster fails to start the worker
due to fork failure. In such conditions, tuplesort_leader_wait will
hang indefinitely because it will wait for the workersFinished count
to become equal to launched workers (+1, if leader participates) which
will never happen.  Am I missing something due to which this won't be
a problem?

Now, I think one argument is that such a problem can happen in a
parallel query, so it is not the responsibility of this patch to solve
it.  However, we already have a patch (there are some review comments
that needs to be addressed in the proposed patch) to solve it and this
patch is adding a new path in the code which has similar symptoms
which can't be fixed with the already proposed patch.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BizMyxzFD6k81Deyar35YJ5qdpbRTUp9cQvo%2BniQom7Q%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Fabien COELHO
Date: 17 January 2018, 19:10:10
Subject: Re: Setting BLCKSZ 4kB

From: Robert Haas
Date: 17 January 2018, 19:49:26
Subject: Re: [HACKERS] Proposal: Local indexes for partitioned table

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

Previous

Next