Re: Broken order-of-operations in parallel query latch manipulation - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Broken order-of-operations in parallel query latch manipulation
Date
Msg-id CAA4eK1J8cP0ceOBZodcbBfkPz7zhM28-OwJH+pyeYHjsaznefQ@mail.gmail.com
Whole thread Raw
In response to Broken order-of-operations in parallel query latch manipulation  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Broken order-of-operations in parallel query latch manipulation  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Aug 1, 2016 at 1:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Both shm_mq.c and nodeGather.c contain instances of this coding pattern:
>
>             WaitLatch(MyLatch, WL_LATCH_SET, 0);
>             CHECK_FOR_INTERRUPTS();
>             ResetLatch(MyLatch);
>
> I believe this is wrong and the CHECK_FOR_INTERRUPTS needs to be before
> or after the two latch operations.  As-is, if the reason somebody set
> our latch was to get us to notice that a CHECK_FOR_INTERRUPTS condition
> happened, there's a race condition where we'd fail to realize that.
>

I could see that in nodeGather.c, it might fail to notice the SetLatch
done by worker process or spuriously woken up due to SetLatch for some
unrelated reason.  However, I don't see what problem it can cause
apart from one extra loop cycle where it will try to process the tuple
when actually there is no tuple in the queue.

> Other places such as ProcWaitForSignal() do it that way; only recently
> introduced (and unproven in the field) code has it like this.
>
> Anyone want to argue it's okay as-is?
>

I don't want to argue to the change to make it same as what we do at
other places, but want to understand the problem you are seeing with
current coding pattern in nodeGather.c

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Combining hash values
Next
From: Amit Khandekar
Date:
Subject: Re: asynchronous and vectorized execution