Re: Windows buildfarm members vs. new async-notify isolation test - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Windows buildfarm members vs. new async-notify isolation test
Date
Msg-id CAA4eK1+eo0t20xkPX9-WGdO+tDwEv9krba-R2F3OSqyLAWW59w@mail.gmail.com
Whole thread Raw
In response to Re: Windows buildfarm members vs. new async-notify isolation test  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Windows buildfarm members vs. new async-notify isolation test  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, Dec 8, 2019 at 10:44 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Dec 8, 2019 at 1:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > So, just idly looking at the code in src/backend/port/win32/signal.c
> > and src/port/kill.c, I have to wonder why we have this baroque-looking
> > design of using *two* signal management threads.  And, if I'm
> > reading it right, we create an entire new pipe object and an entire
> > new instance of the second thread for each incoming signal.  Plus, the
> > signal senders use CallNamedPipe (hence, underneath, TransactNamedPipe)
> > which means they in effect wait for the recipient's signal-handling
> > thread to ack receipt of the signal.  Maybe there's a good reason for
> > all this but it sure seems like a lot of wasted cycles from here.
> >
> > I have to wonder why we don't have a single named pipe that lasts as
> > long as the recipient process does, and a signal sender just writes
> > one byte to it, and considers the signal delivered if it is able to
> > do that.  The "message" semantics seem like overkill for that.
> >
> > I dug around in the contemporaneous archives and could only find
> > https://www.postgresql.org/message-id/303E00EBDD07B943924382E153890E5434AA47%40cuthbert.rcsinc.local
> > which describes the existing approach but fails to explain why we
> > should do it like that.
> >
> > This might or might not have much to do with the immediate problem,
> > but I can't help wondering if there's some race-condition-ish behavior
> > in there that's contributing to what we're seeing.
> >
>
> On the receiving side, the work we do after the 'notify' is finished
> (or before CallNamedPipe gets control back) is as follows:
>
> pg_signal_dispatch_thread()
> {
> ..
> FlushFileBuffers(pipe);
> DisconnectNamedPipe(pipe);
> CloseHandle(pipe);
>
> pg_queue_signal(sigNum);
> }
>
> It seems most of these are the system calls which makes me think that
> they might be slow enough on some Windows version that it could lead
> to such race condition.
>

IIUC, once the dispatch thread has queued the signal
(pg_queue_signal), the next CHECK_FOR_INTERRUPTS by the main thread
will execute the signal.  So, if we move pg_queue_signal() before we
do WriteFile in pg_signal_dispatch_thread(), this race condition will
be closed.  Now, we might not want to do this as that will add some
more time (even though very less) before notify on the other side can
finish or maybe there is some technical problem with this idea which I
am not able to see immediately.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Hadi Moshayedi
Date:
Subject: Fix a comment in CreateLockFile
Next
From: Amit Kapila
Date:
Subject: Re: Fix a comment in CreateLockFile