Re: kqueue - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: kqueue
Date
Msg-id CAEepm=3B5P_-aSAGUUHqtN06UezKW-zXxD+jF3KoGPmg_O6wPQ@mail.gmail.com
Whole thread Raw
In response to Re: kqueue  (Marko Tiikkaja <marko@joh.to>)
Responses Re: kqueue  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On Wed, Sep 7, 2016 at 12:32 AM, Marko Tiikkaja <marko@joh.to> wrote:
> I've tested and reviewed this, and it looks good to me, other than this
> part:
>
> +   /*
> +    * kevent guarantees that the change list has been processed in the
> EINTR
> +    * case.  Here we are only applying a change list so EINTR counts as
> +    * success.
> +    */
>
> this doesn't seem to be guaranteed on old versions of FreeBSD or any other
> BSD flavors, so I don't think it's a good idea to bake the assumption into
> this code.  Or what do you think?

Thanks for the testing and review!

Hmm.  Well spotted.  I wrote that because the man page from FreeBSD 10.3 says:

  When kevent() call fails with EINTR error, all changes in the changelist
  have been applied.

This sentence is indeed missing from the OpenBSD, NetBSD and OSX man
pages.  It was introduced by FreeBSD commit r280818[1] which made
kevent a Pthread cancellation point.  I investigated whether it is
also true in older FreeBSD and the rest of the BSD family.  I believe
the answer is yes.

1.  That commit doesn't do anything that would change the situation:
it just adds thread cancellation wrapper code to libc and libthr which
exits under certain conditions but otherwise lets EINTR through to the
caller.  So I think the new sentence is documentation of the existing
behaviour of the syscall.

2.  I looked at the code in FreeBSD 4.1[2] (the original kqueue
implementation from which all others derive) and the four modern
OSes[3][4][5][6].  They vary a bit but in all cases, the first place
that can produce EINTR appears to be in kqueue_scan when the
(variously named) kernel sleep routine is invoked, which can return
EINTR or ERESTART  (later translated to EINTR because kevent doesn't
support restarting).  That comes after all changes have been applied.
In fact it's unreachable if nevents is 0: OSX doesn't call kqueue_scan
in that case, and the others return early from kqueue_scan in that
case.

3.  An old email[7] from Jonathan Lemon (creator of kqueue) seems to
support that at least in respect of ancient FreeBSD.  He wrote:
"Technically, an EINTR is returned when a signal interrupts the
process after it goes to sleep (that is, after it calls tsleep).  So
if (as an example) you call kevent() with a zero valued timespec,
you'll never get EINTR, since there's no possibility of it sleeping."

So if I've understood correctly, what I wrote in the v4 patch is
universally true, but it's also moot in this case: kevent cannot fail
with errno == EINTR because nevents == 0.  On that basis, here is a
new version with the comment and special case for EINTR removed.

[1] https://svnweb.freebsd.org/base?view=revision&revision=280818
[2] https://github.com/freebsd/freebsd/blob/release/4.1.0/sys/kern/kern_event.c
[3] https://github.com/freebsd/freebsd/blob/master/sys/kern/kern_event.c
[4] https://github.com/IIJ-NetBSD/netbsd-src/blob/master/sys/kern/kern_event.c
[5] https://github.com/openbsd/src/blob/master/sys/kern/kern_event.c
[6] https://github.com/opensource-apple/xnu/blob/master/bsd/kern/kern_event.c
[7] http://marc.info/?l=freebsd-arch&m=98147346707952&w=2

--
Thomas Munro
http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Claudio Freire
Date:
Subject: Re: Tuplesort merge pre-reading
Next
From: Peter Geoghegan
Date:
Subject: cost_sort() may need to be updated