Thread: Excessive PostmasterIsAlive calls slow down WAL redo

Excessive PostmasterIsAlive calls slow down WAL redo

From

Heikki Linnakangas

Date:

05 April 2018, 10:23:43

I started looking at the "Improve compactify_tuples and 
PageRepairFragmentation" patch, and set up a little performance test of 
WAL replay. I ran pgbench, scale 5, to generate about 1 GB of WAL, and 
timed how long it takes to replay that WAL. To focus purely on CPU 
overhead, I kept the data directory in /dev/shm/.

Profiling that, without any patches applied, I noticed that a lot of 
time was spent in read()s on the postmaster-death pipe, i.e. in 
PostmasterIsAlive(). We call that between *every* WAL record.

As a quick test to see how much that matters, I commented out the 
PostmasterIsAlive() call from HandleStartupProcInterrupts(). On 
unpatched master, replaying that 1 GB of WAL takes about 20 seconds on 
my laptop. Without the PostmasterIsAlive() call, 17 seconds.

That seems like an utter waste of time. I'm almost inclined to call that 
a performance bug. As a straightforward fix, I'd suggest that we call 
HandleStartupProcInterrupts() in the WAL redo loop, not on every record, 
but only e.g. every 32 records. That would make the main redo loop less 
responsive to shutdown, SIGHUP, or postmaster death, but that seems OK. 
There are also calls to HandleStartupProcInterrupts() in the various 
other loops, that wait for new WAL to arrive or recovery delay, so this 
would only affect the case where we're actively replaying records.

- Heikki

Attachment

0001-Call-HandleStartupProcInterrupts-less-frequently-in-.patch

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Alvaro Herrera

Date:

05 April 2018, 16:42:37

Heikki Linnakangas wrote:

> That seems like an utter waste of time. I'm almost inclined to call that a
> performance bug. As a straightforward fix, I'd suggest that we call
> HandleStartupProcInterrupts() in the WAL redo loop, not on every record, but
> only e.g. every 32 records. That would make the main redo loop less
> responsive to shutdown, SIGHUP, or postmaster death, but that seems OK.
> There are also calls to HandleStartupProcInterrupts() in the various other
> loops, that wait for new WAL to arrive or recovery delay, so this would only
> affect the case where we're actively replaying records.

I think calling PostmasterIsAlive only every 32 records is sensible, but
I'm not so sure about the other two things happening in
HandleStartupProcInterrupts (which are much cheaper anyway, since they
only read a local flag); if records are large or otherwise slow to
replay, I'd rather not wait for 32 of them before the process honoring
my shutdown request.  Why not split the function in two, or maybe add a
flag "check for postmaster alive too", passed as true every 32 records?

The lesson seems to be that PostmasterIsAlive is moderately expensive
(one syscall).  It's also used in WaitLatchOrSocket when
WL_POSTMASTER_DEATH is given.  I didn't audit its callers terribly
carefully but I think these uses are not as performance-sensitive.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Simon Riggs

Date:

05 April 2018, 16:50:24

On 5 April 2018 at 08:23, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

> That seems like an utter waste of time. I'm almost inclined to call that a
> performance bug. As a straightforward fix, I'd suggest that we call
> HandleStartupProcInterrupts() in the WAL redo loop, not on every record, but
> only e.g. every 32 records. That would make the main redo loop less
> responsive to shutdown, SIGHUP, or postmaster death, but that seems OK.
> There are also calls to HandleStartupProcInterrupts() in the various other
> loops, that wait for new WAL to arrive or recovery delay, so this would only
> affect the case where we're actively replaying records.

+1

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Andres Freund

Date:

05 April 2018, 21:27:58

Hi,

On 2018-04-05 10:23:43 +0300, Heikki Linnakangas wrote:
> Profiling that, without any patches applied, I noticed that a lot of time
> was spent in read()s on the postmaster-death pipe, i.e. in
> PostmasterIsAlive(). We call that between *every* WAL record.

> That seems like an utter waste of time. I'm almost inclined to call that a
> performance bug. As a straightforward fix, I'd suggest that we call
> HandleStartupProcInterrupts() in the WAL redo loop, not on every record, but
> only e.g. every 32 records.

I agree this is a performance problem. I do however not like the fix.
ISTM the better approach would be to try to reduce the cost of
PostmasterIsAlive() on common platforms - it should be nearly free if
done right.

One way to achieve that would e.g. to stop ignoring SIGPIPE and instead
check for postmaster death inside the handler, without reacting to
it. Then the the actual PostmasterIsAlive() checks are just a check of a
single sig_atomic_t.

Greetings,

Andres Freund

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Tom Lane

Date:

05 April 2018, 21:39:27

Andres Freund <andres@anarazel.de> writes:
> ISTM the better approach would be to try to reduce the cost of
> PostmasterIsAlive() on common platforms - it should be nearly free if
> done right.

+1 if it's doable.

> One way to achieve that would e.g. to stop ignoring SIGPIPE and instead
> check for postmaster death inside the handler, without reacting to
> it. Then the the actual PostmasterIsAlive() checks are just a check of a
> single sig_atomic_t.

AFAIR, we do not get SIGPIPE on the postmaster pipe, because nobody
ever writes to it.  So this sketch seems off to me, even assuming that
not-ignoring SIGPIPE causes no problems elsewhere.

While it's not POSIX, at least some platforms are capable of delivering
a separate signal on parent process death.  Perhaps using that where
available would be enough of an answer.

            regards, tom lane

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Andres Freund

Date:

05 April 2018, 22:20:38

On 2018-04-05 14:39:27 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > ISTM the better approach would be to try to reduce the cost of
> > PostmasterIsAlive() on common platforms - it should be nearly free if
> > done right.
> 
> +1 if it's doable.
> 
> > One way to achieve that would e.g. to stop ignoring SIGPIPE and instead
> > check for postmaster death inside the handler, without reacting to
> > it. Then the the actual PostmasterIsAlive() checks are just a check of a
> > single sig_atomic_t.
> 
> AFAIR, we do not get SIGPIPE on the postmaster pipe, because nobody
> ever writes to it.  So this sketch seems off to me, even assuming that
> not-ignoring SIGPIPE causes no problems elsewhere.

Yea, you're probably right. I'm mostly brainstorming here.

(FWIW, I don't think not ignoring SIGPIPE would be a huge issue if we
don't immediately take any action on its account)


> While it's not POSIX, at least some platforms are capable of delivering
> a separate signal on parent process death.  Perhaps using that where
> available would be enough of an answer.

Yea, that'd work on linux. Which is probably the platform 80-95% of
performance critical PG workloads run on.  There's
JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE on windows, which might also work,
but I'm not sure it provides enough opportunity for cleanup.

Greetings,

Andres Freund

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Michael Paquier

Date:

06 April 2018, 08:08:36

On Thu, Apr 05, 2018 at 02:39:27PM -0400, Tom Lane wrote:
> While it's not POSIX, at least some platforms are capable of delivering
> a separate signal on parent process death.  Perhaps using that where
> available would be enough of an answer.

Are you referring to prctl here?

+1 on improving performance of PostmasterIsAlive() if possible instead
of checking for it every N records.  You barely see records creating a
database from a large template close to each other but that could hurt
the response time of the redo process.
--
Michael

Attachment

signature.asc

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Stephen Frost

Date:

06 April 2018, 14:39:28

Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2018-04-05 14:39:27 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > ISTM the better approach would be to try to reduce the cost of
> > > PostmasterIsAlive() on common platforms - it should be nearly free if
> > > done right.
> >
> > +1 if it's doable.
[...]
> > While it's not POSIX, at least some platforms are capable of delivering
> > a separate signal on parent process death.  Perhaps using that where
> > available would be enough of an answer.
>
> Yea, that'd work on linux. Which is probably the platform 80-95% of
> performance critical PG workloads run on.  There's
> JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE on windows, which might also work,
> but I'm not sure it provides enough opportunity for cleanup.

While I tend to agree that it'd be nice to just make it cheaper, that
doesn't seem like something that we'd be likely to back-patch and I tend
to share Heikki's feelings that this is a performance regression we
should be considering fixing in released versions.

What Alvaro posted up-thread seems like it might be a small enough
change to still be reasonable to back-patch and we can still think about
ways to make PostmasterIsAlive() cheaper in the future.

Thanks!

Stephen

Attachment

signature.asc

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Andres Freund

Date:

06 April 2018, 19:39:29

Hi,

On 2018-04-06 07:39:28 -0400, Stephen Frost wrote:
> While I tend to agree that it'd be nice to just make it cheaper, that
> doesn't seem like something that we'd be likely to back-patch and I tend
> to share Heikki's feelings that this is a performance regression we
> should be considering fixing in released versions.

I'm doubtful about fairly characterizing this as a performance bug. It's
not like we've O(n^2) behaviour on our hand, and if your replay isn't of
a toy workload normally that one syscall isn't going to make a huge
difference because you've actual IO and such going on.

I'm also doubtful that it's sane to just check every 32 records. There's
records that can take a good chunk of time, and just continuing for
another 31 records seems like a bad idea.

Greetings,

Andres Freund

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Heikki Linnakangas

Date:

06 April 2018, 20:09:30

On 06/04/18 19:39, Andres Freund wrote:
> On 2018-04-06 07:39:28 -0400, Stephen Frost wrote:
>> While I tend to agree that it'd be nice to just make it cheaper, that
>> doesn't seem like something that we'd be likely to back-patch and I tend
>> to share Heikki's feelings that this is a performance regression we
>> should be considering fixing in released versions.

To be clear, this isn't a performance *regression*. It's always been bad.

I'm not sure if I'd backpatch this. Maybe after it's been in 'master' 
for a while and we've gotten some field testing of it.

> I'm doubtful about fairly characterizing this as a performance bug. It's
> not like we've O(n^2) behaviour on our hand, and if your replay isn't of
> a toy workload normally that one syscall isn't going to make a huge
> difference because you've actual IO and such going on.

If all the data fits in the buffer cache, then there would be no I/O. 
Think of a smallish database that's heavily updated. There are a lot of 
real applications like that.

> I'm also doubtful that it's sane to just check every 32 records. There's
> records that can take a good chunk of time, and just continuing for
> another 31 records seems like a bad idea.

It's pretty arbitrary, I admit. It's the best I could come with, though. 
If we could get a signal on postmaster death, that'd be best, but that's 
a much bigger patch, and I'm worried that it would bring new portability 
and reliability issues.

I'm not too worried about 32 records being too long an interval. True, 
replaying 32 CREATE DATABASE records would take a long time. But pretty 
much all other WAL records are fast enough to apply. We could make it 
every 8 records rather than 32, if that makes you feel better. Or add 
some extra conditions, like always check it when stepping to a new WAL 
segment. In any case, the fundamental difference would be though to not 
check it between every record.

- Heikki

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Stephen Frost

Date:

06 April 2018, 22:09:52

Greetings,

* Heikki Linnakangas (hlinnaka@iki.fi) wrote:
> On 06/04/18 19:39, Andres Freund wrote:
> >On 2018-04-06 07:39:28 -0400, Stephen Frost wrote:
> >>While I tend to agree that it'd be nice to just make it cheaper, that
> >>doesn't seem like something that we'd be likely to back-patch and I tend
> >>to share Heikki's feelings that this is a performance regression we
> >>should be considering fixing in released versions.
>
> To be clear, this isn't a performance *regression*. It's always been bad.

Oh, I see, apologies for the confusion, my initial read was that this
was due to some patch that had gone in previously, hence it was an
actual regression.  I suppose I tend to view performance issues as
either "regression" or "opportunity for improvement" and when you said
"bug" it made me think it was a regression. :)

> I'm not sure if I'd backpatch this. Maybe after it's been in 'master' for a
> while and we've gotten some field testing of it.

If it's not a regression then there's I definitely think the bar is much
higher to consider this something to back-patch.  I wouldn't typically
argue for back-patching a performance improvement unless it's to address
a specific regression.

Thanks!

Stephen

Attachment

signature.asc

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Andres Freund

Date:

10 April 2018, 03:53:30

Hi,

On 2018-04-05 12:20:38 -0700, Andres Freund wrote:
> > While it's not POSIX, at least some platforms are capable of delivering
> > a separate signal on parent process death.  Perhaps using that where
> > available would be enough of an answer.
> 
> Yea, that'd work on linux. Which is probably the platform 80-95% of
> performance critical PG workloads run on.  There's
> JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE on windows, which might also work,
> but I'm not sure it provides enough opportunity for cleanup.

I coincidentally got pinged about our current approach causing
performance problems on FreeBSD and started writing a patch.  The
problem there appears to be that constantly attaching events to the read
pipe end, from multiple processes, causes significant contention inside
the kernel. Which isn't that surprising.   That's distinct from the
problem netbsd/openbsd reported a while back (superflous wakeups).

That person said he'd work on adding an equivalent of linux'
prctl(PR_SET_PDEATHSIG) to FreeBSD.

It's not particularly hard to whip something up that kind of
works. Getting some of the details right isn't entirely as clear
however:

It's trivial to make PostmasterIsAlive() cheap. In my prototype I've
setup PR_SET_PDEATHSIG to deliver SIGINFO to postmaster children. That
sets parent_potentially_died to true. PostmasterIsAlive() only runs the
main body when it's true, making it very cheap in the common case.

What I'm however not quite as clear on is how to best change
WL_POSTMASTER_DEATH.

One appraoch approach I can come up with is to use the self-pipe for
both WL_POSTMASTER_DEATH and WL_LATCH_SET. As we can cheaply check if
either set->latch->is_set or parent_potentially_died, re-using the
selfpipe works well enough.  What I'm however not quite sure about is
how to best do so with epoll() - there we explicitly do not iterate over
all registered events, but use epoll_event->data to get just the wait
event associated with a readyness event.  Therefor we've to keep track
of whether a WaitEventSet has both WL_POSTMASTER_DEATH and WL_LATCH_SET
registered, and check in both WL_POSTMASTER_DEATH/WL_LATCH_SET whether
the event is being waited for.

Another approach, that's simpler to implement, is to simply have a
second selfpipe, just for WL_POSTMASTER_DEATH.

A third question is whether we want to keep postmaster_alive_fds if we
have PR_SET_PDEATHSIG. I'd want to continue re-checking that the parent
is actually dead, but we could also do so by kill()ing postmaster like
we used to do before 89fd72cbf26f5d2e3d86ab19c1ead73ab8fac0fe.  It's not
bad to get rid of unnecessary filedescriptors. Would imply a semantic
change however.

Greetings,

Andres Freund

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Alvaro Herrera

Date:

10 April 2018, 04:31:07

Andres Freund wrote:

> Another approach, that's simpler to implement, is to simply have a
> second selfpipe, just for WL_POSTMASTER_DEATH.

Would it work to use this second pipe, to which each child writes a byte
that postmaster never reads, and then rely on SIGPIPE when postmaster
dies?  Then we never need to do a syscall.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Andres Freund

Date:

10 April 2018, 04:34:31


On April 9, 2018 6:31:07 PM PDT, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>Andres Freund wrote:
>
>> Another approach, that's simpler to implement, is to simply have a
>> second selfpipe, just for WL_POSTMASTER_DEATH.
>
>Would it work to use this second pipe, to which each child writes a
>byte
>that postmaster never reads, and then rely on SIGPIPE when postmaster
>dies?  Then we never need to do a syscall.

I'm not following, could you expand on what you're suggesting?  Note that you do not get SIGPIPE for already buffered
writes. Which syscall can we avoid? 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Thomas Munro

Date:

10 April 2018, 04:36:19

On Tue, Apr 10, 2018 at 12:53 PM, Andres Freund <andres@anarazel.de> wrote:
> I coincidentally got pinged about our current approach causing
> performance problems on FreeBSD and started writing a patch.  The
> problem there appears to be that constantly attaching events to the read
> pipe end, from multiple processes, causes significant contention inside
> the kernel. Which isn't that surprising.   That's distinct from the
> problem netbsd/openbsd reported a while back (superflous wakeups).
>
> That person said he'd work on adding an equivalent of linux'
> prctl(PR_SET_PDEATHSIG) to FreeBSD.

Just an idea, not tested: what about a reusable WaitEventSet with zero
timeout?  Using the kqueue patch, that'd call kevent() which'd return
immediately and tell you if any postmaster death notifications had
arrive on your queue since last time you asked.  It doesn't even touch
the pipe, or any other kernel objects apart from your own queue IIUC.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Andres Freund

Date:

10 April 2018, 04:40:37


On April 9, 2018 6:36:19 PM PDT, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>On Tue, Apr 10, 2018 at 12:53 PM, Andres Freund <andres@anarazel.de>
>wrote:
>> I coincidentally got pinged about our current approach causing
>> performance problems on FreeBSD and started writing a patch.  The
>> problem there appears to be that constantly attaching events to the
>read
>> pipe end, from multiple processes, causes significant contention
>inside
>> the kernel. Which isn't that surprising.   That's distinct from the
>> problem netbsd/openbsd reported a while back (superflous wakeups).
>>
>> That person said he'd work on adding an equivalent of linux'
>> prctl(PR_SET_PDEATHSIG) to FreeBSD.
>
>Just an idea, not tested: what about a reusable WaitEventSet with zero
>timeout?  Using the kqueue patch, that'd call kevent() which'd return
>immediately and tell you if any postmaster death notifications had
>arrive on your queue since last time you asked.  It doesn't even touch
>the pipe, or any other kernel objects apart from your own queue IIUC.

We still create a lot of WES adhoc in a number of places. I don't think this would solve that?  The problem isn't that
IsAliveis expensive, it's that polling is expensive. It's possible that using kqueue would fix that, because the
highestfrequency cases use a persistent WES. 

Andres

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Alvaro Herrera

Date:

10 April 2018, 04:57:23

Andres Freund wrote:
> 
> On April 9, 2018 6:31:07 PM PDT, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> >Would it work to use this second pipe, to which each child writes a
> >byte that postmaster never reads, and then rely on SIGPIPE when
> >postmaster dies?  Then we never need to do a syscall.
> 
> I'm not following, could you expand on what you're suggesting?  Note
> that you do not get SIGPIPE for already buffered writes.  Which
> syscall can we avoid?

Ah.  I was thinking we'd get SIGPIPE from the byte sent at the start, as
soon as the kernel saw that postmaster abandoned the fd by dying.
Scratch that then.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Andres Freund

Date:

10 April 2018, 04:59:04


On April 9, 2018 6:57:23 PM PDT, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>Andres Freund wrote:
>>
>> On April 9, 2018 6:31:07 PM PDT, Alvaro Herrera
><alvherre@alvh.no-ip.org> wrote:
>
>> >Would it work to use this second pipe, to which each child writes a
>> >byte that postmaster never reads, and then rely on SIGPIPE when
>> >postmaster dies?  Then we never need to do a syscall.
>>
>> I'm not following, could you expand on what you're suggesting?  Note
>> that you do not get SIGPIPE for already buffered writes.  Which
>> syscall can we avoid?
>
>Ah.  I was thinking we'd get SIGPIPE from the byte sent at the start,
>as
>soon as the kernel saw that postmaster abandoned the fd by dying.
>Scratch that then.

Had the same idea, but unfortunately reality, in the form of a test program, cured me of my hope ;)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Heikki Linnakangas

Date:

11 April 2018, 13:22:28

On 10/04/18 04:36, Thomas Munro wrote:
> On Tue, Apr 10, 2018 at 12:53 PM, Andres Freund <andres@anarazel.de> wrote:
>> I coincidentally got pinged about our current approach causing
>> performance problems on FreeBSD and started writing a patch.  The
>> problem there appears to be that constantly attaching events to the read
>> pipe end, from multiple processes, causes significant contention inside
>> the kernel. Which isn't that surprising.   That's distinct from the
>> problem netbsd/openbsd reported a while back (superflous wakeups).
>>
>> That person said he'd work on adding an equivalent of linux'
>> prctl(PR_SET_PDEATHSIG) to FreeBSD.
> 
> Just an idea, not tested: what about a reusable WaitEventSet with zero
> timeout?  Using the kqueue patch, that'd call kevent() which'd return
> immediately and tell you if any postmaster death notifications had
> arrive on your queue since last time you asked.  It doesn't even touch
> the pipe, or any other kernel objects apart from your own queue IIUC.

Hmm. In PostmasterIsAlive(), you'd still need to call kevent() to check 
if postmaster has died. It would just replace the current read() syscall 
on the pipe with the kevent() syscall. Is it faster?

- Heikki

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Thomas Munro

Date:

11 April 2018, 13:55:58

On Wed, Apr 11, 2018 at 10:22 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 10/04/18 04:36, Thomas Munro wrote:
>> Just an idea, not tested: what about a reusable WaitEventSet with zero
>> timeout?  Using the kqueue patch, that'd call kevent() which'd return
>> immediately and tell you if any postmaster death notifications had
>> arrive on your queue since last time you asked.  It doesn't even touch
>> the pipe, or any other kernel objects apart from your own queue IIUC.
>
> Hmm. In PostmasterIsAlive(), you'd still need to call kevent() to check if
> postmaster has died. It would just replace the current read() syscall on the
> pipe with the kevent() syscall. Is it faster?

It should be (based on the report of read() being slow here because of
contention on the pipe itself, I guess because of frequent poll() in
WaitLatch() elsewhere?).

But as I said over on another thread[1] (sorry, it got tangled up with
that other conversation about a related topic), maybe testing
getppid() would be simpler and about as fast as possible given you
have to make a syscall (all processes should normally be children of
postmaster, right?).  And only check every nth time through the loop,
as you said, to avoid high frequency syscalls.  I think I might have
been guilty of having a solution looking for a problem, there ;-)

[1] https://www.postgresql.org/message-id/CAEepm%3D298omvRS2C8WO%3DCxp%2BWcM-Vn8V3x4_QhxURhKTRCSfYg%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Thomas Munro

Date:

18 April 2018, 08:04:17

On Wed, Apr 11, 2018 at 10:22 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> On Tue, Apr 10, 2018 at 12:53 PM, Andres Freund <andres@anarazel.de>
>> wrote:
>>> That person said he'd work on adding an equivalent of linux'
>>> prctl(PR_SET_PDEATHSIG) to FreeBSD.

Here is an implementation of Andres's idea for Linux, and also for
patched FreeBSD (for later if/when that lands).  Do you think this
makes sense Heikki?  I am planning to add this to the next CF.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Thomas Munro

Date:

19 April 2018, 06:05:50

On Wed, Apr 18, 2018 at 5:04 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Apr 11, 2018 at 10:22 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>> On Tue, Apr 10, 2018 at 12:53 PM, Andres Freund <andres@anarazel.de>
>>> wrote:
>>>> That person said he'd work on adding an equivalent of linux'
>>>> prctl(PR_SET_PDEATHSIG) to FreeBSD.
>
> Here is an implementation of Andres's idea for Linux, and also for
> patched FreeBSD (for later if/when that lands).  Do you think this
> makes sense Heikki?  I am planning to add this to the next CF.

Here's a new version with a stupid bug fixed (I accidentally posted a
testing version that returned false instead of true, as cfbot quickly
pointed out -- d'oh).

By the way, these patches only use the death signal to make
PostmasterIsAlive() fast, for use by busy loops like recovery.  The
postmaster pipe is still used for IO/timeout loops to detect
postmaster death.  In theory you could get rid of the postmaster pipe
completely when USE_POSTMASTER_DEATH_SIGNAL is defined and make it
like the latch code, using the same self-pipe.  I'm not sure if there
is anything to be gained by that (that wasn't already gained by using
epoll/kqueue) so I'm not proposing it.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Andres Freund

Date:

19 April 2018, 09:20:33


On April 18, 2018 8:05:50 PM PDT, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>On Wed, Apr 18, 2018 at 5:04 PM, Thomas Munro
><thomas.munro@enterprisedb.com> wrote:
>> On Wed, Apr 11, 2018 at 10:22 PM, Heikki Linnakangas
><hlinnaka@iki.fi> wrote:
>>>> On Tue, Apr 10, 2018 at 12:53 PM, Andres Freund
><andres@anarazel.de>
>>>> wrote:
>>>>> That person said he'd work on adding an equivalent of linux'
>>>>> prctl(PR_SET_PDEATHSIG) to FreeBSD.
>>
>> Here is an implementation of Andres's idea for Linux, and also for
>> patched FreeBSD (for later if/when that lands).  Do you think this
>> makes sense Heikki?  I am planning to add this to the next CF.
>
>Here's a new version with a stupid bug fixed (I accidentally posted a
>testing version that returned false instead of true, as cfbot quickly
>pointed out -- d'oh).
>
>By the way, these patches only use the death signal to make
>PostmasterIsAlive() fast, for use by busy loops like recovery.  The
>postmaster pipe is still used for IO/timeout loops to detect
>postmaster death.  In theory you could get rid of the postmaster pipe
>completely when USE_POSTMASTER_DEATH_SIGNAL is defined and make it
>like the latch code, using the same self-pipe.  I'm not sure if there
>is anything to be gained by that (that wasn't already gained by using
>epoll/kqueue) so I'm not proposing it.

In my local prototype patch I'd done so. And I think it makes sense, because it's s somewhat contended object, even
whenusing epoll/kqueue. On the other hand, it makes the chide changed a bit harder, making it pretty was were I
suspendedthe work for a bit... 
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Thomas Munro

Date:

19 April 2018, 10:34:46

On Thu, Apr 19, 2018 at 6:20 PM, Andres Freund <andres@anarazel.de> wrote:
> On April 18, 2018 8:05:50 PM PDT, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>>By the way, these patches only use the death signal to make
>>PostmasterIsAlive() fast, for use by busy loops like recovery.  The
>>postmaster pipe is still used for IO/timeout loops to detect
>>postmaster death.  In theory you could get rid of the postmaster pipe
>>completely when USE_POSTMASTER_DEATH_SIGNAL is defined and make it
>>like the latch code, using the same self-pipe.  I'm not sure if there
>>is anything to be gained by that (that wasn't already gained by using
>>epoll/kqueue) so I'm not proposing it.
>
> In my local prototype patch I'd done so. And I think it makes sense, because it's s somewhat contended object, even
whenusing epoll/kqueue. On the other hand, it makes the chide changed a bit harder, making it pretty was were I
suspendedthe work for a bit... 

Hmm.  I thought the whole idea of these interfaces was "don't call us,
we'll call you" inside the kernel, so you can add thousands of pipes
if you like, but epoll/kevent will only check the queue and then wait
for notification, rather than interacting with the referenced objects?

If that is true, and given that we need to maintain the pipe-based
code anyway for platforms that need it, then what would we gain by
adding a different code path just because we can?  Obviously
WaitLatch() (the thing that creates, adds pipe, waits, then destroys
every time) could save time by not having to deal with a postmaster
pipe, but the solution to that is another patch that switches more
things to reusable WES objects.

--
Thomas Munro
http://www.enterprisedb.com

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Thomas Munro

Date:

21 April 2018, 03:25:27

Here's a new version, because FreeBSD's new interface changed slightly.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Michael Paquier

Date:

24 April 2018, 10:37:40

On Sat, Apr 21, 2018 at 12:25:27PM +1200, Thomas Munro wrote:
> Here's a new version, because FreeBSD's new interface changed slightly.

I have been looking at the proposed set for Linux, and the numbers are
here.  By replaying 1GB worth of WAL after a pgbench run with the data
folder on a tmpfs the recovery time goes from 33s to 28s, so that's a
nice gain.

Do you have numbers with FreeBSD?  I get that this would be more
difficult to set up without a GA release perhaps...

I can also see the difference in profiles by looking for
HandleStartupProcInterrupts which gets close 10% of the attention when
unpatched, and down to 0.1% when patched.

@@ -2484,6 +2484,8 @@ ClosePostmasterPorts(bool am_syslogger)
     if (bonjour_sdref)
         close(DNSServiceRefSockFD(bonjour_sdref));
 #endif
+
+    PostmasterDeathInit();

Thomas, trying to understand here...  Why this place for the signal
initialization?  Wouldn't InitPostmasterChild() be a more logical place
as we'd want to have this logic caught by all other processes?
--
Michael

Attachment

signature.asc

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Thomas Munro

Date:

25 April 2018, 09:23:26

On Tue, Apr 24, 2018 at 7:37 PM, Michael Paquier <michael@paquier.xyz> wrote:
> I have been looking at the proposed set for Linux, and the numbers are
> here.  By replaying 1GB worth of WAL after a pgbench run with the data
> folder on a tmpfs the recovery time goes from 33s to 28s, so that's a
> nice gain.

Thanks for testing.

> Do you have numbers with FreeBSD?  I get that this would be more
> difficult to set up without a GA release perhaps...

I don't have production build numbers, but a similar test to yours
went from 91s to 61s on a debug kernel in a virtual machine.

> I can also see the difference in profiles by looking for
> HandleStartupProcInterrupts which gets close 10% of the attention when
> unpatched, and down to 0.1% when patched.
>
> @@ -2484,6 +2484,8 @@ ClosePostmasterPorts(bool am_syslogger)
>      if (bonjour_sdref)
>          close(DNSServiceRefSockFD(bonjour_sdref));
>  #endif
> +
> +    PostmasterDeathInit();
>
> Thomas, trying to understand here...  Why this place for the signal
> initialization?  Wouldn't InitPostmasterChild() be a more logical place
> as we'd want to have this logic caught by all other processes?

Yeah, you're right.  Here's one like that.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Thomas Munro

Date:

27 June 2018, 08:26:39

On Wed, Apr 25, 2018 at 6:23 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Tue, Apr 24, 2018 at 7:37 PM, Michael Paquier <michael@paquier.xyz> wrote:
>> Thomas, trying to understand here...  Why this place for the signal
>> initialization?  Wouldn't InitPostmasterChild() be a more logical place
>> as we'd want to have this logic caught by all other processes?
>
> Yeah, you're right.  Here's one like that.

Amazingly, due to the way the project schedules fell and thanks to the
help of a couple of very supportive FreeBSD committers and reviewers,
the new PROC_PDEATHSIG_CTL kernel facility[1] landed in a production
release today, beating the Commitfest by several days.

My question is whether this patch set is enough, or people (Andres?) want
to go further and actually kill the postmaster death pipe completely.
My kqueue patch would almost completely kill it on BSDs as it happens,
but for Linux there are a number of races and complications to plug to
do that IIUC.  I don't immediately see what there is to gain by doing
that work (assuming we reuse WaitEventSet objects in enough places),
and we'll always need to maintain the pipe code for other OSes anyway.
So I'm -0.5 on doing that, even though the technical puzzle is
appealing...

[1]
https://www.freebsd.org/cgi/man.cgi?query=procctl&apropos=0&sektion=2&manpath=FreeBSD+11.2-RELEASE&arch=default&format=html

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Heikki Linnakangas

Date:

10 July 2018, 14:39:38

On 27/06/18 08:26, Thomas Munro wrote:
> On Wed, Apr 25, 2018 at 6:23 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Tue, Apr 24, 2018 at 7:37 PM, Michael Paquier <michael@paquier.xyz> wrote:
>>> Thomas, trying to understand here...  Why this place for the signal
>>> initialization?  Wouldn't InitPostmasterChild() be a more logical place
>>> as we'd want to have this logic caught by all other processes?
>>
>> Yeah, you're right.  Here's one like that.
> 
> Amazingly, due to the way the project schedules fell and thanks to the
> help of a couple of very supportive FreeBSD committers and reviewers,
> the new PROC_PDEATHSIG_CTL kernel facility[1] landed in a production
> release today, beating the Commitfest by several days.
> 
> My question is whether this patch set is enough, or people (Andres?) want
> to go further and actually kill the postmaster death pipe completely.
> My kqueue patch would almost completely kill it on BSDs as it happens,
> but for Linux there are a number of races and complications to plug to
> do that IIUC.  I don't immediately see what there is to gain by doing
> that work (assuming we reuse WaitEventSet objects in enough places),
> and we'll always need to maintain the pipe code for other OSes anyway.
> So I'm -0.5 on doing that, even though the technical puzzle is
> appealing...
> 
> [1]
https://www.freebsd.org/cgi/man.cgi?query=procctl&apropos=0&sektion=2&manpath=FreeBSD+11.2-RELEASE&arch=default&format=html

Yeah, I don't think we can kill the pipe completely. As long as we still 
need it for the other OSes, I don't see much point in eliminating it on 
Linux and BSDs either. I'd rather keep the code similar across platforms.

Looking at the patch:

The 'postmaster_possibly_dead' flag is not reset anywhere. So if a 
process receives a spurious death signal, even though postmaster is 
still alive, PostmasterIsAlive() will continue to use the slow path.

postmaster_possibly_dead needs to be marked as 'volatile', no?

The autoconf check for PR_SET_PDEATHSIG seems slightly misplaced. And I 
think we can simplify it with AC_CHECK_HEADER(). I'd also like to avoid 
adding code to c.h for this, that seems too global.

After some kibitzing, I ended up with the attached. It fixes the 
postmaster_possible_dead issues mentioned above, and moves around the 
autoconf and #ifdef logic a bit to make it a bit nicer, at least in my 
opinion. I don't have a FreeBSD machine at hand, so I didn't try fixing 
that patch.

- Heikki

Attachment

0001-Use-signals-for-postmaster-death-on-Linux-2.patch

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Thomas Munro

Date:

11 July 2018, 04:16:31

On Tue, Jul 10, 2018 at 11:39 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> The 'postmaster_possibly_dead' flag is not reset anywhere. So if a process
> receives a spurious death signal, even though postmaster is still alive,
> PostmasterIsAlive() will continue to use the slow path.

+1

> postmaster_possibly_dead needs to be marked as 'volatile', no?

+1

> The autoconf check for PR_SET_PDEATHSIG seems slightly misplaced. And I
> think we can simplify it with AC_CHECK_HEADER(). I'd also like to avoid
> adding code to c.h for this, that seems too global.

+1, much nicer, thanks.

> After some kibitzing, I ended up with the attached. It fixes the
> postmaster_possible_dead issues mentioned above, and moves around the
> autoconf and #ifdef logic a bit to make it a bit nicer, at least in my
> opinion.

Thanks, that looks good to me.  I added your name as co-author and
pushed to master.

I also made a couple of minor cosmetic changes in
PostmasterDeathSignalInit() to make the follow-up patch prettier (#if
defined() instead of #ifdef, and a signum variable because I later
need its address).

> I don't have a FreeBSD machine at hand, so I didn't try fixing that
> patch.

I updated the FreeBSD version to use the header test approach you
showed, and pushed that too.  FWIW the build farm has some FreeBSD
animals with and without PROC_PDEATHSIG_CTL.

I suppose it's possibly that we might want to reconsider the choice of
signal in the future (SIGINFO or SIGPWR).

(Random archeological note: TIL that Linux stole <sys/prctl.h> from
Irix (RIP), but it had PR_TERMCHILD instead of PR_SET_PRDEATHSIG.)

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Excessive PostmasterIsAlive calls slow down WAL redo

From

Heikki Linnakangas

Date:

11 July 2018, 11:10:14

On 11/07/18 04:16, Thomas Munro wrote:
> On Tue, Jul 10, 2018 at 11:39 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> I don't have a FreeBSD machine at hand, so I didn't try fixing that
>> patch.
> 
> I updated the FreeBSD version to use the header test approach you
> showed, and pushed that too.  FWIW the build farm has some FreeBSD
> animals with and without PROC_PDEATHSIG_CTL.

Thanks!

> I suppose it's possibly that we might want to reconsider the choice of
> signal in the future (SIGINFO or SIGPWR).

We could reuse SIGUSR1 for this. If we set the flag in SIGUSR1 handler, 
then some PostmasterIsAlive() calls would take the slow path 
unnecessarily, but it would probably be OK. The slow path isn't that 
slow. But using SIGINFO/SIGPWR seems fine.

- Heikki