Re: [HACKERS] subscription worker signalling wal writer too much - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [HACKERS] subscription worker signalling wal writer too much
Date
Msg-id 20170614232922.igl2qhdeqdp77niq@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] subscription worker signalling wal writer too much  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: [HACKERS] subscription worker signalling wal writer too much  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On 2017-06-14 16:24:27 -0700, Jeff Janes wrote:
> On Wed, Jun 14, 2017 at 3:20 PM, Andres Freund <andres@anarazel.de> wrote:
> 
> > On 2017-06-14 15:08:49 -0700, Jeff Janes wrote:
> > > On Wed, Jun 14, 2017 at 11:55 AM, Jeff Janes <jeff.janes@gmail.com>
> > wrote:
> > >
> > > > If I publish a pgbench workload and subscribe to it, the subscription
> > > > worker is signalling the wal writer thousands of times a second, once
> > for
> > > > every async commit.  This has a noticeable performance cost.
> > > >
> > >
> > > I've used a local variable to avoid waking up the wal writer more than
> > once
> > > for the same page boundary.  This reduces the number of wake-ups by about
> > > 7/8.
> >
> > Maybe I'm missing something here, but isn't that going to reduce our
> > guarantees about when asynchronously committed xacts are flushed out?
> > You can easily fit a number of commits into the same page...   As this
> > isn't specific to logical-rep, I don't think that's ok.
> >
> 
> The guarantee is based on wal_writer_delay not on SIGUSR1, so I don't think
> this changes that. (Also, it isn't really a guarantee, the fsync can take
> many seconds to complete once we do initiate it, and there is absolutely
> nothing we can do about that, other than do the fsync synchronously in the
> first place).

Well, wal_writer_delay doesn't work if walwriter is in sleep mode, and
this afaics would allow wal writer to go into sleep mode with half a
page filled, and it'd not be woken up again until the page is filled.
No?


> > Have you chased down why there's that many wakeups?  Normally I'd have
> > expected that a number of the SetLatch() calls get consolidated
> > together, but I guess walwriter is "too quick" in waking up and
> > resetting the latch?

> I'll have to dig into that some more.  The 7/8 reduction I cited was just
> in calls to SetLatch from that part of the code, I didn't measure whether
> the SetLatch actually called kill(owner_pid, SIGUSR1) or not when I
> determined that reduction, so it wasn't truly wake ups I measured.  Actual
> wake ups were measured only indirectly via the impact on performance.  I'll
> need to figure out how to instrument that without distorting the
> performance too much in the process..

I'd suspect that just measuring the number of kill() calls should be
doable, if measured via perf or something like hta.t


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: [HACKERS] subscription worker signalling wal writer too much
Next
From: "David G. Johnston"
Date:
Subject: Re: [HACKERS] logical replication: \dRp+ and "for all tables"