Re: [HACKERS] subscription worker signalling wal writer too much - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [HACKERS] subscription worker signalling wal writer too much |
Date | |
Msg-id | 20170625000952.552krzrphu74mzzf@alap3.anarazel.de Whole thread Raw |
In response to | Re: [HACKERS] subscription worker signalling wal writer too much (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: [HACKERS] subscription worker signalling wal writer too much
|
List | pgsql-hackers |
Hi, On 2017-06-15 15:06:43 -0700, Jeff Janes wrote: > > Well, wal_writer_delay doesn't work if walwriter is in sleep mode, and > > this afaics would allow wal writer to go into sleep mode with half a > > page filled, and it'd not be woken up again until the page is filled. > > No? > > > > If it is taking the big sleep, then we always wake it up, since > acd4c7d58baf09f. > > I didn't change that part. I only changed what we do if it not hibernating. Right, I hadn't read enough of the context. > It looks like only limited consolidation was happening, the number of kills > invoked was more than half of the number of SetLatch. I think the reason > is what when kill(owner_pid, SIGUSR1) is called, the kernel immediately > suspends the calling process and gives the signalled process a chance to > run in that time slice. The Wal Writer gets woken up, sees that it only > has a partial page to write and decides not to do anything, but resets the > latch. It does this fast enough that the subscription worker hasn't > migrated CPUs yet. I think part of the problem here is that latches signal the owning process even if the owning process isn't actually sleeping - that's obviously quite pointless. In cases where walwriter is busy, that actually causes a lot of superflous interrupted syscalls, because it actually never ends up waiting on the latch. There's also a lot of superflous context switches. I think we can avoid doing so quite easily, as e.g. with the attached patch. Could you check how much that helps your benchmark? > The first change made the WALWriter wake up and do a write and flush > whenever an async commit occurred and there was a completed WAL page to be > written. This way the hint bits could be set while the heap page was still > hot, because they couldn't get set until the WAL covering the hinted-at > transaction commit is flushed. > > The second change said we can set hint bits before the WAL covering the > hinted-at transaction is flushed, as long the page LSN is newer than that > (so the wal covering the hinted-at transaction commit must be flushed > before the page containing the hint bit can be written). > > Then the third change makes the wal writer only write the full pages most > of the times it is woken up, not flush them. It only flushes them once > every wal_writer_delay or wal_writer_flush_after, regardless of how often > it is woken up. > > It seems like the third change rendered the first one useless. I don't think so. Isn't the walwriter writing out the contents of the WAL is quite important because it makes room in wal_buffers for new WAL? I suspect we actually should wake up wal-writer *more* aggressively, to offload wal fsyncs from individual backends, even when they're not committing. Right now we fsync whenever a segment is finished - we really don't want to do that in backends that could do other useful work. I suspect it'd be a good idea to remove that logic from XLogWrite() and replace it with if (ProcGlobal->walwriterLatch) SetLatch(ProcGlobal->walwriterLatch); > Wouldn't it > better, and much simpler, just to have reverted the first change once the > second one was done? I think both can actually happen independently, no? It's pretty common for the page lsn to be *older* than the corresponding commit. In fact you'll always hit that case unless there's also concurrent writes also touching said page. > If that were done, would the third change still be > needed? (It did seem to add some other features as well, so I'm going to > assume we still want those...). It's still useful, yes. It avoids flushing the WAL too often (page-by-page when there's a lot of writes), which can eat up a lot of IOPS on fast storage. > But now the first change is even worse than useless, it is positively > harmful. The only thing to stop it from waking the WALWriter for every > async commit is this line: > > /* if we have already flushed that far, we're done */ > if (WriteRqstPtr <= LogwrtResult.Flush) > return; > > Since LogwrtResult.Flush doesn't advance anymore, this condition doesn't > becomes false due to us waking walwriter, it only becomes false once the > timeout expires (which it would have done anyway with no help from us), or > once wal_writer_flush_after is exceeded. So now every async commit is > waking the walwriter. Most of those wake up do nothing (there is not a > completely new patch to write), some write one completed page but don't > flush anything, and very few do a flush, and that one would have been done > anyway. s/completely new patch/completely new page/? In my opinion we actually *do* want to write (but not flush!) out such pages, so I'm not sure I agree with that logic. Have to think about this some more... Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: