Home > mailing lists

Re: WALWriteLock contention - Mailing list pgsql-hackers

From	Jeff Janes
Subject	Re: WALWriteLock contention
Date	May 18, 2015 17:57:33
Msg-id	CAMkU=1w7nwz89FQWhbetDgOctjxOSBvRo0hDg+6mCSmCA4B1iA@mail.gmail.com Whole thread
In response to	Re: WALWriteLock contention (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

>
> My goal there was to further improve group commit. When running pgbench
> -j10 -c10, it was common to see fsyncs that alternated between flushing 1
> transaction, and 9 transactions. Because the first one to the gate would go
> through it and slam it on all the others, and it would take one fsync cycle
> for it reopen.

Hmm, yeah. I remember somewhat (Peter Geoghegan, I think) mentioning
behavior like that before, but I had not made the connection to this
issue at that time. This blog post is pretty depressing:

http://oldblog.antirez.com/post/fsync-different-thread-useless.html

It suggests that an fsync in progress blocks out not only other
fsyncs, but other writes to the same file, which for our purposes is
just awful. More Googling around reveals that this is apparently
well-known to Linux kernel developers and that they don't seem excited
about fixing it. :-(

I think they already did. I don't see the effect in ext4, even on a rather old kernel like 2.6.32, using the code from the link above.

<crazy-idea>I wonder if we could write WAL to two different files in
alternation, so that we could be writing to one file which fsync-ing
the other.</crazy-idea>

I thought the most promising things, once there were timers and sleeps with resolution much better than centisecond, was to record the time at which each fsync finished, and then sleep until "then + commit_delay". That way you don't do any harm to the sleeper, as the write head is not positioned to process the fsync until then anyway, and give other workers the chance to get their commit records in.

But then I kind of lost interest, because anyone who cares very much about commit performance will probably get a nonvolatile write cache, and anything done would be too hardware/platform dependent.

Of course a BBU isn't magic, the kernel still has to spend time scrubbing the buffer pool and sending the dirty ones to the disk/controller when it gets an fsync, even if the confirmation does come back quickly. But it still seems too hardware/platform dependent to find a general purpose optimization.

Cheers,

Jeff

pgsql-hackers by date:

From: Peter Geoghegan
Date: 18 May 2015, 17:43:29
Subject: Re: jsonb concatenate operator's semantics seem questionable

From: Andrew Dunstan
Date: 18 May 2015, 18:16:34
Subject: Re: jsonb concatenate operator's semantics seem questionable

Re: WALWriteLock contention - Mailing list pgsql-hackers

Previous

Next