Re: Checkpoint cost, looks like it is WAL/CRC - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Checkpoint cost, looks like it is WAL/CRC
Date
Msg-id 1122502691.3670.219.camel@localhost.localdomain
Whole thread Raw
In response to Re: Checkpoint cost, looks like it is WAL/CRC  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Checkpoint cost, looks like it is WAL/CRC
List pgsql-hackers
On Tue, 2005-07-26 at 19:15 -0400, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
> >> We should run tests with much higher wal_buffers numbers to nullify the
> >> effect described above and reduce contention. That way we will move
> >> towards the log disk speed being the limiting factor, patch or no patch.
> 
> > I've run such tests, at a glance they do seem to improve performance.   I 
> > need some time to collate the results.
> 
> With larger wal_buffers values it might also be interesting to take some
> measures to put a larger share of the WAL writing burden on the bgwriter.
> 
> Currently the bgwriter only writes out WAL buffers in two scenarios:
> 
> 1. It wants to write a dirty shared buffer that has LSN beyond the
> current WAL flush marker.  Just like any backend, the bgwriter must
> flush WAL as far as the LSN before writing the buffer.
> 
> 2. The bgwriter is completing a checkpoint.  It must flush WAL as far as
> the checkpoint record before updating pg_control.
> 
> It might be interesting to add some logic to explicitly check for and
> write out any full-but-unwritten WAL buffers during the bgwriter's
> main loop.
> 
> In a scenario with many small transactions, this is probably a waste of
> effort since backends will be forcing WAL write/flush any time they
> commit.  (This is why I haven't pursued the idea already.)  However,
> given a large transaction and adequate wal_buffer space, such a tactic
> should offload WAL writing work nicely.
> 
> I have no idea whether the DBT benchmarks would benefit at all, but
> given that they are affected positively by increasing wal_buffers,
> they must have a fair percentage of not-small transactions.

Yes, I was musing on that also. I think it would help keep response time
even, which seems to be the route to higher performance anyway. This is
more important in real world than in benchmarks, where a nice even
stream of commits arrives to save the day...

I guess I'd be concerned that the poor bgwriter can't do all of this
work. I was thinking about a separate log writer, so we could have both
bgwriter and logwriter active simultaneously on I/O. It has taken a
while to get bgwriter to perform its duties efficiently, so I'd rather
not give it so many that it performs them all badly.

The logwriter would be more of a helper, using LWLockConditionalAcquire
to see if the WALWriteLock was kept active. Each backend would still
perform its own commit write. (We could change that in the future, but
thats a lot more work.) We would only need one new GUC log_writer_delay,
defaulting to 50 ms (??) - if set to zero, the default, then we don't
spawn a logwriter daemon at all. (Perhaps we also need another one to
say how many blocks get written each time its active... but I'm not
hugely in favour of more parameters to get wrong).

That way we could take the LWLockConditionalAcquire on WALWriteLock out
of the top of XLogInsert, which was effectively doing that work.

I think this would also reduce the apparent need for high wal_buffer
settings - probably could get away with a lot less than the 2048 recent
performance results would suggest.

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Integrated autovacuum
Next
From: Michael Fuhr
Date:
Subject: Re: RESULT_OID Bug