Thread: ACM Paper relevant to our buffer algorithm

ACM Paper relevant to our buffer algorithm

From
Gregory Stark
Date:
Incidentally I found this paper in ACM SIGMETRICS 1992 covering more or less
precisely the same algorithm we're using for our clock sweep. I haven't quite
digested it yet myself so I'm not sure what the conclusions about weights tell
us to do with our buffer usage counter.

I put a copy up for download since even though it's permitted to copy I don't
know how to find a public link. It's kind of big so please download it and
read it locally, don't try to read it from my machine:

http://stark.xeocode.com/~stark/p35-nicola.pdf.gz

Regarding copyright the paper sez:

Permission to copy without fee all or part of this material is
granted provided that the copies are not made or distributed for
direct commercial advantage, the ACM copyright notice and the
title of the publication and its date appear, and notice is given
that copying is by permission of ths Association for Computing
Machinery. To copy otherwise, or to republish, requires a fee
and/or specific permission.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: ACM Paper relevant to our buffer algorithm

From
Greg Smith
Date:
Here are some more recent papers that also give good insight into research 
in this area:

http://www.cs.usask.ca/~wew036/comprehensive.pdf
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-05-3.pdf

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: ACM Paper relevant to our buffer algorithm

From
Heikki Linnakangas
Date:
Greg Smith wrote:
> Here are some more recent papers that also give good insight into 
> research in this area:
> 
> http://www.cs.usask.ca/~wew036/comprehensive.pdf
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-05-3.pdf

Nice papers.

What I'd like to see is a paper on precleaning strategies. I tried to 
google for one but couldn't find any. That would be very relevant to the 
bgwriter tuning discussions.

I'm still struggling to understand why and how bgwriter increases 
performance. Under what circumstances, what workload?

The only benefit I can see is that it moves the write() of a page out of 
the critical path. But as long as the OS cache can absorb the write, it 
should be very cheap compared to doing real I/O. Apparently the workload 
that benefits most is an OLTP workload where response times are 
critical, on a database that doesn't fit in share_buffers, but fits in 
OS cache.

Am I missing something? What kind of a test would show the most benefit?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: ACM Paper relevant to our buffer algorithm

From
Martijn van Oosterhout
Date:
On Wed, Jul 04, 2007 at 11:09:19AM +0100, Heikki Linnakangas wrote:
> The only benefit I can see is that it moves the write() of a page out of
> the critical path. But as long as the OS cache can absorb the write, it
> should be very cheap compared to doing real I/O. Apparently the workload
> that benefits most is an OLTP workload where response times are
> critical, on a database that doesn't fit in share_buffers, but fits in
> OS cache.

I thought the point was to make checkpoints cheaper. Sure, the OS can
probably absorb the write() but you execute a fsync() shortly after so
you're going to block on that. The point being that by executing the
writes earlier you can get some of the writing done in the bakcground
prior to the fsync.

So it would be targetting people with lots of dirty shared buffers
where a checkpoint is going to eat your I/O bandwidth. An fsync will
make the OS write everything ASAP.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: ACM Paper relevant to our buffer algorithm

From
Heikki Linnakangas
Date:
Martijn van Oosterhout wrote:
> On Wed, Jul 04, 2007 at 11:09:19AM +0100, Heikki Linnakangas wrote:
>> The only benefit I can see is that it moves the write() of a page out of 
>> the critical path. But as long as the OS cache can absorb the write, it 
>> should be very cheap compared to doing real I/O. Apparently the workload 
>> that benefits most is an OLTP workload where response times are 
>> critical, on a database that doesn't fit in share_buffers, but fits in 
>> OS cache.
> 
> I thought the point was to make checkpoints cheaper. Sure, the OS can
> probably absorb the write() but you execute a fsync() shortly after so
> you're going to block on that. The point being that by executing the
> writes earlier you can get some of the writing done in the bakcground
> prior to the fsync.

That was the purpose of the bgwriter "all"-scan. It was removed as part 
of the load distributed checkpoints patch, as it's not needed anymore.

What's the purpose of the lru-scan?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: ACM Paper relevant to our buffer algorithm

From
Gregory Stark
Date:
Heikki Linnakangas <heikki@enterprisedb.com> writes:

> Greg Smith wrote:
> > Here are some more recent papers that also give good insight into research in
> > this area:
> > http://www.cs.usask.ca/~wew036/comprehensive.pdf
> > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-05-3.pdf
> 
> Nice papers.
> 
> What I'd like to see is a paper on precleaning strategies. I tried to google
> for one but couldn't find any. That would be very relevant to the bgwriter
> tuning discussions.

I saw some listed but they were for VM managers so they may not be perfectly
adaptable.

> I'm still struggling to understand why and how bgwriter increases performance.
> Under what circumstances, what workload?
> 
> The only benefit I can see is that it moves the write() of a page out of the
> critical path. But as long as the OS cache can absorb the write, it should be
> very cheap compared to doing real I/O. 

Well you can't keep writing indefinitely faster than the i/o subsystem can
execute the writes. Eventually the kernel will block your write until a kernel
buffer becomes free. Ie, throttle your writes to the actual write bandwidth
available.

I think most systems are limited by the latency of reads though. Reads can't
be re-ordered as well as writes can, they happen synchronously as far as an
individual backend is concerned.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: ACM Paper relevant to our buffer algorithm

From
Tom Lane
Date:
Gregory Stark <gsstark@mit.edu> writes:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> I'm still struggling to understand why and how bgwriter increases performance.
>> Under what circumstances, what workload?
>> 
>> The only benefit I can see is that it moves the write() of a page out of the
>> critical path. But as long as the OS cache can absorb the write, it should be
>> very cheap compared to doing real I/O. 

> Well you can't keep writing indefinitely faster than the i/o subsystem can
> execute the writes. Eventually the kernel will block your write until a kernel
> buffer becomes free. Ie, throttle your writes to the actual write bandwidth
> available.

Right.  Also, a buffer write isn't "merely" a kernel call --- for
instance, you might have to flush some more WAL before you can execute
the write, and there are cases where you'd have to fsync the write
yourself (ie, if you can't pass it off to the bgwriter).  The more of
that we can take out of foreground query paths, the better.
        regards, tom lane


Re: ACM Paper relevant to our buffer algorithm

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <gsstark@mit.edu> writes:
>> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>>> I'm still struggling to understand why and how bgwriter increases performance.
>>> Under what circumstances, what workload?
>>> 
>>> The only benefit I can see is that it moves the write() of a page out of the
>>> critical path. But as long as the OS cache can absorb the write, it should be
>>> very cheap compared to doing real I/O. 
>
>> Well you can't keep writing indefinitely faster than the i/o subsystem can
>> execute the writes. Eventually the kernel will block your write until a kernel
>> buffer becomes free. Ie, throttle your writes to the actual write bandwidth
>> available.
>
> Right.  Also, a buffer write isn't "merely" a kernel call --- for
> instance, you might have to flush some more WAL before you can execute
> the write, and there are cases where you'd have to fsync the write
> yourself (ie, if you can't pass it off to the bgwriter).  The more of
> that we can take out of foreground query paths, the better.

So it sounds like a good place to start to try to benchmark something where
bgwriter helps might be a setup which starts with a very large table (like
1-10M rows) and each transaction deletes a large number of random tuples (~ 1k
rows). Possibly waiting briefly before committing to give a chance for the
dirty pages to be needed before commit flushes the wal.

That way each transaction dirties a large number of pages and the next
transaction is likely to need a fresh page and find one which has been dirtied
and not had its wal record flushed. Deletes mean not much wal will be
generated so wal won't be a bottleneck and you won't get checkpoints due to
checkpoint_segments being reached.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com