Thread: Background LRU Writer/free list

Background LRU Writer/free list

From
Greg Smith
Date:
I'm mostly done with my review of the "Automatic adjustment of 
bgwriter_lru_maxpages" patch.  In addition to issues already brought up 
with that code, there are some small things that need to be done to merge 
it with the recent pg_stat_bgwriter patch, and I have some concerns about 
its unbounded scanning of the buffer pool; I'll write that up in more 
detail or just submit an improved patch as I get time this week.

But there's a fundamental question that has been bugging me, and I think 
it impacts the direction that code should take.  Unless I'm missing 
something in my reading, buffers written out by the LRU writer aren't ever 
put onto the free list.  I assume this is to stop from prematurely 
removing buffers that contain useful data.  In cases where a substantial 
percentage of the buffer cache is dirty, the LRU writer has to scan a 
significant portion of the pool looking for one of the rare clean buffers, 
then write it out.  When a client goes to grab a free buffer afterward, it 
has to scan the same section of the pool to find the now clean buffer, 
which seems redundant.

With the new patch, the LRU writer is fairly well bounded in that it 
doesn't write out more than it thinks it will need; you shouldn't get into 
a situation where many more pages are written than will be used in the 
near future.  Given that mindset, shouldn't pages the LRU scan writes just 
get moved onto the free list?

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: Background LRU Writer/free list

From
"Jim C. Nasby"
Date:
On Wed, Apr 18, 2007 at 09:09:11AM -0400, Greg Smith wrote:
> I'm mostly done with my review of the "Automatic adjustment of 
> bgwriter_lru_maxpages" patch.  In addition to issues already brought up 
> with that code, there are some small things that need to be done to merge 
> it with the recent pg_stat_bgwriter patch, and I have some concerns about 
> its unbounded scanning of the buffer pool; I'll write that up in more 
> detail or just submit an improved patch as I get time this week.
> 
> But there's a fundamental question that has been bugging me, and I think 
> it impacts the direction that code should take.  Unless I'm missing 
> something in my reading, buffers written out by the LRU writer aren't ever 
> put onto the free list.  I assume this is to stop from prematurely 
> removing buffers that contain useful data.  In cases where a substantial 
> percentage of the buffer cache is dirty, the LRU writer has to scan a 
> significant portion of the pool looking for one of the rare clean buffers, 
> then write it out.  When a client goes to grab a free buffer afterward, it 
> has to scan the same section of the pool to find the now clean buffer, 
> which seems redundant.
> 
> With the new patch, the LRU writer is fairly well bounded in that it 
> doesn't write out more than it thinks it will need; you shouldn't get into 
> a situation where many more pages are written than will be used in the 
> near future.  Given that mindset, shouldn't pages the LRU scan writes just 
> get moved onto the free list?

I've wondered the same thing myself.

If we're worried about freeing pages that we might want back, we could
change the code so that ReadBuffer would also look at the free list if
it couldn't find a page before going to the OS for it.

So if you make this change will BgBufferSync start incrementing
StrategyControl->nextVictimBuffer and decrementing buf->usage_count like
StrategyGetBuffer does now?
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


Re: Background LRU Writer/free list

From
Gregory Stark
Date:
"Greg Smith" <gsmith@gregsmith.com> writes:

> I'm mostly done with my review of the "Automatic adjustment of
> bgwriter_lru_maxpages" patch.  In addition to issues already brought up with
> that code, there are some small things that need to be done to merge it with
> the recent pg_stat_bgwriter patch, and I have some concerns about its unbounded
> scanning of the buffer pool; I'll write that up in more detail or just submit
> an improved patch as I get time this week.

I had a thought on this. Instead of sleeping for a constant amount of time and
then estimating the number of pages needed for that constant amount of time
perhaps what bgwriter should be doing is sleeping for a variable amount of
time and estimating the length of time it needs to sleep to arrive at a
constant number of pages being needed.

The reason I think this may be better is that "what percentage of the shared
buffers the bgwriter allows to get old between wakeups" seems more likely to
be a universal constant that people won't have to adjust than "fixed time
interval between bgwriter cleanup operations".

Just a thought.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Background LRU Writer/free list

From
Tom Lane
Date:
Greg Smith <gsmith@gregsmith.com> writes:
> With the new patch, the LRU writer is fairly well bounded in that it 
> doesn't write out more than it thinks it will need; you shouldn't get into 
> a situation where many more pages are written than will be used in the 
> near future.  Given that mindset, shouldn't pages the LRU scan writes just 
> get moved onto the free list?

This just seems like a really bad idea: throwing away data we might
want.  Furthermore, if the page was dirty, then it's probably been
accessed more recently than adjacent pages that are clean, so
preferentially zapping just-written pages seems backwards.
        regards, tom lane


Re: Background LRU Writer/free list

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> I had a thought on this. Instead of sleeping for a constant amount of time and
> then estimating the number of pages needed for that constant amount of time
> perhaps what bgwriter should be doing is sleeping for a variable amount of
> time and estimating the length of time it needs to sleep to arrive at a
> constant number of pages being needed.

That's an interesting idea, but a possible problem with it is that we
can't vary the granularity of a sleep time as finely as we can vary the
number of buffers processed per iteration.  Assuming that the system's
tick rate is the typical 100Hz, we have only 10ms resolution on sleep
times.

> The reason I think this may be better is that "what percentage of the shared
> buffers the bgwriter allows to get old between wakeups" seems more likely to
> be a universal constant that people won't have to adjust than "fixed time
> interval between bgwriter cleanup operations".

Why?  What you're really trying to determine, I think, is the I/O load
imposed by the bgwriter, and pages-per-second seems a pretty natural
way to think about that; percentage of shared buffers not so much.
        regards, tom lane


Re: Background LRU Writer/free list

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Why?  What you're really trying to determine, I think, is the I/O load
> imposed by the bgwriter, and pages-per-second seems a pretty natural
> way to think about that; percentage of shared buffers not so much.

What I'm saying is that pages/s will vary from system to system. Busier
systems will have higher i/o rates. So a system with a DBA on a system with a
higher rate will want to adjust the bgwriter sleep time lower than the DBA on
a system where bgwriter isn't doing much work.

In particular I'm worried about what happens on a very busy cpu-bound system
where adjusting the sleep times would result in it deciding to not sleep at
all. On such a system sleeping for even 10ms might be too long. But we
probably don't want to make the default even as low as 10ms.

Anyways, if we have a working patch that works the other way around we could
experiment with that and see if there are actual situations where sleeping for
0ms is necessary. Perhaps a mixture of the two approaches will be necessary
anyways because of the granularity issue.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Background LRU Writer/free list

From
Greg Smith
Date:
On Wed, 18 Apr 2007, Tom Lane wrote:

> Furthermore, if the page was dirty, then it's probably been accessed 
> more recently than adjacent pages that are clean, so preferentially 
> zapping just-written pages seems backwards.

The LRU background writer only writes out pages that have a usage_count of 
0, so they can't haven't been accessed too recently.  Assuming the buffer 
allocation rate continues its historical trend, these are the pages that 
are going to be written out and then allocated for something new one way 
or another in the next interval; the content is expected to be lost 
shortly no matter what.

As for preferring dirty pages over clean ones, on a re-read my question 
wasn't as clear as I wanted to be.  I think that clean pages near the 
strategy point should also be moved to the free list by the background 
writer.  You know clients are expected to require x buffers in the next y 
ms based on the history of the server (the new piece of information 
provided by the patch in the queue), and the LRU background writer is 
working in advance to make them available.  If you're doing all that, 
doesn't it make sense to finish the job by putting the pages on the free 
list, where the clients can grab them without running their own scan over 
the buffer cache?

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: Background LRU Writer/free list

From
Greg Smith
Date:
On Wed, 18 Apr 2007, Jim C. Nasby wrote:

> So if you make this change will BgBufferSync start incrementing
> StrategyControl->nextVictimBuffer and decrementing buf->usage_count like
> StrategyGetBuffer does now?

Something will need to keep advancing the nextVictimBuffer, I hadn't 
really finished implementation yet; I just wanted to get an idea if this 
was even feasible, or if there was some larger issue that made the whole 
idea moot.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: Background LRU Writer/free list

From
Greg Smith
Date:
On Wed, 18 Apr 2007, Gregory Stark wrote:

> In particular I'm worried about what happens on a very busy cpu-bound 
> system where adjusting the sleep times would result in it deciding to 
> not sleep at all. On such a system sleeping for even 10ms might be too 
> long... Anyways, if we have a working patch that works the other way 
> around we could experiment with that and see if there are actual 
> situations where sleeping for 0ms is necessary.

I've been waiting for 8.3 to settle down before packaging the prototype 
auto-tuning background writer concept I'm working on (you can peek at the 
code at http://www.westnet.com/~gsmith/content/postgresql/bufmgr.c ), 
which already implements some of the ideas you're talking about in your 
messages today.  I estimate how much of the buffer pool is dirty, use that 
to compute an expected I/O rate, and try to adjust parameters to meet a 
quality of service guarantee for how often the entire buffer pool is 
scanned.  This is one of those problems that gets more difficult the more 
you dig into it; with all that done I still feel like I'm only halfway 
finished and several parts worked radically different in reality than I 
expected them to.

If you're allowing the background writer to write 1000 pages at a clip, 
that's 8MB each interval.  Doing that every 200ms makes for an I/O rate of 
40MB/s.  In a system that cares about data integrity, you'll exceed the 
ability of the WAL to sustain page writes (which limits how fast you can 
dirty pages) long before the interval approaches 0ms.  What I do in my 
code is set the interval to 200ms, compute what the maximum pages to write 
must be, and if it's >1000 then I reduce the interval.  I've tested 
dumping into a fairly fast disk array with tons of cache and I've never 
been able to get useful throughput below an 80ms interval; the OS just 
clamps down and makes you wait for I/O instead regardless of how little 
you intended to sleep.  Eventually, it's got to hit disk, and you can only 
buffer for so long before that starts to slow you down.

Anyway, this is a tangent discussion.  The LRU patch that's in the queue 
doesn't really care if it runs with a short interval or a long one, 
because it automatically scales how much work it does according to how 
much time passed.  I think that many only be a bit of tweaking away from a 
solid solution.  Tuning the all scan, which is what you're talking about 
when you speak in terms of the statistics about the overall buffer pool, 
is a much harder job.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: Background LRU Writer/free list

From
Bruce Momjian
Date:
This has been saved for the 8.4 release:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Greg Smith wrote:
> I'm mostly done with my review of the "Automatic adjustment of 
> bgwriter_lru_maxpages" patch.  In addition to issues already brought up 
> with that code, there are some small things that need to be done to merge 
> it with the recent pg_stat_bgwriter patch, and I have some concerns about 
> its unbounded scanning of the buffer pool; I'll write that up in more 
> detail or just submit an improved patch as I get time this week.
> 
> But there's a fundamental question that has been bugging me, and I think 
> it impacts the direction that code should take.  Unless I'm missing 
> something in my reading, buffers written out by the LRU writer aren't ever 
> put onto the free list.  I assume this is to stop from prematurely 
> removing buffers that contain useful data.  In cases where a substantial 
> percentage of the buffer cache is dirty, the LRU writer has to scan a 
> significant portion of the pool looking for one of the rare clean buffers, 
> then write it out.  When a client goes to grab a free buffer afterward, it 
> has to scan the same section of the pool to find the now clean buffer, 
> which seems redundant.
> 
> With the new patch, the LRU writer is fairly well bounded in that it 
> doesn't write out more than it thinks it will need; you shouldn't get into 
> a situation where many more pages are written than will be used in the 
> near future.  Given that mindset, shouldn't pages the LRU scan writes just 
> get moved onto the free list?
> 
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Background LRU Writer/free list

From
Bruce Momjian
Date:
This has been saved for the 8.4 release:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Greg Smith wrote:
> I'm mostly done with my review of the "Automatic adjustment of 
> bgwriter_lru_maxpages" patch.  In addition to issues already brought up 
> with that code, there are some small things that need to be done to merge 
> it with the recent pg_stat_bgwriter patch, and I have some concerns about 
> its unbounded scanning of the buffer pool; I'll write that up in more 
> detail or just submit an improved patch as I get time this week.
> 
> But there's a fundamental question that has been bugging me, and I think 
> it impacts the direction that code should take.  Unless I'm missing 
> something in my reading, buffers written out by the LRU writer aren't ever 
> put onto the free list.  I assume this is to stop from prematurely 
> removing buffers that contain useful data.  In cases where a substantial 
> percentage of the buffer cache is dirty, the LRU writer has to scan a 
> significant portion of the pool looking for one of the rare clean buffers, 
> then write it out.  When a client goes to grab a free buffer afterward, it 
> has to scan the same section of the pool to find the now clean buffer, 
> which seems redundant.
> 
> With the new patch, the LRU writer is fairly well bounded in that it 
> doesn't write out more than it thinks it will need; you shouldn't get into 
> a situation where many more pages are written than will be used in the 
> near future.  Given that mindset, shouldn't pages the LRU scan writes just 
> get moved onto the free list?
> 
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Background LRU Writer/free list

From
Bruce Momjian
Date:
Added to TODO:

* Consider adding buffers the BGW finds reusable to the free list
 http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php

* Automatically tune bgwriter_delay based on activity rather then using a fixed interval
 http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php



---------------------------------------------------------------------------

Greg Smith wrote:
> I'm mostly done with my review of the "Automatic adjustment of 
> bgwriter_lru_maxpages" patch.  In addition to issues already brought up 
> with that code, there are some small things that need to be done to merge 
> it with the recent pg_stat_bgwriter patch, and I have some concerns about 
> its unbounded scanning of the buffer pool; I'll write that up in more 
> detail or just submit an improved patch as I get time this week.
> 
> But there's a fundamental question that has been bugging me, and I think 
> it impacts the direction that code should take.  Unless I'm missing 
> something in my reading, buffers written out by the LRU writer aren't ever 
> put onto the free list.  I assume this is to stop from prematurely 
> removing buffers that contain useful data.  In cases where a substantial 
> percentage of the buffer cache is dirty, the LRU writer has to scan a 
> significant portion of the pool looking for one of the rare clean buffers, 
> then write it out.  When a client goes to grab a free buffer afterward, it 
> has to scan the same section of the pool to find the now clean buffer, 
> which seems redundant.
> 
> With the new patch, the LRU writer is fairly well bounded in that it 
> doesn't write out more than it thinks it will need; you shouldn't get into 
> a situation where many more pages are written than will be used in the 
> near future.  Given that mindset, shouldn't pages the LRU scan writes just 
> get moved onto the free list?
> 
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +