Thread: RC2 and open issues

RC2 and open issues

From
Bruce Momjian
Date:
We are now packaging RC2.  If nothing comes up after RC2 is released, we
can move to final release.

The open items list is attached.  The doc changes can be easily
completed before final.  The only code issue left is with bgwriter.  We
always knew we needed to find better defaults for its parameters, but we
are only now finding more fundamental issues.

I think the summary I have seen recently pegs it right --- our use of %
of dirty buffers requires a scan of the entire buffer cache, and the
current delay of bgwriter is too high, but we can't lower it because the
buffer cache scan will become too expensive if done too frequently.

I think the ideal solution would be to remove bgwriter_percent or change
it to be a percentage of all buffers, not just dirty buffers, so we
don't have to scan the entire list.  If we set the new value to 10% with
a delay of 1 second, and the bgwriter remembers the place it stopped
scanning the buffer cache, you will clean out the buffer cache
completely every 10 seconds.

Right now it seems no one can find proper values.  We were clear that
this was an issue but it is bad news that we are only addressing it
during RC.

The 8.1 solution is to have some feedback system so writes by individual
backends cause the bgwriter to work more frequently.

The big question is what to do during RC2?  Do we just leave it as
suboptimal knowing we will revisit it in 8.1 or try an incremental
solution for 8.0 that might work better.

We have to decide now.

---------------------------------------------------------------------------
                              PostgreSQL 8.0 Open Items                              =========================

Current version at http://candle.pha.pa.us/cgi-bin/pgopenitems.

Changes
-------
* change bgwriter buffer scan behavior?
* adjust bgwriter defaults

Documentation
-------------
* synchonize supported encodings and docs
* improve external interfaces documentation section
* manual pages

Fixed Since Last Beta
---------------------

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: RC2 and open issues

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I think the ideal solution would be to remove bgwriter_percent or change
> it to be a percentage of all buffers, not just dirty buffers, so we
> don't have to scan the entire list.  If we set the new value to 10% with
> a delay of 1 second, and the bgwriter remembers the place it stopped
> scanning the buffer cache, you will clean out the buffer cache
> completely every 10 seconds.

But we don't *want* it to clean out the buffer cache completely.
There's no point in writing a "hot" page every few seconds.  So I don't
think I believe in remembering where we stopped anyway.

I think there's a reasonable case to be made for redefining
bgwriter_percent as the max percent of the total buffer list to scan
(not the max percent of the list to return --- Jan correctly pointed out
that the latter is useless).  Then we could modify
StrategyDirtyBufferList so that the percent and maxpages parameters are
passed in, so it can stop as soon as either one is satisfied.  This
would be a fairly small/safe code change and I wouldn't have a problem
doing it even at this late stage of the cycle.

Howeve ... we would have to crank up the default bgwriter_percent,
and I don't know if we have any better idea what to set it to after
such a change than we do now ...
        regards, tom lane


Re: RC2 and open issues

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I think the ideal solution would be to remove bgwriter_percent or change
> > it to be a percentage of all buffers, not just dirty buffers, so we
> > don't have to scan the entire list.  If we set the new value to 10% with
> > a delay of 1 second, and the bgwriter remembers the place it stopped
> > scanning the buffer cache, you will clean out the buffer cache
> > completely every 10 seconds.
> 
> But we don't *want* it to clean out the buffer cache completely.

You are only cleaning out in pieces over a 10 second period so it is
getting dirty.  You are not scanning the entire buffer at one time.

> There's no point in writing a "hot" page every few seconds.  So I don't
> think I believe in remembering where we stopped anyway.

I was thinking if you are doing this scanning every X milliseconds then
after a while the front of the buffer cache will be mostly clean and the
end will be dirty so you will always be going over the same early ones
to get to the later dirty ones.  Remembering the location gives the scan
more uniform coverage of the buffer cache.

You need a "clock sweep" like BSD uses (and probably others).

> I think there's a reasonable case to be made for redefining
> bgwriter_percent as the max percent of the total buffer list to scan
> (not the max percent of the list to return --- Jan correctly pointed out
> that the latter is useless).  Then we could modify
> StrategyDirtyBufferList so that the percent and maxpages parameters are
> passed in, so it can stop as soon as either one is satisfied.  This
> would be a fairly small/safe code change and I wouldn't have a problem
> doing it even at this late stage of the cycle.
> 
> Howeve ... we would have to crank up the default bgwriter_percent,
> and I don't know if we have any better idea what to set it to after
> such a change than we do now ...

Once we make the change we will have to get our testers working on it. 
We need those figure to change over time based on backends doing writes
but ath isn't going to happen for 8.0.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: RC2 and open issues

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> You need a "clock sweep" like BSD uses (and probably others).

No, that's *fundamentally* wrong.

The reason we are going to the trouble of maintaining a complicated
cache algorithm like ARC is so that we can tell the heavily used pages
from the lesser used ones.  To throw away that knowledge in favor of
doing I/O with a plain clock sweep algorithm is just wrong.

What's more, I don't even understand what clock sweep would mean given
that the ordering of the list is constantly changing.
        regards, tom lane


Re: RC2 and open issues

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I am confused.  If we change the percentage to be X% of the entire
> buffer cache, and we set it to 1%, and we exit when either the dirty
> pages or % are reached, don't we end up just scanning the first 1% of
> the cache over and over again?

Exactly.  But 1% would be uselessly small with this definition.  Offhand
I'd think something like 50% might be a starting point; maybe even more.
What that says is that a page isn't a candidate to be written out by the
bgwriter until it's fallen halfway down the LRU list.
        regards, tom lane


Re: RC2 and open issues

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I am confused.  If we change the percentage to be X% of the entire
> > buffer cache, and we set it to 1%, and we exit when either the dirty
> > pages or % are reached, don't we end up just scanning the first 1% of
> > the cache over and over again?
> 
> Exactly.  But 1% would be uselessly small with this definition.  Offhand
> I'd think something like 50% might be a starting point; maybe even more.
> What that says is that a page isn't a candidate to be written out by the
> bgwriter until it's fallen halfway down the LRU list.

So we are not scanning by buffer address but using the LRU list?  Are we
sure they are mostly dirty?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: RC2 and open issues

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> Exactly.  But 1% would be uselessly small with this definition.  Offhand
>> I'd think something like 50% might be a starting point; maybe even more.
>> What that says is that a page isn't a candidate to be written out by the
>> bgwriter until it's fallen halfway down the LRU list.

> So we are not scanning by buffer address but using the LRU list?  Are we
> sure they are mostly dirty?

No.  The entire point is to keep the LRU end of the list mostly clean.

Now that you mention it, it might be interesting to try the approach of
doing a clock scan on the buffer array and ignoring the ARC lists
entirely.  That would be a fundamentally different way of envisioning
what the bgwriter is supposed to do, though.  I think the main reason
Jan didn't try that was he wanted to be sure the LRU page was usually
clean so that backends would seldom end up doing writes for themselves
when they needed to get a free buffer.

Maybe we need a hybrid approach: clean a few percent of the LRU end of
the ARC list in order to keep backends from blocking on writes, plus run
a clock scan to keep checkpoints from having to do much.  But that's way
beyond what we have time for in the 8.0 cycle.
        regards, tom lane


Re: RC2 and open issues

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> Exactly.  But 1% would be uselessly small with this definition.  Offhand
> >> I'd think something like 50% might be a starting point; maybe even more.
> >> What that says is that a page isn't a candidate to be written out by the
> >> bgwriter until it's fallen halfway down the LRU list.
> 
> > So we are not scanning by buffer address but using the LRU list?  Are we
> > sure they are mostly dirty?
> 
> No.  The entire point is to keep the LRU end of the list mostly clean.
> 
> Now that you mention it, it might be interesting to try the approach of
> doing a clock scan on the buffer array and ignoring the ARC lists
> entirely.  That would be a fundamentally different way of envisioning
> what the bgwriter is supposed to do, though.  I think the main reason
> Jan didn't try that was he wanted to be sure the LRU page was usually
> clean so that backends would seldom end up doing writes for themselves
> when they needed to get a free buffer.
> 
> Maybe we need a hybrid approach: clean a few percent of the LRU end of
> the ARC list in order to keep backends from blocking on writes, plus run
> a clock scan to keep checkpoints from having to do much.  But that's way
> beyond what we have time for in the 8.0 cycle.

OK, so we scan from the end of the LRU.  If we scan X% and find _no_
dirty buffers perhaps we should start where we left off last time.

If we don't start where we left off, I am thinking if you do a lot of
writes then do nothing, the next checkpoint would be huge because a lot
of the LRU will be dirty because the bgwriter never got to it.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: RC2 and open issues

From
Gavin Sherry
Date:
On Mon, 20 Dec 2004, Tom Lane wrote:

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> Exactly.  But 1% would be uselessly small with this definition.  Offhand
> >> I'd think something like 50% might be a starting point; maybe even more.
> >> What that says is that a page isn't a candidate to be written out by the
> >> bgwriter until it's fallen halfway down the LRU list.
>
> > So we are not scanning by buffer address but using the LRU list?  Are we
> > sure they are mostly dirty?
>
> No.  The entire point is to keep the LRU end of the list mostly clean.
>
> Now that you mention it, it might be interesting to try the approach of
> doing a clock scan on the buffer array and ignoring the ARC lists
> entirely.  That would be a fundamentally different way of envisioning
> what the bgwriter is supposed to do, though.  I think the main reason
> Jan didn't try that was he wanted to be sure the LRU page was usually
> clean so that backends would seldom end up doing writes for themselves
> when they needed to get a free buffer.

Neil and I spoke with Jan briefly last week and he mentioned a few
different approaches he'd been tossing over. Firstly, for alternative
runs, start X% on from the LRU, so that we aren't scanning clean buffers
all the time. Secondly, follow something like the approach you've
mentioned above but remember the offset. So, if we're scanning 10%, after
10 runs we will have written out all buffers.

I was also thinking of benchmarking the effect of changing the algorithm
in StrategyDirtyBufferList(): currently, for each iteration of the loop we
read a buffer from each of T1 and T2. I was wondering what effect reading
T1 first then T2 and vice versa would have on performance. I haven't
thought about this too hard, though, so it might be wrong headed.


>
> Maybe we need a hybrid approach: clean a few percent of the LRU end of
> the ARC list in order to keep backends from blocking on writes, plus run
> a clock scan to keep checkpoints from having to do much.  But that's way
> beyond what we have time for in the 8.0 cycle.

Definately.

>
>             regards, tom lane


Thanks,

Gavin


Re: RC2 and open issues

From
Bruce Momjian
Date:
Gavin Sherry wrote:
> Neil and I spoke with Jan briefly last week and he mentioned a few
> different approaches he'd been tossing over. Firstly, for alternative
> runs, start X% on from the LRU, so that we aren't scanning clean buffers
> all the time. Secondly, follow something like the approach you've
> mentioned above but remember the offset. So, if we're scanning 10%, after
> 10 runs we will have written out all buffers.
> 
> I was also thinking of benchmarking the effect of changing the algorithm
> in StrategyDirtyBufferList(): currently, for each iteration of the loop we
> read a buffer from each of T1 and T2. I was wondering what effect reading
> T1 first then T2 and vice versa would have on performance. I haven't
> thought about this too hard, though, so it might be wrong headed.

So we are all thinking in the same direction.  We might have only a few
days to finalize this before final release.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: RC2 and open issues

From
Tom Lane
Date:
Gavin Sherry <swm@linuxworld.com.au> writes:
> I was also thinking of benchmarking the effect of changing the algorithm
> in StrategyDirtyBufferList(): currently, for each iteration of the loop we
> read a buffer from each of T1 and T2. I was wondering what effect reading
> T1 first then T2 and vice versa would have on performance.

Looking at StrategyGetBuffer, it definitely seems like a good idea to
try to keep the bottom end of both T1 and T2 lists clean.  But we should
work at T1 a bit harder.

The insight I take away from today's discussion is that there are two
separate goals here: try to keep backends that acquire a buffer via
StrategyGetBuffer from being fed a dirty buffer they have to write,
and try to keep the next upcoming checkpoint from having too much work
to do.  Those are both laudable goals but I hadn't really seen before
that they may require different strategies to achieve.  I'm liking the
idea that bgwriter should alternate between doing writes in pursuit of
the one goal and doing writes in pursuit of the other.
        regards, tom lane


Re: RC2 and open issues

From
"Zeugswetter Andreas DAZ SD"
Date:
> If we don't start where we left off, I am thinking if you do a lot of
> writes then do nothing, the next checkpoint would be huge because a lot
> of the LRU will be dirty because the bgwriter never got to it.

I think the problem is, that we don't see wether a "read hot"
page is also "write hot". We would want to write dirty "read hot" pages,
but not "write hot" pages. It does not make sense to write a "write hot"
page since it will be dirty again when the checkpoint comes.

Andreas


Bgwriter behavior

From
Bruce Momjian
Date:
Tom Lane wrote:
> Gavin Sherry <swm@linuxworld.com.au> writes:
> > I was also thinking of benchmarking the effect of changing the algorithm
> > in StrategyDirtyBufferList(): currently, for each iteration of the loop we
> > read a buffer from each of T1 and T2. I was wondering what effect reading
> > T1 first then T2 and vice versa would have on performance.
> 
> Looking at StrategyGetBuffer, it definitely seems like a good idea to
> try to keep the bottom end of both T1 and T2 lists clean.  But we should
> work at T1 a bit harder.
> 
> The insight I take away from today's discussion is that there are two
> separate goals here: try to keep backends that acquire a buffer via
> StrategyGetBuffer from being fed a dirty buffer they have to write,
> and try to keep the next upcoming checkpoint from having too much work
> to do.  Those are both laudable goals but I hadn't really seen before
> that they may require different strategies to achieve.  I'm liking the
> idea that bgwriter should alternate between doing writes in pursuit of
> the one goal and doing writes in pursuit of the other.

It seems we have added a new limitation to bgwriter by not doing a full
scan.  With a full scan we could easily grab the first X pages starting
from the end of the LRU list and write them.  By not scanning the full
list we are opening the possibility of not seeing some of the front-most
LRU dirty pages.  And the full scan was removed so we can run bgwriter
more frequently, but we might end up with other problems.

I have a new proposal.  The idea is to cause bgwriter to increase its
frequency based on how quickly it finds dirty pages.

First, we remove the GUC bgwriter_maxpages because I don't see a good
way to set a default for that.  A default value needs to be based on a
percentage of the full buffer cache size.  Second, we make
bgwriter_percent cause the bgwriter to stop its scan once it has found a
number of dirty buffers that matches X% of the buffer cache size.  So,
if it is set to 5%, the bgwriter scan stops once it find enough dirty
buffers to equal 5% of the buffer cache size. 

Bgwriter continues to scan starting from the end of the LRU list, just
like it does now.

Now, to control the bgwriter frequency we multiply the percent of the
list it had to span by the bgwriter_delay value to determine when to run
bgwriter next.  For example, if you find enough dirty pages by looking
at only 10% of the buffer cache you multiple 10% (0.10) * bgwriter_delay
and that is when you run next.  If you have to scan 50%, bgwriter runs
next at 50% (0.50) * bgwriter_delay, and if it has to scan the entire
list it is 100% (1.00) * bgwriter_delay.

What this does is to cause bgwriter to run more frequently when there
are a lot of dirty buffers on the end of the LRU _and_ when the bgwriter
scan will be quick.  When there are few writes, bgwriter will run less
frequently but will write dirty buffers nearer to the head of the LRU.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Bgwriter behavior

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> First, we remove the GUC bgwriter_maxpages because I don't see a good
> way to set a default for that.  A default value needs to be based on a
> percentage of the full buffer cache size.

This is nonsense.  The admin knows what he set shared_buffers to, and so
maxpages and percent of shared buffers are not really distinct ways of
specifying things.  The cases that make a percent spec useful are if
(a) it is a percent of a non-constant number (eg, percent of total dirty
pages as in the current code), or (b) it is defined in a way that lets
it limit the amount of scanning work done (which it isn't useful for in
the current code).  But a maxpages spec is useful for (b) too.  More to
the point, maxpages is useful to set a hard limit on the amount of I/O
generated by the bgwriter, and I think people will want to be able to do
that.

> Now, to control the bgwriter frequency we multiply the percent of the
> list it had to span by the bgwriter_delay value to determine when to run
> bgwriter next.

I'm less than enthused about this.  The idea of the bgwriter is to
trickle out writes in a way that doesn't affect overall performance too
much.  Not to write everything in sight at any cost.

I like the hybrid "keep the bottom of the ARC list clean, plus do a slow
clock scan on the main buffer array" approach better.  I can see that
that directly impacts both of the goals that the bgwriter has.  I don't
see how a variable I/O rate really improves life on either score; it
just makes things harder to predict.
        regards, tom lane


Re: Bgwriter behavior

From
"Jim C. Nasby"
Date:
A quick $0.02 on how DB2 does this (at least in 7.x).

They used a combination of everything that's been discussed. The first
priority of their background writer was to keep the LRU end of the cache
free so individual backends would never have to wait to get a page.
Then, they would look to pages that had been dirty for 'a long time',
which was user configurable. Pages older than this setting were
candidates to be written out even if they weren't close to LRU. Finally,
I believe there were also settings for how often the writer would fire
up, and how much work it would do at once.

I agree that the first priority should be to keep clean pages near LRU,
but that you also don't want to get hammered at checkpoint time. I think
what might be interesting to consider is keeping a list of dirty pages,
which would remove the need to scan a very large buffer. Of course, in
an environment with a heavy update load, it could be better to just
scan the buffers, especially if you don't do a clock-sweep but instead
look at where the last page you wrote out has ended up in the LRU list
since you last ran, and start scanning from there (by definition
everything after that page would have to be clean). Of course this is
just conjecture on my part and would need testing to verify, and it's
obviously beyond the scope of 8.0.

As for 8.0, I suspect at this point it's probably best to just go with
whatever method has the smallest amount of code impact unless it's
inherenttly broken.
-- 
Jim C. Nasby, Database Consultant               decibel@decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


Re: Bgwriter behavior

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > First, we remove the GUC bgwriter_maxpages because I don't see a good
> > way to set a default for that.  A default value needs to be based on a
> > percentage of the full buffer cache size.
> 
> This is nonsense.  The admin knows what he set shared_buffers to, and so
> maxpages and percent of shared buffers are not really distinct ways of
> specifying things.  The cases that make a percent spec useful are if
> (a) it is a percent of a non-constant number (eg, percent of total dirty
> pages as in the current code), or (b) it is defined in a way that lets
> it limit the amount of scanning work done (which it isn't useful for in
> the current code).  But a maxpages spec is useful for (b) too.  More to
> the point, maxpages is useful to set a hard limit on the amount of I/O
> generated by the bgwriter, and I think people will want to be able to do
> that.

I figured that if we specify a percentage users would not need to update
this value regularly if they increase their shared buffers.  I agree if
you want to limit total I/O by the bgwriter an actual pages a count is
better but I assumed we were looking for bgwriter to do a certain
percentage of total writes.  If the system is doing a lot of writes then
limiting the bgwriter doesn't help because then the backends are going
to have to do the writes themselves.

> > Now, to control the bgwriter frequency we multiply the percent of the
> > list it had to span by the bgwriter_delay value to determine when to run
> > bgwriter next.
> 
> I'm less than enthused about this.  The idea of the bgwriter is to
> trickle out writes in a way that doesn't affect overall performance too
> much.  Not to write everything in sight at any cost.

No question my idea makes tuning diffcult.  I was hoping it would be
self-tuning but I am not sure.

> I like the hybrid "keep the bottom of the ARC list clean, plus do a slow
> clock scan on the main buffer array" approach better.  I can see that
> that directly impacts both of the goals that the bgwriter has.  I don't
> see how a variable I/O rate really improves life on either score; it
> just makes things harder to predict.

So what are we doing for 8.0?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Bgwriter behavior

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> So what are we doing for 8.0?

Well, it looks like RC2 has already crashed and burned --- I can't
imagine that Marc will let us release without an RC3 given what was
committed today, never mind the btree bug that Mark Wong seems to have
found.  So maybe we should just bite the bullet and do something real
about this.

I'm willing to code up a proposed patch for the two-track idea I
suggested, and if anyone else has a favorite maybe they could write
something too.  But do we have the resources to test such patches and
make a decision in the next few days?

At the moment my inclination is to sit on what we have.  I've not seen
any indication that 8.0 is really worse than earlier releases; the most
you could argue against it is that it's not as much better as we hoped.
That's not grounds to muck around at the RC3 stage.
        regards, tom lane


Re: Bgwriter behavior

From
"Joshua D. Drake"
Date:
>At the moment my inclination is to sit on what we have.  I've not seen
>any indication that 8.0 is really worse than earlier releases; the most
>you could argue against it is that it's not as much better as we hoped.
>That's not grounds to muck around at the RC3 stage.
>
>
If is is any help, CMD is basically dead right now and I expect
it will be that way until the new year. 4 of my 5 C programmers
are on vacation but I do have one and a couple of non c programmers.

We can't fix, but we can definately help test.

Sincerely,

Joshua D. Drake


>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 9: the planner will ignore your desire to choose an index scan if your
>      joining column's datatypes do not match
>
>


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL


Attachment

Re: Bgwriter behavior

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > So what are we doing for 8.0?
> 
> Well, it looks like RC2 has already crashed and burned --- I can't
> imagine that Marc will let us release without an RC3 given what was
> committed today, never mind the btree bug that Mark Wong seems to have
> found.  So maybe we should just bite the bullet and do something real
> about this.

Oh, is it that bad?

> I'm willing to code up a proposed patch for the two-track idea I
> suggested, and if anyone else has a favorite maybe they could write
> something too.  But do we have the resources to test such patches and
> make a decision in the next few days?
> 
> At the moment my inclination is to sit on what we have.  I've not seen
> any indication that 8.0 is really worse than earlier releases; the most
> you could argue against it is that it's not as much better as we hoped.
> That's not grounds to muck around at the RC3 stage.

That was my question.  It seems bgwriter is fine for low to medium
traffic but doesn't handle high traffic, and increasing the scan rate
makes things worse.

I am fine with doing nothing, but if we are going to do something, I
would like to do it now rather than later.

The only way I could see it being worse than pre-8.0 is that the
bgwriter is doing fsync of all open files rather than using sync. Other
than that, I think it should behave the same, or slightly better, 
right?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Bgwriter behavior

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> The only way I could see it being worse than pre-8.0 is that the
> bgwriter is doing fsync of all open files rather than using sync. Other
> than that, I think it should behave the same, or slightly better, 
> right?

It's possible that there exist platforms on which this is a loss ---
that is, the OS's handling of fsync is so inefficient that multiple
fsync calls are worse than one sync call even though less I/O is forced.
But I haven't seen any actual evidence of that; and if such platforms
do exist I'm not sure I'd blink anyway.  We are not required to optimize
for brain-dead kernels.
        regards, tom lane


Re: RC2 and open issues

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Maybe we need a hybrid approach: clean a few percent of the LRU end of
> the ARC list in order to keep backends from blocking on writes, plus run
> a clock scan to keep checkpoints from having to do much.  

Well if you just keep note of when the last clock scan started then when you
get to the end of the list you've _done_ a checkpoint.

Put another way, we already have such a clock scan, it's called checkpoint.
You could have checkpoint delay between each page write long enough to spread
the checkpoint i/o out over a configurable amount of time -- say half the
checkpoint interval -- and be done with that side of things.

-- 
greg



Re: Bgwriter behavior

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > So what are we doing for 8.0?
> 
> Well, it looks like RC2 has already crashed and burned --- I can't
> imagine that Marc will let us release without an RC3 given what was
> committed today, never mind the btree bug that Mark Wong seems to have
> found.  So maybe we should just bite the bullet and do something real
> about this.
> 
> I'm willing to code up a proposed patch for the two-track idea I
> suggested, and if anyone else has a favorite maybe they could write
> something too.  But do we have the resources to test such patches and
> make a decision in the next few days?
> 
> At the moment my inclination is to sit on what we have.  I've not seen
> any indication that 8.0 is really worse than earlier releases; the most
> you could argue against it is that it's not as much better as we hoped.
> That's not grounds to muck around at the RC3 stage.

I remember the other difference between 8.0 and pre-8.0.  When a backend
has to write a block in 8.0, it does a write _plus_ fsync(), while in
pre-8.0 it did only a write.  There was a proposal to pass backend write
information to the background writer so it would know to fsync at
checkpoint, but it was decided that backend writing would be rare.  I
think we have to rethink that assumption.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Bgwriter behavior

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I remember the other difference between 8.0 and pre-8.0.  When a backend
> has to write a block in 8.0, it does a write _plus_ fsync(), while in
> pre-8.0 it did only a write.  There was a proposal to pass backend write
> information to the background writer so it would know to fsync at
> checkpoint, but it was decided that backend writing would be rare.  I
> think we have to rethink that assumption.

No, just read the code.  The above assertions are all wet.
        regards, tom lane


Re: Bgwriter behavior

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I remember the other difference between 8.0 and pre-8.0.  When a backend
> > has to write a block in 8.0, it does a write _plus_ fsync(), while in
> > pre-8.0 it did only a write.  There was a proposal to pass backend write
> > information to the background writer so it would know to fsync at
> > checkpoint, but it was decided that backend writing would be rare.  I
> > think we have to rethink that assumption.
> 
> No, just read the code.  The above assertions are all wet.

Oh, I forgot you added that array to pass fsync info.

Shouldn't we send a log message when the array gets full in md.c:
   {       if (ForwardFsyncRequest(reln->smgr_rnode, seg->mdfd_segno))           return true;   }
   if (FileSync(seg->mdfd_vfd) < 0)       return false;

Seems that could fill up quickly.  I see no checking for existing
matching records in the array.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Bgwriter behavior

From
Simon Riggs
Date:
On Wed, 2004-12-22 at 04:43, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > So what are we doing for 8.0?
> 
> Well, it looks like RC2 has already crashed and burned --- I can't
> imagine that Marc will let us release without an RC3 given what was
> committed today, never mind the btree bug that Mark Wong seems to have
> found.  So maybe we should just bite the bullet and do something real
> about this.
> 
> I'm willing to code up a proposed patch for the two-track idea I
> suggested, and if anyone else has a favorite maybe they could write
> something too.  But do we have the resources to test such patches and
> make a decision in the next few days?
> 
> At the moment my inclination is to sit on what we have.  I've not seen
> any indication that 8.0 is really worse than earlier releases; the most
> you could argue against it is that it's not as much better as we hoped.
> That's not grounds to muck around at the RC3 stage.

Agreed, if somewhat reluctantly.

We may have the time to test, but it is clear that we do not have the
time to validate those tests, then discuss and agree on the results.

Time to go with what we have.

[Mark's possible bug seems a higher priority for me.]

-- 
Best Regards, Simon Riggs



Re: RC2 and open issues

From
Kenneth Marshall
Date:
On Mon, Dec 20, 2004 at 11:20:46PM -0500, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> Exactly.  But 1% would be uselessly small with this definition.  Offhand
> >> I'd think something like 50% might be a starting point; maybe even more.
> >> What that says is that a page isn't a candidate to be written out by the
> >> bgwriter until it's fallen halfway down the LRU list.
> 
> > So we are not scanning by buffer address but using the LRU list?  Are we
> > sure they are mostly dirty?
> 
> No.  The entire point is to keep the LRU end of the list mostly clean.
> 
> Now that you mention it, it might be interesting to try the approach of
> doing a clock scan on the buffer array and ignoring the ARC lists
> entirely.  That would be a fundamentally different way of envisioning
> what the bgwriter is supposed to do, though.  I think the main reason
> Jan didn't try that was he wanted to be sure the LRU page was usually
> clean so that backends would seldom end up doing writes for themselves
> when they needed to get a free buffer.
> 
> Maybe we need a hybrid approach: clean a few percent of the LRU end of
> the ARC list in order to keep backends from blocking on writes, plus run
> a clock scan to keep checkpoints from having to do much.  But that's way
> beyond what we have time for in the 8.0 cycle.
> 
>             regards, tom lane
> 

I have not had a chance to investigate, but there is a modification of
the ARC cache strategy called CAR that replaces the LRU linked lists
with the clock approximation to the LRU lists. This algorithm is virtually
identical to the current ARC but reduces the contention at the MRU end
of the lists. This may dovetail nicely into your idea of a "clock" bgwriter
functionality as well as help with the cache-line performance problem.

Yours,
Ken Marshall


Re: RC2 and open issues

From
Bruce Momjian
Date:
Greg Stark wrote:
> 
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > Maybe we need a hybrid approach: clean a few percent of the LRU end of
> > the ARC list in order to keep backends from blocking on writes, plus run
> > a clock scan to keep checkpoints from having to do much.  
> 
> Well if you just keep note of when the last clock scan started then when you
> get to the end of the list you've _done_ a checkpoint.
> 
> Put another way, we already have such a clock scan, it's called checkpoint.
> You could have checkpoint delay between each page write long enough to spread
> the checkpoint i/o out over a configurable amount of time -- say half the
> checkpoint interval -- and be done with that side of things.

But don't you have to keep the WAL files around longer then.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: RC2 and open issues

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Greg Stark wrote:
>> Put another way, we already have such a clock scan, it's called checkpoint.
>> You could have checkpoint delay between each page write long enough to spread
>> the checkpoint i/o out over a configurable amount of time -- say half the
>> checkpoint interval -- and be done with that side of things.

> But don't you have to keep the WAL files around longer then.

Yeah, but do you care?  It seems like what Greg is suggesting is a
"checkpoint slowdown" knob comparable to the "vacuum slowdown"
feature that Jan added for 8.0.  It strikes me as not necessarily
a bad idea.

Suppose that you run a checkpoint every 5 minutes, and with the knob
you slow down the checkpoint to extend over say 3 minutes on average,
rather than the normal blast-it-out-as-fast-as-possible.  Then you'll
be keeping an average of 8 minutes worth of WAL files instead of 5.
Not exactly a killer objection.

Shutdown checkpoints would still need to go as fast as possible,
so we might need two separate code paths; or maybe we could just
change the delay setting locally during a shutdown.

One issue is that while we can regulate the rate at which we issue
write()s, we still have to issue fsync()s at the end, and we can't
control what happens in response to those.  It's quite possible that
all the I/O would happen in response to the fsync()s anyway, in which
case the whole exercise would be a waste of time.
        regards, tom lane


Re: RC2 and open issues

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Suppose that you run a checkpoint every 5 minutes, and with the knob
> you slow down the checkpoint to extend over say 3 minutes on average,
> rather than the normal blast-it-out-as-fast-as-possible.  Then you'll
> be keeping an average of 8 minutes worth of WAL files instead of 5.
> Not exactly a killer objection.

Right. I was thinking that the goal would be to spread the checkpoint out over
exactly the checkpoint interval, minus some safety factor. So if it has some
estimate of the total number of dirty buffers that need flushing it could just
divide the checkpoint interval by that and calculate the delay needed to
finish in some fraction of the checkpoint interval, 60% seems like a
reasonable guess.

> One issue is that while we can regulate the rate at which we issue
> write()s, we still have to issue fsync()s at the end, and we can't
> control what happens in response to those.  It's quite possible that
> all the I/O would happen in response to the fsync()s anyway, in which
> case the whole exercise would be a waste of time.

Well you could fsync earlier as well, say just before whenever you sleep.
Obviously the delay on the checkpoint process doesn't matter to performance if
it's about to sleep. It could end up scheduling i/o earlier than necessary and
cause redundant seeks but then I guess that's an inherent tension between
trying to spread out the i/o evenly and trying to get the ideal ordering of
i/o.

-- 
greg



Re: Bgwriter behavior

From
Bruce Momjian
Date:
Simon Riggs wrote:
> On Wed, 2004-12-22 at 04:43, Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > So what are we doing for 8.0?
> > 
> > Well, it looks like RC2 has already crashed and burned --- I can't
> > imagine that Marc will let us release without an RC3 given what was
> > committed today, never mind the btree bug that Mark Wong seems to have
> > found.  So maybe we should just bite the bullet and do something real
> > about this.
> > 
> > I'm willing to code up a proposed patch for the two-track idea I
> > suggested, and if anyone else has a favorite maybe they could write
> > something too.  But do we have the resources to test such patches and
> > make a decision in the next few days?
> > 
> > At the moment my inclination is to sit on what we have.  I've not seen
> > any indication that 8.0 is really worse than earlier releases; the most
> > you could argue against it is that it's not as much better as we hoped.
> > That's not grounds to muck around at the RC3 stage.
> 
> Agreed, if somewhat reluctantly.
> 
> We may have the time to test, but it is clear that we do not have the
> time to validate those tests, then discuss and agree on the results.
> 
> Time to go with what we have.

I ran some tests last week and can report results similar on Tom's test:
pgbench -i -s 10 benchpgbench -c 10 -t 10000 bench

The tests were on a machine with a single SCSI drive that doesn't lie
about fsync.  I found 7.4.X got around 75tps while 8.0 got 100tps, very
similar to the 65/107 numbers Tom had.

First, I am confused why we have such a large improvement in 8.0.  Does
anyone know?  This is a pretty long test so a 33-50% increase is a big
jump.

Second, I added a little code in my local code to check if the
pendingOpsTable overflows and register_dirty_segment() must have a local
backend do an fsync().  I found one bgbench test had 54 local fsyncs,
but the next test had none, and 54 isn't a very larger number.

Should we emit a server log message when this happens so they can
reduce bewriter delay?

It seems having the backend do the writes is not so bad (same as 7.4.X)
and our only big problem with current bgwriter is the inability to
reduce checkpoint load for busy servers.

Should we consider at least adjusting the meaning of bgwriter_percent?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Bgwriter behavior

From
John Hansen
Date:
> I ran some tests last week and can report results similar on Tom's test:
> 
>     pgbench -i -s 10 bench
>     pgbench -c 10 -t 10000 bench
> 
> The tests were on a machine with a single SCSI drive that doesn't lie
> about fsync.  I found 7.4.X got around 75tps while 8.0 got 100tps, very
> similar to the 65/107 numbers Tom had.

You do realize, that pgbench result comparisons are about as useful as a
fork for eating soup?

On another note, how do you know for sure, that your drive does not lie
about fsync?

Did you run the tests with fsync turned off vs fsync on?

> First, I am confused why we have such a large improvement in 8.0.  Does
> anyone know?  This is a pretty long test so a 33-50% increase is a big
> jump.

bgwriter is responsible I imagine,... I experienced the same improvement
in an early 7.5, just after the bgwriter was added.
(tho my results was about 4-5 times higher in terms of tps rates, hehe)

... John



Re: RC2 and open issues

From
Bruce Momjian
Date:
Greg Stark wrote:
> 
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > Suppose that you run a checkpoint every 5 minutes, and with the knob
> > you slow down the checkpoint to extend over say 3 minutes on average,
> > rather than the normal blast-it-out-as-fast-as-possible.  Then you'll
> > be keeping an average of 8 minutes worth of WAL files instead of 5.
> > Not exactly a killer objection.
> 
> Right. I was thinking that the goal would be to spread the checkpoint out over
> exactly the checkpoint interval, minus some safety factor. So if it has some
> estimate of the total number of dirty buffers that need flushing it could just
> divide the checkpoint interval by that and calculate the delay needed to
> finish in some fraction of the checkpoint interval, 60% seems like a
> reasonable guess.
> 
> > One issue is that while we can regulate the rate at which we issue
> > write()s, we still have to issue fsync()s at the end, and we can't
> > control what happens in response to those.  It's quite possible that
> > all the I/O would happen in response to the fsync()s anyway, in which
> > case the whole exercise would be a waste of time.
> 
> Well you could fsync earlier as well, say just before whenever you sleep.
> Obviously the delay on the checkpoint process doesn't matter to performance if
> it's about to sleep. It could end up scheduling i/o earlier than necessary and
> cause redundant seeks but then I guess that's an inherent tension between
> trying to spread out the i/o evenly and trying to get the ideal ordering of
> i/o.

It certainly is an interesting idea to have the checkpoint span a longer
time period.  We couldn't do that with sync, but now that we fsync each
file it is possible.

It would be easy do this if we didn't also need the fsync.  The original
idea was that we would write() the dirty buffers long before the
checkpoint, and the kernel would write many of these dirty buffers
before we got to checkpoint time.

We could go with the checkpoint clock sweep idea but then we aren't
writing them but actually doing write/fsync a lot more.  I can't think
of a way this would be a win.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Bgwriter behavior

From
Bruce Momjian
Date:
Added to TODO:

* Improve the background writer
 Allow the background writer to more efficiently write dirty buffers from the end of the LRU cache and use a clock
sweepalgorithm to write other dirty buffers to reduced checkpoint I/O
 


---------------------------------------------------------------------------

Simon Riggs wrote:
> On Wed, 2004-12-22 at 04:43, Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > So what are we doing for 8.0?
> > 
> > Well, it looks like RC2 has already crashed and burned --- I can't
> > imagine that Marc will let us release without an RC3 given what was
> > committed today, never mind the btree bug that Mark Wong seems to have
> > found.  So maybe we should just bite the bullet and do something real
> > about this.
> > 
> > I'm willing to code up a proposed patch for the two-track idea I
> > suggested, and if anyone else has a favorite maybe they could write
> > something too.  But do we have the resources to test such patches and
> > make a decision in the next few days?
> > 
> > At the moment my inclination is to sit on what we have.  I've not seen
> > any indication that 8.0 is really worse than earlier releases; the most
> > you could argue against it is that it's not as much better as we hoped.
> > That's not grounds to muck around at the RC3 stage.
> 
> Agreed, if somewhat reluctantly.
> 
> We may have the time to test, but it is clear that we do not have the
> time to validate those tests, then discuss and agree on the results.
> 
> Time to go with what we have.
> 
> [Mark's possible bug seems a higher priority for me.]
> 
> -- 
> Best Regards, Simon Riggs
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Bgwriter behavior

From
Bruce Momjian
Date:
John Hansen wrote:
> > I ran some tests last week and can report results similar on Tom's test:
> > 
> >     pgbench -i -s 10 bench
> >     pgbench -c 10 -t 10000 bench
> > 
> > The tests were on a machine with a single SCSI drive that doesn't lie
> > about fsync.  I found 7.4.X got around 75tps while 8.0 got 100tps, very
> > similar to the 65/107 numbers Tom had.
> 
> You do realize, that pgbench result comparisons are about as useful as a
> fork for eating soup?


> 
> On another note, how do you know for sure, that your drive does not lie
> about fsync?

> 
> Did you run the tests with fsync turned off vs fsync on?

I just tried and got 115tps with fsync off vs 100 with fsync on, so
fsync is certainly doing something.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Bgwriter behavior

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> John Hansen wrote:
>> On another note, how do you know for sure, that your drive does not lie
>> about fsync?

> I just tried and got 115tps with fsync off vs 100 with fsync on, so
> fsync is certainly doing something.

[ raised eyebrow... ]  Something is wrong with that.  I'd expect a
*much* higher difference.  It's difficult to credit a tps rate higher
than your disk's RPM rating with fsync on, but most modern CPUs can do
a lot better than that with fsync off.  If you have a 7200 RPM drive
then I'd believe the 100 figure, but not the other ...
        regards, tom lane


Re: Bgwriter behavior

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > John Hansen wrote:
> >> On another note, how do you know for sure, that your drive does not lie
> >> about fsync?
> 
> > I just tried and got 115tps with fsync off vs 100 with fsync on, so
> > fsync is certainly doing something.
> 
> [ raised eyebrow... ]  Something is wrong with that.  I'd expect a
> *much* higher difference.  It's difficult to credit a tps rate higher
> than your disk's RPM rating with fsync on, but most modern CPUs can do
> a lot better than that with fsync off.  If you have a 7200 RPM drive
> then I'd believe the 100 figure, but not the other ...

I think it is a 10k RPM drive, Seagate Cheteetah ST336607LW.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Bgwriter behavior

From
Simon Riggs
Date:
On Tue, 2004-12-28 at 07:23, John Hansen wrote:
> > I ran some tests last week and can report results similar on Tom's test:
> > 
> >     pgbench -i -s 10 bench
> >     pgbench -c 10 -t 10000 bench
> > 
> > The tests were on a machine with a single SCSI drive that doesn't lie
> > about fsync.  I found 7.4.X got around 75tps while 8.0 got 100tps, very
> > similar to the 65/107 numbers Tom had.
> 
> You do realize, that pgbench result comparisons are about as useful as a
> fork for eating soup?

I'd have to agree. I find it hard to get comparable results on my test
server, let alone discuss other people's findings.

The only tests I have reasonable faith in these days are those performed
to a rigorous test method, which is also published, visible and
challengeable. OSDL is the nearest thing to that we have to that.

-- 
Best Regards, Simon Riggs



Re: Bgwriter behavior

From
John Hansen
Date:
> > > I ran some tests last week and can report results similar on Tom's test:
> > > 
> > >     pgbench -i -s 10 bench
> > >     pgbench -c 10 -t 10000 bench
> > > 

don't you have to specify the scaling factor for the benchmark as well?
as in pgbench -c 10 -t 10000 -s 10 bench ?

> I just tried and got 115tps with fsync off vs 100 with fsync on, so
> fsync is certainly doing something.

well, I usually get results that differ by that much from run to run.
Probably you ran in to more checkpoints on the second test.

Also, did you reinitialize the bench database with pgbench -i ?

... John



Re: Bgwriter behavior

From
Bruce Momjian
Date:
John Hansen wrote:
> > > > I ran some tests last week and can report results similar on Tom's test:
> > > > 
> > > >     pgbench -i -s 10 bench
> > > >     pgbench -c 10 -t 10000 bench
> > > > 
> 
> don't you have to specify the scaling factor for the benchmark as well?
> as in pgbench -c 10 -t 10000 -s 10 bench ?
> 
> > I just tried and got 115tps with fsync off vs 100 with fsync on, so
> > fsync is certainly doing something.
> 
> well, I usually get results that differ by that much from run to run.
> Probably you ran in to more checkpoints on the second test.
> 
> Also, did you reinitialize the bench database with pgbench -i ?

I destroyed the database and recreated it.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Bgwriter behavior

From
Mark Kirkwood
Date:

Bruce Momjian wrote:

>>well, I usually get results that differ by that much from run to run.
>>Probably you ran in to more checkpoints on the second test.
>>
>>Also, did you reinitialize the bench database with pgbench -i ?
>>    
>>
>
>I destroyed the database and recreated it.
>  
>
The only way I managed to control the variability in Pgbench was to 
*reboot the machine* and recreate the database for each test. In 
addition it seems that using a larger scale factor (e.g 200) helped as well.

Having said that, on FreeBSD 5.3 with hw.ata.wc=0 (i.e no write cache) 
my results for s=200, t=10000 and c=4 were 49  (+/- 0.5) tps for both 
7.4.6 and 8.0.0RC1 - no measurable difference. If I  reduced the number 
of transactions to t=1000, then 7.4.6 jumped ahead by about 10 tps.

Bruce - are you able to try s=200? It would be interesting to see what 
your setup does.

regards

Mark


Re: Bgwriter behavior

From
Manfred Koizar
Date:
[I know I'm late and this has already been discussed by Richrad, Tom,
et al., but ...]

On Tue, 21 Dec 2004 16:17:17 -0600, "Jim C. Nasby"
<decibel@decibel.org> wrote:
>look at where the last page you wrote out has ended up in the LRU list
>since you last ran, and start scanning from there (by definition
>everything after that page would have to be clean).

This is a bit oversimplified, because that page will be moved to the
start of the list when it is accessed the next time.
 A = B = C = D = E = F = G = H = I = J = K = L = m = n = o = p = q                                                 ^
would become
 M = A = B = C = D = E = F = G = H = I = J = K = L = n = o = p = q ^

(a-z ... known to be clean, A-Z ... possibly dirty)

But with a bit of cooperation from the backends this could be made to
work.  Whenever a backend takes the page which is the start of the
clean tail out of the list (most probably to insert it into another
list or to re-insert it at the start of the same list) the clean tail
pointer is advanced to the next list element, if any.  So we would get
 M = A = B = C = D = E = F = G = H = I = J = K = L = n = o = p = q
^

As a little improvement the clean tail could be prevented from
shrinking unnecessarily fast by moving the pointer to the previous
list element if this is found to be clean:
 M = A = B = C = D = E = F = G = H = I = J = K = l = n = o = p = q                                                 ^

Maybe this approach could serve both goals, (1) keeping a number of
clean pages at the LRU end of the list and (2) writing out other dirty
pages if there's not much to do near the end of the list.

But ...
On Tue, 21 Dec 2004 10:26:48 -0500, Tom Lane <tgl@sss.pgh.pa.us>
wrote:
>Also, the cntxDirty mechanism allows a block to be dirtied without
>changing the ARC state at all.

... which might kill this proposal anyway.

ServusManfred



Re: Bgwriter behavior

From
Simon Riggs
Date:
On Mon, 2004-12-27 at 22:21, Bruce Momjian wrote:
> Should we consider at least adjusting the meaning of bgwriter_percent?

Yes. As things stand, this is the only change that seems safe.

Here's a very short patch that implements this change within BufferSync
in bufmgr.c

- No algorithm changes
- No error message changes
- Only change is the call to StrategyDirtyBufferList is made using the
maximum number of buffers that will be cleaned, rather than uselessly
trawling through all of shared_buffers

This changes the meaning of bgwriter_percent from "percent of dirty
buffers" to "percent of shared_buffers". The default settings of 1% of
1000 buffers gives up to 10 dirty block writes every 250ms

Benefit: allows performance tuning by increases options for setting
bgwriter_delay which would otherwise have an ineffectually high minimum
setting

Risk: low

1-line doc patch to follow, if this is approved.

--
Best Regards, Simon Riggs

Attachment

Re: Bgwriter behavior

From
Bruce Momjian
Date:
Simon Riggs wrote:
> On Mon, 2004-12-27 at 22:21, Bruce Momjian wrote:
> > Should we consider at least adjusting the meaning of bgwriter_percent?
>
> Yes. As things stand, this is the only change that seems safe.
>
> Here's a very short patch that implements this change within BufferSync
> in bufmgr.c
>
> - No algorithm changes
> - No error message changes
> - Only change is the call to StrategyDirtyBufferList is made using the
> maximum number of buffers that will be cleaned, rather than uselessly
> trawling through all of shared_buffers
>
> This changes the meaning of bgwriter_percent from "percent of dirty
> buffers" to "percent of shared_buffers". The default settings of 1% of
> 1000 buffers gives up to 10 dirty block writes every 250ms
>
> Benefit: allows performance tuning by increases options for setting
> bgwriter_delay which would otherwise have an ineffectually high minimum
> setting
>
> Risk: low
>
> 1-line doc patch to follow, if this is approved.

I am not objecting to the patch, but what value is there in having both
bgwriter_percent and bgwriter_maxpages?  Seems both are redundant and
that one would be enough.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Bgwriter behavior

From
Simon Riggs
Date:
On Fri, 2004-12-31 at 01:14, Bruce Momjian wrote:
> Simon Riggs wrote:
> > On Mon, 2004-12-27 at 22:21, Bruce Momjian wrote:
> > > Should we consider at least adjusting the meaning of bgwriter_percent?
> >
> > Yes. As things stand, this is the only change that seems safe.
> >
> > Here's a very short patch that implements this change within BufferSync
> > in bufmgr.c
> >
> > - No algorithm changes
> > - No error message changes
> > - Only change is the call to StrategyDirtyBufferList is made using the
> > maximum number of buffers that will be cleaned, rather than uselessly
> > trawling through all of shared_buffers
> >
> > This changes the meaning of bgwriter_percent from "percent of dirty
> > buffers" to "percent of shared_buffers". The default settings of 1% of
> > 1000 buffers gives up to 10 dirty block writes every 250ms
> >
> > Benefit: allows performance tuning by increases options for setting
> > bgwriter_delay which would otherwise have an ineffectually high minimum
> > setting
> >
> > Risk: low
> >
> > 1-line doc patch to follow, if this is approved.
>
> I am not objecting to the patch, but what value is there in having both
> bgwriter_percent and bgwriter_maxpages?  Seems both are redundant and
> that one would be enough.

In brief:
i) for now: as little change as possible is good
ii) the two parameters are OK
iii) trying to decide an alternative takes time, which we do not have
iv) what is presented here is simply a performance bug fix, not the best
long term alternative...

I'd like to move quickly: if we do this (or an alternative), it has to
be done soon and it would be easy to discuss this until we run out of
time. Could we vote: in RC3, or not?

In more detail...

The value of having both is:
i) as little change as possible at this stage of RC - the main one
...which gives us stability
...and also avoids having to re-discuss what they *should* be

ii) Having two isn't that bad. bgwriter_percent auto adjusts the length
of the to-be-cleaned-list, so it is roughly useful anywhere between 500
and 10000 shared_buffers. That is IMHO slightly more useful than a hard
definition set via bgwriter_maxpages, since that is likely to be set
wrong anyway - but has some value as an outside limit on the number of
pages. [You may wish to set shared_buffers > 10000 even on smaller
servers, since many now have 2GB RAM and yet a relatively poor I/O
subsystem. Having maxpages set separately allows the majority of people
to set shared_buffers higher without swamping their I/O subsystems
because they didn't know about the r8.0 bgwriter feature/parameters]

iii) changing the parameters might tempt us towards changing the
algorithm, which is not a topic we have reached agreement on

iv) I see it as a goal to remove all of those parameters anyway, as well
as explore some of the many options and ideas everybody has presented,
so further change is likely at the next release whatever is done now.

The patch is as simple as I can make it and yet remove the unnecessary
performance effect in the existing code. Thanks to Neil and others for
showing that this was possible...I see this patch as a team effort.

I've already spoken against larger change and would do so again now: if
we don't agree this change, then I would vote for no-change.... simply
because this patch is minimal change. We *suspect* further change is
beneficial but we have no evidence to support what that change should
be, amongst the large range of possible solutions proposed.

--
Best Regards, Simon Riggs


Re: [PATCHES] Bgwriter behavior

From
Bruce Momjian
Date:
This change isn't going to make it for RC3, and it probably not
something we want to rush.

I think there are a few issues involved:

    o  everyone agrees the current meaning of bgwriter_percent is
       useless (percent of dirty buffers)
    o  removal of bgwriter_percent will cause problems because
       postgresql.conf is only installed via initdb, so beta users
       will have to have some workaround so their existing
       postgresql.conf files work.
    o  bgwriter_percent and bgwriter_maxpages are duplicate for a
       given number of buffers and it isn't clear which one takes
       precedence.
    o  8.1 might use these variables with different meanings,
       causing slight upgrade confusion.
    o  Another idea is for bgwriter_percent to control how much of
       the buffer is scanned.

Tom feels bgwriter_maxpages is good because it allows the user to
specify the I/O traffic, while bgwriter_percent as total pages (not just
dirty ones) is perhaps easier to set a default (I/O load varies based on
buffer cache size) and perhaps easier to understand.

I am not sure what to suggest at this point but whatever solution we use
should take the above issues into account.

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Fri, 2004-12-31 at 01:14, Bruce Momjian wrote:
> > Simon Riggs wrote:
> > > On Mon, 2004-12-27 at 22:21, Bruce Momjian wrote:
> > > > Should we consider at least adjusting the meaning of bgwriter_percent?
> > >
> > > Yes. As things stand, this is the only change that seems safe.
> > >
> > > Here's a very short patch that implements this change within BufferSync
> > > in bufmgr.c
> > >
> > > - No algorithm changes
> > > - No error message changes
> > > - Only change is the call to StrategyDirtyBufferList is made using the
> > > maximum number of buffers that will be cleaned, rather than uselessly
> > > trawling through all of shared_buffers
> > >
> > > This changes the meaning of bgwriter_percent from "percent of dirty
> > > buffers" to "percent of shared_buffers". The default settings of 1% of
> > > 1000 buffers gives up to 10 dirty block writes every 250ms
> > >
> > > Benefit: allows performance tuning by increases options for setting
> > > bgwriter_delay which would otherwise have an ineffectually high minimum
> > > setting
> > >
> > > Risk: low
> > >
> > > 1-line doc patch to follow, if this is approved.
> >
> > I am not objecting to the patch, but what value is there in having both
> > bgwriter_percent and bgwriter_maxpages?  Seems both are redundant and
> > that one would be enough.
>
> In brief:
> i) for now: as little change as possible is good
> ii) the two parameters are OK
> iii) trying to decide an alternative takes time, which we do not have
> iv) what is presented here is simply a performance bug fix, not the best
> long term alternative...
>
> I'd like to move quickly: if we do this (or an alternative), it has to
> be done soon and it would be easy to discuss this until we run out of
> time. Could we vote: in RC3, or not?
>
> In more detail...
>
> The value of having both is:
> i) as little change as possible at this stage of RC - the main one
> ...which gives us stability
> ...and also avoids having to re-discuss what they *should* be
>
> ii) Having two isn't that bad. bgwriter_percent auto adjusts the length
> of the to-be-cleaned-list, so it is roughly useful anywhere between 500
> and 10000 shared_buffers. That is IMHO slightly more useful than a hard
> definition set via bgwriter_maxpages, since that is likely to be set
> wrong anyway - but has some value as an outside limit on the number of
> pages. [You may wish to set shared_buffers > 10000 even on smaller
> servers, since many now have 2GB RAM and yet a relatively poor I/O
> subsystem. Having maxpages set separately allows the majority of people
> to set shared_buffers higher without swamping their I/O subsystems
> because they didn't know about the r8.0 bgwriter feature/parameters]
>
> iii) changing the parameters might tempt us towards changing the
> algorithm, which is not a topic we have reached agreement on
>
> iv) I see it as a goal to remove all of those parameters anyway, as well
> as explore some of the many options and ideas everybody has presented,
> so further change is likely at the next release whatever is done now.
>
> The patch is as simple as I can make it and yet remove the unnecessary
> performance effect in the existing code. Thanks to Neil and others for
> showing that this was possible...I see this patch as a team effort.
>
> I've already spoken against larger change and would do so again now: if
> we don't agree this change, then I would vote for no-change.... simply
> because this patch is minimal change. We *suspect* further change is
> beneficial but we have no evidence to support what that change should
> be, amongst the large range of possible solutions proposed.
>
> --
> Best Regards, Simon Riggs
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [PATCHES] Bgwriter behavior

From
Simon Riggs
Date:
On Sat, 2005-01-01 at 06:20, Bruce Momjian wrote:
> This change isn't going to make it for RC3, and it probably not
> something we want to rush.

OK. Thank you.

> I think there are a few issues involved:
>
>     o  everyone agrees the current meaning of bgwriter_percent is
>        useless (percent of dirty buffers)
>     o  removal of bgwriter_percent will cause problems because
>        postgresql.conf is only installed via initdb, so beta users
>        will have to have some workaround so their existing
>        postgresql.conf files work.
>     o  bgwriter_percent and bgwriter_maxpages are duplicate for a
>        given number of buffers and it isn't clear which one takes
>        precedence.
>     o  8.1 might use these variables with different meanings,
>        causing slight upgrade confusion.
>     o  Another idea is for bgwriter_percent to control how much of
>        the buffer is scanned.
>

Agreed.

Would add as item #1: current behaviour of bgwriter causes sub-optimal
performance for 8.0, for systems with a high write workload, more CPUs
and higher shared_buffers.

> Tom feels bgwriter_maxpages is good because it allows the user to
> specify the I/O traffic, while bgwriter_percent as total pages (not just
> dirty ones) is perhaps easier to set a default (I/O load varies based on
> buffer cache size) and perhaps easier to understand.
>

Agreed.

> I am not sure what to suggest at this point but whatever solution we use
> should take the above issues into account.

Well, I think we're saying: its not in 8.0 now, and we take our time to
consider patches for 8.1 and accept the situation that the parameter
names/meaning will change in next release.

The patch is there if that decision changes, but I'll say no more on it.

> ---------------------------------------------------------------------------
>
> Simon Riggs wrote:
> > On Fri, 2004-12-31 at 01:14, Bruce Momjian wrote:
> > > Simon Riggs wrote:
> > > > On Mon, 2004-12-27 at 22:21, Bruce Momjian wrote:
> > > > > Should we consider at least adjusting the meaning of bgwriter_percent?
> > > >
> > > > Yes. As things stand, this is the only change that seems safe.
> > > >
> > > > Here's a very short patch that implements this change within BufferSync
> > > > in bufmgr.c
> > > >
> > > > - No algorithm changes
> > > > - No error message changes
> > > > - Only change is the call to StrategyDirtyBufferList is made using the
> > > > maximum number of buffers that will be cleaned, rather than uselessly
> > > > trawling through all of shared_buffers
> > > >
> > > > This changes the meaning of bgwriter_percent from "percent of dirty
> > > > buffers" to "percent of shared_buffers". The default settings of 1% of
> > > > 1000 buffers gives up to 10 dirty block writes every 250ms
> > > >
> > > > Benefit: allows performance tuning by increases options for setting
> > > > bgwriter_delay which would otherwise have an ineffectually high minimum
> > > > setting
> > > >
> > > > Risk: low
> > > >
> > > > 1-line doc patch to follow, if this is approved.
> > >
> > > I am not objecting to the patch, but what value is there in having both
> > > bgwriter_percent and bgwriter_maxpages?  Seems both are redundant and
> > > that one would be enough.
> >
> > In brief:
> > i) for now: as little change as possible is good
> > ii) the two parameters are OK
> > iii) trying to decide an alternative takes time, which we do not have
> > iv) what is presented here is simply a performance bug fix, not the best
> > long term alternative...
> >
> > I'd like to move quickly: if we do this (or an alternative), it has to
> > be done soon and it would be easy to discuss this until we run out of
> > time. Could we vote: in RC3, or not?
> >
> > In more detail...
> >
> > The value of having both is:
> > i) as little change as possible at this stage of RC - the main one
> > ...which gives us stability
> > ...and also avoids having to re-discuss what they *should* be
> >
> > ii) Having two isn't that bad. bgwriter_percent auto adjusts the length
> > of the to-be-cleaned-list, so it is roughly useful anywhere between 500
> > and 10000 shared_buffers. That is IMHO slightly more useful than a hard
> > definition set via bgwriter_maxpages, since that is likely to be set
> > wrong anyway - but has some value as an outside limit on the number of
> > pages. [You may wish to set shared_buffers > 10000 even on smaller
> > servers, since many now have 2GB RAM and yet a relatively poor I/O
> > subsystem. Having maxpages set separately allows the majority of people
> > to set shared_buffers higher without swamping their I/O subsystems
> > because they didn't know about the r8.0 bgwriter feature/parameters]
> >
> > iii) changing the parameters might tempt us towards changing the
> > algorithm, which is not a topic we have reached agreement on
> >
> > iv) I see it as a goal to remove all of those parameters anyway, as well
> > as explore some of the many options and ideas everybody has presented,
> > so further change is likely at the next release whatever is done now.
> >
> > The patch is as simple as I can make it and yet remove the unnecessary
> > performance effect in the existing code. Thanks to Neil and others for
> > showing that this was possible...I see this patch as a team effort.
> >
> > I've already spoken against larger change and would do so again now: if
> > we don't agree this change, then I would vote for no-change.... simply
> > because this patch is minimal change. We *suspect* further change is
> > beneficial but we have no evidence to support what that change should
> > be, amongst the large range of possible solutions proposed.
> >

--
Best Regards, Simon Riggs


Re: [PATCHES] Bgwriter behavior

From
Bruce Momjian
Date:
Simon Riggs wrote:
> On Sat, 2005-01-01 at 06:20, Bruce Momjian wrote:
> > This change isn't going to make it for RC3, and it probably not
> > something we want to rush.
>
> OK. Thank you.
>
> > I think there are a few issues involved:
> >
> >     o  everyone agrees the current meaning of bgwriter_percent is
> >        useless (percent of dirty buffers)
> >     o  removal of bgwriter_percent will cause problems because
> >        postgresql.conf is only installed via initdb, so beta users
> >        will have to have some workaround so their existing
> >        postgresql.conf files work.
> >     o  bgwriter_percent and bgwriter_maxpages are duplicate for a
> >        given number of buffers and it isn't clear which one takes
> >        precedence.
> >     o  8.1 might use these variables with different meanings,
> >        causing slight upgrade confusion.
> >     o  Another idea is for bgwriter_percent to control how much of
> >        the buffer is scanned.
> >
>
> Agreed.
>
> Would add as item #1: current behaviour of bgwriter causes sub-optimal
> performance for 8.0, for systems with a high write workload, more CPUs
> and higher shared_buffers.
>
> > Tom feels bgwriter_maxpages is good because it allows the user to
> > specify the I/O traffic, while bgwriter_percent as total pages (not just
> > dirty ones) is perhaps easier to set a default (I/O load varies based on
> > buffer cache size) and perhaps easier to understand.
> >
>
> Agreed.
>
> > I am not sure what to suggest at this point but whatever solution we use
> > should take the above issues into account.
>
> Well, I think we're saying: its not in 8.0 now, and we take our time to
> consider patches for 8.1 and accept the situation that the parameter
> names/meaning will change in next release.

I have no problem doing something for 8.0 if we can find something that
meets all the items I mentioned.

One idea would be to just remove bgwriter_percent.  Beta/RC users would
still have it in their postgresql.conf, but it is commented out so it
should be OK.  If they uncomment it their server would not start but we
could just tell testers to remove it.  I see that as better than having
conflicting parameters.

Another idea is to have bgwriter_percent be the percent of the buffer it
will scan.  We could default that to 50% or 100%, but we then need to
make sure all beta/RC users update their postgresql.conf with the new
default because the commented-out default will not be correct.

At this point I see these as our only two viable options, aside from
doing nothing.

I realize our current behavior requires a full scan of the buffer cache,
but how often is the bgwriter_maxpages limit met?  If it is not a full
scan is done anyway, right?  It seems the only way to really add
functionality is to change bgwriter_precent to control how much of the
buffer is scanned.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [PATCHES] Bgwriter behavior

From
Simon Riggs
Date:
On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote:
> Simon Riggs wrote:
>
> > Well, I think we're saying: its not in 8.0 now, and we take our time to
> > consider patches for 8.1 and accept the situation that the parameter
> > names/meaning will change in next release.
>
> I have no problem doing something for 8.0 if we can find something that
> meets all the items I mentioned.
>
> One idea would be to just remove bgwriter_percent.  Beta/RC users would
> still have it in their postgresql.conf, but it is commented out so it
> should be OK.  If they uncomment it their server would not start but we
> could just tell testers to remove it.  I see that as better than having
> conflicting parameters.

Can't say I like that at first thought. I'll think some more though...

> Another idea is to have bgwriter_percent be the percent of the buffer it
> will scan.

Hmmm....well that was my original suggestion (bg2.patch on 12 Dec)
(...though with a bug, as Neil pointed out)

> We could default that to 50% or 100%, but we then need to
> make sure all beta/RC users update their postgresql.conf with the new
> default because the commented-out default will not be correct.

...we just differ/ed on what the default should be...

> At this point I see these as our only two viable options, aside from
> doing nothing.

> I realize our current behavior requires a full scan of the buffer cache,
> but how often is the bgwriter_maxpages limit met?  If it is not a full
> scan is done anyway, right?

Well, if you heavy a very heavy read workload then that would be a
problem. I was more worried about concurrency in a heavy write
situation, but I can see your point, and agree.

(Idea #1 still suffers from this, so we should rule it out...)

> It seems the only way to really add
> functionality is to change bgwriter_precent to control how much of the
> buffer is scanned.

OK. I think you've persuaded me on idea #2, if I understand you right:

bgwriter_percent = 50 (default)
bgwriter_maxpages = 100 (default)

percent is the number of shared_buffers we scan, limited by maxpages.

(I'll code it up in a couple of hours when the kids are in bed)

--
Best Regards, Simon Riggs


Re: [PATCHES] Bgwriter behavior

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>     o  everyone agrees the current meaning of bgwriter_percent is
>        useless (percent of dirty buffers)

Oh?

It's not useless by any means; it's a perfectly reasonable and useful
definition that happens to be expensive to implement.  One of the
questions that is not answered to my satisfaction is what is an adequate
substitute that doesn't lose needed functionality.

>     o  bgwriter_percent and bgwriter_maxpages are duplicate for a
>        given number of buffers and it isn't clear which one takes
>        precedence.

Not unless the current definition of bgwriter_percent is changed.

Please try to make sure that your summaries reduce confusion instead
of increasing it.

            regards, tom lane

Re: [PATCHES] Bgwriter behavior

From
Simon Riggs
Date:
On Sat, 2005-01-01 at 17:47, Simon Riggs wrote:
> On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote:
> > Simon Riggs wrote:
> >
> > > Well, I think we're saying: its not in 8.0 now, and we take our time to
> > > consider patches for 8.1 and accept the situation that the parameter
> > > names/meaning will change in next release.
> >
> > I have no problem doing something for 8.0 if we can find something that
> > meets all the items I mentioned.
> >
> > One idea would be to just remove bgwriter_percent.  Beta/RC users would
> > still have it in their postgresql.conf, but it is commented out so it
> > should be OK.  If they uncomment it their server would not start but we
> > could just tell testers to remove it.  I see that as better than having
> > conflicting parameters.
>
> Can't say I like that at first thought. I'll think some more though...
>
> > Another idea is to have bgwriter_percent be the percent of the buffer it
> > will scan.
>
> Hmmm....well that was my original suggestion (bg2.patch on 12 Dec)
> (...though with a bug, as Neil pointed out)
>
> > We could default that to 50% or 100%, but we then need to
> > make sure all beta/RC users update their postgresql.conf with the new
> > default because the commented-out default will not be correct.
>
> ...we just differ/ed on what the default should be...
>
> > At this point I see these as our only two viable options, aside from
> > doing nothing.
>
> > I realize our current behavior requires a full scan of the buffer cache,
> > but how often is the bgwriter_maxpages limit met?  If it is not a full
> > scan is done anyway, right?
>
> Well, if you heavy a very heavy read workload then that would be a
> problem. I was more worried about concurrency in a heavy write
> situation, but I can see your point, and agree.
>
> (Idea #1 still suffers from this, so we should rule it out...)
>
> > It seems the only way to really add
> > functionality is to change bgwriter_precent to control how much of the
> > buffer is scanned.
>
> OK. I think you've persuaded me on idea #2, if I understand you right:
>
> bgwriter_percent = 50 (default)
> bgwriter_maxpages = 100 (default)
>
> percent is the number of shared_buffers we scan, limited by maxpages.
>
> (I'll code it up in a couple of hours when the kids are in bed)

Here's the basic patch - no changes to current default values or docs.

Not sure if this is still interesting or not...

--
Best Regards, Simon Riggs

Attachment

Re: [PATCHES] Bgwriter behavior

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >     o  everyone agrees the current meaning of bgwriter_percent is
> >        useless (percent of dirty buffers)
>
> Oh?
>
> It's not useless by any means; it's a perfectly reasonable and useful
> definition that happens to be expensive to implement.  One of the
> questions that is not answered to my satisfaction is what is an adequate
> substitute that doesn't lose needed functionality.

I remembered this statement:

> I think there's a reasonable case to be made for redefining
> bgwriter_percent as the max percent of the total buffer list to scan
> (not the max percent of the list to return --- Jan correctly pointed out
> that the latter is useless).  Then we could modify
> StrategyDirtyBufferList so that the percent and maxpages parameters are
> passed in, so it can stop as soon as either one is satisfied.  This
> would be a fairly small/safe code change and I wouldn't have a problem
> doing it even at this late stage of the cycle.

Referenced here:

    http://archives.postgresql.org/pgsql-hackers/2004-12/msg00703.php

But I now see that Jan was objecting to the idea of the previouis patch
where bgwriter_percent is a percent of all buffers to return, which we
just discussed as redundant.

> >     o  bgwriter_percent and bgwriter_maxpages are duplicate for a
> >        given number of buffers and it isn't clear which one takes
> >        precedence.
>
> Not unless the current definition of bgwriter_percent is changed.
>
> Please try to make sure that your summaries reduce confusion instead
> of increasing it.

OK, whatever.  My point is that many have critisized the current
behavior of bgwriter_percent and I haven't heard anyone defend it,
including Jan.

What bothers me is that we have known bgwriter needs tuning for months
and I am not sure we are any closer to improving it.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [PATCHES] Bgwriter behavior

From
Bruce Momjian
Date:
OK, we have a submitted patch that attempts to improve bgwriter by
making bgwriter_percent control what percentage of the buffer is
scanned.

The patch still needs doc changes and a change to the default value but
at this point we need a vote on the patch.  Is it:

    * too late for 8.0
    * not the right improvement
    * to be applied with doc/default additions

Comments?

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Sat, 2005-01-01 at 17:47, Simon Riggs wrote:
> > On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote:
> > > Simon Riggs wrote:
> > >
> > > > Well, I think we're saying: its not in 8.0 now, and we take our time to
> > > > consider patches for 8.1 and accept the situation that the parameter
> > > > names/meaning will change in next release.
> > >
> > > I have no problem doing something for 8.0 if we can find something that
> > > meets all the items I mentioned.
> > >
> > > One idea would be to just remove bgwriter_percent.  Beta/RC users would
> > > still have it in their postgresql.conf, but it is commented out so it
> > > should be OK.  If they uncomment it their server would not start but we
> > > could just tell testers to remove it.  I see that as better than having
> > > conflicting parameters.
> >
> > Can't say I like that at first thought. I'll think some more though...
> >
> > > Another idea is to have bgwriter_percent be the percent of the buffer it
> > > will scan.
> >
> > Hmmm....well that was my original suggestion (bg2.patch on 12 Dec)
> > (...though with a bug, as Neil pointed out)
> >
> > > We could default that to 50% or 100%, but we then need to
> > > make sure all beta/RC users update their postgresql.conf with the new
> > > default because the commented-out default will not be correct.
> >
> > ...we just differ/ed on what the default should be...
> >
> > > At this point I see these as our only two viable options, aside from
> > > doing nothing.
> >
> > > I realize our current behavior requires a full scan of the buffer cache,
> > > but how often is the bgwriter_maxpages limit met?  If it is not a full
> > > scan is done anyway, right?
> >
> > Well, if you heavy a very heavy read workload then that would be a
> > problem. I was more worried about concurrency in a heavy write
> > situation, but I can see your point, and agree.
> >
> > (Idea #1 still suffers from this, so we should rule it out...)
> >
> > > It seems the only way to really add
> > > functionality is to change bgwriter_precent to control how much of the
> > > buffer is scanned.
> >
> > OK. I think you've persuaded me on idea #2, if I understand you right:
> >
> > bgwriter_percent = 50 (default)
> > bgwriter_maxpages = 100 (default)
> >
> > percent is the number of shared_buffers we scan, limited by maxpages.
> >
> > (I'll code it up in a couple of hours when the kids are in bed)
>
> Here's the basic patch - no changes to current default values or docs.
>
> Not sure if this is still interesting or not...
>
> --
> Best Regards, Simon Riggs

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [PATCHES] Bgwriter behavior

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> OK, we have a submitted patch that attempts to improve bgwriter by
> making bgwriter_percent control what percentage of the buffer is
> scanned.

> The patch still needs doc changes and a change to the default value but
> at this point we need a vote on the patch.  Is it:

>     * too late for 8.0
>     * not the right improvement
>     * to be applied with doc/default additions

My vote: too late for 8.0.  There is no hard evidence that this is a
useful improvement, and no time for such evidence to be obtained.

            regards, tom lane

Re: [PATCHES] Bgwriter behavior

From
"Marc G. Fournier"
Date:
On Mon, 3 Jan 2005, Bruce Momjian wrote:

>
> OK, we have a submitted patch that attempts to improve bgwriter by
> making bgwriter_percent control what percentage of the buffer is
> scanned.
>
> The patch still needs doc changes and a change to the default value but
> at this point we need a vote on the patch.  Is it:
>
>     * too late for 8.0

Too late by at least 3 RCs ...


----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Re: [PATCHES] Bgwriter behavior

From
Simon Riggs
Date:
On Mon, 2005-01-03 at 20:09, Bruce Momjian wrote:
> OK, we have a submitted patch that attempts to improve bgwriter by
> making bgwriter_percent control what percentage of the buffer is
> scanned.
>
> The patch still needs doc changes and a change to the default value but
> at this point we need a vote on the patch.  Is it:
>
>     * too late for 8.0
>     * not the right improvement
>     * to be applied with doc/default additions
>
> Comments?
>
> ---------------------------------------------------------------------------
>
> Simon Riggs wrote:
> > On Sat, 2005-01-01 at 17:47, Simon Riggs wrote:
> > > On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote:
> > > > Simon Riggs wrote:
> > > >
> > > > > Well, I think we're saying: its not in 8.0 now, and we take our time to
> > > > > consider patches for 8.1 and accept the situation that the parameter
> > > > > names/meaning will change in next release.
> > > >

I hear veto ... so the above situation stands then: 8.1 it is.

Not unhappy...I want this thing released as much as the next man...

--
Best Regards, Simon Riggs


Re: [PATCHES] Bgwriter behavior

From
Bruce Momjian
Date:
Simon Riggs wrote:
> On Mon, 2005-01-03 at 20:09, Bruce Momjian wrote:
> > OK, we have a submitted patch that attempts to improve bgwriter by
> > making bgwriter_percent control what percentage of the buffer is
> > scanned.
> >
> > The patch still needs doc changes and a change to the default value but
> > at this point we need a vote on the patch.  Is it:
> >
> >     * too late for 8.0
> >     * not the right improvement
> >     * to be applied with doc/default additions
> >
> > Comments?
> >
> > ---------------------------------------------------------------------------
> >
> > Simon Riggs wrote:
> > > On Sat, 2005-01-01 at 17:47, Simon Riggs wrote:
> > > > On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote:
> > > > > Simon Riggs wrote:
> > > > >
> > > > > > Well, I think we're saying: its not in 8.0 now, and we take our time to
> > > > > > consider patches for 8.1 and accept the situation that the parameter
> > > > > > names/meaning will change in next release.
> > > > >
>
> I hear veto ... so the above situation stands then: 8.1 it is.
>
> Not unhappy...I want this thing released as much as the next man...

Well, we went through the process and that's the best we can do.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [PATCHES] Bgwriter behavior

From
Simon Riggs
Date:
On Mon, 2005-01-03 at 23:03, Bruce Momjian wrote:
> Simon Riggs wrote:
> > On Mon, 2005-01-03 at 20:09, Bruce Momjian wrote:
> > > OK, we have a submitted patch that attempts to improve bgwriter by
> > > making bgwriter_percent control what percentage of the buffer is
> > > scanned.
> > >
> > > The patch still needs doc changes and a change to the default value but
> > > at this point we need a vote on the patch.  Is it:
> > >
> > >     * too late for 8.0
> > >     * not the right improvement
> > >     * to be applied with doc/default additions
> > >
> > > Comments?
> > >
> > > ---------------------------------------------------------------------------
> > >
> > > Simon Riggs wrote:
> > > > On Sat, 2005-01-01 at 17:47, Simon Riggs wrote:
> > > > > On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote:
> > > > > > Simon Riggs wrote:
> > > > > >
> > > > > > > Well, I think we're saying: its not in 8.0 now, and we take our time to
> > > > > > > consider patches for 8.1 and accept the situation that the parameter
> > > > > > > names/meaning will change in next release.
> > > > > >
> >
> > I hear veto ... so the above situation stands then: 8.1 it is.
> >
> > Not unhappy...I want this thing released as much as the next man...
>
> Well, we went through the process and that's the best we can do.

Here's my bgwriter instrumentation patch, which gives info that could
allow the bgwriter settings to be tuned.

--
Best Regards, Simon Riggs

Attachment

Re: [PATCHES] Bgwriter behavior

From
Bruce Momjian
Date:
Simon Riggs wrote:
> Here's my bgwriter instrumentation patch, which gives info that could
> allow the bgwriter settings to be tuned.

Uh, what does this do exactly?  Add additional logging output?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [PATCHES] Bgwriter behavior

From
Bruce Momjian
Date:
This has been saved for the 8.1 release:

    http:/momjian.postgresql.org/cgi-bin/pgpatches2

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Sat, 2005-01-01 at 17:47, Simon Riggs wrote:
> > On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote:
> > > Simon Riggs wrote:
> > >
> > > > Well, I think we're saying: its not in 8.0 now, and we take our time to
> > > > consider patches for 8.1 and accept the situation that the parameter
> > > > names/meaning will change in next release.
> > >
> > > I have no problem doing something for 8.0 if we can find something that
> > > meets all the items I mentioned.
> > >
> > > One idea would be to just remove bgwriter_percent.  Beta/RC users would
> > > still have it in their postgresql.conf, but it is commented out so it
> > > should be OK.  If they uncomment it their server would not start but we
> > > could just tell testers to remove it.  I see that as better than having
> > > conflicting parameters.
> >
> > Can't say I like that at first thought. I'll think some more though...
> >
> > > Another idea is to have bgwriter_percent be the percent of the buffer it
> > > will scan.
> >
> > Hmmm....well that was my original suggestion (bg2.patch on 12 Dec)
> > (...though with a bug, as Neil pointed out)
> >
> > > We could default that to 50% or 100%, but we then need to
> > > make sure all beta/RC users update their postgresql.conf with the new
> > > default because the commented-out default will not be correct.
> >
> > ...we just differ/ed on what the default should be...
> >
> > > At this point I see these as our only two viable options, aside from
> > > doing nothing.
> >
> > > I realize our current behavior requires a full scan of the buffer cache,
> > > but how often is the bgwriter_maxpages limit met?  If it is not a full
> > > scan is done anyway, right?
> >
> > Well, if you heavy a very heavy read workload then that would be a
> > problem. I was more worried about concurrency in a heavy write
> > situation, but I can see your point, and agree.
> >
> > (Idea #1 still suffers from this, so we should rule it out...)
> >
> > > It seems the only way to really add
> > > functionality is to change bgwriter_precent to control how much of the
> > > buffer is scanned.
> >
> > OK. I think you've persuaded me on idea #2, if I understand you right:
> >
> > bgwriter_percent = 50 (default)
> > bgwriter_maxpages = 100 (default)
> >
> > percent is the number of shared_buffers we scan, limited by maxpages.
> >
> > (I'll code it up in a couple of hours when the kids are in bed)
>
> Here's the basic patch - no changes to current default values or docs.
>
> Not sure if this is still interesting or not...
>
> --
> Best Regards, Simon Riggs

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [PATCHES] Bgwriter behavior

From
Simon Riggs
Date:
On Mon, 2005-01-03 at 19:14 -0500, Bruce Momjian wrote:
> Simon Riggs wrote:
> > Here's my bgwriter instrumentation patch, which gives info that could
> > allow the bgwriter settings to be tuned.
>
> Uh, what does this do exactly?  Add additional logging output?
>

Produces output like this...

DEBUG:ARC T1target=  45 B1len= 4954 T1len=   40 T2len= 4960 B2len=   46
DEBUG:ARC total   =  98% B1hit=   0% T1hit=   0% T2hit=  98% B2hit=   0%
DEBUG:ARC buffer dirty misses=   22% (wasted=    0); cleaned=     4494

when you have debug_shared_buffers (= n) set
and you have server messages DEBUG1 available.

The last line of log output has been replaced by this version.

--
Best Regards, Simon Riggs


Re: [PATCHES] Bgwriter behavior

From
Bruce Momjian
Date:
Do we want to add this additional log infor to CVS for 8.0?

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Mon, 2005-01-03 at 19:14 -0500, Bruce Momjian wrote:
> > Simon Riggs wrote:
> > > Here's my bgwriter instrumentation patch, which gives info that could
> > > allow the bgwriter settings to be tuned.
> >
> > Uh, what does this do exactly?  Add additional logging output?
> >
>
> Produces output like this...
>
> DEBUG:ARC T1target=  45 B1len= 4954 T1len=   40 T2len= 4960 B2len=   46
> DEBUG:ARC total   =  98% B1hit=   0% T1hit=   0% T2hit=  98% B2hit=   0%
> DEBUG:ARC buffer dirty misses=   22% (wasted=    0); cleaned=     4494
>
> when you have debug_shared_buffers (= n) set
> and you have server messages DEBUG1 available.
>
> The last line of log output has been replaced by this version.
>
> --
> Best Regards, Simon Riggs
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [PATCHES] Bgwriter behavior

From
"Marc G. Fournier"
Date:
On Fri, 7 Jan 2005, Bruce Momjian wrote:

>
> Do we want to add this additional log infor to CVS for 8.0?

No, unless we're looking for an RC5?


>
> ---------------------------------------------------------------------------
>
> Simon Riggs wrote:
>> On Mon, 2005-01-03 at 19:14 -0500, Bruce Momjian wrote:
>>> Simon Riggs wrote:
>>>> Here's my bgwriter instrumentation patch, which gives info that could
>>>> allow the bgwriter settings to be tuned.
>>>
>>> Uh, what does this do exactly?  Add additional logging output?
>>>
>>
>> Produces output like this...
>>
>> DEBUG:ARC T1target=  45 B1len= 4954 T1len=   40 T2len= 4960 B2len=   46
>> DEBUG:ARC total   =  98% B1hit=   0% T1hit=   0% T2hit=  98% B2hit=   0%
>> DEBUG:ARC buffer dirty misses=   22% (wasted=    0); cleaned=     4494
>>
>> when you have debug_shared_buffers (= n) set
>> and you have server messages DEBUG1 available.
>>
>> The last line of log output has been replaced by this version.
>>
>> --
>> Best Regards, Simon Riggs
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>>
>
> --
>  Bruce Momjian                        |  http://candle.pha.pa.us
>  pgman@candle.pha.pa.us               |  (610) 359-1001
>  +  If your life is a hard drive,     |  13 Roberts Road
>  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Re: [PATCHES] Bgwriter behavior

From
Tom Lane
Date:
"Marc G. Fournier" <scrappy@postgresql.org> writes:
> On Fri, 7 Jan 2005, Bruce Momjian wrote:
>> Do we want to add this additional log infor to CVS for 8.0?

> No, unless we're looking for an RC5?

I vote no as well.  While it's probably not a dangerous change, the need
for it has not been demonstrated.

            regards, tom lane

Re: [PATCHES] Bgwriter behavior

From
Bruce Momjian
Date:
Tom Lane wrote:
> "Marc G. Fournier" <scrappy@postgresql.org> writes:
> > On Fri, 7 Jan 2005, Bruce Momjian wrote:
> >> Do we want to add this additional log infor to CVS for 8.0?
>
> > No, unless we're looking for an RC5?
>
> I vote no as well.  While it's probably not a dangerous change, the need
> for it has not been demonstrated.

OK, Simon, would you email me a copy of the patch again privately so I
can put it in the 8.1 queue.  I seem to have lost the email.  Thanks.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [PATCHES] Bgwriter behavior

From
Bruce Momjian
Date:
Later version of this patch added to the patch queue.

Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


Simon Riggs wrote:
> On Sat, 2005-01-01 at 17:47, Simon Riggs wrote:
> > On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote:
> > > Simon Riggs wrote:
> > >
> > > > Well, I think we're saying: its not in 8.0 now, and we take our time to
> > > > consider patches for 8.1 and accept the situation that the parameter
> > > > names/meaning will change in next release.
> > >
> > > I have no problem doing something for 8.0 if we can find something that
> > > meets all the items I mentioned.
> > >
> > > One idea would be to just remove bgwriter_percent.  Beta/RC users would
> > > still have it in their postgresql.conf, but it is commented out so it
> > > should be OK.  If they uncomment it their server would not start but we
> > > could just tell testers to remove it.  I see that as better than having
> > > conflicting parameters.
> >
> > Can't say I like that at first thought. I'll think some more though...
> >
> > > Another idea is to have bgwriter_percent be the percent of the buffer it
> > > will scan.
> >
> > Hmmm....well that was my original suggestion (bg2.patch on 12 Dec)
> > (...though with a bug, as Neil pointed out)
> >
> > > We could default that to 50% or 100%, but we then need to
> > > make sure all beta/RC users update their postgresql.conf with the new
> > > default because the commented-out default will not be correct.
> >
> > ...we just differ/ed on what the default should be...
> >
> > > At this point I see these as our only two viable options, aside from
> > > doing nothing.
> >
> > > I realize our current behavior requires a full scan of the buffer cache,
> > > but how often is the bgwriter_maxpages limit met?  If it is not a full
> > > scan is done anyway, right?
> >
> > Well, if you heavy a very heavy read workload then that would be a
> > problem. I was more worried about concurrency in a heavy write
> > situation, but I can see your point, and agree.
> >
> > (Idea #1 still suffers from this, so we should rule it out...)
> >
> > > It seems the only way to really add
> > > functionality is to change bgwriter_precent to control how much of the
> > > buffer is scanned.
> >
> > OK. I think you've persuaded me on idea #2, if I understand you right:
> >
> > bgwriter_percent = 50 (default)
> > bgwriter_maxpages = 100 (default)
> >
> > percent is the number of shared_buffers we scan, limited by maxpages.
> >
> > (I'll code it up in a couple of hours when the kids are in bed)
>
> Here's the basic patch - no changes to current default values or docs.
>
> Not sure if this is still interesting or not...
>
> --
> Best Regards, Simon Riggs

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073