Thread: Performance features the 4th

Performance features the 4th

From
Jan Wieck
Date:
I've just uploaded

http://developer.postgresql.org/~wieck/all_performance.v4.74.diff.gz

This patch contains the "still not yet ready" performance improvements 
discussed over the couple last days.

_Shared buffer replacement_:

The buffer replacement strategy is a slightly modified version of ARC. 
The modifications are some specializations about CDB promotions. Since 
PostgreSQL allways looks for buffers multiple times when updating (first 
during the scan, then during the heap_update() etc.), every updated 
block would jump right into the T2 (frequent accessed) queue. To prevent 
that the Xid when a buffer got added to the T1 queue is remembered and 
if a block is found in T1, the same transaction will not promote it into 
T2. This also affects blocks accessed like SELECT ... FOR UPDATE; UPDATE 
as this is a usual strategy and does not mean that this particular datum 
is accessed frequently.

Blocks faulted in by vacuum are handled special in that they end up at 
the LRU of the T1 queue and when evicted from there their CDB get's 
destroyed instead of added to the B1 queue to prevent vacuum from 
polluting the caches autotuning.

A guc variable
    buffer_strategy_status_interval = 0 # 0-600 seconds

controls DEBUG1 messages every n seconds showing the current queue sizes 
and the cache hitrates during the last interval.


_Vacuum page delay_:

Tom Lane's napping during vacuums with another tuning option. I replaced 
the usleep() call with a PG_DELAY(msec) macro in miscadmin.h, which does 
use select(2) instead. That should address the possible portability 
problems.

The config options
    vacuum_page_group_delay = 0  # 0-100 milliseconds    vacuum_page_group_size  = 10 # 1-1000 pages

control how many pages get vacuumed as a group and how long vacuum will 
nap between groups.

I think this can be improved more if vacuum get's feedback from the 
buffer manager if a page actually was found clean or already dirty in 
the cache or faulted in. This together with the fact if vacuum actually 
dirties the page or not would result in a sort of "vacuum page cost" 
that is accumulated and controls how often to nap. So that vacuuming a 
page found in the cache and that has no dead tuples is cheap, but 
vacuuming a page that caused another dirty block to get evicted, then 
read in and finally ends up dirty because of dead tuples is expensive.


_Lazy checkpoint_:

This is the checkpoint process with the ability to schedule the buffer 
flushing over some time. Also the buffers are written in an order told 
by the buffer replacement strategy. Currently that is a merged list of 
dirty buffers in the order of the T1 and T2 queues of ARC. Since buffers 
are replaced in that order, it causes backends to find clean buffers for 
eviction more often.

The config options
    lazy_checkpoint_time = 0        # 0-3600 seconds    lazy_checkpoint_group_size = 50 # 10-1000 pages
lazy_checkpoint_maxdelay= 500  # 100-1000 milliseconds
 

control how long the buffer flushing "should" take, how many dirty pages 
to write as a group before syncing and napping. The maxdelay is a 
parameter that causes really small amounts of changes not to spread out 
over that long.

The syncing is currently done in a new function in md.c, mdfsyncrecent() 
called through the smgr. The intention is to maintain some LRU of 
written to file descriptors and pg_fdatasync() them. I haven't found the 
right place for that yet, so it simply does a system global sync().

My idea here is that it really does not matter how accurate the single 
files are forced to disk during this, all we care for is to cause some 
physical writes performed by the kernel while we're writing them out, 
and not to buffer those writes in the OS until we finish the checkpoint.

The lazy checkpoint configuration should only affect automatic 
checkpoints started by postmaster because a checkpoint_timeout occured. 
Acutally it seems to apply this to manually started checkpoints as well. 
BufferSync() monitors the time to finish, held in shared memory, so it 
would be relatively easy to hurry up a running lazy checkpoint by 
setting that to zero. It's just that the postmaster can't do that 
because he does not have a PGPROC structure and therefore can't lock 
that shmem structure. This is a must fix item because to hurry up the 
checkpointer is very critical at shutdown time.


_TODO_:

* Replace the global sync() in mdfsyncrecent(int max) with calls to  pg_fdatasync()

* Add functionality to postmaster to hurry up a running checkpoint  at shutdown.

* Make sure that manual checkpoints are not affected by the lazy  checkpoint config options and that they too hurry up
arunning one.
 

* Further improve vacuums napping strategy depending on actual caused  IO per page.


_NOTE_:

The core team is well aware of the high demand for these features. As 
things stand however, it is impossible to get this functionality 
released in version 7.4.

That does not mean, that we have no chance to include some or all of the 
functionality in a subsequent 7.4.x release. But for that to happen, the 
above already mentioned TODO's must get done first. Further, we need a 
good amount of evidence that these changes actually gain the desired 
effect to a degree that justifies breaking our "no features in dot 
releases" rule. Also we need a good amount of evidence that the features 
don't break anything or sacrifice stability and that a backward 
compatible behaviour (where possible ... not possible with ARC vs. LRU) 
is the default.

I personally would like to see this work included in a 7.4.x release. 
But it requires people to actually run tests, stress some hardware, 
check platform portability and *give us feedback*, bacause this is what 
we get for the release candidates and these improvements can under no 
circumstance have any lower quality than that. If this goes into a 7.4.x 
release and there is any platform dependant issue in it, it endangers 
the timely fix of other bugs for those platforms, and that's a no-go.


Happy testing


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: Performance features the 4th

From
Manfred Spraul
Date:
Jan Wieck wrote:

>
> _Vacuum page delay_:
>
> Tom Lane's napping during vacuums with another tuning option. I 
> replaced the usleep() call with a PG_DELAY(msec) macro in miscadmin.h, 
> which does use select(2) instead. That should address the possible 
> portability problems.

What about skipping the delay if there are no outstanding disk 
operations? Then vacuum would get the full disk bandwidth if the system 
is idle.

--   Manfred





Re: Performance features the 4th

From
Neil Conway
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> This patch contains the "still not yet ready" performance improvements
> discussed over the couple last days.

Cool stuff!

> The buffer replacement strategy is a slightly modified version of
> ARC.

BTW Jan, I got your message about taking a look at the ARC code; I'm
really busy at the moment, but I'll definitely take a look at it when
I get a chance.

> I personally would like to see this work included in a 7.4.x
> release.

Personally, I can't see any circumstance under which I would view this
as appropriate for integration into the 7.4 branch -- the changes this
patch introduces are pretty fundamental to the system; even with
testing I'd rather not see a stable release series potentially
destabilized. Furthermore, it's not as if these performance issues
have been recently discovered: we've been aware of most of them for at
least one or two prior releases (if not much longer).

-Neil



Re: Performance features the 4th

From
Jan Wieck
Date:
Manfred Spraul wrote:

> Jan Wieck wrote:
> 
>>
>> _Vacuum page delay_:
>>
>> Tom Lane's napping during vacuums with another tuning option. I 
>> replaced the usleep() call with a PG_DELAY(msec) macro in miscadmin.h, 
>> which does use select(2) instead. That should address the possible 
>> portability problems.
> 
> What about skipping the delay if there are no outstanding disk 
> operations? Then vacuum would get the full disk bandwidth if the system 
> is idle.

All we could do is to monitor our own recent activity. I doubt that 
anything else would be portable. And on a dedicated DB server that is 
very close to the truth anyway.

How portable is getrusage()? Could the postmaster issue that frequently 
for RUSAGE_CHILDREN and leave the result somewhere in the shared memory 
for whoever is concerned?


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: Performance features the 4th

From
Jan Wieck
Date:
Neil Conway wrote:

> Jan Wieck <JanWieck@Yahoo.com> writes:
>> This patch contains the "still not yet ready" performance improvements
>> discussed over the couple last days.
> 
> Cool stuff!
> 
>> The buffer replacement strategy is a slightly modified version of
>> ARC.
> 
> BTW Jan, I got your message about taking a look at the ARC code; I'm
> really busy at the moment, but I'll definitely take a look at it when
> I get a chance.
> 
>> I personally would like to see this work included in a 7.4.x
>> release.
> 
> Personally, I can't see any circumstance under which I would view this
> as appropriate for integration into the 7.4 branch -- the changes this
> patch introduces are pretty fundamental to the system; even with
> testing I'd rather not see a stable release series potentially
> destabilized. Furthermore, it's not as if these performance issues
> have been recently discovered: we've been aware of most of them for at
> least one or two prior releases (if not much longer).

There are many aspects to this, and a full consensus will probably not 
be reachable.

As a matter of fact, people who have performance problems are likely to 
be the same who have upgrade problems. And as Gaetano pointed out 
correctly, we will see wildforms with one or the other feature applied.

My opinion is that it is best for us as supporters and for the 
reputation of PostgreSQL to try to keep the number of wildforms as small 
as possible and to provide those features applied in the best possible 
quality.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: Performance features the 4th

From
Andrew Dunstan
Date:
Jan Wieck wrote:

>
> How portable is getrusage()? Could the postmaster issue that 
> frequently for RUSAGE_CHILDREN and leave the result somewhere in the 
> shared memory for whoever is concerned?
>
SVr4, BSD4.3, SUS2 and POSIX1003.1, I believe.

I also believe there is a M$ dll available that gives that functionality 
(psapi.dll).

cheers

andrew



Re: Performance features the 4th

From
Jan Wieck
Date:
Andrew Dunstan wrote:

> Jan Wieck wrote:
> 
>>
>> How portable is getrusage()? Could the postmaster issue that 
>> frequently for RUSAGE_CHILDREN and leave the result somewhere in the 
>> shared memory for whoever is concerned?
>>
> SVr4, BSD4.3, SUS2 and POSIX1003.1, I believe.
> 
> I also believe there is a M$ dll available that gives that functionality 
> (psapi.dll).

Remains the question when it is updated, the manpage doesn't tell. If 
the RUSAGE_CHILDREN information is updated only when the child exits, 
each backend has to do it.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: Performance features the 4th

From
Andrew Sullivan
Date:
On Wed, Nov 05, 2003 at 03:08:53PM -0500, Neil Conway wrote:
> Jan Wieck <JanWieck@Yahoo.com> writes:
> > I personally would like to see this work included in a 7.4.x
> > release.
> 
> Personally, I can't see any circumstance under which I would view this
> as appropriate for integration into the 7.4 branch -- the changes this

As unhappy as I am to say so, I agree strongly.  Dot releases don't
get anything like enough testing to make me comfortable with putting
this kind of patch into such a release.  I'm just a user though.

A

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8                                        +1 416 646 3304
x110



Re: Performance features the 4th

From
Tom Lane
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> Manfred Spraul wrote:
>> What about skipping the delay if there are no outstanding disk 
>> operations?

> How portable is getrusage()? Could the postmaster issue that frequently 
> for RUSAGE_CHILDREN and leave the result somewhere in the shared memory 
> for whoever is concerned?

How would that tell you about currently outstanding operations?

Manfred's idea is interesting but AFAICS completely unimplementable
in any portable fashion.  You'd have to have hooks into the kernel.
        regards, tom lane


Re: Performance features the 4th

From
Kurt Roeckx
Date:
On Wed, Nov 05, 2003 at 03:49:54PM -0500, Jan Wieck wrote:
> Andrew Dunstan wrote:
> 
> >Jan Wieck wrote:
> >
> >>
> >>How portable is getrusage()? Could the postmaster issue that 
> >>frequently for RUSAGE_CHILDREN and leave the result somewhere in the 
> >>shared memory for whoever is concerned?
> >>
> >SVr4, BSD4.3, SUS2 and POSIX1003.1, I believe.
> >
> >I also believe there is a M$ dll available that gives that functionality 
> >(psapi.dll).
> 
> Remains the question when it is updated, the manpage doesn't tell. If 
> the RUSAGE_CHILDREN information is updated only when the child exits, 
> each backend has to do it.

"If the value of the who argument is RUSAGE_CHILDREN,
information shall be returned about resources used by the
terminated and waited-for children of the current process"


Kurt



Re: Performance features the 4th

From
Tom Lane
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> As a matter of fact, people who have performance problems are likely to 
> be the same who have upgrade problems. And as Gaetano pointed out 
> correctly, we will see wildforms with one or the other feature applied.

I'd believe that for patches of the size of my original VACUUM-delay
hack (or even a production-grade version of same, which'd probably be
10x larger).  The kind of wholesale rewrite you are currently proposing
is much too large to consider folding back into 7.4.*, IMHO.
        regards, tom lane


Re: Performance features the 4th

From
Manfred Spraul
Date:
Tom Lane wrote:

>Manfred's idea is interesting but AFAICS completely unimplementable
>in any portable fashion.  You'd have to have hooks into the kernel.
>  
>
I thought about outstanding operations from postgres - I don't know 
enough about the buffer layer if it's possible to keep a counter of the 
currently running read() and write() operations, or something similar.

--   Manfred



Re: Performance features the 4th

From
"Matthew T. O'Connor"
Date:
Tom Lane wrote:

>Jan Wieck <JanWieck@Yahoo.com> writes:
>  
>
>>As a matter of fact, people who have performance problems are likely to 
>>be the same who have upgrade problems. And as Gaetano pointed out 
>>correctly, we will see wildforms with one or the other feature applied.
>>    
>>
>
>I'd believe that for patches of the size of my original VACUUM-delay
>hack (or even a production-grade version of same, which'd probably be
>10x larger).  The kind of wholesale rewrite you are currently proposing
>is much too large to consider folding back into 7.4.*, IMHO.
>  
>
Do people think that the VACUUM-delay patch by itself, would be usefully 
enough on it's own to consider working it into 7.4.1 or something?  From 
the little feedback I have read on the VACUUM-delay patch used in 
isolation, it certainly does help.  I would love to see it put into 7.4 
somehow. 

The far more rigorous changes that Jan is working on, will be welcome 
improvements for 7.5.



Re: Performance features the 4th

From
Bruce Momjian
Date:
Tom Lane wrote:
> Jan Wieck <JanWieck@Yahoo.com> writes:
> > As a matter of fact, people who have performance problems are likely to 
> > be the same who have upgrade problems. And as Gaetano pointed out 
> > correctly, we will see wildforms with one or the other feature applied.
> 
> I'd believe that for patches of the size of my original VACUUM-delay
> hack (or even a production-grade version of same, which'd probably be
> 10x larger).  The kind of wholesale rewrite you are currently proposing
> is much too large to consider folding back into 7.4.*, IMHO.

What Jan could do is to have a 7.4 patch available that people can test,
and he can improve it during the 7.5 development cycle with feedback
from users.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Performance features the 4th

From
Christopher Browne
Date:
A long time ago, in a galaxy far, far away, pgman@candle.pha.pa.us (Bruce Momjian) wrote:
> Tom Lane wrote:
>> Jan Wieck <JanWieck@Yahoo.com> writes:
>> > As a matter of fact, people who have performance problems are likely to 
>> > be the same who have upgrade problems. And as Gaetano pointed out 
>> > correctly, we will see wildforms with one or the other feature applied.
>> 
>> I'd believe that for patches of the size of my original VACUUM-delay
>> hack (or even a production-grade version of same, which'd probably be
>> 10x larger).  The kind of wholesale rewrite you are currently proposing
>> is much too large to consider folding back into 7.4.*, IMHO.
>
> What Jan could do is to have a 7.4 patch available that people can test,
> and he can improve it during the 7.5 development cycle with feedback
> from users.

The thing is, there are two patches that seem likely to be of
interest:
a) There's the ARC changes, which really feel like they are 7.5   development, not likely to be readily backportable;
b) On the other hand, a "simple delay" on the VACUUM seems likely   to be useful, and reasonably backportable.

And these are two quite different things, both of which may be worth
having.
-- 
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','acm.org').
http://www.ntlug.org/~cbbrowne/unix.html
If I could put Klein in a bottle...


Re: Performance features the 4th

From
Bruce Momjian
Date:
Christopher Browne wrote:
> A long time ago, in a galaxy far, far away, pgman@candle.pha.pa.us (Bruce Momjian) wrote:
> > Tom Lane wrote:
> >> Jan Wieck <JanWieck@Yahoo.com> writes:
> >> > As a matter of fact, people who have performance problems are likely to 
> >> > be the same who have upgrade problems. And as Gaetano pointed out 
> >> > correctly, we will see wildforms with one or the other feature applied.
> >> 
> >> I'd believe that for patches of the size of my original VACUUM-delay
> >> hack (or even a production-grade version of same, which'd probably be
> >> 10x larger).  The kind of wholesale rewrite you are currently proposing
> >> is much too large to consider folding back into 7.4.*, IMHO.
> >
> > What Jan could do is to have a 7.4 patch available that people can test,
> > and he can improve it during the 7.5 development cycle with feedback
> > from users.
> 
> The thing is, there are two patches that seem likely to be of
> interest:
> 
>  a) There's the ARC changes, which really feel like they are 7.5
>     development, not likely to be readily backportable;
> 
>  b) On the other hand, a "simple delay" on the VACUUM seems likely
>     to be useful, and reasonably backportable.
> 
> And these are two quite different things, both of which may be worth
> having.

Yes, Tom has already said "b" is possible in a 7.4.X subrelease, but not
for 7.4.0.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Performance features the 4th

From
Jan Wieck
Date:
Christopher Browne wrote:

> A long time ago, in a galaxy far, far away, pgman@candle.pha.pa.us (Bruce Momjian) wrote:
>> Tom Lane wrote:
>>> Jan Wieck <JanWieck@Yahoo.com> writes:
>>> > As a matter of fact, people who have performance problems are likely to 
>>> > be the same who have upgrade problems. And as Gaetano pointed out 
>>> > correctly, we will see wildforms with one or the other feature applied.
>>> 
>>> I'd believe that for patches of the size of my original VACUUM-delay
>>> hack (or even a production-grade version of same, which'd probably be
>>> 10x larger).  The kind of wholesale rewrite you are currently proposing
>>> is much too large to consider folding back into 7.4.*, IMHO.
>>
>> What Jan could do is to have a 7.4 patch available that people can test,
>> and he can improve it during the 7.5 development cycle with feedback
>> from users.
> 
> The thing is, there are two patches that seem likely to be of
> interest:
> 
>  a) There's the ARC changes, which really feel like they are 7.5
>     development, not likely to be readily backportable;
> 
>  b) On the other hand, a "simple delay" on the VACUUM seems likely
>     to be useful, and reasonably backportable.
> 
> And these are two quite different things, both of which may be worth
> having.


I only need to know the three W's, when, what and where (when do people 
want what pieces of the stuff where?).

However, I have not seen much evidence yet that the vacuum delay alone 
does that much. In conjunction with putting vacuum dirtied blocks at LRU 
instead of MRU maybe, but that's again another functional change. So I 
am not sure what the outcome of that for 7.4 is. The general opinion is 
that the whole thing is too much. But nobody has done anything to show 
how the vacuum delay alone compares to that.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: Performance features the 4th

From
Tom Lane
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> However, I have not seen much evidence yet that the vacuum delay alone 
> does that much.

Gaetano and a couple of other people did experiments that seemed to show
it was useful.  I think we'd want to change the shape of the knob per
later suggestions (sleep 10 ms every N blocks, instead of N ms every
block) but it did seem that there was useful bang for little buck there.
        regards, tom lane


Re: Performance features the 4th

From
Jan Wieck
Date:
Tom Lane wrote:

> Jan Wieck <JanWieck@Yahoo.com> writes:
>> However, I have not seen much evidence yet that the vacuum delay alone 
>> does that much.
> 
> Gaetano and a couple of other people did experiments that seemed to show
> it was useful.  I think we'd want to change the shape of the knob per
> later suggestions (sleep 10 ms every N blocks, instead of N ms every
> block) but it did seem that there was useful bang for little buck there.

I thought it was "sleep N ms every M blocks".

Have we seen any numbers? Anything at all? Something that gives us a 
clue by what factor one has to multiply the total time a "VACUUM 
ANALYZE" takes, to get what effect in return?


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: Performance features the 4th

From
"Matthew T. O'Connor"
Date:
----- Original Message ----- 
From: "Jan Wieck" <JanWieck@Yahoo.com>
> Tom Lane wrote:
> > Gaetano and a couple of other people did experiments that seemed to show
> > it was useful.  I think we'd want to change the shape of the knob per
> > later suggestions (sleep 10 ms every N blocks, instead of N ms every
> > block) but it did seem that there was useful bang for little buck there.
>
> I thought it was "sleep N ms every M blocks".
>
> Have we seen any numbers? Anything at all? Something that gives us a
> clue by what factor one has to multiply the total time a "VACUUM
> ANALYZE" takes, to get what effect in return?

I have some time on sunday to do some testing.  Is there a patch that I can
apply that implements either of the two options? (sleep 10ms every M blocks
or sleep N ms every M blocks).

I know Tom posted the original patch that sleept N ms every 1 block (where N
is > 10 due to OS limitations).  Jan can you post a patch that has just the
sleep code in it? Or should it be easy enough for me to cull out of the
larger patch you posted?



Re: Performance features the 4th

From
"scott.marlowe"
Date:
On Fri, 7 Nov 2003, Matthew T. O'Connor wrote:

> ----- Original Message ----- 
> From: "Jan Wieck" <JanWieck@Yahoo.com>
> > Tom Lane wrote:
> > > Gaetano and a couple of other people did experiments that seemed to show
> > > it was useful.  I think we'd want to change the shape of the knob per
> > > later suggestions (sleep 10 ms every N blocks, instead of N ms every
> > > block) but it did seem that there was useful bang for little buck there.
> >
> > I thought it was "sleep N ms every M blocks".
> >
> > Have we seen any numbers? Anything at all? Something that gives us a
> > clue by what factor one has to multiply the total time a "VACUUM
> > ANALYZE" takes, to get what effect in return?
> 
> I have some time on sunday to do some testing.  Is there a patch that I can
> apply that implements either of the two options? (sleep 10ms every M blocks
> or sleep N ms every M blocks).
> 
> I know Tom posted the original patch that sleept N ms every 1 block (where N
> is > 10 due to OS limitations).  Jan can you post a patch that has just the
> sleep code in it? Or should it be easy enough for me to cull out of the
> larger patch you posted?

The reason for the change is that the minumum sleep period on many systems 
is 10mS, which meant that vacuum was running 20X slower than normal.  
While it might be necessary in certain very I/O starved situations to make 
it this slow, it would probably be better to be able to get a vacuum that 
ran at about 1/2 to 1/5 speed for most folks.  So, since the delta can't 
less than 10mS on most systems, it's better to just leave it at a fixed 
amount and change the number of pages vacuumed per sleep.

I'm certainly gonna test the patch out too.  We aren't really I/O bound, 
but it would be nice to have a database that only slowed down ~1% or so 
during vacuuming.



Re: Performance features the 4th

From
"Stephen"
Date:
Yes, I would like to see the vacuum delay patch go into 7.4.1 if possible.
It's really useful. I don't think there is any major risk in adding the
delay patch into a minor revision given the small amount of code change.

Stephen


""Matthew T. O'Connor"" <matthew@zeut.net> wrote in message
news:3FA97470.3020803@zeut.net...
> Tom Lane wrote:
>
> >Jan Wieck <JanWieck@Yahoo.com> writes:
> >
> >
> >>As a matter of fact, people who have performance problems are likely to
> >>be the same who have upgrade problems. And as Gaetano pointed out
> >>correctly, we will see wildforms with one or the other feature applied.
> >>
> >>
> >
> >I'd believe that for patches of the size of my original VACUUM-delay
> >hack (or even a production-grade version of same, which'd probably be
> >10x larger).  The kind of wholesale rewrite you are currently proposing
> >is much too large to consider folding back into 7.4.*, IMHO.
> >
> >
> Do people think that the VACUUM-delay patch by itself, would be usefully
> enough on it's own to consider working it into 7.4.1 or something?  From
> the little feedback I have read on the VACUUM-delay patch used in
> isolation, it certainly does help.  I would love to see it put into 7.4
> somehow.
>
> The far more rigorous changes that Jan is working on, will be welcome
> improvements for 7.5.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>




Re: Performance features the 4th

From
Gaetano Mendola
Date:
Tom Lane wrote:

> Jan Wieck <JanWieck@Yahoo.com> writes:
> 
>>However, I have not seen much evidence yet that the vacuum delay alone 
>>does that much.
> 
> 
> Gaetano and a couple of other people did experiments that seemed to show
> it was useful.  I think we'd want to change the shape of the knob per
> later suggestions (sleep 10 ms every N blocks, instead of N ms every
> block) but it did seem that there was useful bang for little buck there.

Right, I'd like to try know the patch:  "sleep N ms every M blocks".
Can you please post this patch ?

BTW, I'll see if I'm able to apply it also to a 7.3.X ( our production
DB).

Regards
Gaetano Mendola



Re: Performance features the 4th

From
Jan Wieck
Date:
scott.marlowe wrote:

> On Fri, 7 Nov 2003, Matthew T. O'Connor wrote:
> 
>> ----- Original Message ----- 
>> From: "Jan Wieck" <JanWieck@Yahoo.com>
>> > Tom Lane wrote:
>> > > Gaetano and a couple of other people did experiments that seemed to show
>> > > it was useful.  I think we'd want to change the shape of the knob per
>> > > later suggestions (sleep 10 ms every N blocks, instead of N ms every
>> > > block) but it did seem that there was useful bang for little buck there.
>> >
>> > I thought it was "sleep N ms every M blocks".
>> >
>> > Have we seen any numbers? Anything at all? Something that gives us a
>> > clue by what factor one has to multiply the total time a "VACUUM
>> > ANALYZE" takes, to get what effect in return?
>> 
>> I have some time on sunday to do some testing.  Is there a patch that I can
>> apply that implements either of the two options? (sleep 10ms every M blocks
>> or sleep N ms every M blocks).
>> 
>> I know Tom posted the original patch that sleept N ms every 1 block (where N
>> is > 10 due to OS limitations).  Jan can you post a patch that has just the
>> sleep code in it? Or should it be easy enough for me to cull out of the
>> larger patch you posted?
> 
> The reason for the change is that the minumum sleep period on many systems 
> is 10mS, which meant that vacuum was running 20X slower than normal.  
> While it might be necessary in certain very I/O starved situations to make 
> it this slow, it would probably be better to be able to get a vacuum that 
> ran at about 1/2 to 1/5 speed for most folks.  So, since the delta can't 
> less than 10mS on most systems, it's better to just leave it at a fixed 
> amount and change the number of pages vacuumed per sleep.

I disagree with that. If you limit yourself to the number of pages being 
the only knob you have and set the napping time fixed, you can only 
lower the number of sequentially read pages to slow it down. Making read 
ahead absurd in an IO starved situation ...

I'll post a patch doing
    every N pages nap for M milliseconds

using two GUC variables and based on a select(2) call later.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: Performance features the 4th

From
Jan Wieck
Date:
Matthew T. O'Connor wrote:

> ----- Original Message -----
> From: "Jan Wieck" <JanWieck@Yahoo.com>
>> Tom Lane wrote:
>> > Gaetano and a couple of other people did experiments that seemed to show
>> > it was useful.  I think we'd want to change the shape of the knob per
>> > later suggestions (sleep 10 ms every N blocks, instead of N ms every
>> > block) but it did seem that there was useful bang for little buck there.
>>
>> I thought it was "sleep N ms every M blocks".
>>
>> Have we seen any numbers? Anything at all? Something that gives us a
>> clue by what factor one has to multiply the total time a "VACUUM
>> ANALYZE" takes, to get what effect in return?
>
> I have some time on sunday to do some testing.  Is there a patch that I can
> apply that implements either of the two options? (sleep 10ms every M blocks
> or sleep N ms every M blocks).
>
> I know Tom posted the original patch that sleept N ms every 1 block (where N
> is > 10 due to OS limitations).  Jan can you post a patch that has just the
> sleep code in it? Or should it be easy enough for me to cull out of the
> larger patch you posted?

Sorry for the delay, had to finish some other concept yesterday (will be
published soon).

The attached patch adds

     vacuum_group_delay_size = 10 (range 1-1000)
     vacuum_group_delay_msec = 0  (range 0-1000)

and does the sleeping via select(2). It does it only at the same places
where Tom had done the usleep() in his hack, so I guess there is still
some more to do besides the documentation, before it can be added to
7.4.1. But it should be enough to get some testing done.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #
Index: src/backend/access/nbtree/nbtree.c
===================================================================
RCS file: /home/pgsql/CvsRoot/pgsql-server/src/backend/access/nbtree/nbtree.c,v
retrieving revision 1.106
diff -c -b -r1.106 nbtree.c
*** src/backend/access/nbtree/nbtree.c    2003/09/29 23:40:26    1.106
--- src/backend/access/nbtree/nbtree.c    2003/11/09 23:39:36
***************
*** 27,32 ****
--- 27,40 ----
  #include "storage/smgr.h"


+ /*
+  * Variables for vacuum_group_delay option (in commands/vacuumlazy.c)
+  */
+ extern int    vacuum_group_delay_size;    /* vacuum N pages */
+ extern int    vacuum_group_delay_msec;    /* then sleep M msec */
+ extern int    vacuum_group_delay_count;
+
+
  /* Working state for btbuild and its callback */
  typedef struct
  {
***************
*** 610,615 ****
--- 618,632 ----

              CHECK_FOR_INTERRUPTS();

+             if (vacuum_group_delay_msec > 0)
+             {
+                 if (++vacuum_group_delay_count >= vacuum_group_delay_size)
+                 {
+                     PG_DELAY(vacuum_group_delay_msec);
+                     vacuum_group_delay_count = 0;
+                 }
+             }
+
              ndeletable = 0;
              page = BufferGetPage(buf);
              opaque = (BTPageOpaque) PageGetSpecialPointer(page);
***************
*** 736,741 ****
--- 753,769 ----
          Buffer        buf;
          Page        page;
          BTPageOpaque opaque;
+
+         CHECK_FOR_INTERRUPTS();
+
+         if (vacuum_group_delay_msec > 0)
+         {
+             if (++vacuum_group_delay_count >= vacuum_group_delay_size)
+             {
+                 PG_DELAY(vacuum_group_delay_msec);
+                 vacuum_group_delay_count = 0;
+             }
+         }

          buf = _bt_getbuf(rel, blkno, BT_READ);
          page = BufferGetPage(buf);
Index: src/backend/commands/vacuumlazy.c
===================================================================
RCS file: /home/pgsql/CvsRoot/pgsql-server/src/backend/commands/vacuumlazy.c,v
retrieving revision 1.32
diff -c -b -r1.32 vacuumlazy.c
*** src/backend/commands/vacuumlazy.c    2003/09/25 06:57:59    1.32
--- src/backend/commands/vacuumlazy.c    2003/11/09 23:40:13
***************
*** 88,93 ****
--- 88,100 ----
  static TransactionId OldestXmin;
  static TransactionId FreezeLimit;

+ /*
+  * Variables for vacuum_group_delay option (in commands/vacuumlazy.c)
+  */
+ int    vacuum_group_delay_size = 10;    /* vacuum N pages */
+ int    vacuum_group_delay_msec = 0;    /* then sleep M msec */
+ int    vacuum_group_delay_count = 0;
+

  /* non-export function prototypes */
  static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
***************
*** 228,233 ****
--- 235,249 ----

          CHECK_FOR_INTERRUPTS();

+         if (vacuum_group_delay_msec > 0)
+         {
+             if (++vacuum_group_delay_count >= vacuum_group_delay_size)
+             {
+                 PG_DELAY(vacuum_group_delay_msec);
+                 vacuum_group_delay_count = 0;
+             }
+         }
+
          /*
           * If we are close to overrunning the available space for
           * dead-tuple TIDs, pause and do a cycle of vacuuming before we
***************
*** 469,474 ****
--- 485,499 ----

          CHECK_FOR_INTERRUPTS();

+         if (vacuum_group_delay_msec > 0)
+         {
+             if (++vacuum_group_delay_count >= vacuum_group_delay_size)
+             {
+                 PG_DELAY(vacuum_group_delay_msec);
+                 vacuum_group_delay_count = 0;
+             }
+         }
+
          tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
          buf = ReadBuffer(onerel, tblk);
          LockBufferForCleanup(buf);
***************
*** 799,804 ****
--- 824,838 ----
                      hastup;

          CHECK_FOR_INTERRUPTS();
+
+         if (vacuum_group_delay_msec > 0)
+         {
+             if (++vacuum_group_delay_count >= vacuum_group_delay_size)
+             {
+                 PG_DELAY(vacuum_group_delay_msec);
+                 vacuum_group_delay_count = 0;
+             }
+         }

          blkno--;

Index: src/backend/utils/misc/guc.c
===================================================================
RCS file: /home/pgsql/CvsRoot/pgsql-server/src/backend/utils/misc/guc.c,v
retrieving revision 1.164.2.1
diff -c -b -r1.164.2.1 guc.c
*** src/backend/utils/misc/guc.c    2003/11/07 21:27:50    1.164.2.1
--- src/backend/utils/misc/guc.c    2003/11/09 23:27:49
***************
*** 73,78 ****
--- 73,80 ----
  extern int    CommitDelay;
  extern int    CommitSiblings;
  extern char *preload_libraries_string;
+ extern int    vacuum_group_delay_size;
+ extern int    vacuum_group_delay_msec;

  #ifdef HAVE_SYSLOG
  extern char *Syslog_facility;
***************
*** 1188,1193 ****
--- 1190,1213 ----
          },
          &log_min_duration_statement,
          -1, -1, INT_MAX / 1000, NULL, NULL
+     },
+
+     {
+         {"vacuum_group_delay_msec", PGC_USERSET, RESOURCES,
+             gettext_noop("Sets VACUUM's delay in milliseconds between processing groups of pages."),
+             NULL
+         },
+         &vacuum_group_delay_msec,
+         0, 0, 1000, NULL, NULL
+     },
+
+     {
+         {"vacuum_group_delay_size", PGC_USERSET, RESOURCES,
+             gettext_noop("Sets VACUUM's group size for the vacuum_group_delay_msec option."),
+             NULL
+         },
+         &vacuum_group_delay_size,
+         10, 1, 1000, NULL, NULL
      },

      /* End-of-list marker */
Index: src/backend/utils/misc/postgresql.conf.sample
===================================================================
RCS file: /home/pgsql/CvsRoot/pgsql-server/src/backend/utils/misc/postgresql.conf.sample,v
retrieving revision 1.92
diff -c -b -r1.92 postgresql.conf.sample
*** src/backend/utils/misc/postgresql.conf.sample    2003/10/08 03:49:38    1.92
--- src/backend/utils/misc/postgresql.conf.sample    2003/11/09 23:04:21
***************
*** 69,74 ****
--- 69,79 ----
  #max_files_per_process = 1000    # min 25
  #preload_libraries = ''

+ # - Vacuum napping -
+
+ #vacuum_group_delay_size = 10    # range 1-1000 pages ; vacuum this many pages
+ #vacuum_group_delay_msec = 0    # range 0-1000 msec  ; then nap this long
+

  #---------------------------------------------------------------------------
  # WRITE AHEAD LOG
Index: src/include/miscadmin.h
===================================================================
RCS file: /home/pgsql/CvsRoot/pgsql-server/src/include/miscadmin.h,v
retrieving revision 1.134
diff -c -b -r1.134 miscadmin.h
*** src/include/miscadmin.h    2003/09/24 18:54:01    1.134
--- src/include/miscadmin.h    2003/11/09 23:02:03
***************
*** 96,101 ****
--- 96,111 ----
          CritSectionCount--; \
      } while(0)

+ /*
+  * Macro using select(2) to nap for milliseconds
+  */
+ #define PG_DELAY(_msec) \
+ { \
+     struct timeval _delay; \
+     _delay.tv_sec  = (_msec) / 1000; \
+     _delay.tv_usec = ((_msec) % 1000) * 1000; \
+     (void) select(0, NULL, NULL, NULL, &_delay);\
+ }

  /*****************************************************************************
   *      globals.h --                                                             *

Re: Performance features the 4th

From
"scott.marlowe"
Date:
On Sun, 9 Nov 2003, Jan Wieck wrote:

> scott.marlowe wrote:
> 
> > On Fri, 7 Nov 2003, Matthew T. O'Connor wrote:
> > 
> >> ----- Original Message ----- 
> >> From: "Jan Wieck" <JanWieck@Yahoo.com>
> >> > Tom Lane wrote:
> >> > > Gaetano and a couple of other people did experiments that seemed to show
> >> > > it was useful.  I think we'd want to change the shape of the knob per
> >> > > later suggestions (sleep 10 ms every N blocks, instead of N ms every
> >> > > block) but it did seem that there was useful bang for little buck there.
> >> >
> >> > I thought it was "sleep N ms every M blocks".
> >> >
> >> > Have we seen any numbers? Anything at all? Something that gives us a
> >> > clue by what factor one has to multiply the total time a "VACUUM
> >> > ANALYZE" takes, to get what effect in return?
> >> 
> >> I have some time on sunday to do some testing.  Is there a patch that I can
> >> apply that implements either of the two options? (sleep 10ms every M blocks
> >> or sleep N ms every M blocks).
> >> 
> >> I know Tom posted the original patch that sleept N ms every 1 block (where N
> >> is > 10 due to OS limitations).  Jan can you post a patch that has just the
> >> sleep code in it? Or should it be easy enough for me to cull out of the
> >> larger patch you posted?
> > 
> > The reason for the change is that the minumum sleep period on many systems 
> > is 10mS, which meant that vacuum was running 20X slower than normal.  
> > While it might be necessary in certain very I/O starved situations to make 
> > it this slow, it would probably be better to be able to get a vacuum that 
> > ran at about 1/2 to 1/5 speed for most folks.  So, since the delta can't 
> > less than 10mS on most systems, it's better to just leave it at a fixed 
> > amount and change the number of pages vacuumed per sleep.
> 
> I disagree with that. If you limit yourself to the number of pages being 
> the only knob you have and set the napping time fixed, you can only 
> lower the number of sequentially read pages to slow it down. Making read 
> ahead absurd in an IO starved situation ...
> 
> I'll post a patch doing
> 
>      every N pages nap for M milliseconds
> 
> using two GUC variables and based on a select(2) call later.

I didn't mean "fixed in the code"  I meant in your setup.  I.e. find a 
delay (10mS, 50, 100 etc...) then vary the number of pages processed at a 
time until you start to notice the load, then back it off.

Not being forced by the code to have one and only one delay value, setting 
it yourself.