Thread: Re: [GENERAL] Autovacuum Improvements

Re: [GENERAL] Autovacuum Improvements

From
"Simon Riggs"
Date:
On Fri, 2006-12-29 at 20:25 -0300, Alvaro Herrera wrote:
> Christopher Browne wrote:
>
> > Seems to me that you could get ~80% of the way by having the simplest
> > "2 queue" implementation, where tables with size < some threshold get
> > thrown at the "little table" queue, and tables above that size go to
> > the "big table" queue.
> >
> > That should keep any small tables from getting "vacuum-starved."
>
> Hmm, would it make sense to keep 2 queues, one that goes through the
> tables in smaller-to-larger order, and the other one in the reverse
> direction?
>
> I am currently writing a design on how to create "vacuum queues" but I'm
> thinking that maybe it's getting too complex to handle, and a simple
> idea like yours is enough (given sufficient polish).

Sounds good to me. My colleague Pavan has just suggested multiple
autovacuums and then prototyped something almost as a side issue while
trying to solve other problems. I'll show him this entry, maybe he saw
it already? I wasn't following this discussion until now.

The 2 queue implementation seemed to me to be the most straightforward
implementation, mirroring Chris' suggestion. A few aspects that haven't
been mentioned are:
- if you have more than one VACUUM running, we'll need to watch memory
management. Having different queues based upon table size is a good way
of doing that, since the smaller queues have a naturally limited memory
consumption.
- with different size-based queues, the larger VACUUMs can be delayed so
they take much longer, while the small tables can go straight through

Some feedback from initial testing is that 2 queues probably isn't
enough. If you have tables with 100s of blocks and tables with millions
of blocks, the tables in the mid-range still lose out. So I'm thinking
that a design with 3 queues based upon size ranges, plus the idea that
when a queue is empty it will scan for tables slightly above/below its
normal range. That way we wouldn't need to specify the cut-offs with a
difficult to understand new set of GUC parameters, define them exactly
and then have them be wrong when databases grow.

The largest queue would be the one reserved for Xid wraparound
avoidance. No table would be eligible for more than one queue at a time,
though it might change between queues as it grows.

Alvaro, have you completed your design?

Pavan, what are your thoughts?

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: [GENERAL] Autovacuum Improvements

From
Alvaro Herrera
Date:
Simon Riggs wrote:

> Some feedback from initial testing is that 2 queues probably isn't
> enough. If you have tables with 100s of blocks and tables with millions
> of blocks, the tables in the mid-range still lose out. So I'm thinking
> that a design with 3 queues based upon size ranges, plus the idea that
> when a queue is empty it will scan for tables slightly above/below its
> normal range.

Yeah, eventually it occurred to me the fact that as soon as you have 2
queues, you may as well want to have 3 or in fact any number.  Which in
my proposal is very easily achieved.


> Alvaro, have you completed your design?

No, I haven't, and the part that's missing is precisely the queues
stuff.  I think I've been delaying posting it for too long, and that is
harmful because it makes other people waste time thinking on issues that
I may already have resolved, and delays the bashing that yet others will
surely inflict on my proposal, which is never a good thing ;-)  So maybe
I'll put in a stub about the "queues" stuff and see how people like the
whole thing.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [GENERAL] Autovacuum Improvements

From
"Simon Riggs"
Date:
On Fri, 2007-01-12 at 19:33 -0300, Alvaro Herrera wrote:

> > Alvaro, have you completed your design?
>
> No, I haven't, and the part that's missing is precisely the queues
> stuff.  I think I've been delaying posting it for too long, and that is
> harmful because it makes other people waste time thinking on issues that
> I may already have resolved, and delays the bashing that yet others will
> surely inflict on my proposal, which is never a good thing ;-)  So maybe
> I'll put in a stub about the "queues" stuff and see how people like the
> whole thing.

I've not read a word spoken against the general idea, so I think we
should pursue this actively for 8.3. It should be straightforward to
harvest the good ideas, though there will definitely be many.

Perhaps we should focus on the issues that might result, so that we
address those before we spend time on the details of the user interface.
Can we deadlock or hang from running multiple autovacuums?

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: [GENERAL] Autovacuum Improvements

From
Christopher Browne
Date:
In an attempt to throw the authorities off his trail, alvherre@commandprompt.com (Alvaro Herrera) transmitted:
> Simon Riggs wrote:
>
>> Some feedback from initial testing is that 2 queues probably isn't
>> enough. If you have tables with 100s of blocks and tables with
>> millions of blocks, the tables in the mid-range still lose out. So
>> I'm thinking that a design with 3 queues based upon size ranges,
>> plus the idea that when a queue is empty it will scan for tables
>> slightly above/below its normal range.
>
> Yeah, eventually it occurred to me the fact that as soon as you have
> 2 queues, you may as well want to have 3 or in fact any number.
> Which in my proposal is very easily achieved.

Adding an extra attribute to reflect a different ordering or a
different policy allows having as many queues in one queue table as
you might need.

>> Alvaro, have you completed your design?
>
> No, I haven't, and the part that's missing is precisely the queues
> stuff.  I think I've been delaying posting it for too long, and that
> is harmful because it makes other people waste time thinking on
> issues that I may already have resolved, and delays the bashing that
> yet others will surely inflict on my proposal, which is never a good
> thing ;-) So maybe I'll put in a stub about the "queues" stuff and
> see how people like the whole thing.

Seems like a good idea to me.

Implementing multiple queues amounts to having different worker
processes/threads that operate on the queue table using varying
policies.
-- 
output = reverse("gro.mca" "@" "enworbbc")
http://linuxdatabases.info/info/lisp.html
Rules of the  Evil Overlord #60. "My five-year-old  child advisor will
also  be asked to  decipher any  code I  am thinking  of using.  If he
breaks the code  in under 30 seconds, it will not  be used. Note: this
also applies to passwords." <http://www.eviloverlord.com/>


Re: [GENERAL] Autovacuum Improvements

From
Christopher Browne
Date:
In an attempt to throw the authorities off his trail, alvherre@commandprompt.com (Alvaro Herrera) transmitted:
> Simon Riggs wrote:
>
>> Some feedback from initial testing is that 2 queues probably isn't
>> enough. If you have tables with 100s of blocks and tables with
>> millions of blocks, the tables in the mid-range still lose out. So
>> I'm thinking that a design with 3 queues based upon size ranges,
>> plus the idea that when a queue is empty it will scan for tables
>> slightly above/below its normal range.
>
> Yeah, eventually it occurred to me the fact that as soon as you have
> 2 queues, you may as well want to have 3 or in fact any number.
> Which in my proposal is very easily achieved.

Adding an extra attribute to reflect a different ordering or a
different policy allows having as many queues in one queue table as
you might need.

>> Alvaro, have you completed your design?
>
> No, I haven't, and the part that's missing is precisely the queues
> stuff.  I think I've been delaying posting it for too long, and that
> is harmful because it makes other people waste time thinking on
> issues that I may already have resolved, and delays the bashing that
> yet others will surely inflict on my proposal, which is never a good
> thing ;-) So maybe I'll put in a stub about the "queues" stuff and
> see how people like the whole thing.

Seems like a good idea to me.

Implementing multiple queues amounts to having different worker
processes/threads that operate on the queue table using varying
policies.
-- 
output = reverse("gro.mca" "@" "enworbbc")
http://linuxdatabases.info/info/lisp.html
Rules of the  Evil Overlord #60. "My five-year-old  child advisor will
also  be asked to  decipher any  code I  am thinking  of using.  If he
breaks the code  in under 30 seconds, it will not  be used. Note: this
also applies to passwords." <http://www.eviloverlord.com/>


Re: [GENERAL] Autovacuum Improvements

From
elein
Date:
On Fri, Jan 12, 2007 at 07:33:05PM -0300, Alvaro Herrera wrote:
> Simon Riggs wrote:
>
> > Some feedback from initial testing is that 2 queues probably isn't
> > enough. If you have tables with 100s of blocks and tables with millions
> > of blocks, the tables in the mid-range still lose out. So I'm thinking
> > that a design with 3 queues based upon size ranges, plus the idea that
> > when a queue is empty it will scan for tables slightly above/below its
> > normal range.
>
> Yeah, eventually it occurred to me the fact that as soon as you have 2
> queues, you may as well want to have 3 or in fact any number.  Which in
> my proposal is very easily achieved.
>
>
> > Alvaro, have you completed your design?
>
> No, I haven't, and the part that's missing is precisely the queues
> stuff.  I think I've been delaying posting it for too long, and that is
> harmful because it makes other people waste time thinking on issues that
> I may already have resolved, and delays the bashing that yet others will
> surely inflict on my proposal, which is never a good thing ;-)  So maybe
> I'll put in a stub about the "queues" stuff and see how people like the
> whole thing.

Have you made any consideration of providing feedback on autovacuum to users?
Right now we don't even know what tables were vacuumed when and what was
reaped.  This might actually be another topic.

---elein
elein@varlena.com

Re: [GENERAL] Autovacuum Improvements

From
"Pavan Deolasee"
Date:
Simon Riggs wrote:
> On Fri, 2006-12-29 at 20:25 -0300, Alvaro Herrera wrote:
>> Christopher Browne wrote:
>>
>>> Seems to me that you could get ~80% of the way by having the simplest
>>> "2 queue" implementation, where tables with size < some threshold get
>>> thrown at the "little table" queue, and tables above that size go to
>>> the "big table" queue.
>>>
>>> That should keep any small tables from getting "vacuum-starved."
>

This is exectly what I am trying, two process autovacuum and a GUC to
seperate small tables.

In this case, one process takes up vacuuming of the small tables and
other process vacuuming of the remaining tables as well as Xid
avoidance related vacuuming. The goal is to avoid starvation of small
tables when a large table is being vacuumed (which may take
several hours) without adding too much complexity to the code.

>
> Some feedback from initial testing is that 2 queues probably isn't
> enough. If you have tables with 100s of blocks and tables with millions
> of blocks, the tables in the mid-range still lose out. So I'm thinking
> that a design with 3 queues based upon size ranges, plus the idea that
> when a queue is empty it will scan for tables slightly above/below its
> normal range. That way we wouldn't need to specify the cut-offs with a
> difficult to understand new set of GUC parameters, define them exactly
> and then have them be wrong when databases grow.
>
> The largest queue would be the one reserved for Xid wraparound
> avoidance. No table would be eligible for more than one queue at a time,
> though it might change between queues as it grows.
>
> Alvaro, have you completed your design?
>
> Pavan, what are your thoughts?
>

IMO 2-queue is a good step forward, but in long term we may need to go
for a multiprocess autovacuum where the number and tasks of processes
are either demand based and/or user configurable.

Another idea is to vacuum the tables in round-robin fashion
where the quantum could be either "time" or "number of block". The
autovacuum process would vacuum 'x' blocks of one table and then
schedule next table in the queue. This would avoid starvation of
small tables, though cost of index cleanup might go up because of
increased IO. Any thoughts of this approach ?

Thanks,
Pavan





Re: [GENERAL] Autovacuum Improvements

From
"Pavan Deolasee"
Date:
Simon Riggs wrote:
 > On Fri, 2006-12-29 at 20:25 -0300, Alvaro Herrera wrote:
 >> Christopher Browne wrote:
 >>
 >>> Seems to me that you could get ~80% of the way by having the simplest
 >>> "2 queue" implementation, where tables with size < some threshold get
 >>> thrown at the "little table" queue, and tables above that size go to
 >>> the "big table" queue.
 >>>
 >>> That should keep any small tables from getting "vacuum-starved."
 >

This is exectly what I am trying, two process autovacuum and a GUC to
seperate small tables.

In this case, one process takes up vacuuming of the small tables and
other process vacuuming of the remaining tables as well as Xid
avoidance related vacuuming. The goal is to avoid starvation of small
tables when a large table is being vacuumed (which may take
several hours) without adding too much complexity to the code.

 >
 > Some feedback from initial testing is that 2 queues probably isn't
 > enough. If you have tables with 100s of blocks and tables with millions
 > of blocks, the tables in the mid-range still lose out. So I'm thinking
 > that a design with 3 queues based upon size ranges, plus the idea that
 > when a queue is empty it will scan for tables slightly above/below its
 > normal range. That way we wouldn't need to specify the cut-offs with a
 > difficult to understand new set of GUC parameters, define them exactly
 > and then have them be wrong when databases grow.
 >
 > The largest queue would be the one reserved for Xid wraparound
 > avoidance. No table would be eligible for more than one queue at a time,
 > though it might change between queues as it grows.
 >
 > Alvaro, have you completed your design?
 >
 > Pavan, what are your thoughts?
 >

IMO 2-queue is a good step forward, but in long term we may need to go
for a multiprocess autovacuum where the number and tasks of processes
are either demand based and/or user configurable.

Another idea is to vacuum the tables in round-robin fashion
where the quantum could be either "time" or "number of block". The
autovacuum process would vacuum 'x' blocks of one table and then
schedule next table in the queue. This would avoid starvation of
small tables, though cost of index cleanup might go up because of
increased IO. Any thoughts of this approach ?

Thanks,
Pavan




Re: [GENERAL] Autovacuum Improvements

From
Alban Hertroys
Date:
Pavan Deolasee wrote:
> Simon Riggs wrote:
>> On Fri, 2006-12-29 at 20:25 -0300, Alvaro Herrera wrote:
>>> Christopher Browne wrote:
>>>
>>>> Seems to me that you could get ~80% of the way by having the simplest
>>>> "2 queue" implementation, where tables with size < some threshold get
>>>> thrown at the "little table" queue, and tables above that size go to
>>>> the "big table" queue.
>>>>
>>>> That should keep any small tables from getting "vacuum-starved."
>>
>
> This is exectly what I am trying, two process autovacuum and a GUC to
> seperate small tables.
>
> In this case, one process takes up vacuuming of the small tables and
> other process vacuuming of the remaining tables as well as Xid
> avoidance related vacuuming. The goal is to avoid starvation of small
> tables when a large table is being vacuumed (which may take
> several hours) without adding too much complexity to the code.

Would it work to make the queues push the treshold into the direction of
the still running queue if the other queue finishes before the still
running one? This would achieve some kind of auto-tuning, but that is
usually tricky.

For example, what if one of the queues got stuck on a lock?

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: [GENERAL] Autovacuum Improvements

From
Alvaro Herrera
Date:
Simon Riggs wrote:

> Perhaps we should focus on the issues that might result, so that we
> address those before we spend time on the details of the user interface.
> Can we deadlock or hang from running multiple autovacuums?

If you were to run multiple autovacuum processes the way they are today,
maybe.  But that's not my intention -- the launcher would be the only
one to read the catalogs; the workers would be started only to do a
single VACUUM job.  This reduces the risk of this kind of problems.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [GENERAL] Autovacuum Improvements

From
Alvaro Herrera
Date:
elein wrote:

> Have you made any consideration of providing feedback on autovacuum to users?
> Right now we don't even know what tables were vacuumed when and what was
> reaped.  This might actually be another topic.

I'd like to hear other people's opinions on Darcy Buskermolen proposal
to have a log table, on which we'd register what did we run, at what
time, how long did it last, how many tuples did it clean, etc.  I feel
having it on the regular text log is useful but it's not good enough.
Keep in mind that in the future we may want to peek at that collected
information to be able to take better scheduling decisions (or at least
inform the DBA that he sucks).

Now, I'd like this to be a VACUUM thing, not autovacuum.  That means
that manually-run vacuums would be logged as well.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [GENERAL] Autovacuum Improvements

From
Darcy Buskermolen
Date:
On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote:
> elein wrote:
> > Have you made any consideration of providing feedback on autovacuum to
> > users? Right now we don't even know what tables were vacuumed when and
> > what was reaped.  This might actually be another topic.
>
> I'd like to hear other people's opinions on Darcy Buskermolen proposal
> to have a log table, on which we'd register what did we run, at what
> time, how long did it last, how many tuples did it clean, etc.  I feel
> having it on the regular text log is useful but it's not good enough.
> Keep in mind that in the future we may want to peek at that collected
> information to be able to take better scheduling decisions (or at least
> inform the DBA that he sucks).
>
> Now, I'd like this to be a VACUUM thing, not autovacuum.  That means
> that manually-run vacuums would be logged as well.

Yes I did intend this thought for vacuum, not strictly autovacuum.

Re: [GENERAL] Autovacuum Improvements

From
"Matthew T. O'Connor"
Date:
Alvaro Herrera wrote:
> I'd like to hear other people's opinions on Darcy Buskermolen proposal
> to have a log table, on which we'd register what did we run, at what
> time, how long did it last, how many tuples did it clean, etc.  I feel
> having it on the regular text log is useful but it's not good enough.
> Keep in mind that in the future we may want to peek at that collected
> information to be able to take better scheduling decisions (or at least
> inform the DBA that he sucks).

I'm not familiar with his proposal, but I'm not sure what I think of
logging vacuum (and perhaps analyze) commands to a table.  We have never
logged anything to tables inside PG.  I would be worried about this
eating a lot of space in some situations.

I think most people would just be happy if we could get autovacuum to
log it's actions at a much higher log level.  I think that "autovacuum
vacuumed table x" is important and shouldn't be all the way down at the
debug level.

The other (more involved) solution to this problem was proposed which
was create a separate set of logging control params for autovacuum so
that you can turn it up or down independent of the general server logging.

> Now, I'd like this to be a VACUUM thing, not autovacuum.  That means
> that manually-run vacuums would be logged as well.

+1


Re: [GENERAL] Autovacuum Improvements

From
Ron Mayer
Date:
Matthew T. O'Connor wrote:
> Alvaro Herrera wrote:
>> I'd like to hear other people's opinions on Darcy Buskermolen proposal
>> to have a log table, on which we'd register what did we run, at what
>> time, how long did it last, [...] 
> 
> I think most people would just be happy if we could get autovacuum to
> log it's actions at a much higher log level.  I think that "autovacuum
> vacuumed table x" is important and shouldn't be all the way down at the
> debug level.

+1 here.  Even more than "autovacuum vacuumed table x", I'd like to see
"vacuum starting table x" and "vacuum done table x".  The reason I say
that is because the speculation "autovacuum might have been running"
is now a frequent phrase I hear when performance questions are asked.
If vacuum start and end times were logged at a much earlier level,
that feature plus log_min_duration_statement could easily disprove
the "vacuum might have been running" hypothesis.


Re: [GENERAL] Autovacuum Improvements

From
"Simon Riggs"
Date:
On Tue, 2007-01-16 at 07:16 -0800, Darcy Buskermolen wrote:
> On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote:
> > elein wrote:
> > > Have you made any consideration of providing feedback on autovacuum to
> > > users? Right now we don't even know what tables were vacuumed when and
> > > what was reaped.  This might actually be another topic.
> >
> > I'd like to hear other people's opinions on Darcy Buskermolen proposal
> > to have a log table, on which we'd register what did we run, at what
> > time, how long did it last, how many tuples did it clean, etc.  I feel
> > having it on the regular text log is useful but it's not good enough.
> > Keep in mind that in the future we may want to peek at that collected
> > information to be able to take better scheduling decisions (or at least
> > inform the DBA that he sucks).
> >
> > Now, I'd like this to be a VACUUM thing, not autovacuum.  That means
> > that manually-run vacuums would be logged as well.
>
> Yes I did intend this thought for vacuum, not strictly autovacuum.

I agree, for all VACUUMs: we need a log table.

The only way we can get a feedback loop on what has come before is by
remembering what happened. Simply logging it is interesting, but not
enough.

There is some complexity there, because with many applications a small
table gets VACUUMed every few minutes, so the log table would become a
frequently updated table itself. I'd also suggest that we might want to
take account of the number of tuples removed by btree pre-split VACUUMs
also.

I also like the idea of a single scheduler and multiple child workers.

The basic architecture is clear and obviously beneficial. What worries
me is how the scheduler will work; there seems to be as many ideas as we
have hackers. I'm wondering if we should provide the facility of a
pluggable scheduler? That way you'd be able to fine tune the schedule to
both the application and to the business requirements. That would allow
integration with external workflow engines and job schedulers, for when
VACUUMs need to not-conflict with external events.

If no scheduler has been defined, just use a fairly simple default.

The three main questions are
- what is the maximum size of VACUUM that can start *now*
- can *this* VACUUM start now?
- which is the next VACUUM to run?

If we have an API that allows those 3 questions to be asked, then a
scheduler plug-in could supply the answers. That way any complex
application rules (table A is available for VACUUM now for next 60 mins,
table B is in constant use so we must use vacuum_delay), external events
(long running reports have now finished, OK to VACUUM), time-based rules
(e.g. first Sunday of the month 00:00 - 04:00 is scheduled downtime,
first 3 days of the each month is financial accounting close) can be
specified.


--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: [GENERAL] Autovacuum Improvements

From
Darcy Buskermolen
Date:
On Friday 19 January 2007 01:47, Simon Riggs wrote:
> On Tue, 2007-01-16 at 07:16 -0800, Darcy Buskermolen wrote:
> > On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote:
> > > elein wrote:
> > > > Have you made any consideration of providing feedback on autovacuum
> > > > to users? Right now we don't even know what tables were vacuumed when
> > > > and what was reaped.  This might actually be another topic.
> > >
> > > I'd like to hear other people's opinions on Darcy Buskermolen proposal
> > > to have a log table, on which we'd register what did we run, at what
> > > time, how long did it last, how many tuples did it clean, etc.  I feel
> > > having it on the regular text log is useful but it's not good enough.
> > > Keep in mind that in the future we may want to peek at that collected
> > > information to be able to take better scheduling decisions (or at least
> > > inform the DBA that he sucks).
> > >
> > > Now, I'd like this to be a VACUUM thing, not autovacuum.  That means
> > > that manually-run vacuums would be logged as well.
> >
> > Yes I did intend this thought for vacuum, not strictly autovacuum.
>
> I agree, for all VACUUMs: we need a log table.
>
> The only way we can get a feedback loop on what has come before is by
> remembering what happened. Simply logging it is interesting, but not
> enough.

Correct, I think we are all saying the same thing that is this log table is
purely inserts so that we can see trends over time.

>
> There is some complexity there, because with many applications a small
> table gets VACUUMed every few minutes, so the log table would become a
> frequently updated table itself. I'd also suggest that we might want to
> take account of the number of tuples removed by btree pre-split VACUUMs
> also.

Thinking on this a bit more, I suppose that this table really should allow for
user defined triggers on it, so that a DBA can create partioning for it, not
to mention being able to move it off into it's own tablespace.


>
> I also like the idea of a single scheduler and multiple child workers.
>
> The basic architecture is clear and obviously beneficial. What worries
> me is how the scheduler will work; there seems to be as many ideas as we
> have hackers. I'm wondering if we should provide the facility of a
> pluggable scheduler? That way you'd be able to fine tune the schedule to
> both the application and to the business requirements. That would allow
> integration with external workflow engines and job schedulers, for when
> VACUUMs need to not-conflict with external events.
>
> If no scheduler has been defined, just use a fairly simple default.
>
> The three main questions are
> - what is the maximum size of VACUUM that can start *now*

How can we determine this given we have no real knowledge of the upcoming
adverse IO conditions ?

> - can *this* VACUUM start now?
> - which is the next VACUUM to run?
>
> If we have an API that allows those 3 questions to be asked, then a
> scheduler plug-in could supply the answers. That way any complex
> application rules (table A is available for VACUUM now for next 60 mins,
> table B is in constant use so we must use vacuum_delay), external events
> (long running reports have now finished, OK to VACUUM), time-based rules
> (e.g. first Sunday of the month 00:00 - 04:00 is scheduled downtime,
> first 3 days of the each month is financial accounting close) can be
> specified.

Another thought, is it at all possible to do a partial vacuum?  ie spend the
next 30 minutes vacuuming foo table, and update the fsm with what hew have
learned over the 30 mins, even if we have not done a full table scan ?


--


Darcy Buskermolen
The PostgreSQL company, Command Prompt Inc.

Re: [GENERAL] Autovacuum Improvements

From
Bruce Momjian
Date:
Added to TODO:

>       o Allow multiple vacuums so large tables do not starve small
>         tables
>
>         http://archives.postgresql.org/pgsql-general/2007-01/msg00031.php
>
>       o Improve control of auto-vacuum
>
>         http://archives.postgresql.org/pgsql-hackers/2006-12/msg00876.php


---------------------------------------------------------------------------

Darcy Buskermolen wrote:
> On Friday 19 January 2007 01:47, Simon Riggs wrote:
> > On Tue, 2007-01-16 at 07:16 -0800, Darcy Buskermolen wrote:
> > > On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote:
> > > > elein wrote:
> > > > > Have you made any consideration of providing feedback on autovacuum
> > > > > to users? Right now we don't even know what tables were vacuumed when
> > > > > and what was reaped.  This might actually be another topic.
> > > >
> > > > I'd like to hear other people's opinions on Darcy Buskermolen proposal
> > > > to have a log table, on which we'd register what did we run, at what
> > > > time, how long did it last, how many tuples did it clean, etc.  I feel
> > > > having it on the regular text log is useful but it's not good enough.
> > > > Keep in mind that in the future we may want to peek at that collected
> > > > information to be able to take better scheduling decisions (or at least
> > > > inform the DBA that he sucks).
> > > >
> > > > Now, I'd like this to be a VACUUM thing, not autovacuum.  That means
> > > > that manually-run vacuums would be logged as well.
> > >
> > > Yes I did intend this thought for vacuum, not strictly autovacuum.
> >
> > I agree, for all VACUUMs: we need a log table.
> >
> > The only way we can get a feedback loop on what has come before is by
> > remembering what happened. Simply logging it is interesting, but not
> > enough.
>
> Correct, I think we are all saying the same thing that is this log table is
> purely inserts so that we can see trends over time.
>
> >
> > There is some complexity there, because with many applications a small
> > table gets VACUUMed every few minutes, so the log table would become a
> > frequently updated table itself. I'd also suggest that we might want to
> > take account of the number of tuples removed by btree pre-split VACUUMs
> > also.
>
> Thinking on this a bit more, I suppose that this table really should allow for
> user defined triggers on it, so that a DBA can create partioning for it, not
> to mention being able to move it off into it's own tablespace.
>
>
> >
> > I also like the idea of a single scheduler and multiple child workers.
> >
> > The basic architecture is clear and obviously beneficial. What worries
> > me is how the scheduler will work; there seems to be as many ideas as we
> > have hackers. I'm wondering if we should provide the facility of a
> > pluggable scheduler? That way you'd be able to fine tune the schedule to
> > both the application and to the business requirements. That would allow
> > integration with external workflow engines and job schedulers, for when
> > VACUUMs need to not-conflict with external events.
> >
> > If no scheduler has been defined, just use a fairly simple default.
> >
> > The three main questions are
> > - what is the maximum size of VACUUM that can start *now*
>
> How can we determine this given we have no real knowledge of the upcoming
> adverse IO conditions ?
>
> > - can *this* VACUUM start now?
> > - which is the next VACUUM to run?
> >
> > If we have an API that allows those 3 questions to be asked, then a
> > scheduler plug-in could supply the answers. That way any complex
> > application rules (table A is available for VACUUM now for next 60 mins,
> > table B is in constant use so we must use vacuum_delay), external events
> > (long running reports have now finished, OK to VACUUM), time-based rules
> > (e.g. first Sunday of the month 00:00 - 04:00 is scheduled downtime,
> > first 3 days of the each month is financial accounting close) can be
> > specified.
>
> Another thought, is it at all possible to do a partial vacuum?  ie spend the
> next 30 minutes vacuuming foo table, and update the fsm with what hew have
> learned over the 30 mins, even if we have not done a full table scan ?
>
>
> --
>
>
> Darcy Buskermolen
> The PostgreSQL company, Command Prompt Inc.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [GENERAL] Autovacuum Improvements

From
Russell Smith
Date:
Darcy Buskermolen wrote:
[snip]
Another thought, is it at all possible to do a partial vacuum?  ie spend the 
next 30 minutes vacuuming foo table, and update the fsm with what hew have 
learned over the 30 mins, even if we have not done a full table scan ? 
There was a proposal for this, but it was dropped on 2 grounds.
1. partial vacuum would mean that parts of the table are missed, the user could never vacuum certain parts and transaction wraparound would get you.  You may also have other performance issues as you forgot certian parts of the table

2. Index cleanup is the most expensive part of vacuum.  So doing a partial vacuum actually means more I/O as you have to do index cleanup more often.

If we are talking about autovacuum, 1 doesn't become so much of an issue as you just make the autovacuum remember what parts of the table it's vacuumed.  This really has great power when you have a dead space map.

Item 2 will still be an issue.  But if you define "partial" as either fill maintenance_work_mem, or finish the table, you are not increasing I/O at all.  As when maintenance work mem is full, you have to cleanup all the indexes anyway.  This is really more like VACUUM SINGLE, but the same principal applies.

I believe all planning really needs to think about how a dead space map will effect what vacuum is going to be doing in the future.


Strange idea that I haven't researched,  Given Vacuum can't be run in a transaction, it is possible at a certain point to quit the current transaction and start another one.  There has been much chat and now a TODO item about allowing multiple vacuums to not starve small tables.  But if a big table has a long running vacuum the vacuum of the small table won't be effective anyway will it?  If vacuum of a big table was done in multiple transactions you could reduce the effect of long running vacuum.  I'm not sure how this effects the rest of the system thought.

Russell Smith
 

Re: [GENERAL] Autovacuum Improvements

From
Heikki Linnakangas
Date:
Russell Smith wrote:
> Strange idea that I haven't researched,  Given Vacuum can't be run in a
> transaction, it is possible at a certain point to quit the current
> transaction and start another one.  There has been much chat and now a
> TODO item about allowing multiple vacuums to not starve small tables.
> But if a big table has a long running vacuum the vacuum of the small
> table won't be effective anyway will it?  If vacuum of a big table was
> done in multiple transactions you could reduce the effect of long
> running vacuum.  I'm not sure how this effects the rest of the system
> thought.

That was fixed by Hannu Krosing's patch in 8.2 that made vacuum to
ignore other vacuums in the oldest xmin calculation.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: [GENERAL] Autovacuum Improvements

From
"Simon Riggs"
Date:
On Sat, 2007-01-20 at 09:41 +1100, Russell Smith wrote:
> Darcy Buskermolen wrote:
> > [snip]
> >
> > Another thought, is it at all possible to do a partial vacuum?  ie spend the
> > next 30 minutes vacuuming foo table, and update the fsm with what hew have
> > learned over the 30 mins, even if we have not done a full table scan ?
> >
> There was a proposal for this, but it was dropped on 2 grounds.
> 1. partial vacuum would mean that parts of the table are missed, the
> user could never vacuum certain parts and transaction wraparound would
> get you.  You may also have other performance issues as you forgot
> certian parts of the table

Partial vacuum would still be possible if you remembered where you got
to in the VACUUM and then started from that same point next time. It
could then go to the end of the table and wrap back around.

> 2. Index cleanup is the most expensive part of vacuum.  So doing a
> partial vacuum actually means more I/O as you have to do index cleanup
> more often.

Again, not necessarily. A large VACUUM can currently perform more than
one set of index scans, so if you chose the right stopping place for a
partial VACUUM you need never incur any additional work. It might even
save effort in the long run.

I'm not necessarily advocating partial VACUUM, just pointing out that
the problems you raise need not be barriers to implementation, should
that be considered worthwhile.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: [GENERAL] Autovacuum Improvements

From
Martijn van Oosterhout
Date:
On Sun, Jan 21, 2007 at 12:24:38PM +0000, Simon Riggs wrote:
> Partial vacuum would still be possible if you remembered where you got
> to in the VACUUM and then started from that same point next time. It
> could then go to the end of the table and wrap back around.

ISTM the Dead Space Map would give you this automatically, since that's
your memory... Once you have the DSM to track where the dead pages are,
you can set it up to target clusters first, thus giving maximum bang
for buck.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: [GENERAL] Autovacuum Improvements

From
"Jim C. Nasby"
Date:
On Sun, Jan 21, 2007 at 11:39:45AM +0000, Heikki Linnakangas wrote:
> Russell Smith wrote:
> >Strange idea that I haven't researched,  Given Vacuum can't be run in a
> >transaction, it is possible at a certain point to quit the current
> >transaction and start another one.  There has been much chat and now a
> >TODO item about allowing multiple vacuums to not starve small tables.
> >But if a big table has a long running vacuum the vacuum of the small
> >table won't be effective anyway will it?  If vacuum of a big table was
> >done in multiple transactions you could reduce the effect of long
> >running vacuum.  I'm not sure how this effects the rest of the system
> >thought.
>
> That was fixed by Hannu Krosing's patch in 8.2 that made vacuum to
> ignore other vacuums in the oldest xmin calculation.

And IIRC in 8.1 every time vacuum finishes a pass over the indexes it
will commit and start a new transaction. That's still useful even with
Hannu's patch in case you start a vacuum with maintenance_work_mem too
small; you can abort the vacuum some time later and at least some of the
work it's done will get committed.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: [GENERAL] Autovacuum Improvements

From
"Simon Riggs"
Date:
On Sun, 2007-01-21 at 14:26 -0600, Jim C. Nasby wrote:
> On Sun, Jan 21, 2007 at 11:39:45AM +0000, Heikki Linnakangas wrote:
> > Russell Smith wrote:
> > >Strange idea that I haven't researched,  Given Vacuum can't be run in a
> > >transaction, it is possible at a certain point to quit the current
> > >transaction and start another one.  There has been much chat and now a
> > >TODO item about allowing multiple vacuums to not starve small tables.
> > >But if a big table has a long running vacuum the vacuum of the small
> > >table won't be effective anyway will it?  If vacuum of a big table was
> > >done in multiple transactions you could reduce the effect of long
> > >running vacuum.  I'm not sure how this effects the rest of the system
> > >thought.
> >
> > That was fixed by Hannu Krosing's patch in 8.2 that made vacuum to
> > ignore other vacuums in the oldest xmin calculation.
>
> And IIRC in 8.1 every time vacuum finishes a pass over the indexes it
> will commit and start a new transaction.

err...It doesn't do this now and IIRC didn't do that in 8.1 either.

> That's still useful even with
> Hannu's patch in case you start a vacuum with maintenance_work_mem too
> small; you can abort the vacuum some time later and at least some of the
> work it's done will get committed.

True, but not recommended, though for a variety of reasons.

The reason is not intermediate commits, but just that the work of VACUUM
is mostly non-transactional in nature, apart from the various catalog
entries when it completes.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



STOP all user access except for admin for a few minutes?

From
Date:
Hi there,

Is is possible to stop all user access to postgres, but still give access to 
admin?
Just temporarily, not a security setup.

Something like, stop all users but allow user x and y.

thx 



Re: STOP all user access except for admin for a few minutes?

From
Russell Smith
Date:
org@kewlstuff.co.za wrote:
> Hi there,
>
> Is is possible to stop all user access to postgres, but still give 
> access to admin?
> Just temporarily, not a security setup.
>
> Something like, stop all users but allow user x and y.
You could restart in single user mode, or alter pg_hba.conf to allow the 
users you want and disallow all other users.

single user mode will require you have direct access to the machine to 
do the alterations.

using pg_hba.conf will not disconnect existing users as far as I'm aware.

That's the best advice I can offer, maybe somebody else will be able to 
give you more
>
> thx
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>
>



Re: STOP all user access except for admin for a few minutes?

From
Peter Eisentraut
Date:
Am Montag, 22. Januar 2007 10:32 schrieb org@kewlstuff.co.za:
> Is is possible to stop all user access to postgres, but still give access
> to admin?

Make the appropriate adjustments to pg_hba.conf.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: [GENERAL] Autovacuum Improvements

From
Heikki Linnakangas
Date:
Russell Smith wrote:
> 2. Index cleanup is the most expensive part of vacuum.  So doing a
> partial vacuum actually means more I/O as you have to do index cleanup
> more often.

I don't think that's usually the case. Index(es) are typically only a
fraction of the size of the table, and since 8.2 we do index vacuums in
a single scan in physical order. In fact, in many applications the index
is be mostly cached and the index scan doesn't generate any I/O at all.

I believe the heap scans are the biggest issue at the moment.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: [GENERAL] Autovacuum Improvements

From
Bruce Momjian
Date:
Heikki Linnakangas wrote:
> Russell Smith wrote:
> > 2. Index cleanup is the most expensive part of vacuum.  So doing a
> > partial vacuum actually means more I/O as you have to do index cleanup
> > more often.
>
> I don't think that's usually the case. Index(es) are typically only a
> fraction of the size of the table, and since 8.2 we do index vacuums in
> a single scan in physical order. In fact, in many applications the index
> is be mostly cached and the index scan doesn't generate any I/O at all.

Are _all_ the indexes cached?  I would doubt that.  Also, for typical
table, what percentage is the size of all indexes combined?

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [GENERAL] Autovacuum Improvements

From
Heikki Linnakangas
Date:
Bruce Momjian wrote:
> Heikki Linnakangas wrote:
>> Russell Smith wrote:
>>> 2. Index cleanup is the most expensive part of vacuum.  So doing a
>>> partial vacuum actually means more I/O as you have to do index cleanup
>>> more often.
>> I don't think that's usually the case. Index(es) are typically only a
>> fraction of the size of the table, and since 8.2 we do index vacuums in
>> a single scan in physical order. In fact, in many applications the index
>> is be mostly cached and the index scan doesn't generate any I/O at all.
>
> Are _all_ the indexes cached?  I would doubt that.

Well, depends on your schema, of course. In many applications, yes.

>  Also, for typical
> table, what percentage is the size of all indexes combined?

Well, there's no such thing as a typical table. As an anecdote here's
the ratios (total size of all indexes of a table)/(size of corresponding
heap) for the bigger tables for a DBT-2 run I have at hand:

Stock:        1190470/68550 = 6%
Order_line:    950103/274372 = 29%
Customer:    629011 /(5711+20567) = 8%

In any case, for the statement "Index cleanup is the most expensive part
of vacuum" to be true, you're indexes would have to take up 2x as much
space as the heap, since the heap is scanned twice. I'm sure there's
databases like that out there, but I don't think it's the common case.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: [GENERAL] Autovacuum Improvements

From
Bruce Momjian
Date:
Heikki Linnakangas wrote:
> Bruce Momjian wrote:
> > Heikki Linnakangas wrote:
> >> Russell Smith wrote:
> >>> 2. Index cleanup is the most expensive part of vacuum.  So doing a
> >>> partial vacuum actually means more I/O as you have to do index cleanup
> >>> more often.
> >> I don't think that's usually the case. Index(es) are typically only a
> >> fraction of the size of the table, and since 8.2 we do index vacuums in
> >> a single scan in physical order. In fact, in many applications the index
> >> is be mostly cached and the index scan doesn't generate any I/O at all.
> >
> > Are _all_ the indexes cached?  I would doubt that.
>
> Well, depends on your schema, of course. In many applications, yes.
>
> >  Also, for typical
> > table, what percentage is the size of all indexes combined?
>
> Well, there's no such thing as a typical table. As an anecdote here's
> the ratios (total size of all indexes of a table)/(size of corresponding
> heap) for the bigger tables for a DBT-2 run I have at hand:
>
> Stock:        1190470/68550 = 6%
> Order_line:    950103/274372 = 29%
> Customer:    629011 /(5711+20567) = 8%
>
> In any case, for the statement "Index cleanup is the most expensive part
> of vacuum" to be true, you're indexes would have to take up 2x as much
> space as the heap, since the heap is scanned twice. I'm sure there's
> databases like that out there, but I don't think it's the common case.

I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
out how small, and it seems about 15% of the total table, which means if
we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
80%, assuming 5% of the table is scanned.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [GENERAL] Autovacuum Improvements

From
"Simon Riggs"
Date:
On Mon, 2007-01-22 at 12:18 -0500, Bruce Momjian wrote:
> Heikki Linnakangas wrote:
> >
> > In any case, for the statement "Index cleanup is the most expensive part
> > of vacuum" to be true, you're indexes would have to take up 2x as much
> > space as the heap, since the heap is scanned twice. I'm sure there's
> > databases like that out there, but I don't think it's the common case.
>
> I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
> out how small, and it seems about 15% of the total table, which means if
> we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
> 80%, assuming 5% of the table is scanned.

Clearly keeping track of what needs vacuuming will lead to a more
efficient VACUUM. Your math applies to *any* design that uses some form
of book-keeping to focus in on the hot spots.

On a separate thread, Heikki has raised a different idea for VACUUM.

Heikki's idea asks an important question: where and how should DSM
information be maintained? Up to now everybody has assumed that it would
be maintained when DML took place and that the DSM would be a
transactional data structure (i.e. on-disk). Heikki's idea requires
similar bookkeeping requirements to the original DSM concept, but the
interesting aspect is that the DSM information is collected off-line,
rather than being an overhead on every statement's response time.

That idea seems extremely valuable to me.

One of the main challenges is how we cope with large tables that have a
very fine spray of updates against them. A DSM bitmap won't help with
that situation, regrettably.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: [GENERAL] Autovacuum Improvements

From
Gregory Stark
Date:
"Bruce Momjian" <bruce@momjian.us> writes:

> I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
> out how small, and it seems about 15% of the total table, which means if
> we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
> 80%, assuming 5% of the table is scanned.

Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%.

That is, if you assum that only 5% of the table will be scanned and you
arrange to do it sequentially then you should expect the i/o to be marginally
faster than just reading the entire table. Vacuum does do some cpu work and
wouldn't have to consult the clog as often, so it would still be somewhat
faster.

The theory online was that as long as you're reading one page from each disk
track you're going to pay the same seek overhead as reading the entire track.
I also had some theories involving linux being confused by the seeks and
turning off read-ahead but I could never prove them.

In short, to see big benefits you would have to have a much smaller percentage
of the table being read. That shouldn't be taken to mean that the DSM is a
loser. There are plenty of use cases where tables can be extremely large and
have only very small percentages that are busy. The big advantage of the DSM
is that it takes the size of the table out of the equation and replaces it
with the size of the busy portion of the table. So updating a single record in
a terabyte table has the same costs as updating a single record in a kilobyte
table.

Sadly that's not quite true due to indexes, and due to the size of the bitmap
itself. But going back to your numbers it does mean that if you update a
single row out of a terabyte table then we'll be removing about 85% of the i/o
(minus the i/o needed to read the DSM, about .025%). If you update about 1%
then you would be removing substantially less, and once you get to about 10%
then you're back where you started.


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: [GENERAL] Autovacuum Improvements

From
Bruce Momjian
Date:
Yep, agreed on the random I/O issue.  The larger question is if you have
a huge table, do you care to reclaim 3% of the table size, rather than
just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
to take a lot of time, but vacuuming to relaim 3% three times seems like
it is going to be more expensive than just vacuuming the 10% once.  And
vacuuming to reclaim 1% ten times seems even more expensive.  The
partial vacuum idea is starting to look like a loser to me again.

---------------------------------------------------------------------------

Gregory Stark wrote:
> "Bruce Momjian" <bruce@momjian.us> writes:
>
> > I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
> > out how small, and it seems about 15% of the total table, which means if
> > we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
> > 80%, assuming 5% of the table is scanned.
>
> Actually no. A while back I did experiments to see how fast reading a file
> sequentially was compared to reading the same file sequentially but skipping
> x% of the blocks randomly. The results were surprising (to me) and depressing.
> The breakeven point was about 7%.
>
> That is, if you assum that only 5% of the table will be scanned and you
> arrange to do it sequentially then you should expect the i/o to be marginally
> faster than just reading the entire table. Vacuum does do some cpu work and
> wouldn't have to consult the clog as often, so it would still be somewhat
> faster.
>
> The theory online was that as long as you're reading one page from each disk
> track you're going to pay the same seek overhead as reading the entire track.
> I also had some theories involving linux being confused by the seeks and
> turning off read-ahead but I could never prove them.
>
> In short, to see big benefits you would have to have a much smaller percentage
> of the table being read. That shouldn't be taken to mean that the DSM is a
> loser. There are plenty of use cases where tables can be extremely large and
> have only very small percentages that are busy. The big advantage of the DSM
> is that it takes the size of the table out of the equation and replaces it
> with the size of the busy portion of the table. So updating a single record in
> a terabyte table has the same costs as updating a single record in a kilobyte
> table.
>
> Sadly that's not quite true due to indexes, and due to the size of the bitmap
> itself. But going back to your numbers it does mean that if you update a
> single row out of a terabyte table then we'll be removing about 85% of the i/o
> (minus the i/o needed to read the DSM, about .025%). If you update about 1%
> then you would be removing substantially less, and once you get to about 10%
> then you're back where you started.
>
>
> --
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [GENERAL] Autovacuum Improvements

From
Gregory Stark
Date:
"Bruce Momjian" <bruce@momjian.us> writes:

> Yep, agreed on the random I/O issue.  The larger question is if you have
> a huge table, do you care to reclaim 3% of the table size, rather than
> just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
> to take a lot of time, but vacuuming to relaim 3% three times seems like
> it is going to be more expensive than just vacuuming the 10% once.  And
> vacuuming to reclaim 1% ten times seems even more expensive.  The
> partial vacuum idea is starting to look like a loser to me again.

Well the answer is of course "that depends".

If you maintain the dead space at a steady state averaging 1.5% instead of 5%
your table is 3.33% smaller on average. If this is a DSS system that will
translate into running your queries 3.33% faster. It will take a lot of
vacuums before they hurt more than a 3%+ performance drop.

If it's an OLTP system the it's harder to figure. a 3.33% increase in data
density will translate to a higher cache hit rate but how much higher depends
on a lot of factors. In our experiments we actually got bigger boost in these
kinds of situations than the I expected (I expected comparable to the 3.33%
improvement). So it could be even more than 3.33%. But like said it depends.
If you already have the whole database cache you won't see any improvement. If
you are right on the cusp you could see a huge benefit.

It sounds like you're underestimating the performance drain 10% wasted space
has. If we found out that one routine was unnecessarily taking 10% of the cpu
time it would be an obvious focus of attention. 10% wasted space is going to
work out to about 10% of the i/o time.

It also sounds like we're still focused on the performance impact in absolute
terms. I'm much more interested in changing the performance characteristics so
they're predictable and scalable. It doesn't matter much if your 1kb table is
100% slower than necessary but it does matter if your 1TB table needs 1,000x
as much vacuuming as your 1GB table even if it's getting the same update
traffic.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: [GENERAL] Autovacuum Improvements

From
"Simon Riggs"
Date:
On Mon, 2007-01-22 at 13:27 -0500, Bruce Momjian wrote:
> Yep, agreed on the random I/O issue.  The larger question is if you have
> a huge table, do you care to reclaim 3% of the table size, rather than
> just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
> to take a lot of time, but vacuuming to relaim 3% three times seems like
> it is going to be more expensive than just vacuuming the 10% once.  And
> vacuuming to reclaim 1% ten times seems even more expensive.  The
> partial vacuum idea is starting to look like a loser to me again.

Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: [GENERAL] Autovacuum Improvements

From
Alvaro Herrera
Date:
Bruce Momjian wrote:
>
> Yep, agreed on the random I/O issue.  The larger question is if you have
> a huge table, do you care to reclaim 3% of the table size, rather than
> just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
> to take a lot of time, but vacuuming to relaim 3% three times seems like
> it is going to be more expensive than just vacuuming the 10% once.  And
> vacuuming to reclaim 1% ten times seems even more expensive.  The
> partial vacuum idea is starting to look like a loser to me again.

But if the partial vacuum is able to clean the busiest pages and reclaim
useful space, currently-running transactions will be able to use that
space and thus not have to extend the table.  Not that extension is a
problem on itself, but it'll keep your working set smaller.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: [GENERAL] Autovacuum Improvements

From
Richard Huxton
Date:
Bruce Momjian wrote:
> Yep, agreed on the random I/O issue.  The larger question is if you have
> a huge table, do you care to reclaim 3% of the table size, rather than
> just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
> to take a lot of time, but vacuuming to relaim 3% three times seems like
> it is going to be more expensive than just vacuuming the 10% once.  And
> vacuuming to reclaim 1% ten times seems even more expensive.  The
> partial vacuum idea is starting to look like a loser to me again.

Buying a house with a 25-year mortgage is much more expensive than just
paying cash too, but you don't always have a choice.

Surely the key benefit of the partial vacuuming thing is that you can at
least do something useful with a large table if a full vacuum takes 24
hours and you only have 4 hours of idle I/O.

It's also occurred to me that all the discussion of scheduling way back
when isn't directly addressing the issue. What most people want (I'm
guessing) is to vacuum *when the user-workload allows* and the
time-tabling is just a sysadmin first-approximation at that.

With partial vacuuming possible, we can arrange things with just three
thresholds and two measurements:
   Measurement 1 = system workload
   Measurement 2 = a per-table "requires vacuuming" value
   Threshold 1 = workload at which we do more vacuuming
   Threshold 2 = workload at which we do less vacuuming
   Threshold 3 = point at which a table is considered worth vacuuming.
Once every 10 seconds, the manager compares the current workload to the
thresholds and starts a new vacuum, kills one or does nothing. New
vacuum processes keep getting started as long as there is workload spare
and tables that need vacuuming.

Now the trick of course is how you measure system workload in a
meaningful manner.

--
   Richard Huxton
   Archonet Ltd

Re: [GENERAL] Autovacuum Improvements

From
Heikki Linnakangas
Date:
Gregory Stark wrote:
> "Bruce Momjian" <bruce@momjian.us> writes:
>
>> I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
>> out how small, and it seems about 15% of the total table, which means if
>> we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
>> 80%, assuming 5% of the table is scanned.
>
> Actually no. A while back I did experiments to see how fast reading a file
> sequentially was compared to reading the same file sequentially but skipping
> x% of the blocks randomly. The results were surprising (to me) and depressing.
> The breakeven point was about 7%.

Note that with uniformly random updates, you have dirtied every page of
the table until you get anywhere near 5% of dead space. So we have to
assume non-uniform distribution of update for the DSM to be of any help.

And if we assume non-uniform distribution, it's a good bet that the
blocks that need vacuuming are also not randomly distributed. In fact,
they might very well all be in one cluster, so that scanning that
cluster is indeed sequential I/O.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: [GENERAL] Autovacuum Improvements

From
Heikki Linnakangas
Date:
Kenneth Marshall wrote:
> On Mon, Jan 22, 2007 at 06:42:09PM +0000, Simon Riggs wrote:
>> Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...
>
> There may be other functions that could leverage a similar sort of
> infrastructure. For example, a long DB mining query could be registered
> with the system. Then as the pieces of the table/database are brought in
> to shared memory during the normal daily DB activity they can be acquired
> without forcing the DB to run a very I/O expensive query when waiting a
> bit for the results would be acceptable. As long as we are thinking
> piggyback.

Yeah, I had the same idea when we discussed synchronizing sequential
scans. The biggest difference is that with queries, there's often a user
waiting for the query to finish, but with vacuum we don't care so much
how long it takes.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: [GENERAL] Autovacuum Improvements

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> Bruce Momjian wrote:
> >
> > Yep, agreed on the random I/O issue.  The larger question is if you have
> > a huge table, do you care to reclaim 3% of the table size, rather than
> > just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
> > to take a lot of time, but vacuuming to relaim 3% three times seems like
> > it is going to be more expensive than just vacuuming the 10% once.  And
> > vacuuming to reclaim 1% ten times seems even more expensive.  The
> > partial vacuum idea is starting to look like a loser to me again.
>
> But if the partial vacuum is able to clean the busiest pages and reclaim
> useful space, currently-running transactions will be able to use that
> space and thus not have to extend the table.  Not that extension is a
> problem on itself, but it'll keep your working set smaller.

Yes, but my point is that if you are trying to avoid vacuuming the
table, I am afraid the full index scan is going to be painful too.  I
can see corner cases where partial vacuum is a win (I only have 4 hours
of idle I/O), but for the general case I am still worried that partial
vacuum will not be that useful as long as we have to scan the indexes.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [GENERAL] Autovacuum Improvements

From
Ron Mayer
Date:
Gregory Stark wrote:
> 
> Actually no. A while back I did experiments to see how fast reading a file
> sequentially was compared to reading the same file sequentially but skipping
> x% of the blocks randomly. The results were surprising (to me) and depressing.
> The breakeven point was about 7%. [...]
> 
> The theory online was that as long as you're reading one page from each disk
> track you're going to pay the same seek overhead as reading the entire track.

Could one take advantage of this observation in designing the DSM?

Instead of a separate bit representing every page, having each bit
represent 20 or so pages might be a more useful unit.  It sounds
like the time spent reading would be similar; while the bitmap
would be significantly smaller.


Re: [GENERAL] Autovacuum Improvements

From
"Joris Dobbelsteen"
Date:
>-----Original Message-----
>From: pgsql-general-owner@postgresql.org
>[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Gregory Stark
>Sent: maandag 22 januari 2007 19:41
>To: Bruce Momjian
>Cc: Heikki Linnakangas; Russell Smith; Darcy Buskermolen;
>Simon Riggs; Alvaro Herrera; Matthew T. O'Connor; Pavan
>Deolasee; Christopher Browne; pgsql-general@postgresql.org;
>pgsql-hackers@postgresql.org
>Subject: Re: [HACKERS] [GENERAL] Autovacuum Improvements
>
>"Bruce Momjian" <bruce@momjian.us> writes:
>
>> Yep, agreed on the random I/O issue.  The larger question is if you
>> have a huge table, do you care to reclaim 3% of the table
>size, rather
>> than just vacuum it when it gets to 10% dirty?  I realize the vacuum
>> is going to take a lot of time, but vacuuming to relaim 3%
>three times
>> seems like it is going to be more expensive than just vacuuming the
>> 10% once.  And vacuuming to reclaim 1% ten times seems even more
>> expensive.  The partial vacuum idea is starting to look like
>a loser to me again.
>
>Well the answer is of course "that depends".
>
>If you maintain the dead space at a steady state averaging
>1.5% instead of 5% your table is 3.33% smaller on average. If
>this is a DSS system that will translate into running your
>queries 3.33% faster. It will take a lot of vacuums before
>they hurt more than a 3%+ performance drop.

Good, this means a DSS system will mostly do table scans (right?). So
probably you should witness the 'table scan' statistic and rows fetched
aproaching the end of the universe (at least compared to
inserts/updates/deletes)?

>If it's an OLTP system the it's harder to figure. a 3.33%
>increase in data density will translate to a higher cache hit
>rate but how much higher depends on a lot of factors. In our
>experiments we actually got bigger boost in these kinds of
>situations than the I expected (I expected comparable to the
>3.33% improvement). So it could be even more than 3.33%. But
>like said it depends.
>If you already have the whole database cache you won't see any
>improvement. If you are right on the cusp you could see a huge benefit.

These tables have high insert, update and delete rates, probably a lot
of index scans? I believe the workload on table scans should be (close
to) none.

Are you willing to share some of this measured data? I'm quite
interested in such figures.

>It sounds like you're underestimating the performance drain
>10% wasted space has. If we found out that one routine was
>unnecessarily taking 10% of the cpu time it would be an
>obvious focus of attention. 10% wasted space is going to work
>out to about 10% of the i/o time.
>
>It also sounds like we're still focused on the performance
>impact in absolute terms. I'm much more interested in changing
>the performance characteristics so they're predictable and
>scalable. It doesn't matter much if your 1kb table is 100%
>slower than necessary but it does matter if your 1TB table
>needs 1,000x as much vacuuming as your 1GB table even if it's
>getting the same update traffic.

Or rather, the vacuuming should pay back.
A nice metric might be: cost_of_not_vacuuming / cost_of_vacuuming.
Obviously, the higher the better.

- Joris Dobbelsteen

Re: [GENERAL] Autovacuum Improvements

From
Ron Mayer
Date:
Gregory Stark wrote:
> 
> Actually no. A while back I did experiments to see how fast reading a file
> sequentially was compared to reading the same file sequentially but skipping
> x% of the blocks randomly. The results were surprising (to me) and depressing.
> The breakeven point was about 7%. [...]
> 
> The theory online was that as long as you're reading one page from each disk
> track you're going to pay the same seek overhead as reading the entire track.

Could one take advantage of this observation in designing the DSM?

Instead of a separate bit representing every page, having each bit
represent 20 or so pages might be a more useful unit.  It sounds
like the time spent reading would be similar; while the bitmap
would be significantly smaller.


Re: [GENERAL] Autovacuum Improvements

From
Kenneth Marshall
Date:
On Mon, Jan 22, 2007 at 07:24:20PM +0000, Heikki Linnakangas wrote:
> Kenneth Marshall wrote:
> >On Mon, Jan 22, 2007 at 06:42:09PM +0000, Simon Riggs wrote:
> >>Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...
> >
> >There may be other functions that could leverage a similar sort of
> >infrastructure. For example, a long DB mining query could be registered
> >with the system. Then as the pieces of the table/database are brought in
> >to shared memory during the normal daily DB activity they can be acquired
> >without forcing the DB to run a very I/O expensive query when waiting a
> >bit for the results would be acceptable. As long as we are thinking
> >piggyback.
>
> Yeah, I had the same idea when we discussed synchronizing sequential
> scans. The biggest difference is that with queries, there's often a user
> waiting for the query to finish, but with vacuum we don't care so much
> how long it takes.
>
Yes, but with trending and statistical analysis you may not need the
exact answer ASAP. An approximate answer based on a fraction of the
information would be useful. Also, "what if" queries could be run without
impacting the production uses of a database. One might imagine having a
query with results that "converge" as the table is processed during normal
use.

Ken

Re: [GENERAL] Autovacuum Improvements

From
Kenneth Marshall
Date:
On Mon, Jan 22, 2007 at 06:42:09PM +0000, Simon Riggs wrote:
> On Mon, 2007-01-22 at 13:27 -0500, Bruce Momjian wrote:
> > Yep, agreed on the random I/O issue.  The larger question is if you have
> > a huge table, do you care to reclaim 3% of the table size, rather than
> > just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
> > to take a lot of time, but vacuuming to relaim 3% three times seems like
> > it is going to be more expensive than just vacuuming the 10% once.  And
> > vacuuming to reclaim 1% ten times seems even more expensive.  The
> > partial vacuum idea is starting to look like a loser to me again.
>
> Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...
>
> --
>   Simon Riggs
>   EnterpriseDB   http://www.enterprisedb.com
>

There may be other functions that could leverage a similar sort of
infrastructure. For example, a long DB mining query could be registered
with the system. Then as the pieces of the table/database are brought in
to shared memory during the normal daily DB activity they can be acquired
without forcing the DB to run a very I/O expensive query when waiting a
bit for the results would be acceptable. As long as we are thinking
piggyback.

Ken


Re: [GENERAL] Autovacuum Improvements

From
"Jim C. Nasby"
Date:
On Mon, Jan 22, 2007 at 12:17:39PM -0800, Ron Mayer wrote:
> Gregory Stark wrote:
> > 
> > Actually no. A while back I did experiments to see how fast reading a file
> > sequentially was compared to reading the same file sequentially but skipping
> > x% of the blocks randomly. The results were surprising (to me) and depressing.
> > The breakeven point was about 7%. [...]
> > 
> > The theory online was that as long as you're reading one page from each disk
> > track you're going to pay the same seek overhead as reading the entire track.
> 
> Could one take advantage of this observation in designing the DSM?
> 
> Instead of a separate bit representing every page, having each bit
> represent 20 or so pages might be a more useful unit.  It sounds
> like the time spent reading would be similar; while the bitmap
> would be significantly smaller.

If we extended relations by more than one page at a time we'd probably
have a better shot at the blocks on disk being contiguous and all read
at the same time by the OS.
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


Re: [GENERAL] Autovacuum Improvements

From
Steve Atkins
Date:
On Jan 22, 2007, at 11:16 AM, Richard Huxton wrote:

> Bruce Momjian wrote:
>> Yep, agreed on the random I/O issue.  The larger question is if
>> you have
>> a huge table, do you care to reclaim 3% of the table size, rather
>> than
>> just vacuum it when it gets to 10% dirty?  I realize the vacuum is
>> going
>> to take a lot of time, but vacuuming to relaim 3% three times
>> seems like
>> it is going to be more expensive than just vacuuming the 10%
>> once.  And
>> vacuuming to reclaim 1% ten times seems even more expensive.  The
>> partial vacuum idea is starting to look like a loser to me again.
>
> Buying a house with a 25-year mortgage is much more expensive than
> just paying cash too, but you don't always have a choice.
>
> Surely the key benefit of the partial vacuuming thing is that you
> can at least do something useful with a large table if a full
> vacuum takes 24 hours and you only have 4 hours of idle I/O.
>
> It's also occurred to me that all the discussion of scheduling way
> back when isn't directly addressing the issue. What most people
> want (I'm guessing) is to vacuum *when the user-workload allows*
> and the time-tabling is just a sysadmin first-approximation at that.

Yup. I'd really like for my app to be able to say "Hmm. No
interactive users at the moment, no critical background tasks. Now
would be a really good time for the DB to do some maintenance." but
also to be able to interrupt the maintenance process if some new
users or other system load show up.

> With partial vacuuming possible, we can arrange things with just
> three thresholds and two measurements:
>   Measurement 1 = system workload
>   Measurement 2 = a per-table "requires vacuuming" value
>   Threshold 1 = workload at which we do more vacuuming
>   Threshold 2 = workload at which we do less vacuuming
>   Threshold 3 = point at which a table is considered worth vacuuming.
> Once every 10 seconds, the manager compares the current workload to
> the thresholds and starts a new vacuum, kills one or does nothing.
> New vacuum processes keep getting started as long as there is
> workload spare and tables that need vacuuming.
>
> Now the trick of course is how you measure system workload in a
> meaningful manner.

I'd settle for a "start maintenance", "stop maintenance" API.
Anything else (for instance the heuristics you suggest above) would
definitely be gravy.

It's not going to be simple to do, though, I don't think.

Cheers,
   Steve


Re: [GENERAL] Autovacuum Improvements

From
Martijn van Oosterhout
Date:
On Mon, Jan 22, 2007 at 05:51:53PM +0000, Gregory Stark wrote:
> Actually no. A while back I did experiments to see how fast reading a file
> sequentially was compared to reading the same file sequentially but skipping
> x% of the blocks randomly. The results were surprising (to me) and depressing.
> The breakeven point was about 7%.

I asusume this means you were reading 7% of the blocks, not skipping 7%
of the blocks when you broke even?

I presume by break-even you mean it took just as long, time-wise. But
did it have the same effect on system load? If reading only 7% of the
blocks allows the drive to complete other requests more quickly then
it's beneficial, even if the vacuum takes longer.

This may be a silly thought, I'm not sure how drives handle multiple
requests...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: STOP all user access except for admin for a few minutes?

From
Date:
Thx Russel,
I want to control it from software, changing network access via pg_hba with 
software doesnt feel right.

================ possible case================
Say I have a Group called Normal_Rights and one called Zero_Rights.

So dB runs as... Normal_Rights(User A, User B, User C, User D)
Then via sql, superuser REVOKEs those user rights and GRANTs them
Zero_Rights(User A, User B, User C, User D)... ie make users a member of the 
ZERO rights group.

Then hopefully Postgres kicks them out gracefully?????

Then software make changes and switch's them back to their Normal_Rights 
group.

================ or  more general case================
RECORD all the SQL for all user rights...
REVOKE everything except needed software superusers (postgres, and program 
superuser).
make changes via software.
PLAY BACK all the rights SQL script.

What do you think, will PG kill connections, let them go gracefully, stop 
after current transaction????

================ maybe I'm in the wrong tree================
Is it possible to make quick structural changes to postgres, with user 
activety?
Maybe start a transaction that changes structure... wonder if that will stop 
or hold user activity???

Thx
.



Re: STOP all user access except for admin for a few minutes?

From
Tino Wildenhain
Date:
org@kewlstuff.co.za schrieb:
> Thx Russel,
> I want to control it from software, changing network access via pg_hba 
> with software doesnt feel right.
> 
> ================ possible case================
> Say I have a Group called Normal_Rights and one called Zero_Rights.
> 
> So dB runs as... Normal_Rights(User A, User B, User C, User D)
> Then via sql, superuser REVOKEs those user rights and GRANTs them
> Zero_Rights(User A, User B, User C, User D)... ie make users a member of 
> the ZERO rights group.
> 
> Then hopefully Postgres kicks them out gracefully?????
> 
> Then software make changes and switch's them back to their Normal_Rights 
> group.
> 
> ================ or  more general case================
> RECORD all the SQL for all user rights...
> REVOKE everything except needed software superusers (postgres, and 
> program superuser).
> make changes via software.
> PLAY BACK all the rights SQL script.
> 
> What do you think, will PG kill connections, let them go gracefully, 
> stop after current transaction????
> 
> ================ maybe I'm in the wrong tree================

Yes I'm thinking that too:

> Is it possible to make quick structural changes to postgres, with user 
> activety?

of course.

> Maybe start a transaction that changes structure... wonder if that will 
> stop or hold user activity???

Usually not - all your DDL is done in a transaction just like any other
access users would make. So it only fails (but as a whole) if you want
to modify locked tables and such. But you would not end up w/ a partly
changed database in any case. Just make sure you do everything in
a transaction. No need to suspend user accounts for that.


Regards
Tino




Re: [GENERAL] Autovacuum Improvements

From
Jim Nasby
Date:
On Jan 22, 2007, at 6:53 PM, Kenneth Marshall wrote:
> The default should
> be approximately the OS standard read-ahead amount.

Is there anything resembling a standard across the OSes we support?  
Better yet, is there a standard call that allows you to find out what  
the read-ahead setting is?
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)




Re: STOP all user access except for admin for a few minutes?

From
Date:
Ha ha... thx Tino
Yes, I think this is way to go, strange how my mind climbs the wrong tree 
sometimes :)
I actually need to aquire a transaction across several dB's, check if the 
conditions are right, and then modify some tables, write and remove some 
triggers.
Transactions in postgres are 2 sophisticated, I dont think they will hold 
the locks at the level I need them.
But I was thinking (climbing out of the wrong tree;)... I can just aquire 
exclusive locks on the tables, and hey presto, users are on hold while the 
software checks the dB's.
Effectively creating a very rough transaction, with the lock scope needed? 
... ie it will keep users out long enough to align several dB's... I'm 
hoping?

From: "Tino Wildenhain" <tino@wildenhain.de>
>> ================ maybe I'm in the wrong tree================
>
> Yes I'm thinking that too:
>
>> Is it possible to make quick structural changes to postgres, with user 
>> activety?
>
> of course.
>
>> Maybe start a transaction that changes structure... wonder if that will 
>> stop or hold user activity???
>
> Usually not - all your DDL is done in a transaction just like any other
> access users would make. So it only fails (but as a whole) if you want
> to modify locked tables and such. But you would not end up w/ a partly
> changed database in any case. Just make sure you do everything in
> a transaction. No need to suspend user accounts for that.
>
>
> Regards
> Tino
>
>
> 



Re: STOP all user access except for admin for a few minutes?

From
Markus Schiltknecht
Date:
Hi,

org@kewlstuff.co.za wrote:
> Ha ha... thx Tino
> Yes, I think this is way to go, strange how my mind climbs the wrong 
> tree sometimes :)
> I actually need to aquire a transaction across several dB's, check if 
> the conditions are right, and then modify some tables, write and remove 
> some triggers.
> Transactions in postgres are 2 sophisticated, I dont think they will 
> hold the locks at the level I need them.

You want to read about explicit locking:
http://www.postgresql.org/docs/8.2/static/explicit-locking.html

> But I was thinking (climbing out of the wrong tree;)... I can just 
> aquire exclusive locks on the tables, and hey presto, users are on hold 
> while the software checks the dB's.

I'm sure, that's possible. However, I remember you were talking about 
replication, thus I have to add a warning: please keep in mind that this 
does not scale. You're most probably better using two phase commit, 
aren't you?

Regards

Markus


Re: [GENERAL] Autovacuum Improvements

From
Kenneth Marshall
Date:
On Mon, Jan 22, 2007 at 05:11:03PM -0600, Jim C. Nasby wrote:
> On Mon, Jan 22, 2007 at 12:17:39PM -0800, Ron Mayer wrote:
> > Gregory Stark wrote:
> > > 
> > > Actually no. A while back I did experiments to see how fast reading a file
> > > sequentially was compared to reading the same file sequentially but skipping
> > > x% of the blocks randomly. The results were surprising (to me) and depressing.
> > > The breakeven point was about 7%. [...]
> > > 
> > > The theory online was that as long as you're reading one page from each disk
> > > track you're going to pay the same seek overhead as reading the entire track.
> > 
> > Could one take advantage of this observation in designing the DSM?
> > 
> > Instead of a separate bit representing every page, having each bit
> > represent 20 or so pages might be a more useful unit.  It sounds
> > like the time spent reading would be similar; while the bitmap
> > would be significantly smaller.
> 
> If we extended relations by more than one page at a time we'd probably
> have a better shot at the blocks on disk being contiguous and all read
> at the same time by the OS.
> -- 
> Jim Nasby                                            jim@nasby.net
> EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)
> 
Yes, most OS have some read-ahead  when reading a file from disk. Any
increment over 1 would be an improvement. If you used a counter with
a time-based decrement function, you could increase the amount that
the relation is extended based on temporal proximity. If you have
extended it several times recently, increase the size of the new
extension to reduce the overhead even further. The default should
be approximately the OS standard read-ahead amount.

Ken



Re: [GENERAL] Autovacuum Improvements

From
Kenneth Marshall
Date:
On Tue, Jan 23, 2007 at 09:01:41PM -0600, Jim Nasby wrote:
> On Jan 22, 2007, at 6:53 PM, Kenneth Marshall wrote:
> >The default should
> >be approximately the OS standard read-ahead amount.
> 
> Is there anything resembling a standard across the OSes we support?  
> Better yet, is there a standard call that allows you to find out what  
> the read-ahead setting is?
> --
> Jim Nasby                                            jim@nasby.net
> EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)
> 
Not that I am aware of. Even extending the relation by one additional
block can make a big difference in performance and should easily fall
within every read-ahead in use today. Or a GUC variable, that defaults
to a small power of 2 number of PostgreSQL blocks, with a default arrived
at by testing.

Ken


Re: [GENERAL] Autovacuum Improvements

From
Tom Lane
Date:
Kenneth Marshall <ktm@is.rice.edu> writes:
> Not that I am aware of. Even extending the relation by one additional
> block can make a big difference in performance

Do you have any evidence to back up that assertion?

It seems a bit nontrivial to me --- not the extension part exactly, but
making sure that the space will get used promptly.  With the current
code the backend extending a relation will do subsequent inserts into
the block it just got, which is fine, but there's no mechanism for
remembering that any other newly-added blocks are available --- unless
you wanted to push them into the FSM, which could work but the current
FSM code doesn't support piecemeal addition of space, and in any case
there's some question in my mind about the concurrency cost of increasing
FSM traffic even more.

In short, it's hardly an unquestionable improvement, so we need some
evidence.
        regards, tom lane


Re: STOP all user access except for admin for a few minutes?

From
Date:
Good memory you have and you exactly right.
Yes... the replication is using posgres's normal transactions ie 2 phase 
commits.... and it works like a dream.
When moving data during replication, the locks are happening at record 
level, and its intrinsic to the postgres transaction machinery.
ie postgres is deciding how 'fine grained' the locks should be, and doing 
all that other amazing MVCC stuff.

The part I'm toying with and struggling with is the start and stop, or the 
admin side of the replication. As it stands now, one has to start with 
identicle databases, then setup replication, and then the users come on. But 
now say I want to make a structural change... as it stands I have to claim 
the dB's back, fix them all, make sure they identicle, re-set up the 
replication, and then the users can come back on. Its that.... the dB's must 
be identicle on setting up replication, that I'm trying to get around. I 
think that when it comes to the structural side, I have to hold(LOCK) those 
dB's, while the software removes the replication, changes the structures of 
all the dB's, reinstalls the scripts and triggers... and I want to make that 
invisible to a system thats already active. Ideally the user software just 
delays for say 10 seconds, and in that time, 6 dB's have been restructured, 
checked and the replication restarted.

In terms of the set up I want to get it to... make those 5 dB's the same as 
this template and start or continue replicating.... becomes a mind twister.
Thats the idea anyway... current version is at http://coolese.100free.com/ 
it works great, but you'll see it has a setup, breakdown problem on an 
active system.
Thx  4 the help Johnny

From: "Markus Schiltknecht" <markus@bluegap.ch>
>> But I was thinking (climbing out of the wrong tree;)... I can just aquire 
>> exclusive locks on the tables, and hey presto, users are on hold while 
>> the software checks the dB's.
>
> I'm sure, that's possible. However, I remember you were talking about 
> replication, thus I have to add a warning: please keep in mind that this 
> does not scale. You're most probably better using two phase commit, aren't 
> you?
>
> Regards
>
> Markus
> 



Re: [GENERAL] Autovacuum Improvements

From
Bruce Momjian
Date:
Jim C. Nasby wrote:
> On Mon, Jan 22, 2007 at 12:17:39PM -0800, Ron Mayer wrote:
> > Gregory Stark wrote:
> > > 
> > > Actually no. A while back I did experiments to see how fast reading a file
> > > sequentially was compared to reading the same file sequentially but skipping
> > > x% of the blocks randomly. The results were surprising (to me) and depressing.
> > > The breakeven point was about 7%. [...]
> > > 
> > > The theory online was that as long as you're reading one page from each disk
> > > track you're going to pay the same seek overhead as reading the entire track.
> > 
> > Could one take advantage of this observation in designing the DSM?
> > 
> > Instead of a separate bit representing every page, having each bit
> > represent 20 or so pages might be a more useful unit.  It sounds
> > like the time spent reading would be similar; while the bitmap
> > would be significantly smaller.
> 
> If we extended relations by more than one page at a time we'd probably
> have a better shot at the blocks on disk being contiguous and all read
> at the same time by the OS.

Actually, there is evidence that adding only a single page to the end
causes a lot of contention for that last page, and that adding a few
might be better.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [GENERAL] Autovacuum Improvements

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Jim C. Nasby wrote:
>> If we extended relations by more than one page at a time we'd probably
>> have a better shot at the blocks on disk being contiguous and all read
>> at the same time by the OS.

> Actually, there is evidence that adding only a single page to the end
> causes a lot of contention for that last page, and that adding a few
> might be better.

Evidence where?  The code is designed so that the last page *isn't*
shared --- go read the comments in hio.c sometime.
        regards, tom lane


Re: [GENERAL] Autovacuum Improvements

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Jim C. Nasby wrote:
> >> If we extended relations by more than one page at a time we'd probably
> >> have a better shot at the blocks on disk being contiguous and all read
> >> at the same time by the OS.
> 
> > Actually, there is evidence that adding only a single page to the end
> > causes a lot of contention for that last page, and that adding a few
> > might be better.
> 
> Evidence where?  The code is designed so that the last page *isn't*
> shared --- go read the comments in hio.c sometime.

I was talking about the last page of a table, where INSERTs all cluster
on that last page and cause lots of page locking.  hio.c does look like
it avoids that problem.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [GENERAL] Autovacuum Improvements

From
Kenneth Marshall
Date:
On Wed, Jan 24, 2007 at 07:30:05PM -0500, Tom Lane wrote:
> Kenneth Marshall <ktm@is.rice.edu> writes:
> > Not that I am aware of. Even extending the relation by one additional
> > block can make a big difference in performance
> 
> Do you have any evidence to back up that assertion?
> 
> It seems a bit nontrivial to me --- not the extension part exactly, but
> making sure that the space will get used promptly.  With the current
> code the backend extending a relation will do subsequent inserts into
> the block it just got, which is fine, but there's no mechanism for
> remembering that any other newly-added blocks are available --- unless
> you wanted to push them into the FSM, which could work but the current
> FSM code doesn't support piecemeal addition of space, and in any case
> there's some question in my mind about the concurrency cost of increasing
> FSM traffic even more.
> 
> In short, it's hardly an unquestionable improvement, so we need some
> evidence.
> 
>             regards, tom lane
> 

My comment was purely based on the reduction in fragmentation of the
file behind the relation. A result that I have seen repeatedly in file
related data processing. It does sound much more complicated to make the
additional space available to other backends. If one backend was doing
many inserts, it might still be of value even for just that backend. As
you mention, testing is needed to see if there is enough value in this
process.

Ken