Thread: Some thoughts about i/o priorities and throttling vacuum

Some thoughts about i/o priorities and throttling vacuum

From
Greg Stark
Date:
So when I suggested on linux-kernel that vacuumcould benefit from some way to
prioritize i/o resourcse, someone suggested vacuum could just throttle its own
disk accesses.

While I think they their conception of vacuum is still broken and the
throttling methods they described are the wrong direction, on further thought
I think they actually have the right idea.

pg_autovacuum knows at what rate free space has been accumulating. It knows
how large the fsm available is. It can therefore calculate exactly how much
time it has available to complete the next vacuum run before the fsm runs out
(assuming the free space continues accumulating at a constant rate). 

If it passed that information on to vacuum then vacuum could throttle its own
disk accesses by, say, reading 64k at a time then sleeping for a fraction of a
second. The time spent sleeping would be calculated to have the vacuum take
the required total time.

This would produce a more even and less resource hogging duty cycle where
vacuum would be continuously running at low levels, rather than a duty cycle
where it doesn't run at all until it's needed, but then floods the disk
controllers with continuous sequential reads.

(There are a few details of course. You would need to leave a safety margin in
case free space accumulation speeds up. And accounting for the actual time
spent doing the vacuum would make calculating the sleep time tricky. But they
seem fairly tractable problems.)

Personally I think i/o priorities give much better leverage. It would let
vacuum run as fast as the disk subsystems can handle during idle times, and
then fade away as soon as any heavy transaction load appears. But the flip
side is that with i/o prioritization vacuum might not actually finish in time.


-- 
greg



Re: Some thoughts about i/o priorities and throttling vacuum

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> ... vacuum could throttle
> its own disk accesses by, say, reading 64k at a time then sleeping for
> a fraction of a second.
> ...
> Personally I think i/o priorities give much better leverage.

Pie in the sky is great too ;-).  But there is no such thing as i/o
priorities, at least not in any portable sense.

OTOH I was just musing to myself earlier today that putting a tunable
delay into VACUUM's per-page loop might make it more friendly to
competing processes.  I dunno if it'd work or just be a waste of time,
but it does seem worth experimenting with.

Want to try it out and report back?
        regards, tom lane


Re: Some thoughts about i/o priorities and throttling vacuum

From
"Stephen"
Date:
I think adding tunable delay per-page loop into VACUUM will help keep system
responsive at all times. In many cases, especially for mostly read-only
tables, plain VACUUM does not need to complete immediately (VACUUM FULL
should complete immediately). I prefer that VACUUM takes its sweet time to
run as long as it doesn't disrupt other queries. See my other post on
"VACUUM degrades performance significantly. Database becomes unusable!" on
pgsql-general mailing list.

Regards,

Stephen


"Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
news:16818.1066282922@sss.pgh.pa.us...
> Greg Stark <gsstark@mit.edu> writes:
> > ... vacuum could throttle
> > its own disk accesses by, say, reading 64k at a time then sleeping for
> > a fraction of a second.
> > ...
> > Personally I think i/o priorities give much better leverage.
>
> Pie in the sky is great too ;-).  But there is no such thing as i/o
> priorities, at least not in any portable sense.
>
> OTOH I was just musing to myself earlier today that putting a tunable
> delay into VACUUM's per-page loop might make it more friendly to
> competing processes.  I dunno if it'd work or just be a waste of time,
> but it does seem worth experimenting with.
>
> Want to try it out and report back?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>




Re: Some thoughts about i/o priorities and throttling vacuum

From
Bruce Momjian
Date:
Stephen wrote:
> I think adding tunable delay per-page loop into VACUUM will help keep system
> responsive at all times. In many cases, especially for mostly read-only
> tables, plain VACUUM does not need to complete immediately (VACUUM FULL
> should complete immediately). I prefer that VACUUM takes its sweet time to
> run as long as it doesn't disrupt other queries. See my other post on
> "VACUUM degrades performance significantly. Database becomes unusable!" on
> pgsql-general mailing list.

Of course, this makes VACUUM run longer, and if you are waiting for it
to finish, it would be worse, like if you are running it at night or
something.

I think the delay has to take into account the number of active
transactions or something.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Some thoughts about i/o priorities and throttling vacuum

From
Greg Stark
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:

> Of course, this makes VACUUM run longer, and if you are waiting for it
> to finish, it would be worse, like if you are running it at night or
> something.

My plan was that the time delay would be a parameter and pg_autovacuum would
set it based on the observed rate at which free space is accumulating.

Someone could manually specify a delay, but by default it would run with no
delay when run on the command line.

> I think the delay has to take into account the number of active
> transactions or something.

That's a possibility. That's actually what the linux-kernel folk suggested.
Someone there suggested using aio to do carefully schedule i/o only when no
i/o was pending from transactions.

But vacuum has no way to judge whether those transactions are really doing
much disk i/o or only reading cached blocks, or even whether the disk i/o
they're doing is on the same disk. They could also be waiting on the client or
on locks from other transactions.

-- 
greg



Re: Some thoughts about i/o priorities and throttling vacuum

From
"Matthew T. O'Connor"
Date:
On Thu, 2003-10-16 at 16:16, Greg Stark wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Of course, this makes VACUUM run longer, and if you are waiting for it
> > to finish, it would be worse, like if you are running it at night or
> > something.
> 
> My plan was that the time delay would be a parameter and pg_autovacuum would
> set it based on the observed rate at which free space is accumulating.

I don't know that pg_autovacuum is smart enough to make a good guess as
to an appropriate parameter.
> > I think the delay has to take into account the number of active
> > transactions or something.

I think this is a better plan than pg_autovacuum, this would also allow
vacuum to have a different delay for each loop depending on the current
number of transactions. 

> But vacuum has no way to judge whether those transactions are really doing
> much disk i/o or only reading cached blocks, or even whether the disk i/o
> they're doing is on the same disk. They could also be waiting on the client or
> on locks from other transactions.

True, it would be a rough estimate, but at least one based on something
representative of system I/O load at that moment.



Re: Some thoughts about i/o priorities and throttling vacuum

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Of course, this makes VACUUM run longer, and if you are waiting for it
> to finish, it would be worse, like if you are running it at night or
> something.
> I think the delay has to take into account the number of active
> transactions or something.

I was just thinking of a GUC parameter: wait N milliseconds between
pages, where N defaults to zero probably.  A user who wants to run his
vacuum as a background process could set N larger than zero.  I don't
believe we are anywhere near being able to automatically adjust the
delay based on load, and even if we could, this would ignore the point
you make above --- the user's intent has to matter as much as anything
else.
        regards, tom lane


Re: Some thoughts about i/o priorities and throttling vacuum

From
"Stephen"
Date:
Is it possible to have an optional delay in plain VACUUM for each invocation
rather than database wide? Something along the line of an optional THROTTLE
or DELAY parameter for the VACUUM command. The THROTTLE is ignored when FULL
or FREEZE is selected.

VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [THROTTLE] ANALYZE [ table [ (column
[, ...] ) ] ]

This way autovacuum can still throttle VACUUM as needed in future (either in
contrib or backend) and administrators can decide to apply different delays
for different tables depending on the usage.

Regards, Stephen

"Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
news:16916.1066349859@sss.pgh.pa.us...
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Of course, this makes VACUUM run longer, and if you are waiting for it
> > to finish, it would be worse, like if you are running it at night or
> > something.
> > I think the delay has to take into account the number of active
> > transactions or something.
>
> I was just thinking of a GUC parameter: wait N milliseconds between
> pages, where N defaults to zero probably.  A user who wants to run his
> vacuum as a background process could set N larger than zero.  I don't
> believe we are anywhere near being able to automatically adjust the
> delay based on load, and even if we could, this would ignore the point
> you make above --- the user's intent has to matter as much as anything
> else.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>




Re: Some thoughts about i/o priorities and throttling vacuum

From
Shridhar Daithankar
Date:
Tom Lane wrote:
> I was just thinking of a GUC parameter: wait N milliseconds between
> pages, where N defaults to zero probably.  A user who wants to run his
> vacuum as a background process could set N larger than zero.  I don't
> believe we are anywhere near being able to automatically adjust the
> delay based on load, and even if we could, this would ignore the point
> you make above --- the user's intent has to matter as much as anything
> else.

I am slightly confused here. IIRC pg_autovacuum never did a vacuum full. At the 
most it does vacuum /vacuum analyse, none of which chew disk bandwidth. And if 
pg_autovacuum is running along with postmaster all the time, with aggressive 
polling like 5 sec, the database should not accumulate any dead tuples nor it 
would suffer xid wraparound as there are vacuum happening constantly.

What's left in above scenario? As long as all the requirements for pg_autovacuum 
are met, namely setting it up, setting it up aggressively and tuning 
postgresql.conf correctly, vacuum and related problems should be a thing in 
past, at least as far as 7.4 and onwards is considered.

Of course RSM implementation for vacuum would still be much needed but right 
now, it does not affect disk IO directly(except for tossing buffer cache out of 
track that is).

What am I missing?
 Shridhar



Re: Some thoughts about i/o priorities and throttling vacuum

From
Andrew Sullivan
Date:
On Fri, Oct 17, 2003 at 07:04:45PM +0530, Shridhar Daithankar wrote:
> I am slightly confused here. IIRC pg_autovacuum never did a vacuum full. At 
> the most it does vacuum /vacuum analyse, none of which chew disk bandwidth. 

The latter is false.  VACUUM FULL certainly uses _more_ disk
bandwidth than VACUUM, but it's just false that plain VACUUM doesn't
contend for disk.  And if you're already maxed, then that extra
bandwidth you cannot afford.

A


-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8                                        +1 416 646 3304
x110



Re: Some thoughts about i/o priorities and throttling vacuum

From
Shridhar Daithankar
Date:
Andrew Sullivan wrote:

> On Fri, Oct 17, 2003 at 07:04:45PM +0530, Shridhar Daithankar wrote:
> 
>>I am slightly confused here. IIRC pg_autovacuum never did a vacuum full. At 
>>the most it does vacuum /vacuum analyse, none of which chew disk bandwidth. 
> 
> 
> The latter is false.  VACUUM FULL certainly uses _more_ disk
> bandwidth than VACUUM, but it's just false that plain VACUUM doesn't
> contend for disk.  And if you're already maxed, then that extra
> bandwidth you cannot afford.

What part of plain vacuum takes disk bandwidth? WAL? Clog? Certainly not data 
files themselves, right?

OK, I understand some system can be saturated enough to have additional WAL/Clog 
burdon, but genuinely curious, how much disk bandwidth is required for plain 
vacuum and what are the factors it depends upon?
 Shridhar



Re: Some thoughts about i/o priorities and throttling vacuum

From
Alvaro Herrera
Date:
On Fri, Oct 17, 2003 at 07:04:45PM +0530, Shridhar Daithankar wrote:

> And if pg_autovacuum is running along with postmaster all the time, with 
> aggressive polling like 5 sec, the database should not accumulate any dead 
> tuples nor it would suffer xid wraparound as there are vacuum happening 
> constantly.

The database can suffer XID wraparound anyway if there's at least one
table without updates, because the autovacuum daemon will never vacuum
it (correct me if I'm wrong).

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Tiene valor aquel que admite que es un cobarde" (Fernandel)


Re: Some thoughts about i/o priorities and throttling vacuum

From
Shridhar Daithankar
Date:
Alvaro Herrera wrote:

> On Fri, Oct 17, 2003 at 07:04:45PM +0530, Shridhar Daithankar wrote:
> 
> 
>>And if pg_autovacuum is running along with postmaster all the time, with 
>>aggressive polling like 5 sec, the database should not accumulate any dead 
>>tuples nor it would suffer xid wraparound as there are vacuum happening 
>>constantly.
> 
> 
> The database can suffer XID wraparound anyway if there's at least one
> table without updates, because the autovacuum daemon will never vacuum
> it (correct me if I'm wrong).
> 

If a table is never updated and hence not vacuumed at all, why would it be 
involved in a transaction that would have xid wrap around?

pg_autovacuum takes care of insert/updates/deletes. If a table never 
participates in above three and hence escape from pg_autovauum, it also escapes 
from xid wraparound, isn't it?
 Shridhar



Re: Some thoughts about i/o priorities and throttling vacuum

From
Alvaro Herrera
Date:
On Fri, Oct 17, 2003 at 07:41:38PM +0530, Shridhar Daithankar wrote:
> Alvaro Herrera wrote:
> 
> >On Fri, Oct 17, 2003 at 07:04:45PM +0530, Shridhar Daithankar wrote:

> >The database can suffer XID wraparound anyway if there's at least one
> >table without updates, because the autovacuum daemon will never vacuum
> >it (correct me if I'm wrong).
> 
> If a table is never updated and hence not vacuumed at all, why would it be 
> involved in a transaction that would have xid wrap around?

Because the tuples on it were involved in some insert operation at some
time (else the table would not have any tuples).  So it _has_ to be
vacuumed, else you run the risk of losing the tuples when the wraparound
happens.  (Sorry, I don't know how to explain this better.)

Maybe in this case it's best to do a VACUUM FREEZE; that'd ensure that
the table would never ever need a vacuum again until it suffers
an insert, delete or update.  Perhaps the autovacuum daemon could detect
the case where a table has only very old tuples and freeze it.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El número de instalaciones de UNIX se ha elevado a 10,
y se espera que este número aumente" (UPM, 1972)


Re: Some thoughts about i/o priorities and throttling vacuum

From
Tom Lane
Date:
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> What part of plain vacuum takes disk bandwidth?

Reading (and possibly rewriting) all the pages.
        regards, tom lane


Re: Some thoughts about i/o priorities and throttling vacuum

From
Shridhar Daithankar
Date:
Alvaro Herrera wrote:

> On Fri, Oct 17, 2003 at 07:41:38PM +0530, Shridhar Daithankar wrote:
> 
>>Alvaro Herrera wrote:
>>
>>
>>>On Fri, Oct 17, 2003 at 07:04:45PM +0530, Shridhar Daithankar wrote:
> 
> 
>>>The database can suffer XID wraparound anyway if there's at least one
>>>table without updates, because the autovacuum daemon will never vacuum
>>>it (correct me if I'm wrong).
>>
>>If a table is never updated and hence not vacuumed at all, why would it be 
>>involved in a transaction that would have xid wrap around?
> 
> 
> Because the tuples on it were involved in some insert operation at some
> time (else the table would not have any tuples).  So it _has_ to be
> vacuumed, else you run the risk of losing the tuples when the wraparound
> happens.  (Sorry, I don't know how to explain this better.)

OK. So here is what I understand. I have a table which contains 100 rows which 
appeated there due to some insert operation. Then I vacuum it. And sit there for 
internity for rest of the database to approach the singularity(the xid 
wraparound..:-) Nice term, isn't it?).

So this static table is vulnerable to xid wraparound? I doubt.

Did I miss something?
 Shridhar



Re: Some thoughts about i/o priorities and throttling vacuum

From
Shridhar Daithankar
Date:
Tom Lane wrote:

> Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> 
>>What part of plain vacuum takes disk bandwidth?
> 
> 
> Reading (and possibly rewriting) all the pages.

I was under impression that was for shared memory pages only and not for disk pages.

OK.  I can see difference of understanding here.

Plain Vacuum goes around the table/database and makes space, shared buffers and 
disks, reusable whenever possible but *does not* free any space.

Would it be possible to have a vacuum variant that would just shuffle thr. 
shared buffers and not touch disk at all?  pg_autovacuum could probably be ulra 
agressive with such a shared-buffers only scan? Is it possible or feasible?

IMO that could be a clever solution rather than throttling IO for vacuum. For 
one thing, getting that throttiling right, would be extremely difficult and 
varying from site to site. If it is going to be tough to tune, then it will be 
underutilised and will lose it's value rather rapidly.
 Just a thought..
 Shridhar









Re: Some thoughts about i/o priorities and throttling vacuum

From
Tom Lane
Date:
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> Would it be possible to have a vacuum variant that would just shuffle thr. 
> shared buffers and not touch disk at all?

What would be the use of that?  You couldn't predict *anything* about
the coverage.  Maybe you find all the free space in a particular table,
but most likely you don't.

In any case an I/O-free vacuum is impossible since once you have decided
to recycle a particular tuple, you don't have any option about removing
the corresponding index entries first.  So unless both the table and all
its indexes are in RAM, you will be incurring I/O.
        regards, tom lane


Re: Some thoughts about i/o priorities and throttling vacuum

From
Alvaro Herrera
Date:
On Fri, Oct 17, 2003 at 07:55:44PM +0530, Shridhar Daithankar wrote:

> OK. So here is what I understand. I have a table which contains 100 rows 
> which appeated there due to some insert operation. Then I vacuum it. And 
> sit there for internity for rest of the database to approach the 
> singularity(the xid wraparound..:-) Nice term, isn't it?).
> 
> So this static table is vulnerable to xid wraparound? I doubt.
> 
> Did I miss something?

You are missing the part when the XID that was formerly a "committed
transaction" becomes an uncommitted transaction when the wraparound
occurs... so the tuples will have creation XID by an uncommitted
transaction, and current transactions will not see them.  Voila, your
table is empty.

The trick to keep in mind is that the XID comparison functions use
"modulo" operations, _but_ there are special "frozen" XIDs that are
always "committed" -- that's why a VACUUM FREEZE would relieve the table
forever from this problem.

(At least this is how I understand it -- I could be totally wrong here)

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Los dioses no protegen a los insensatos.  Éstos reciben protección de
otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)


Re: Some thoughts about i/o priorities and throttling vacuum

From
"Matthew T. O'Connor"
Date:
On Fri, 2003-10-17 at 09:34, Shridhar Daithankar wrote:
> I am slightly confused here. IIRC pg_autovacuum never did a vacuum full. 

Correct.

> At the 
> most it does vacuum /vacuum analyse, 

Incorrect, it either does vacuum analyse, or just analyse

> none of which chew disk bandwidth. 

Incorrect, vacuum can have lots of disk I/O, analyze has considerably
less, but still some.

> And if 
> pg_autovacuum is running along with postmaster all the time, with aggressive 
> polling like 5 sec, the database should not accumulate any dead tuples

True, however, I think such aggressive polling will be a net loss in
efficiency.

>  nor it 
> would suffer xid wraparound as there are vacuum happening constantly.

Wrong, pg_autovacuum typically just does vacuum [table name], which does
not effect the xid wraparound issue, one has to issue a vacuum against
an entire database to effect that.

> What's left in above scenario? As long as all the requirements for pg_autovacuum 
> are met, namely setting it up, setting it up aggressively and tuning 
> postgresql.conf correctly, vacuum and related problems should be a thing in 
> past, at least as far as 7.4 and onwards is considered.

Well it still remains to be seen if the client side implementation of
pg_autovacuum is sufficient.  Also, we will see if index bloat is
handled (less an autovac issue, but semi-related).  Ideally, autovac
should make better decisions based on FSM and perhaps even the RSM (is
that what it was called?) that people have talked about setting up.

With all that said, hopefully pg_autovacuum proves to be a successful
experiment, and if so, then it needs to be integrated into core somehow.

Matthew



Re: Some thoughts about i/o priorities and throttling vacuum

From
"Matthew T. O'Connor"
Date:
On Fri, 2003-10-17 at 10:25, Shridhar Daithankar wrote:
> OK. So here is what I understand. I have a table which contains 100 rows which 
> appeated there due to some insert operation. Then I vacuum it. And sit there for 
> internity for rest of the database to approach the singularity(the xid 
> wraparound..:-) Nice term, isn't it?).
> 
> So this static table is vulnerable to xid wraparound? I doubt.

No that table would probably be ok, because you did a vacuum on it after
the inserts.  The problem is that pg_autovacuum may choose not to do a
vacuum if you didn't cross a threshold, or someone outside of
pg_autovacuum may have done the vacuum and autovac doesn't know about
it, so it can't guarantee that all tables in the database are safe from
xid wraparound.  

One additional thing, some of this might be possible if pg_autovacuum
saved its data between restarts.  Right now it restarts with no memory
of what happened before.  



Re: Some thoughts about i/o priorities and throttling vacuum

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> Maybe in this case it's best to do a VACUUM FREEZE; that'd ensure that
> the table would never ever need a vacuum again until it suffers
> an insert, delete or update.

But how would you keep track of that?  Certainly an external autovacuum
daemon couldn't know for sure that the table had never been modified
since it was frozen.  I suppose you could think about altering the
backend to mark a table "dirty" whenever an insert/update/delete is
done, but I'd have to think this would be a net waste of cycles in the
vast majority of cases.  How many people have tables that are *really*
read-only over the long haul (billions of transactions)?

I think the existing approach of forcing a database-wide vacuum every
billion or so transactions is probably the most efficient way of dealing
with the issue.  It's almost certainly cheaper, net, than any scheme
that adds even a tiny overhead to each individual insert/update/delete.
        regards, tom lane


Re: Some thoughts about i/o priorities and throttling vacuum

From
Shridhar Daithankar
Date:
Matthew T. O'Connor wrote:

> On Fri, 2003-10-17 at 10:25, Shridhar Daithankar wrote:
> 
>>OK. So here is what I understand. I have a table which contains 100 rows which 
>>appeated there due to some insert operation. Then I vacuum it. And sit there for 
>>internity for rest of the database to approach the singularity(the xid 
>>wraparound..:-) Nice term, isn't it?).
>>
>>So this static table is vulnerable to xid wraparound? I doubt.
> 
> 
> No that table would probably be ok, because you did a vacuum on it after
> the inserts.  The problem is that pg_autovacuum may choose not to do a
> vacuum if you didn't cross a threshold, or someone outside of
> pg_autovacuum may have done the vacuum and autovac doesn't know about
> it, so it can't guarantee that all tables in the database are safe from
> xid wraparound.  
> 
> One additional thing, some of this might be possible if pg_autovacuum
> saved its data between restarts.  Right now it restarts with no memory
> of what happened before.  

Well, the unmaintened gborg version adopted approach of storing such info. in a 
table, so that it survives postgresql/pg_atuvacuum restart or both.

That was considered a tablespace pollution back then. But personally I think, it 
should be ok. If ever it goes to catalogues, I would rather add few columns to 
pg_class for such a stat. But again, thats not my call to make.
 Shridhar



Re: Some thoughts about i/o priorities and throttling vacuum

From
Shridhar Daithankar
Date:
Tom Lane wrote:

> Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> 
>>Would it be possible to have a vacuum variant that would just shuffle thr. 
>>shared buffers and not touch disk at all?
> 
> 
> What would be the use of that?  You couldn't predict *anything* about
> the coverage.  Maybe you find all the free space in a particular table,
> but most likely you don't.
> 
> In any case an I/O-free vacuum is impossible since once you have decided
> to recycle a particular tuple, you don't have any option about removing
> the corresponding index entries first.  So unless both the table and all
> its indexes are in RAM, you will be incurring I/O.

I am just suggesting it as a variant and not a replacement for existing vacuum 
options. Knowing that it does not do any IO, it could be triggered lot more 
aggressively. Furthermore if we assume pg_autovacuum as integral part of 
database operation, right before from a single database object is created, I 
think it could cover many/most database usage patterns barring multiple indexes, 
for which normal vacuum variants could be used.

Furthermore, when a tuple is updated, all the relevant indexes are updated, 
right? So if such a vacuum is aggressive enough, it could catch the index 
entries as well, in the RAM.

Think of it like catching hens. Easier to do in a cage rather than over a farm. 
So catch as many of them in cage. If they escape or spill out of cage due to 
over-population, you have to tread the farm anyways...
 Just a thought.
 Shridhar



Re: Some thoughts about i/o priorities and throttling vacuum

From
Andrew Sullivan
Date:
On Fri, Oct 17, 2003 at 07:25:13PM +0530, Shridhar Daithankar wrote:
> What part of plain vacuum takes disk bandwidth? WAL? Clog? Certainly not 
> data files themselves, right?

Sure, the data files.  The data files still have to be completely
read from beginning to end by VACUUM.

A

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8                                        +1 416 646 3304
x110



Re: Some thoughts about i/o priorities and throttling vacuum

From
Rod Taylor
Date:
On Fri, 2003-10-17 at 10:22, Tom Lane wrote:
> Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> > What part of plain vacuum takes disk bandwidth?
>
> Reading (and possibly rewriting) all the pages.

Would it be possible for the backend to keep a list of the first N (N
being a large number but not significant in memory usage) pages it has
deleted tuples out of and a second list of N pages it has inserted
tuples into.

After the transaction has completed and there is an idle period (say 1/4
second between transaction) it can pass the insert information on a
rollback and delete information on a commit to a separate backend.

This 'vacuum' backend could then prioritize garbage collection for the
pages it knows have been changed performing a single page vacuum when a
specific page has seen a high level of reported activity.

If this daemon could also get a hold of information about idleness of IO
in general the decision about what to vacuum and when may be better
(heavily hit pages during peak periods, all reports pages on medium
load). When completely idle, run through the entire system to get back
as much as possible.

Re: Some thoughts about i/o priorities and throttling vacuum

From
Shridhar Daithankar
Date:
Rod Taylor wrote:

> On Fri, 2003-10-17 at 10:22, Tom Lane wrote:
> 
>>Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
>>
>>>What part of plain vacuum takes disk bandwidth?
>>
>>Reading (and possibly rewriting) all the pages.
> 
> 
> Would it be possible for the backend to keep a list of the first N (N
> being a large number but not significant in memory usage) pages it has
> deleted tuples out of and a second list of N pages it has inserted
> tuples into.

That is RSM, reclaimable space map. It is on TODO.

> After the transaction has completed and there is an idle period (say 1/4
> second between transaction) it can pass the insert information on a
> rollback and delete information on a commit to a separate backend.
> 
> This 'vacuum' backend could then prioritize garbage collection for the
> pages it knows have been changed performing a single page vacuum when a
> specific page has seen a high level of reported activity.
> 
> If this daemon could also get a hold of information about idleness of IO
> in general the decision about what to vacuum and when may be better
> (heavily hit pages during peak periods, all reports pages on medium
> load). When completely idle, run through the entire system to get back
> as much as possible.

I agree. This seems to be the best way of dealing with things. Of course, 
probably there are details we are missing here, but in general its good.
 Shridhar



Re: Some thoughts about i/o priorities and throttling vacuum

From
Tom Lane
Date:
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> I agree. This seems to be the best way of dealing with things. Of course, 
> probably there are details we are missing here, but in general its good.

Actually, this is all pure handwaving, because you are ignoring the need
to remove index tuples.  The existing VACUUM code amortizes index
cleanup over as many tuples as it can.  If you do partial vacuuming of
tables then you are necessarily going to be expending more cycles (and
I/O) per tuple, on average, to get rid of the index entries.  It's not
at all clear that there's any real win to be had in that direction.
Perhaps it's a win, but you have no evidence on which to assert so.
        regards, tom lane


Re: Some thoughts about i/o priorities and throttling vacuum

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> The trick to keep in mind is that the XID comparison functions use
> "modulo" operations, _but_ there are special "frozen" XIDs that are
> always "committed" -- that's why a VACUUM FREEZE would relieve the table
> forever from this problem.

> (At least this is how I understand it -- I could be totally wrong here)

No, that's exactly correct.
        regards, tom lane


Re: Some thoughts about i/o priorities and throttling vacuum

From
"Matthew T. O'Connor"
Date:
On Fri, 2003-10-17 at 10:53, Shridhar Daithankar wrote:
> Matthew T. O'Connor wrote:
> > One additional thing, some of this might be possible if pg_autovacuum
> > saved its data between restarts.  Right now it restarts with no memory
> > of what happened before.  
> 
> Well, the unmaintened gborg version adopted approach of storing such info. in a 
> table, so that it survives postgresql/pg_atuvacuum restart or both.
> 
> That was considered a tablespace pollution back then. But personally I think, it 
> should be ok. If ever it goes to catalogues, I would rather add few columns to 
> pg_class for such a stat. But again, thats not my call to make.

I still consider it tablespace pollution, when / if it gets integrated
into the backend, and it uses system tables that is a different story,
you are not modifying a users database.  What should happen is that on
exit pg_autovacuum writes it's data to a file that it rereads on
startup, or something like that....



Re: Some thoughts about i/o priorities and throttling vacuum

From
Christopher Browne
Date:
shridhar_daithankar@persistent.co.in (Shridhar Daithankar) writes:
> Tom Lane wrote:
>> I was just thinking of a GUC parameter: wait N milliseconds between
>> pages, where N defaults to zero probably.  A user who wants to run his
>> vacuum as a background process could set N larger than zero.  I don't
>> believe we are anywhere near being able to automatically adjust the
>> delay based on load, and even if we could, this would ignore the point
>> you make above --- the user's intent has to matter as much as anything
>> else.
>
> I am slightly confused here. IIRC pg_autovacuum never did a vacuum
> full. At the most it does vacuum /vacuum analyse, none of which chew
> disk bandwidth. 

[remainder elided; your second sentence is the vital bit...]

> What am I missing?

You are missing that VACUUM most certainly _does_ chew up disk
bandwidth, because it must load the pages of the table into memory.

If the system is busy doing other I/O, then the other I/O has to
compete with the I/O initiated by VACUUM.

VACUUM FULL is certainly more expensive than VACUUM/VACUUM ANALYZE;
the point is that even the latter is NOT free on big tables when there
is a lot of "traffic."

VACUUM is like putting an extra few transport trucks onto the highway.
It may only go from one highway junction to the next, and be fairly
brief, if traffic is moving well.  But if traffic is heavy, it adds to
the congestion.  (And that's as far as the analogy can go; I can't
imagine a way of drawing the GUC parameter into this...)
-- 
(format nil "~S@~S" "cbbrowne" "libertyrms.info")
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)


Re: Some thoughts about i/o priorities and throttling vacuum

From
Christopher Browne
Date:
shridhar_daithankar@persistent.co.in (Shridhar Daithankar) writes:

> Andrew Sullivan wrote:
>
>> On Fri, Oct 17, 2003 at 07:04:45PM +0530, Shridhar Daithankar wrote:
>>
>>> I am slightly confused here. IIRC pg_autovacuum never did a vacuum
>>> full. At the most it does vacuum /vacuum analyse, none of which
>>> chew disk bandwidth.
>> The latter is false.  VACUUM FULL certainly uses _more_ disk
>> bandwidth than VACUUM, but it's just false that plain VACUUM doesn't
>> contend for disk.  And if you're already maxed, then that extra
>> bandwidth you cannot afford.
>
> What part of plain vacuum takes disk bandwidth? WAL? Clog? Certainly
> not data files themselves, right?

Certainly YES, the data files themselves.

VACUUM has to read through the pages to assess what tuples are to
expire.  So if the data file is 8GB long, VACUUM has to read through
8GB of data.

As compared to VACUUM FULL, it is certainly cheaper, as it is not
rummaging around to reorder pages, but rather walking through, single
page by single page.  Thus, where VACUUM FULL might involve (in
effect) reading through the file several times (as it shifts data
between pages), VACUUM only reads through it once.  

That's (for the "for instance") 8GB of reads.
-- 
"cbbrowne","@","libertyrms.info"
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)


Re: Some thoughts about i/o priorities and throttling vacuum

From
Greg Stark
Date:
Christopher Browne <cbbrowne@libertyrms.info> writes:

> VACUUM is like putting an extra few transport trucks onto the highway.
> It may only go from one highway junction to the next, and be fairly
> brief, if traffic is moving well.  But if traffic is heavy, it adds to
> the congestion.  (And that's as far as the analogy can go; I can't
> imagine a way of drawing the GUC parameter into this...)

Ooh strained metaphors. This game is always fun.

So I think of it the other way around. A busy database is like downtown
traffic with everyone going every which way for short trips. Running vacuum is
like having a few trucks driving through your city streets for through
traffic. 

Having a parameter to slow down the through traffic is like, uh, having
express lanes for local traffic. er, yeah, that's the ticket. Except who ever
heard of having express lanes for local traffic. Hm.

-- 
greg



Re: Some thoughts about i/o priorities and throttling vacuum

From
Mike Mascari
Date:
Greg Stark wrote:

> Christopher Browne <cbbrowne@libertyrms.info> writes:
> 
>>VACUUM is like putting an extra few transport trucks onto the highway.
>>It may only go from one highway junction to the next, and be fairly
>>brief, if traffic is moving well.  But if traffic is heavy, it adds to
>>the congestion.  (And that's as far as the analogy can go; I can't
>>imagine a way of drawing the GUC parameter into this...)
> 
> Ooh strained metaphors. This game is always fun.
> 
> So I think of it the other way around. A busy database is like downtown
> traffic with everyone going every which way for short trips. Running vacuum is
> like having a few trucks driving through your city streets for through
> traffic. 
> 
> Having a parameter to slow down the through traffic is like, uh, having
> express lanes for local traffic. er, yeah, that's the ticket. Except who ever
> heard of having express lanes for local traffic. Hm.

All I know is that Jan Wieck would have each car filled to the brim
with spikes....

Mike Mascari
mascarm@mascari.com





Re: Some thoughts about i/o priorities and throttling vacuum

From
"Matthew T. O'Connor"
Date:
On Fri, 2003-10-17 at 14:56, Mike Mascari wrote:
> Greg Stark wrote:
> > Ooh strained metaphors. This game is always fun.
> > 
> > So I think of it the other way around. A busy database is like downtown
> > traffic with everyone going every which way for short trips. Running vacuum is
> > like having a few trucks driving through your city streets for through
> > traffic. 
> > 
> > Having a parameter to slow down the through traffic is like, uh, having
> > express lanes for local traffic. er, yeah, that's the ticket. Except who ever
> > heard of having express lanes for local traffic. Hm.
> 
> All I know is that Jan Wieck would have each car filled to the brim
> with spikes....

ROTFLAMO




Re: Some thoughts about i/o priorities and throttling vacuum

From
Christopher Browne
Date:
mascarm@mascari.com (Mike Mascari) writes:
> Greg Stark wrote:
>> Christopher Browne <cbbrowne@libertyrms.info> writes:
>>>VACUUM is like putting an extra few transport trucks onto the
>>>highway.  It may only go from one highway junction to the next, and
>>>be fairly brief, if traffic is moving well.  But if traffic is
>>>heavy, it adds to the congestion.  (And that's as far as the
>>>analogy can go; I can't imagine a way of drawing the GUC parameter
>>>into this...)
>> 
>> Ooh strained metaphors. This game is always fun.
>> 
>> So I think of it the other way around. A busy database is like
>> downtown traffic with everyone going every which way for short
>> trips. Running vacuum is like having a few trucks driving through
>> your city streets for through traffic.
>> 
>> Having a parameter to slow down the through traffic is like, uh,
>> having express lanes for local traffic. er, yeah, that's the
>> ticket. Except who ever heard of having express lanes for local
>> traffic. Hm.
>
> All I know is that Jan Wieck would have each car filled to the brim
> with spikes....

No, you just need _one_ spike.

_One_ spike in the centre of the steering wheel.

There would be _so_ much less tailgating if they had those spikes...
-- 
"cbbrowne","@","libertyrms.info"
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)