Thread: possible vacuum improvement?

possible vacuum improvement?

From

Mario Weilguni

Date:

03 September 2002, 02:53:37

I know everyone is busy with the 7.3beta, but maybe this is something to think of before releasing the beta. Currently
VACUUMwill vacuum every table, but sometimes 
it's desireable to leave tables untouched because the're mostly static or protocol tables. In my case this would be the
pg_largeobjectwhich is around 4GB of data, while the 
other tables are ~40MB. Vacuuming the data is important, the large object table however rarely changes. The same goes
for a protocol table which is around 1GB and never is 
changed beside INSERTS, so it's just growing, but never needs vacuum. VACUUM on the 4GB table needs a long long time
andno improvements, it just hurts performance and 
fills OS buffers.

If pg_class would have a field for storing misc flags (e.g. a bitfield). This would allow to set a flag like
NO_AUTO_VACUUMand modify the vacuum code to leave that tables untouched 
if not specified by hand. Maybe there are other uses for such a bitfield too, and will help  prevent an initdb for
simpleimprovements. 

Any comments?

Best regards,Mario Weilguni

Re: possible vacuum improvement?

From

"Shridhar Daithankar"

Date:

03 September 2002, 03:06:51

On 3 Sep 2002 at 8:55, Mario Weilguni wrote:

> I know everyone is busy with the 7.3beta, but maybe this is something to think of before releasing the beta.
CurrentlyVACUUM will vacuum every table, but sometimes

> it's desireable to leave tables untouched because the're mostly static or protocol tables. In my case this would be
thepg_largeobject which is around 4GB of data, while the

> other tables are ~40MB. Vacuuming the data is important, the large object table however rarely changes. The same goes
for a protocol table which is around 1GB and never is

> changed beside INSERTS, so it's just growing, but never needs vacuum. VACUUM on the 4GB table needs a long long time
andno improvements, it just hurts performance and

> fills OS buffers.
> 
> If pg_class would have a field for storing misc flags (e.g. a bitfield). This would allow to set a flag like
NO_AUTO_VACUUMand modify the vacuum code to leave that tables untouched

> if not specified by hand. Maybe there are other uses for such a bitfield too, and will help  prevent an initdb for
simpleimprovements.

> 
> Any comments?

I suggest vacumming only the table that changes. Further I believe, 
updates/deletes should be watched for performance as they cause dead tuples. Of 
course insert impacts statistics and should be monitored but something like a 
log table does not need vacuuming that often..

Knowing the application load can help a lot in tuning the DB, in short.

I was running a banking simulation for benchmarking. I know that accounts table 
gets updated for each transaction but log table is just an insert. So rather 
than vacumming entire db, just doing 'vacuum analyze accounts' give me almost 
same results. 

Performance was far better in earlier case. Without any vacuum I got something 
like 50 tps for 80K transactions. With 'vacuum analyze accounts' for each 5K 
transactions I got 200tps.

Personally I would prefer to have a trigger on a metadata table where I could 
trigger vacuuming a particular table each n number of transactions(Oh it would 
be great if that vacuum runs in background not blocking meta data table.. just 
a wishlist...). Can anybody tell me which table I could write such a trigger? I 
went thr. pg_* for some time but didn't find what I was looking for..

ByeShridhar

--
Reisner's Rule of Conceptual Inertia:    If you think big enough, you'll never 
have to do it.

Re: possible vacuum improvement?

From

"Christopher Kings-Lynne"

Date:

03 September 2002, 03:14:31

> Personally I would prefer to have a trigger on a metadata table
> where I could
> trigger vacuuming a particular table each n number of
> transactions(Oh it would
> be great if that vacuum runs in background not blocking meta data
> table.. just
> a wishlist...). Can anybody tell me which table I could write
> such a trigger? I
> went thr. pg_* for some time but didn't find what I was looking for..

Actually, if you wrote it in C and kept some static data on each table, you
could probably write a vacuum trigger pretty easily.  You could even keep
the info in a table.

Chris

Re: possible vacuum improvement?

From

"Shridhar Daithankar"

Date:

03 September 2002, 03:31:16

On 3 Sep 2002 at 15:14, Christopher Kings-Lynne wrote:

> > Personally I would prefer to have a trigger on a metadata table
> > where I could
> > trigger vacuuming a particular table each n number of
> > transactions(Oh it would
> > be great if that vacuum runs in background not blocking meta data
> > table.. just
> > a wishlist...). Can anybody tell me which table I could write
> > such a trigger? I
> > went thr. pg_* for some time but didn't find what I was looking for..
> 
> Actually, if you wrote it in C and kept some static data on each table, you
> could probably write a vacuum trigger pretty easily.  You could even keep
> the info in a table.

Actually that's what I did. Update global  transaction counter than trigger the 
vacuum from a spare thread.

but having it in DB has advantages of centralisation. It's just a good to have 
kind of thing..

ByeShridhar

--
"I don't know why, but first C programs tend to look a lot worse thanfirst 
programs in any other language (maybe except for fortran, but thenI suspect all 
fortran programs look like `firsts')"(By Olaf Kirch)

Re: possible vacuum improvement?

From

"Mario Weilguni"

Date:

03 September 2002, 03:36:36

> gets updated for each transaction but log table is just an insert. So
rather
> than vacumming entire db, just doing 'vacuum analyze accounts' give me
almost
> same results.
>

That is not really practicable, one datebase has 107 tables, and making a
cron job
with 107 vacuum calls is completly out of question and very error prone
anyway.

Regards,   Mario Weilguni

Re: possible vacuum improvement?

From

"Christopher Kings-Lynne"

Date:

03 September 2002, 03:39:12

> Actually that's what I did. Update global  transaction counter
> than trigger the
> vacuum from a spare thread.
>
> but having it in DB has advantages of centralisation. It's just a
> good to have
> kind of thing..

Care to submit it as a BSD licensed contrib module then?  Or at least create
a project for it on http://gborg.postgresql.org/ ?

Chris

Re: possible vacuum improvement?

From

"Shridhar Daithankar"

Date:

03 September 2002, 03:44:28

On 3 Sep 2002 at 9:36, Mario Weilguni wrote:
> That is not really practicable, one datebase has 107 tables, and making a
> cron job
> with 107 vacuum calls is completly out of question and very error prone
> anyway.

That's correct.. What are the possible alternatives? Either backend has to 
support something or the DBA has to script something.

1)If number of tables that need vacuum are far more than those who don't, then 
a simple all  vacuum would do. But again sizes of individual tables will affect 
that judgement as well.

2)As OP suggested, if vacuum could pick up only those tables marked by 
bitfields, ay by an additional option like, 'vacuum analyse frequent_ones'.. 
this is going to need a backend change.

3)I guess scripting cron job for vacuum is one time job. If it's desparately 
needed, say 60 tables out of 107 require vacuum, personally I would spend some 
time making that script. Depends upon the requirement actually.

On a sidenote, does anybody have some statistics from benchmark may be, as in 
what's a rule of thumb for vacuuming? I found that a vacuum every 5K-10K 
transactions increases the tps like anything but below 1K transactions, it's 
not as much effective. May be one should consider this factor as well..

ByeShridhar

--
Pascal:    A programming language named after a man who would turn over    in his 
grave if he knew about it.        -- Datamation, January 15, 1984

Re: possible vacuum improvement?

From

"Shridhar Daithankar"

Date:

03 September 2002, 03:58:39

On 3 Sep 2002 at 15:39, Christopher Kings-Lynne wrote:

> > Actually that's what I did. Update global  transaction counter
> > than trigger the
> > vacuum from a spare thread.
> >
> > but having it in DB has advantages of centralisation. It's just a
> > good to have
> > kind of thing..
> 
> Care to submit it as a BSD licensed contrib module then?  Or at least create
> a project for it on http://gborg.postgresql.org/ ?

Sounds like a nice idea. I would do that by this week end, once I finalise the 
details about it.

Give me couple of days to finish it. Will come back soon with that..

ByeShridhar

--
Reporter, n.:    A writer who guesses his way to the truth and dispels it with a    
tempest of words.        -- Ambrose Bierce, "The Devil's Dictionary"

Re: possible vacuum improvement?

From

Rod Taylor

Date:

03 September 2002, 07:57:51

On Tue, 2002-09-03 at 03:36, Mario Weilguni wrote:
> > gets updated for each transaction but log table is just an insert. So
> rather
> > than vacumming entire db, just doing 'vacuum analyze accounts' give me
> almost
> > same results.
> >
> 
> That is not really practicable, one datebase has 107 tables, and making a
> cron job
> with 107 vacuum calls is completly out of question and very error prone
> anyway.

So...  Write a script which does something like:

skiptables = "'skipme' 'andme'"
tables = `psql -c 'SELECT relname from pg_class where relname not in
(${skiptables})' template1`

for tab in ${tables} ; do vacuumdb -t ${tab}
done


Fill in the holes and your done -- get the right pg_class type, handle
schemas appropriately, etc.

Re: possible vacuum improvement?

From

Tom Lane

Date:

03 September 2002, 09:49:53

Mario Weilguni <mweilguni@sime.com> writes:
> I know everyone is busy with the 7.3beta, but maybe this is something
> to think of before releasing the beta.

We are already in feature freeze.

In terms of what might happen for 7.4 or beyond, what I'd personally
like to see is some "auto vacuum" facility that would launch background
vacuums automatically every so often.  This could (eventually) be made
self-tuning so that it would vacuum heavily-updated tables more often
than seldom-updated ones --- while not forgetting the
every-billion-transactions rule...
        regards, tom lane

Re: possible vacuum improvement?

From

"Shridhar Daithankar"

Date:

03 September 2002, 10:25:42

On 3 Sep 2002 at 9:49, Tom Lane wrote:

> In terms of what might happen for 7.4 or beyond, what I'd personally
> like to see is some "auto vacuum" facility that would launch background
> vacuums automatically every so often.  This could (eventually) be made
> self-tuning so that it would vacuum heavily-updated tables more often
> than seldom-updated ones --- while not forgetting the
> every-billion-transactions rule...

OK, I plan to work on this. Here is my brief idea

1)Create a table (vacuum_info) that stores table name and auto vacuum defaults. 
Since I am planning this in contrib, I would not touch pg_class.

The table will store- table names- number of transactions to trigger vacuum analyze(default 1K)- number of transactions
totrigger full vacuum(default 10K)    

A trigger on pg_class i.e. table creation should add a row in this table as 
well.

2)Write a trigger on tables that updates statistics on table activity. I see 

-pg_stat_all_tables
-pg_stat_sys_tables
-pg_stat_user_tables. 

The columns are 

-n_tup_ins     
-n_tup_upd    
-n_tup_del     

Of course it will ignore it's own updates and inserts to avoid infinite loops. 
This will update the pseudo statistics in vacuum_info table

Another trigger on vacuum_info will trigger vacuum if required. Ideally I would 
write it in external multithreaded library to trigger vacuum in background 
without blocking operations on vacuum_info table.

I need to know the following..

1)Is this sounds like a workable solution?

2)Is this as simple as I have put here or am I missing some vital components?

3)Is there some kind of rework involved?

4)Is use of threads sounds portable enough? I just need to trigger a thread in 
background and return. No locking, nothing is required. Will there be any 
problem for postgres invoking such an external trigger?

5)When I create a function in a .so, is it possible to invoke init/startup 
routines? I can create and destroy thread in these routine to avoid thread 
creation overhead. If postgres is using dlopen, I can use _init, _fini. 

6)such a 'daemon' would be on per back-end basis if I am guessing correctly. 
Would locking things in transactions for vacuum_info be sufficient?

I hope I am making a sensible proposal/design(My first attempt to contribute to 
postgres). Please let me know your comments. 

ByeShridhar

--
Blast medicine anyway!  We've learned to tie into every organ in thehuman body 
but one.  The brain!  The brain is what life is all about.        -- McCoy, "The 
Menagerie", stardate 3012.4

Re: possible vacuum improvement?

From

Tom Lane

Date:

03 September 2002, 11:02:28

"Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes:
> 1)Is this sounds like a workable solution?

Adding a trigger to every tuple update won't do at all.  Storing the
counts in a table won't do either, as the updates on that table will
generate a huge amount of wasted space themselves (not to mention
enough contention to destroy concurrent performance).

> 4)Is use of threads sounds portable enough?

Threads are completely out of the question, at least if you have any
hope of seeing this code get accepted into the core distro.

For vacuum's purposes all that we really care to know about is the
number of obsoleted tuples in each table: committed deletes and updates,
and aborted inserts and updates all count.  Furthermore, we do not need
or want a 100% reliable solution; approximate counts would be plenty
good enough.

What I had in the back of my mind was: each backend counts attempted
insertions and deletions in its relcache entries (an update adds to both
counts).  At transaction commit or abort, we know which of these two
counts represents the number of dead tuples added to each relation, so
while we scan the relcache for post-xact cleanup (which we will be doing
anyway) we can transfer the correct count into the shared FSM entry for
the relation.  This gives us a reasonably accurate count in shared
memory of all the tuple obsoletions since bootup, at least for
heavily-used tables.  (The FSM might choose to forget about lightly-used
tables.)  The auto vacuumer could look at the FSM numbers to decide
which tables are highest priority to vacuum.

This scheme would lose the count info on a database restart, but that
doesn't bother me.  In typical scenarios the same tables will soon get
enough new counts to be highly ranked for vacuuming.  In any case the
auto vacuumer must be designed so that it vacuums every table every so
often anyhow, so the possibility of forgetting that there were some dead
tuples in a given table isn't catastrophic.

I do not think we need or want a control table for this; certainly I see
no need for per-table manual control over this process.  There should
probably be a few knobs in the form of GUC parameters so that the admin
can control how much overall work the auto-vacuumer does.  For instance
you'd probably like to turn it off when under peak interactive load.
        regards, tom lane

Re: possible vacuum improvement?

From

Rod Taylor

Date:

03 September 2002, 11:09:51

On Tue, 2002-09-03 at 11:01, Tom Lane wrote:
> "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes:
> > 1)Is this sounds like a workable solution?
> 
> Adding a trigger to every tuple update won't do at all.  Storing the
> counts in a table won't do either, as the updates on that table will
> generate a huge amount of wasted space themselves (not to mention
> enough contention to destroy concurrent performance).
> 
> > 4)Is use of threads sounds portable enough?
> 
> Threads are completely out of the question, at least if you have any
> hope of seeing this code get accepted into the core distro.
> 
> 
> For vacuum's purposes all that we really care to know about is the
> number of obsoleted tuples in each table: committed deletes and updates,
> and aborted inserts and updates all count.  Furthermore, we do not need
> or want a 100% reliable solution; approximate counts would be plenty
> good enough.

It would be nice if it could track successful inserts, and fire off an
analyze run when it changes more than 20% from what stats says.

Re: possible vacuum improvement?

From

Tom Lane

Date:

03 September 2002, 11:19:31

Rod Taylor <rbt@zort.ca> writes:
> On Tue, 2002-09-03 at 11:01, Tom Lane wrote:
>> For vacuum's purposes all that we really care to know about is the
>> number of obsoleted tuples in each table: committed deletes and updates,
>> and aborted inserts and updates all count.  Furthermore, we do not need
>> or want a 100% reliable solution; approximate counts would be plenty
>> good enough.

> It would be nice if it could track successful inserts, and fire off an
> analyze run when it changes more than 20% from what stats says.

That's a thought too.  I was only thinking of space reclamation, but
it'd be easy to extend the scheme to keep track of the number of tuples
successfully inserted, changed, or deleted (all three events would
affect stats) as well as the number of dead tuples.  Then you could fire
auto-analyze every so often, along with auto-vacuum.

Auto-analyze might need more tuning controls than auto-vacuum, though.
Vacuum doesn't have any question about when it needs to run: a dead
tuple is a dead tuple.  But for analyze you might have plenty of update
traffic and yet no meaningful change in the interesting stats for a
table.  An admin who knows the behavior of his tables would like to be
able to configure the frequency of analyze runs, rather than trust to
a necessarily-not-too-bright auto-analyze routine.  (Not sure whether
this is important enough to warrant the complications of making it
configurable though.  You can always do it the old-fashioned way with
cron scripts if you want that kind of control, I suppose.)
        regards, tom lane

Re: possible vacuum improvement?

From

"Mario Weilguni"

Date:

03 September 2002, 11:26:20

>I do not think we need or want a control table for this; certainly I see
>no need for per-table manual control over this process.  There should
>probably be a few knobs in the form of GUC parameters so that the admin
>can control how much overall work the auto-vacuumer does.  For instance
>you'd probably like to turn it off when under peak interactive load.

If (auto)vacuum is clever to check that some tables do not need vacuum
there's really no need for that. That brings me to another point, can't the
statistics collector used for that?

For my database I wrote a statistic display program for web-access, and all
the info autovacuum would need is here.
http://mw.sime.com/pgsql.htm

That brings me to another point, is there interest for this
web-statistics-frontend, maybe for /contrib? I found it extremly useful
because it showed up the weak points in my applications.

Best regards,Mario Weilguni

Re: possible vacuum improvement?

From

Barry Lind

Date:

03 September 2002, 14:13:36

Wouldn't it make sense to implement autovacuum information in a struture 
like the FSM, a Dirty Space Map (DSM)?  As blocks are dirtied by 
transactions they can be added to the DSM.  Then vacuum can give 
priority processing to those blocks only.  The reason I suggest this is 
that in many usage senerios it will be more efficient to only vacuum 
part of a table than the entire table.  Given a large table that grows 
over time, it tends to be the case that older data in the table becomes 
more static as it ages (a lot of financial data is like this, when it is 
initially created it may get a lot of updates done early in it's life 
and may even be deleted, but once the data gets older (for example a 
year old), it is unlikely to change).  This would imply that over time 
the first blocks in a table will change less and most activity will 
occur towards the end of the table.  If you have a multigig table, where 
most of the activity occurs near the end, a lot of cpu cycles can be 
wasted going over the mostly static begining of the table.

thanks,
--Barry

Tom Lane wrote:

>"Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes:
>  
>
>>1)Is this sounds like a workable solution?
>>    
>>
>
>Adding a trigger to every tuple update won't do at all.  Storing the
>counts in a table won't do either, as the updates on that table will
>generate a huge amount of wasted space themselves (not to mention
>enough contention to destroy concurrent performance).
>
>  
>
>>4)Is use of threads sounds portable enough?
>>    
>>
>
>Threads are completely out of the question, at least if you have any
>hope of seeing this code get accepted into the core distro.
>
>
>For vacuum's purposes all that we really care to know about is the
>number of obsoleted tuples in each table: committed deletes and updates,
>and aborted inserts and updates all count.  Furthermore, we do not need
>or want a 100% reliable solution; approximate counts would be plenty
>good enough.
>
>What I had in the back of my mind was: each backend counts attempted
>insertions and deletions in its relcache entries (an update adds to both
>counts).  At transaction commit or abort, we know which of these two
>counts represents the number of dead tuples added to each relation, so
>while we scan the relcache for post-xact cleanup (which we will be doing
>anyway) we can transfer the correct count into the shared FSM entry for
>the relation.  This gives us a reasonably accurate count in shared
>memory of all the tuple obsoletions since bootup, at least for
>heavily-used tables.  (The FSM might choose to forget about lightly-used
>tables.)  The auto vacuumer could look at the FSM numbers to decide
>which tables are highest priority to vacuum.
>
>This scheme would lose the count info on a database restart, but that
>doesn't bother me.  In typical scenarios the same tables will soon get
>enough new counts to be highly ranked for vacuuming.  In any case the
>auto vacuumer must be designed so that it vacuums every table every so
>often anyhow, so the possibility of forgetting that there were some dead
>tuples in a given table isn't catastrophic.
>
>I do not think we need or want a control table for this; certainly I see
>no need for per-table manual control over this process.  There should
>probably be a few knobs in the form of GUC parameters so that the admin
>can control how much overall work the auto-vacuumer does.  For instance
>you'd probably like to turn it off when under peak interactive load.
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>  
>

Re: possible vacuum improvement?

From

Tom Lane

Date:

03 September 2002, 16:26:08

"Mario Weilguni" <mario.weilguni@icomedias.com> writes:
> That brings me to another point, can't the
> statistics collector used for that?

Hmm, that would be a different way of attacking the problem.  Not sure
offhand which is better, but it'd surely be worth considering both.

Note that collecting of dead-tuple counts requires input from aborted
transactions as well as successful ones.  I don't recall whether the
stats collector currently collects anything from aborted xacts; that
might or might not be a sticky point.
        regards, tom lane

Re: possible vacuum improvement?

From

"Christopher Kings-Lynne"

Date:

03 September 2002, 22:03:32

> That brings me to another point, is there interest for this
> web-statistics-frontend, maybe for /contrib? I found it extremly useful
> because it showed up the weak points in my applications.

Why not create a project here for it:  http://gborg.postgresql.org/

Chris

Re: possible vacuum improvement?

From

"Matthew T. OConnor"

Date:

03 September 2002, 23:44:29

On Tuesday 03 September 2002 16:24, Tom Lane wrote:
> "Mario Weilguni" <mario.weilguni@icomedias.com> writes:
> > That brings me to another point, can't the
> > statistics collector used for that?
>
> Hmm, that would be a different way of attacking the problem.  Not sure
> offhand which is better, but it'd surely be worth considering both.
>
> Note that collecting of dead-tuple counts requires input from aborted
> transactions as well as successful ones.  I don't recall whether the
> stats collector currently collects anything from aborted xacts; that
> might or might not be a sticky point.

I have been doing some poking around with this item, and I was planning on
using the stats collector to do "intelligent" auto-vacuuming.  I was planning
on adding some new columns that account for activity that has taken place
since the last vacuum.  The current stats collector shows n_tup_ins,
n_tup_upd and n_tup_del for any given rel, but those numbers have nothing to
do with what has happened since the last vacuum, hence nothing to do with
current status or need for vacuum.

I hope to have something worth showing soon (a week or two).  I know that is a
bit slow, but I am new at pg internals and since we are in beta I know this
is a 7.4 item.

FYI, the current stats collector does keep track of inserts, updates and
deletes that are part of a rolled back transaction, as shown in the example
below:

matthew=# create TABLE foo (id serial, name text);
NOTICE:  CREATE TABLE will create implicit sequence 'foo_id_seq' for SERIAL
column 'foo.id'
CREATE TABLE
matthew=# select relname,n_tup_ins, n_tup_upd, n_tup_del from
pg_stat_all_tables where relname = 'foo';relname | n_tup_ins | n_tup_upd | n_tup_del
---------+-----------+-----------+-----------foo     |         0 |         0 |         0
(1 row)

matthew=# INSERT INTO foo (name) VALUES ('asdf');
INSERT 17075 1
matthew=# UPDATE foo SET name='qwert';
UPDATE 1
matthew=# DELETE FROM foo;
DELETE 1
matthew=# select relname,n_tup_ins, n_tup_upd, n_tup_del from
pg_stat_all_tables where relname = 'foo';relname | n_tup_ins | n_tup_upd | n_tup_del
---------+-----------+-----------+-----------foo     |         1 |         1 |         1
(1 row)

matthew=# begin;
BEGIN
matthew=# INSERT INTO foo (name) VALUES ('asdf');
INSERT 17076 1
matthew=# UPDATE foo SET name='qwert';
UPDATE 1
matthew=# DELETE FROM foo;
DELETE 1
matthew=# rollback;
ROLLBACK
matthew=# select relname,n_tup_ins, n_tup_upd, n_tup_del from
pg_stat_all_tables where relname = 'foo';relname | n_tup_ins | n_tup_upd | n_tup_del
---------+-----------+-----------+-----------foo     |         2 |         2 |         2
(1 row)

Re: possible vacuum improvement?

From

"Christopher Kings-Lynne"

Date:

03 September 2002, 23:47:49

> I have been doing some poking around with this item, and I was
> planning on
> using the stats collector to do "intelligent" auto-vacuuming.  I
> was planning
> on adding some new columns that account for activity that has taken place
> since the last vacuum.  The current stats collector shows n_tup_ins,
> n_tup_upd and n_tup_del for any given rel, but those numbers have
> nothing to
> do with what has happened since the last vacuum, hence nothing to do with
> current status or need for vacuum.

Postgres 7.3-beta has a new function 'pg_stat_reset()' that you can call to reset the stats collector after a vacuum...

Chris

Re: possible vacuum improvement?

From

"Matthew T. OConnor"

Date:

03 September 2002, 23:59:23

On Tuesday 03 September 2002 23:47, Christopher Kings-Lynne wrote:
> > I have been doing some poking around with this item, and I was
> > planning on
> > using the stats collector to do "intelligent" auto-vacuuming.  I
> > was planning
> > on adding some new columns that account for activity that has taken place
> > since the last vacuum.  The current stats collector shows n_tup_ins,
> > n_tup_upd and n_tup_del for any given rel, but those numbers have
> > nothing to
> > do with what has happened since the last vacuum, hence nothing to do with
> > current status or need for vacuum.
>
> Postgres 7.3-beta has a new function 'pg_stat_reset()' that you can call to
> reset the stats collector after a vacuum...

Just my opinion here, but I don't think having autovac constantly resetting
the stats is a good idea, it means that you lose the current stat
functionality when using autovacuum, and also implies that the stats mean
differnet things if autovac is turned on or off.

Re: possible vacuum improvement?

From

Mario Weilguni

Date:

04 September 2002, 01:54:33

Am Mittwoch, 4. September 2002 05:44 schrieb Matthew T. OConnor:
> I have been doing some poking around with this item, and I was planning on
> using the stats collector to do "intelligent" auto-vacuuming.  I was
> planning on adding some new columns that account for activity that has
> taken place since the last vacuum.  The current stats collector shows
> n_tup_ins, n_tup_upd and n_tup_del for any given rel, but those numbers
> have nothing to do with what has happened since the last vacuum, hence
> nothing to do with current status or need for vacuum.

This should be no real problem, extending the table pg_stat_all_tables with 3 fields
"av_n_tup_ins", "av_n_tup_upd", "av_n_tup_del" should do it IMO.

Re: possible vacuum improvement?

From

Richard Tucker

Date:

05 September 2002, 12:30:37

How about counting the number of dead tuples examined and the number of live
tuples returned.  As the ratio of dead tuples over live tuples visited
increases the table becomes a candidate for vacuuming.
-regards
richt

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Tom Lane
> Sent: Tuesday, September 03, 2002 4:25 PM
> To: Mario Weilguni
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] possible vacuum improvement?
>
>
> "Mario Weilguni" <mario.weilguni@icomedias.com> writes:
> > That brings me to another point, can't the
> > statistics collector used for that?
>
> Hmm, that would be a different way of attacking the problem.  Not sure
> offhand which is better, but it'd surely be worth considering both.
>
> Note that collecting of dead-tuple counts requires input from aborted
> transactions as well as successful ones.  I don't recall whether the
> stats collector currently collects anything from aborted xacts; that
> might or might not be a sticky point.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>