Re: Some vacuum & tuning help - Mailing list pgsql-performance

From Matthew T. O'Connor
Subject Re: Some vacuum & tuning help
Date
Msg-id 001101c35b66$76ba0480$c202a8c0@hplaptop
Whole thread Raw
In response to Re: Some vacuum & tuning help  ("Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>)
List pgsql-performance
From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>
> On 5 Aug 2003 at 10:29, Christopher Browne wrote:
>
> > Shridhar Daithankar wrote:
> > There would be some merit to having pg_autovacuum throw in some tables
> > in which to store persistent information,
>
> Well, the C++ version I wrote quite a while back, which resides on gborg
and
> unmaintained, did that. It was considered as table pollution. However
whenever
> autovacuum stuff goes in backend as such, it is going to need a catalogue.

right, I think there is a distinction between adding a system catalogue
needed for core functionality, and requiring a user to put a table in the
users name space of their database just to run a utility.  As I mentioned in
my other email, I do think that a non-required helper table could be a good
idea, the question is should we do this considering autovacuum should be
integrated into the backend at which point pg_autovacuum will be scrapped.

> >  -> Some tables should _never_ be touched;
>
> That can be determined runtime from stats. Not required as a special
feature
> IMHO..
>
> >
> >  -> Some tables might get "reset" to indicate that they should be
> >     considered as having been recently vacuumed, or perhaps that they
> >     badly need vacuuming;
>
> Well, stats collector takes care of that. Autovacuum daemon reads that
> statistics, maintain  a periodic snapshot of the same to determine whether
or
> not it needs to vacuum.

Actually I don't think that pg_autovacuum will notice.  The stats that it
watches no nothing about when a table is vacuumed outside of pg_autovacuum.
I agree this is a deficiency, but I don't know how to get that information
without being part of the backend.

> Why it crawls for a dirty database is as follows. Autovauum daemon starts,
read
> statistics, sets it as base level and let a cycle pass, which is typically
few
> minutes. When it goes again, it finds that lots of things are modified and
need
> vacuum and so it triggers vacuum.
>
> Now vacuum goes on cleaning entire table which might be days job
continously
> postponed some one reason or another. Oops.. your database is on it's
knees..

If one table takes days (or even hours) to vacuum, then most probably it
requires *a lot* of activity before pg_autovacuum will try to vacuum the
table.  The thresholds are based on two factors, a base value (default =
1,000), and a multiplier (default = 2) of the total number of tuples.  So
using the default pg_autovacuum settings, a table with 1,000,000 rows will
not be vacuumed until the number of rows updated or deleted = 2,001,000.
So, a table shouldn't be vacuumed until it really needs it.  This setup
works well since a small table of say 100 rows, will be updated every 1,200
(updates or deletes).
>
> >  -> As you suggest, per-table thresholds;
>
> I would rather put it in terms of pages. If any table wastes 100 pages
each, it
> deserves a vacuum..

unfortunately I don't know of an efficient method of looking at how many
pages have free space without running vacuum or without using the
pgstattuple contrib module which in my testing took about 90% as long to run
as vacuum.

> >  -> pg_autovacuum would know when tables were last vacuumed by
> >     it...

pg_autovacuum already does this, but the data does not persist through
pg_autovacuum restarts.

> If you maintain a table in database, there are lot of things you can
maintain.
> And you need to connect to database anyway to fire vacuum..
>
> >  -> You could record vacuum times to tell pg_autovacuum that you
> >     vacuumed something "behind its back."
>
> It should notice..

I don't think it does.

> >  -> If the system queued up proposed vacuums by having a "queue"
> >     table, you could request that pg_autovacuum do a vacuum on a
> >     particular table at the next opportunity.

That would be a design changes as right now pg_autovacuum doesn't keep a
list of tables to vacuum at all, it just decides to vacuum or not vacuum a
table as it loops through the database.

> > Unfortunately, the "integrate into the backend" thing has long seemed
> > "just around the corner."  I think we should either:
> >  a) Decide to enhance pg_autovacuum, or
> >  b) Not.

I have been of the opinion to not enhance pg_autovacuum because it needs to
be intgrated, and enhancing it will only put that off.  Also, I think many
of the real enhancements can only come from being integrated (using the FSM
to make decisions, keepting track of external vacuums, modifying system
catalogs to keep autovacuum information etc...)

> In fact, I would say that after we have autovacuum, we should not
integrate it.
> It is a very handy tool of tighting a database. Other database go other
way
> round. They develop maintance functionality built in and then create tool
on
> top of it. Here we have it already done.

I'm not sure I understand your point.

> It's just that it should be triggered by default. That would rock..


I agree that if pg_autovacuum becomes a core tool (not contrib and not
integrated into backend) then pg_ctl should fire it up and kill it
automatically.


pgsql-performance by date:

Previous
From: Joe Conway
Date:
Subject: Re: [SQL] EXTERNAL storage and substring on long strings
Next
From: Tom Lane
Date:
Subject: Re: Some vacuum & tuning help