Re: Some vacuum & tuning help - Mailing list pgsql-performance
From | Matthew T. O'Connor |
---|---|
Subject | Re: Some vacuum & tuning help |
Date | |
Msg-id | 001101c35b66$76ba0480$c202a8c0@hplaptop Whole thread Raw |
In response to | Re: Some vacuum & tuning help ("Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>) |
List | pgsql-performance |
From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> > On 5 Aug 2003 at 10:29, Christopher Browne wrote: > > > Shridhar Daithankar wrote: > > There would be some merit to having pg_autovacuum throw in some tables > > in which to store persistent information, > > Well, the C++ version I wrote quite a while back, which resides on gborg and > unmaintained, did that. It was considered as table pollution. However whenever > autovacuum stuff goes in backend as such, it is going to need a catalogue. right, I think there is a distinction between adding a system catalogue needed for core functionality, and requiring a user to put a table in the users name space of their database just to run a utility. As I mentioned in my other email, I do think that a non-required helper table could be a good idea, the question is should we do this considering autovacuum should be integrated into the backend at which point pg_autovacuum will be scrapped. > > -> Some tables should _never_ be touched; > > That can be determined runtime from stats. Not required as a special feature > IMHO.. > > > > > -> Some tables might get "reset" to indicate that they should be > > considered as having been recently vacuumed, or perhaps that they > > badly need vacuuming; > > Well, stats collector takes care of that. Autovacuum daemon reads that > statistics, maintain a periodic snapshot of the same to determine whether or > not it needs to vacuum. Actually I don't think that pg_autovacuum will notice. The stats that it watches no nothing about when a table is vacuumed outside of pg_autovacuum. I agree this is a deficiency, but I don't know how to get that information without being part of the backend. > Why it crawls for a dirty database is as follows. Autovauum daemon starts, read > statistics, sets it as base level and let a cycle pass, which is typically few > minutes. When it goes again, it finds that lots of things are modified and need > vacuum and so it triggers vacuum. > > Now vacuum goes on cleaning entire table which might be days job continously > postponed some one reason or another. Oops.. your database is on it's knees.. If one table takes days (or even hours) to vacuum, then most probably it requires *a lot* of activity before pg_autovacuum will try to vacuum the table. The thresholds are based on two factors, a base value (default = 1,000), and a multiplier (default = 2) of the total number of tuples. So using the default pg_autovacuum settings, a table with 1,000,000 rows will not be vacuumed until the number of rows updated or deleted = 2,001,000. So, a table shouldn't be vacuumed until it really needs it. This setup works well since a small table of say 100 rows, will be updated every 1,200 (updates or deletes). > > > -> As you suggest, per-table thresholds; > > I would rather put it in terms of pages. If any table wastes 100 pages each, it > deserves a vacuum.. unfortunately I don't know of an efficient method of looking at how many pages have free space without running vacuum or without using the pgstattuple contrib module which in my testing took about 90% as long to run as vacuum. > > -> pg_autovacuum would know when tables were last vacuumed by > > it... pg_autovacuum already does this, but the data does not persist through pg_autovacuum restarts. > If you maintain a table in database, there are lot of things you can maintain. > And you need to connect to database anyway to fire vacuum.. > > > -> You could record vacuum times to tell pg_autovacuum that you > > vacuumed something "behind its back." > > It should notice.. I don't think it does. > > -> If the system queued up proposed vacuums by having a "queue" > > table, you could request that pg_autovacuum do a vacuum on a > > particular table at the next opportunity. That would be a design changes as right now pg_autovacuum doesn't keep a list of tables to vacuum at all, it just decides to vacuum or not vacuum a table as it loops through the database. > > Unfortunately, the "integrate into the backend" thing has long seemed > > "just around the corner." I think we should either: > > a) Decide to enhance pg_autovacuum, or > > b) Not. I have been of the opinion to not enhance pg_autovacuum because it needs to be intgrated, and enhancing it will only put that off. Also, I think many of the real enhancements can only come from being integrated (using the FSM to make decisions, keepting track of external vacuums, modifying system catalogs to keep autovacuum information etc...) > In fact, I would say that after we have autovacuum, we should not integrate it. > It is a very handy tool of tighting a database. Other database go other way > round. They develop maintance functionality built in and then create tool on > top of it. Here we have it already done. I'm not sure I understand your point. > It's just that it should be triggered by default. That would rock.. I agree that if pg_autovacuum becomes a core tool (not contrib and not integrated into backend) then pg_ctl should fire it up and kill it automatically.
pgsql-performance by date: