Thread: Re: [DEFAULT] Daily digest v1.4346 (20 messages)
Matthew, I am replying to the below as a pg_autovacuum user for multiple client databases. My thoughts: > Inability to customize thresholds on a per table basis This hasn't been a big problem for me. I would judge that 80% of my clients would make no use of this feature. > Inability to set default thresholds on a per database basis This would be much more useful to us. > Inability to exclude specific databases / tables from pg_autovacuum monitoring Same as above -- exclusion is much more needed than incremental raising/ lowering. Of course, if one can set levels, one can set them to zero, so perhaps it is the same thing. > Inability to schedule vacuums during off-peak times I don't think that this is the job of pg_autovacuum. If a database requires bulk loads and other burst activity, the DBA should schedule manual vacuums around those and not use pg_autovacuum. Also, bgwriter and slow vacuum should make this less of an issue for 7.5. > Lack of integration related to startup and shutdown Yes, this is a pain, especially from a security standpoint. > Ignorance of VACUUM and ANALYZE operations performed outside pg_autovacuum (requires backend integration? or can listen / notify can be used?) Again, I think this is not crucial, personally. Nice if there's some easy way to do it, of course. > Lack of logging options / syslog integration / log rotation options Yep, this is a biggie. Now, let me add my comments as to what my clients have complained about: -- Lack of integrated security with the Postmaster -- Inability to detect VACUUMs "backing up" due to too low vacuum mem or too much activity and warn the DBA -- Inability to Vacuum in parallel on high-capacity machines -- No "timeout" for locked vacuums. > I'm not sure how to address all of these concerns, or that they all should be addressed right now. One of my big questions is backend integration. I concur with the other commentors; backend integration would be nice if pg_autovacuum is not to be permanently a seperate script/process. It would eliminate several of the above issues. > Since many people do not like tools that clutter their databases by adding tables, I think option 1 (adding a pg_autovacuum table to existing databases) is right out. Personally, I like the idea of a pg_autovacuum table, and would supporrt it. However, I have no strong objections to the other approaches. > Right now pg_autovacuum has no memory of what was going on the last time it was run. So if significant changes have happened while pg_autovacuum is not running, they will not be counted in the analysis of when to perform a vacuum or analyze operation which can result in under vacuuming. So, pg_autovacuum should occasionally write down it's numbers to the database. The data will be stored in an additional table called table_data I think we've already had feedback about this. If it's system information, it should go in one of the existing tables, or it should be called something more descriptive than "table_data", and should begin with pg_ Some consideraiton should also be given to the frequency of updating the persistent data. I would favor an asynchnous, infrequent updating that would permit some loss of information over a synchrnous lossless approach. The latter, while more accurate, would detract from server performance on high-volume transction databases. > 3.Single-Pass Mode (External Scheduling): > > I have received requests to be able to run pg_autovacuum only on request (not as a daemon) making only one pass over all the tables (not looping indefinately). The advantage being that it will operate more like the current vacuum command except that it will only vacuum tables that need to be vacuumed. I think this is a completely different utility from pg_autovacuum, and this line of development need not be pursued unless it's easy to do. I certainly don't need it .... > Syslog support. I'm not sure this is really needed, but a simple patch was submitted by one user and perhaps that can be reviewed / improved and applied. > I need it, and am glad to hear there is a patch. Several of my clients use centralized syslog servers, and do *everything* through syslog. -- -Josh BerkusAglio Database SolutionsSan Francisco
Josh Berkus wrote: >>Inability to customize thresholds on a per table basis >> >> > >This hasn't been a big problem for me. I would judge that 80% of my clients >would make no use of this feature. > > > Ok. >>Inability to set default thresholds on a per database basis >> >> > >This would be much more useful to us. > > > So interesting, most uses request the per table settings, guess there is sufficient demand for both. >>Inability to schedule vacuums during off-peak times >> >> > >I don't think that this is the job of pg_autovacuum. If a database requires >bulk loads and other burst activity, the DBA should schedule manual vacuums >around those and not use pg_autovacuum. > > > You might be missing the point, the advantage of using pg_autovacuum is that it wouldn't waste cycles doing vacuums on tables that don't need it. If we have persistent data (saving state information periodically) then this is a very easy feature to add. >>Lack of integration related to startup and shutdown >> >> > >Yes, this is a pain, especially from a security standpoint. > > > Yes, backend integration will make this go away. >>Ignorance of VACUUM and ANALYZE operations performed outside pg_autovacuum >> >> >(requires backend integration? or can listen / notify can be used?) > >Again, I think this is not crucial, personally. Nice if there's some easy >way to do it, of course. > > > What I'm thinking is that the VACUUM command could be modified to write down some data from the stats system at vacuum time. Once the VACUUM command writes this down for itself then pg_autovacuum just uses that number to make its decision. Again, we are trying to reduce as much as possible superfluous vacuums. If an admin vacuums his whole cluster every Sunday night that may prevent lots of vacuums occurring during business hours that effect processing. >>Lack of logging options / syslog integration / log rotation options >> >> > >Yep, this is a biggie. > > > Agreed. This is another issue that could be solved with backend integration. >Now, let me add my comments as to what my clients have complained about: > >-- Lack of integrated security with the Postmaster >-- Inability to detect VACUUMs "backing up" due to too low vacuum mem or too >much activity and warn the DBA >-- Inability to Vacuum in parallel on high-capacity machines >-- No "timeout" for locked vacuums. > > > Backend integration should solve the 1st issue. Parallel vacuums is something that could be worked on at some point. Would it make sense to incorporate this with tablespaces? The vacuum daemon would only issue one vacuum command per tablespace, but could issue as many parallel vacuums as you have independent tablespaces. I think timeout issue would need to be a part of vacuum proper, and I'm not sure about the "backing up" issue. >>Since many people do not like tools that clutter their databases by adding >> >> >tables, I think option 1 (adding a pg_autovacuum table to existing databases) >is right out. > >Personally, I like the idea of a pg_autovacuum table, and would supporrt it. >However, I have no strong objections to the other approaches. > > > I think I was unclear, I agree the creation of a pg_autovacuum system table is fine (if we really need it), but my initial post was talking about keeping pg_autovacuum as a client app, hence the autovacuum table would be added into (clutter up) the users table space, not the systems. >I think we've already had feedback about this. If it's system information, it >should go in one of the existing tables, or it should be called something >more descriptive than "table_data", and should begin with pg_ > > > I wasn't really suggesting table_data as the real name, but again this will be more straightforward once integrated in. >Some consideraiton should also be given to the frequency of updating the >persistent data. I would favor an asynchnous, infrequent updating that >would permit some loss of information over a synchrnous lossless approach. >The latter, while more accurate, would detract from server performance on >high-volume transction databases. > > > Agreed, the performance impact of this should be negligible. >>3.Single-Pass Mode (External Scheduling): >> >>I have received requests to be able to run pg_autovacuum only on request >> >> >I think this is a completely different utility from pg_autovacuum, and this >line of development need not be pursued unless it's easy to do. I >certainly don't need it .... > > > The reason it's similar is that once pg_autovacuum data is persistent, it would be trivial to implement this feature, and the data that any tool would need to make these decisions is the same as what pg_autovacuum is already tracking. >>Syslog support. I'm not sure this is really needed, but a simple patch was >> >> >I need it, and am glad to hear there is a patch. Several of my clients use >centralized syslog servers, and do *everything* through syslog. > > > I think the patch was submitted to either the hackers or patches list. If you can't find it, I'll look around and see if I still have a copy. The person who submitted said it was simple, but was working for him in production.
Josh Berkus wrote: >>Syslog support. I'm not sure this is really needed, but a simple patch was >> >> >submitted by one user and perhaps that can be reviewed / improved and >applied. > > > >I need it, and am glad to hear there is a patch. Several of my clients use >centralized syslog servers, and do *everything* through syslog. > > > FYI, I just checked, I think it was sent to the hackers lists on Jan 21, 2004 by Mark Hollow.
Matt, > So interesting, most uses request the per table settings, guess there > is sufficient demand for both. The reason for us is that multi-database installations generally have significantly different purposes for each database; for example, with a "reporting" database on the same server that needs no vacuuming at all. > You might be missing the point, the advantage of using pg_autovacuum is > that it wouldn't waste cycles doing vacuums on tables that don't need > it. If we have persistent data (saving state information periodically) > then this is a very easy feature to add. OK, I can see that. > What I'm thinking is that the VACUUM command could be modified to write > down some data from the stats system at vacuum time. Once the VACUUM > command writes this down for itself then pg_autovacuum just uses that > number to make its decision. Again, we are trying to reduce as much as > possible superfluous vacuums. If an admin vacuums his whole cluster > every Sunday night that may prevent lots of vacuums occurring during > business hours that effect processing. Be nice, yes. However, my experience is that mixing manual and autovacuums is bound to lead to endless support requests, because conflicts *will* arise. So in some ways you'd be working to please those who can't be pleased. > Backend integration should solve the 1st issue. Parallel vacuums is > something that could be worked on at some point. Would it make sense > to incorporate this with tablespaces? The vacuum daemon would only > issue one vacuum command per tablespace, but could issue as many > parallel vacuums as you have independent tablespaces. Hmmm ... that's an interesting idea. I'd more been thinking about vacuums of small tables, where a high-end server under low load could vacuum several tables in parallel, one per CPU. However, working through tablespaces would make a lot of sense. > I think timeout issue would need to be a part of vacuum proper, and I'm > not sure about the "backing up" issue. Well, we've discussed timeout for vacuum. Thing is, autovacuum changes the equation somewhat. Imagine that the transaction rate of your tables accelerates so that autovacuum with a 0.3 scale setting is triggered every 23 minutes. But say that it takes 29 minutes to vacuum through all of your tables ... or even 49 minutes if you have "slow vacuum" turned on! You would get into a cycle where you are running vacuum continuously, all the time. This is a very bad situation and the admin should be warned about it via the logs. Hmmm ... thinkiing about that, are we changing the defaults for threshold and scale? You and I have discussed this, yes? > The reason it's similar is that once pg_autovacuum data is persistent, > it would be trivial to implement this feature, and the data that any > tool would need to make these decisions is the same as what > pg_autovacuum is already tracking. Well, if it's easy to do, then go for it. I can see how some would find it useful. Once it's sufficently bulletproof, it could replace the standard VACUUM (whole db). > I think the patch was submitted to either the hackers or patches list. > If you can't find it, I'll look around and see if I still have a copy. > The person who submitted said it was simple, but was working for him in > production. Thanks for the forward. -- -Josh BerkusAglio Database SolutionsSan Francisco
On Tuesday 23 March 2004 12:32 am, Josh Berkus wrote: > Matt, > > > What I'm thinking is that the VACUUM command could be modified to write > > down some data from the stats system at vacuum time. Once the VACUUM > > command writes this down for itself then pg_autovacuum just uses that > > number to make its decision. Again, we are trying to reduce as much as > > possible superfluous vacuums. If an admin vacuums his whole cluster > > every Sunday night that may prevent lots of vacuums occurring during > > business hours that effect processing. > > Be nice, yes. However, my experience is that mixing manual and > autovacuums is bound to lead to endless support requests, because conflicts > *will* arise. So in some ways you'd be working to please those who can't be > pleased. Could you please expound on this. What types of conflicts in what situations? Maybe I'm missing something, but I really don't see how this could cause any problems, and would only help prevent unneeded vacuums since pg_autovacuum would always know the stats as of the last vacuum no matter who issued it, because the vacuum itself would write it down. > Hmmm ... that's an interesting idea. I'd more been thinking about > vacuums of small tables, where a high-end server under low load could > vacuum several tables in parallel, one per CPU. However, working through > tablespaces would make a lot of sense. The reason tablespaces might make sense is that vacuum is I/O bound, not CPU bound. So having 4 processors isn't going to make multiple vacuums go any faster. Now you might have enough I/O bandwidth such that multiple simultaneous vacuums are actually faster that serialized. You could probably test this yourself, I would be curious to see some data on this. > You would get into a cycle where you are running vacuum continuously, all > the time. This is a very bad situation and the admin should be warned > about it via the logs. Finding a way to log this would be good. > Hmmm ... thinkiing about that, are we changing the defaults for threshold > and scale? You and I have discussed this, yes? Right now pg_autovacuum doesn't change the thresholds, but it does change the sleep time. Just like the vacuum and analyze command options there is a sleep base value and a sleep scaling factor. The purpose was to make sure that pg_autovacuum was never continuously issuing vacuum commands. I think that if pg_autovacuum is vacuuming too often then the admin needs to tune and turn it down. Matthew