Re: [PERFORM] More detail on settings for pgavd? - Mailing list pgsql-hackers
From | Josh Berkus |
---|---|
Subject | Re: [PERFORM] More detail on settings for pgavd? |
Date | |
Msg-id | 200311202224.45065.josh@agliodbs.com Whole thread Raw |
In response to | Re: [PERFORM] More detail on settings for pgavd? ("Matthew T. O'Connor" <matthew@zeut.net>) |
Responses |
Re: [PERFORM] More detail on settings for pgavd?
|
List | pgsql-hackers |
Matthew, > > 110% of a 1.1 million row table is updated, though, that vaccuum will > > take an hour or more. > > True, but I think it would be one hour once, rather than 30 minutes 4 > times. Well, generally it would be about 6-8 times at 2-4 minutes each. > This is one of the things I had hoped to add to pg_autovacuum, but never > got to. In addition to just the information from the stats collector on > inserts updates and deletes, pg_autovacuum should also look at the FSM, > and make decisions based on it. Anyone looking for a project? Hmmm ... I think that's the wrong approach. Once your database is populated, it's very easy to determine how to set the FSM for a given pg_avd level. If you're vacuuming after 20% updates, for example, just set fsm_pages to 20% of the total database pages plus growth & safety margins. I'd be really reluctant to base pv-avd frequency on the fsm settings instead. What if the user loads 8GB of data but leaves fsm_pages at the default of 10,000? You can't do much with that; you'd have to vacuum if even 1% of the data changed. The other problem is that calculating data pages from a count of updates+deletes would require pg_avd to keep more statistics and do more math for every table. Do we want to do this? > But I can't imagine that 2% makes any difference on a large table. In > fact I would think that 10-15% would hardly be noticable, beyond that > I'm not sure. I've seen performance lag at 10% of records, especially in tables where both update and select activity focus on one subset of the table (calendar tables, for example). > Valid points, and again I think this points to the fact that > pg_autovacuum needs to be more configurable. Being able to set > different thresholds for different tables will help considerably. In > fact, you may find that some tables should have a vac threshold much > larger than the analyze thresold, while other tables might want the > opposite. Sure. Though I think we can make the present configuration work with a little adjustment of the numbers. I'll have a chance to test on production databases soon. > I would be surprized if you can notice the difference between a vacuum > analyze and a vacuum, especially on large tables. It's substantial for tables with high statistics settings. A 1,000,000 row table with 5 columns set to statistics=250 can take 3 minutes to analyze on a medium-grade server. > I think you need two separate schedules. There are lots of times where > a vacuum doesn't help, and an analyze is all that is needed Agreed. And I've just talked to a client who may want to use pg_avd's ANALYZE scheduling but not use vacuum at all. BTW, I think we should have a setting for this; for example, if -V is -1, don't vacuum. > I'm open to discussion on changing the defaults. Perhaps what it would > be better to use some non-linear (perhaps logorithmic) scaling factor. > So that you wound up with something roughly like this: > > #tuples activity% for vacuum > 1k 100% > 10k 70% > 100k 45% > 1M 20% > 10M 10% > 100M 8% That would be cool, too. Though a count of data pages would be a better scale than a count of rows, and equally obtainable from pg_class. > Thanks for the lucid feedback / discussion. autovacuum is a feature > that, despite it's simple implementation, has generated a lot of > feedback from users, and I would really like to see it become something > closer to what it should be. Well, I hope to help now. Until very recently, I've not had a chance to seriously look at pg_avd and test it in production. Now that I do, I'm interested in improving it. -- Josh Berkus Aglio Database Solutions San Francisco
pgsql-hackers by date: