Re: [HACKERS] More detail on settings for pgavd? - Mailing list pgsql-performance
From | Matthew T. O'Connor |
---|---|
Subject | Re: [HACKERS] More detail on settings for pgavd? |
Date | |
Msg-id | 3FBE2791.8040404@zeut.net Whole thread Raw |
In response to | Re: [HACKERS] More detail on settings for pgavd? (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: [HACKERS] More detail on settings for pgavd?
|
List | pgsql-performance |
Josh Berkus wrote: >Matthew, > > >>True, but I think it would be one hour once, rather than 30 minutes 4 >>times. >> >> >Well, generally it would be about 6-8 times at 2-4 minutes each. > > Are you saying that you can vacuum a 1 million row table in 2-4 minutes? While a vacuum of the same table with an additional 1 million dead tuples would take an hour? >>This is one of the things I had hoped to add to pg_autovacuum, but never >>got to. In addition to just the information from the stats collector on >>inserts updates and deletes, pg_autovacuum should also look at the FSM, >>and make decisions based on it. Anyone looking for a project? >> >> >Hmmm ... I think that's the wrong approach. Once your database is populated, >it's very easy to determine how to set the FSM for a given pg_avd level. If >you're vacuuming after 20% updates, for example, just set fsm_pages to 20% of >the total database pages plus growth & safety margins. > > Ok. >I'd be really reluctant to base pv-avd frequency on the fsm settings instead. >What if the user loads 8GB of data but leaves fsm_pages at the default of >10,000? You can't do much with that; you'd have to vacuum if even 1% of the >data changed. > Ok, but as you said above it's very easy to set the FSM once you know your db size. >The other problem is that calculating data pages from a count of >updates+deletes would require pg_avd to keep more statistics and do more math >for every table. Do we want to do this? > > I would think the math is simple enough to not be a big problem. Also, I did not recommend looking blindly at the FSM as our guide, rather consulting it as another source of information as to when it would be useful to vacuum. I don't have a good plan as to how to incorporate this data, but to a large extent the FSM already tracks table activity and gives us the most accurate answer about storage growth (short of using something like contrib/pgstattuple which takes nearly the same amount of time as an actual vacuum) >>But I can't imagine that 2% makes any difference on a large table. In >>fact I would think that 10-15% would hardly be noticable, beyond that >>I'm not sure. >> >> >I've seen performance lag at 10% of records, especially in tables where both >update and select activity focus on one subset of the table (calendar tables, >for example). > > Ok. >>Valid points, and again I think this points to the fact that >>pg_autovacuum needs to be more configurable. Being able to set >>different thresholds for different tables will help considerably. In >>fact, you may find that some tables should have a vac threshold much >>larger than the analyze thresold, while other tables might want the >>opposite. >> >> >Sure. Though I think we can make the present configuration work with a little >adjustment of the numbers. I'll have a chance to test on production >databases soon. > > I look forward to hearing results from your testing. >>I would be surprized if you can notice the difference between a vacuum >>analyze and a vacuum, especially on large tables. >> >> >It's substantial for tables with high statistics settings. A 1,000,000 row >table with 5 columns set to statistics=250 can take 3 minutes to analyze on a >medium-grade server. > > In my testing, I never changed the default statistics settings. >>I think you need two separate schedules. There are lots of times where >>a vacuum doesn't help, and an analyze is all that is needed >> >> >Agreed. And I've just talked to a client who may want to use pg_avd's ANALYZE >scheduling but not use vacuum at all. BTW, I think we should have a setting >for this; for example, if -V is -1, don't vacuum. > > That would be nice. Easy to add, and something I never thought of.... >>I'm open to discussion on changing the defaults. Perhaps what it would >>be better to use some non-linear (perhaps logorithmic) scaling factor. >> >> >That would be cool, too. Though a count of data pages would be a better >scale than a count of rows, and equally obtainable from pg_class. > > But we track tuples because we can compare against the count given by the stats system. I don't know of a way (other than looking at the FSM, or contrib/pgstattuple ) to see how many dead pages exist.
pgsql-performance by date: