Re: [HACKERS] More detail on settings for pgavd? - Mailing list pgsql-performance
From | Chester Kustarz |
---|---|
Subject | Re: [HACKERS] More detail on settings for pgavd? |
Date | |
Msg-id | Pine.BSO.4.44.0311201311200.19584-100000@detroit.arbor.net Whole thread Raw |
In response to | Re: [HACKERS] More detail on settings for pgavd? (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: [HACKERS] More detail on settings for pgavd?
|
List | pgsql-performance |
On Thu, 20 Nov 2003, Josh Berkus wrote: > Additionally, you are not thinking of this in terms of an overall database > maintanence strategy. Lazy Vacuum needs to stay below the threshold of the > Free Space Map (max_fsm_pages) to prevent creeping bloat from setting in to > your databases. With proper configuration of pg_avd, vacuum_mem and FSM > values, it should be possible to never run a VACUUM FULL again, and as of 7.4 > never run an REINDEX again either. is there any command you can run to see how much of the FSM is filled? is there any way to tell which tables are filling it? > Analyze is needed only as often as the *aggregate distribution* of data in the > tables changes. Depending on the application, this could be frequently, but > far more often (in my experience running multiple databases for several > clients) the data distribution of very large tables changes very slowly over > time. analyze does 2 things for me: 1. gets reasonable aggregate statistics 2. generates STATISTICS # of bins for the most frequent hitters (2) is very important for me. my values typically seem to have power-law like distributions. i need enough bins to reach a "cross-over" point where the last bin is frequent enough to make an index scan useful. also, i want enough bins so that the planner can choose index a or b for: select * from foo where a=n and b=m; the selectivity of either index depends not only on the average selectivity of index a or index b, but on n and m as well. for example, 1M row table: value % of rows v1 23 v2 12 v3 4.5 v4 4 v5 3.5 ... you can see that picking an index for =v1 would be poor. picking the 20th most common value would be 0.5% selective. much better. of course this breaks down for more complex operators, but = is fairly common. > So if you're going to have a seperate ANALYZE schedule at all, it should be > slightly less frequent than VACUUM for large tables. Either that, or drop > the idea, and simplify pg_avd by running VACUUM ANALYZE all the time instead > of having 2 seperate schedules. i have some tables which are insert only. i do not want to vacuum them because there are never any dead tuples in them and the vacuum grows the indexes. plus it is very expensive (they tables grow rather large.) after they expire i drop the whole table to make room for a newer one (making sort of a rolling log with many large tables.) i need to analyze them every so often so that the planner knows that there is 1 row, 100 rows, 100k rows, 1M. the funny thing is that because i never vacuum the tables, the relpages on the index never grows. don't know if this affects anything (this is on 7.2.3). vacuum is to reclaim dead tuples. this means it depends on update and delete. analyze depends on data values/distribution. this means it depends on insert, update, and delete. thus the dependencies are slightly different between the 2 operations, an so you can come up with use-cases that justify running either more frequently. i am not sure how failed transactions fit into this though, not that i think anybody ever has very many. maybe big rollbacks during testing?
pgsql-performance by date: