Thread: DB Tuning Notes for comment...
The notes below are the results of various tuning issues experienced recently on a large database (several GB) that has many tables and a high transient data flow (ie. thousands of records added, updated, and deleted every hour) on a few tables. This kind of data flow is not at all well handled by the default postgresql settings. Experiments have also been conducted using a much smaller test database with a text field written to a TOAST relation (which is what the large table contains). I think this example is useful because it encapsulates in several hours the level of updates that most of us see in several weeks, so the rules below should apply equally well but in different time frames, with provisos as noted. The database in question is subject to periodic bulk deletes where up to 50% of the rows in the large table are deleted. It is also has to run 24x7. Any comments or suggestions would be welcome. Tuning ====== 1. max_fsm_relations -------------------- First of all, the free space manager is useless at managing free space if it can not map all relations (including system relations and toast relations). The following query should give the correct ballpark: select count(*) from pg_class where not relkind in ('i','v'); Set max_fsm_relations to a number greater than this. Add extra to deal with any tables you will create etc. It costs 40 bytes per table, so be generous - if it is set too low, you will get bizarre space usage. [Note: the FSM is so bad at reclaiming space when this value is too low that I believe it should be overridden at startup if it is not at least equal to the result of the above query. Similarly, I think a warning should be given at startup and/or runtime when it is exceeded, or work should be done to make it dynamic - and it should then not be a config item]. 2. VACUUM Frequency ------------------- Ideally VACUUM should run constantly; a future version will support something like it. But for now, vacuum should be run when a significant amount of data has been inserted, updated or deleted. The definition of 'significant' is not immediately obvious. Most tables will *not* be updated frequently in most databases; such tables can be vacuumed irregularly, or vacuumed when the more frequently updated tables are vacuumed. In our specific case we have one table that has a few rows (< 1000), but it is updated as many as 3 times per second. In this case, we chose a 5 minute interval, which results in at worst 1000 'dead' rows in the table as a result of the updates. Since it was such a small table, we saw no reason to vacuum every minute, or even constantly. For larger or more complex tables, the output of VACUUM ANALYZE must be used. The following is an extract of the output from a VACUUM VERBOSE of a simple test database - the table is the TOAST table of a large text column, where the table has been constructed to be 75% empty. The output is after deleting some rows. 1 INFO: --Relation pg_toast.pg_toast_16979-- 2 INFO: Index pg_toast_16979_index: Pages 575; Tuples 16384: Deleted 25984. 3 CPU 0.05s/0.16u sec elapsed 7.41 sec. 4 INFO: Removed 25984 tuples in 6496 pages. 5 CPU 0.75s/0.79u sec elapsed 14.17 sec. 6 INFO: Pages 22480: Changed 6496, Empty 0; Tup 16384: Vac 25984, Keep 0, UnUsed 47552. 7 Total CPU 1.98s/1.05u sec elapsed 23.30 sec. Line 6 shows that there are 22480 pages, and 6496 (roughly 25%) were changed since the last vacuum. Line 4 indicates that these were all removed. Note that when tuples are updated, a new copy of the record is written and the old one deleted, so updates will also result in tuples being 'removed'. A more complex example follows; this was after deleting 512 rows and adding 256: 1 INFO: --Relation pg_toast.pg_toast_16979-- 2 INFO: Index pg_toast_16979_index: Pages 667; Tuples 24576: Deleted 16384. 3 CPU 0.02s/0.10u sec elapsed 4.73 sec. 4 INFO: Removed 16384 tuples in 4096 pages. 5 CPU 0.52s/0.48u sec elapsed 9.38 sec. 6 INFO: Pages 20528: Changed 6144, Empty 0; Tup 24576: Vac 16384, Keep 0, UnUsed 41152. 7 Total CPU 1.81s/0.64u sec elapsed 22.51 sec. note that line 6 has a 'changed' value, and line 4 has a 'removed' value. This gives some indication of the pages consumed and released in any period. The final example is for 512 inserts, 512 updates (of different records) and 512 deletes. 1 INFO: --Relation pg_toast.pg_toast_16979-- 2 INFO: Index pg_toast_16979_index: Pages 854; Tuples 32768: Deleted 32768. 3 CPU 0.05s/0.20u sec elapsed 8.41 sec. 4 INFO: Removed 32768 tuples in 8192 pages. 5 CPU 1.01s/0.91u sec elapsed 13.52 sec. 6 INFO: Pages 26672: Changed 12288, Empty 0; Tup 32768: Vac 32768, Keep 0, UnUsed 41152. 7 Total CPU 2.92s/1.25u sec elapsed 30.01 sec. again it shows the effects of UPDATE/DELETE vs. INSERT. In each case the 'Changed' value indicates the maximum number of pages required between vaccuums; and the 'removed' values indicates that some pages will be added to the FSM when vacuum is run. The high 'unused' value shows the results of an earleir bulk delete. Note that 'Changed' seems to be 'updates+deletes+inserts' whereas 'removed' is 'deletes+updates', so it is not possible to determine 'updates+inserts', which would be the best indicator of the required number of new pages. If necessary, it *could* be derived by looking at tuple counts across vacuums, but using the 'changed' figure will give a good upper limit, since in most meaningful cases deletes will be lower than inserts -- hence it will be out by at worst a factor of two. We have chosen (arbitrarily) to keep the number of pages 'changed' below 25% of the total number of pages; in the above case that would involve running VACUUM twice as often. 3. max_fsm_pages ---------------- Contrary to other advice, FOR TABLES THAT ARE SUBJECT PERIODIC TO BULK DELETES, this figure should not be set based on how many pages you delete between vacuums, but how many pages you will consume between vacuums (the 'changed' figure, above). This difference is important for sites that purge data periodically: it does not need to be set high enough to hold all deleted pages for one bulk delete -- it should be set high enough to have sufficient pages to manage the consumption between vacuums. If the pool is exhausted, then vacuum will find more 'unused' pages the next time it is run. UPDATE & INSERT will consume free pages, DELETE will create free pages. For tables that are more typical (insert-intensive, or a general mix of updates, deletes and inserts), the 'removed' value should be used. ==== NOTE: max_fsm_pages is a cumulative value. It should be set to the sum of these values for all tables. ==== Based on the example above, VACUUM should probably be run more frequently. However, if the above was typical of the chosen VACUUM frequency, then max_fsm_pages should be set to at least 12288 for this table, despite the fact that only 8192 pages were released (since we assume there may be bulk deletes freeing up many pages). If VACUUM were run twice as often, the max_fsm_pages should be set to at least 6144 for this table. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
On Sun, 2002-12-08 at 09:41, Philip Warner wrote: > Any comments or suggestions would be welcome. > first and foremost, this is really excellent work! We need to look into getting this info into the standard documentation and/or Bruce's tuning guide. > > Tuning > ====== > > 1. max_fsm_relations > -------------------- > > First of all, the free space manager is useless at managing free space if > it can not map all relations (including system relations and toast > relations). The following query should give the correct ballpark: > > select count(*) from pg_class where not relkind in ('i','v'); > It should be noted that if you have multiple databases, you'll need to get the number of relations minus the system tables for each database, then add in the number of system tables. > Set max_fsm_relations to a number greater than this. Add extra to deal with > any tables you will create etc. It costs 40 bytes per table, so be > generous - if it is set too low, you will get bizarre space usage. > > > 2. VACUUM Frequency > ------------------- > > Ideally VACUUM should run constantly; a future version will support > something like it. But for now, vacuum should be run when a significant > amount of data has been inserted, updated or deleted. The definition of > 'significant' is not immediately obvious. > I don't think this is entirely true. On tables that have large numbers of inserts, but no updates or deletes, you do not need to run vacuum. It might be helpful to run analyze on these tables if your inserting enough data to change the statistical relationships, but vacuum itself is not needed. > Most tables will *not* be updated frequently in most databases; such tables > can be vacuumed irregularly, or vacuumed when the more frequently updated > tables are vacuumed. > > In our specific case we have one table that has a few rows (< 1000), but it > is updated as many as 3 times per second. In this case, we chose a 5 minute > interval, which results in at worst 1000 'dead' rows in the table as a > result of the updates. Since it was such a small table, we saw no reason to > vacuum every minute, or even constantly. I have some similar tables in my system, with between 250 and 3500 rows. These tables turn over at least every 15 minutes, so I have decided on a 10 minute vacuum interval. As with Phillip's, since they are small tables, more frequent vacuuming seemed excessive. > > For larger or more complex tables, the output of VACUUM ANALYZE must be used. > <snip> again, great work Philip. Robert Treat
Robert Treat wrote: > On Sun, 2002-12-08 at 09:41, Philip Warner wrote: > >>Any comments or suggestions would be welcome. >> > > > first and foremost, this is really excellent work! We need to look into > getting this info into the standard documentation and/or Bruce's tuning > guide. > Seconded! > >>Tuning >>====== >> >>1. max_fsm_relations >>-------------------- >> >>First of all, the free space manager is useless at managing free space if >>it can not map all relations (including system relations and toast >>relations). The following query should give the correct ballpark: >> >> select count(*) from pg_class where not relkind in ('i','v'); >> > > > It should be noted that if you have multiple databases, you'll need to > get the number of relations minus the system tables for each database, > then add in the number of system tables. > > >>Set max_fsm_relations to a number greater than this. Add extra to deal with >>any tables you will create etc. It costs 40 bytes per table, so be >>generous - if it is set too low, you will get bizarre space usage. >> >> >>2. VACUUM Frequency >>------------------- >> >>Ideally VACUUM should run constantly; a future version will support >>something like it. But for now, vacuum should be run when a significant >>amount of data has been inserted, updated or deleted. The definition of >>'significant' is not immediately obvious. >> > > > I don't think this is entirely true. On tables that have large numbers > of inserts, but no updates or deletes, you do not need to run vacuum. It > might be helpful to run analyze on these tables if your inserting enough > data to change the statistical relationships, but vacuum itself is not > needed. > In my experience I've seen tables with numerous indexes continue to benefit greatly from vacuum/vacuum full operations when large volumes of inserts are performed. This is true even when the update/delete activity on the base table itself is manageable. While dropping and recreating the index after loading is possible in some cases, my general comment is that index maintenance is an issue you should keep in mind when designing your vacuum strategy. > >>Most tables will *not* be updated frequently in most databases; such tables >>can be vacuumed irregularly, or vacuumed when the more frequently updated >>tables are vacuumed. >> >>In our specific case we have one table that has a few rows (< 1000), but it >>is updated as many as 3 times per second. In this case, we chose a 5 minute >>interval, which results in at worst 1000 'dead' rows in the table as a >>result of the updates. Since it was such a small table, we saw no reason to >>vacuum every minute, or even constantly. > > > I have some similar tables in my system, with between 250 and 3500 rows. > These tables turn over at least every 15 minutes, so I have decided on a > 10 minute vacuum interval. As with Phillip's, since they are small > tables, more frequent vacuuming seemed excessive. > > >>For larger or more complex tables, the output of VACUUM ANALYZE must be used. >> > > <snip> > > again, great work Philip. > > Robert Treat > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > ss Scott Shattuck Technical Pursuit Inc.
Robert Treat <rtreat@webmd.net> writes: > On Sun, 2002-12-08 at 09:41, Philip Warner wrote: >> First of all, the free space manager is useless at managing free space if >> it can not map all relations (including system relations and toast >> relations). The following query should give the correct ballpark: >> >> select count(*) from pg_class where not relkind in ('i','v'); FSM entries aren't needed for sequences either, so more correct is select count(*) from pg_class where relkind in ('r', 't'); > It should be noted that if you have multiple databases, you'll need to > get the number of relations minus the system tables for each database, > then add in the number of system tables. You're assuming that system tables are shared, which they mostly aren't. Summing the pg_class count over all databases (or all that get vacuumed, anyway; you can exclude template0) will be close enough. BTW, this neglects what seems possibly an important factor: you don't need FSM entries for tables that are effectively read-only or insert-only (no deletes or updates). At least in some database designs, that's a significant number of tables. However, I suspect that the present FSM code is not very effective at deciding *which* tables to track if it has too few slots, so Philip's advice of "make sure there's one for every table" may be the best in the near term. But we need to work at improving that logic. I have some uncommitted patches concerning the FSM management heuristics from Stephen Marshall, which I deemed too late/risky for 7.3, but we should get something done for 7.4. Anyone interested in playing around in this area? regards, tom lane
Scott Shattuck <ss@technicalpursuit.com> writes: > Robert Treat wrote: >> I don't think this is entirely true. On tables that have large numbers >> of inserts, but no updates or deletes, you do not need to run vacuum. > In my experience I've seen tables with numerous indexes continue to > benefit greatly from vacuum/vacuum full operations when large volumes of > inserts are performed. This is true even when the update/delete activity > on the base table itself is manageable. This is hard to believe, as VACUUM does not even touch the indexes unless it has found deletable tuples --- and I am quite certain that btree indexes, at least, do not do any VACUUM-time reorganization beyond deleting deletable entries. (I wouldn't swear to it one way or the other for GiST though.) Robert's opinion coincides with what I know of the code. regards, tom lane
Tom Lane wrote: > Scott Shattuck <ss@technicalpursuit.com> writes: > >>Robert Treat wrote: >> >>>I don't think this is entirely true. On tables that have large numbers >>>of inserts, but no updates or deletes, you do not need to run vacuum. >> > >>In my experience I've seen tables with numerous indexes continue to >>benefit greatly from vacuum/vacuum full operations when large volumes of >>inserts are performed. This is true even when the update/delete activity >>on the base table itself is manageable. > > > This is hard to believe, as VACUUM does not even touch the indexes > unless it has found deletable tuples --- and I am quite certain that > btree indexes, at least, do not do any VACUUM-time reorganization beyond > deleting deletable entries. (I wouldn't swear to it one way or the > other for GiST though.) Robert's opinion coincides with what I know of > the code. > Willing to learn here but skipping a vacuum full has caused some issues for us. Here's some data from a recent 3 day test run that was done with regular vacuums but not vacuum fulls. When running with vacuum full the indexes remain in line: nsuite-10=# select relname, relpages, reltuples from pg_class where relname not like 'pg_%' order by reltuples desc; -[ RECORD 1 ]------------------------------ relname | directory_fullpath_ix relpages | 96012 reltuples | 1.38114e+06 -[ RECORD 2 ]------------------------------ relname | directory_pkey relpages | 16618 reltuples | 1.38114e+06 -[ RECORD 3 ]------------------------------ relname | directory relpages | 23924 reltuples | 59578 <snip> Needless to say, the system performance was pathetic but the test did serve to highlight this index issue. Anyone want to give a quick summary of index maintenance or give me a pointer into the codebase where someone who's not a C expert might still get a sense of what's being done? I'd really like to understand how an index can get so completely out of whack after a weekend of testing. It seems you're telling me that the data here "proves" there's an update or delete going on somewhere in the system, even though this test is of a database initialization driven by a stored procedure with no update or delete operations targeting the directory table. There may be some operations being done external to that process that I've not been made aware of but I'm still curious to learn more about indexing behavior so I know why something like this happens in the first place. ss
Scott Shattuck <ss@technicalpursuit.com> writes: > Willing to learn here but skipping a vacuum full has caused some issues > for us. Here's some data from a recent 3 day test run that was done with > regular vacuums but not vacuum fulls. When running with vacuum full the > indexes remain in line: > nsuite-10=# select relname, relpages, reltuples from pg_class where > relname not like 'pg_%' order by reltuples desc; > -[ RECORD 1 ]------------------------------ > relname | directory_fullpath_ix > relpages | 96012 > reltuples | 1.38114e+06 > -[ RECORD 2 ]------------------------------ > relname | directory_pkey > relpages | 16618 > reltuples | 1.38114e+06 > -[ RECORD 3 ]------------------------------ > relname | directory > relpages | 23924 > reltuples | 59578 > <snip> <<blink>> There's no way that the index and table tuple counts should get that far out of line; in the absence of any concurrent updates, they should be *equal* (or index < table, if you have a partial index, which I assume these are not). I would credit the recorded index count exceeding the recorded table count by the number of tuples inserted/ updated while a (plain) VACUUM is in process on that table --- but this doesn't look like it meets that situation. There was a bug a long time ago wherein vacuum would forget to update pg_class.reltuples for indexes in some cases, but according to the CVS logs that was fixed before 7.2 release. What version are you running exactly? In any case, you seem to be dealing with some kind of bug here. It might be helpful to look at the output of "vacuum verbose directory" if you still have it available. regards, tom lane
At 03:54 PM 9/12/2002 -0500, Tom Lane wrote: >However, I suspect that the present FSM code is not very effective at >deciding *which* tables to track if it has too few slots, You are definitely right there. I think it would be worth looking at removing max_fsm_tables as a tuning option, and adding a 'relhasfsm' flag to pg_class for those tables that should not be mapped. Default to 't'. Then, make the table grow dynamically as tables are added, or when a VACUUM occurs... AFAICT, the only justification for a smaller list of relations is for those that are *almost never* subject to deletes or updates. They are certainly common in DB design, but I'd let the DBA designate them. Does this sound reasonable? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 02:46 PM 9/12/2002 -0500, Robert Treat wrote: >getting this info into the standard documentation and/or Bruce's tuning >guide. I'd vote for the standard docs since it is sufficiently basic as to be needed by most users. We either need a tuning chapter or a new section in runtime configuration. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > I think it would be worth looking at removing max_fsm_tables as a tuning > option, and adding a 'relhasfsm' flag to pg_class for those tables that > should not be mapped. Default to 't'. Then, make the table grow dynamically > as tables are added, or when a VACUUM occurs... If we could "make the table grow dynamically" then there'd not be much need for the config parameters at all. The real problem is to fit into a shmem segment whose size has to be frozen at postmaster start (which, not incidentally, is before we've ever looked at the database...). We could make the constraint be on total space for relation entries + page entries rather than either individually, but I think that'd mostly make it harder to interpret the config setting rather than offer any real ease of administration. > AFAICT, the only justification for a smaller list of relations is for those > that are *almost never* subject to deletes or updates. They are certainly > common in DB design, but I'd let the DBA designate them. It doesn't seem to me to be that hard for the system to recognize them automatically. Basically, if there are no holes of useful size in the table, there's no need to create an FSM entry for it. The trick is "useful size" here --- but VACUUM already does the work needed to estimate an average tuple size, so I'd think it could do a reasonably good job of realizing that all the available holes are just leftover space. (The relation's very last page is also a special case that's likely not special-cased correctly at the moment: perhaps it should never be entered in FSM at all, certainly not if it's the only page that would be entered.) regards, tom lane
At 03:54 PM 9/12/2002 -0500, Tom Lane wrote: >FSM entries aren't needed for sequences either, so more correct is > > select count(*) from pg_class where relkind in ('r', 't'); presumably: select count(*) from pg_class where relkind in ('r', 't', 'i'); ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > At 03:54 PM 9/12/2002 -0500, Tom Lane wrote: >> FSM entries aren't needed for sequences either, so more correct is >> >> select count(*) from pg_class where relkind in ('r', 't'); > presumably: > select count(*) from pg_class where relkind in ('r', 't', 'i'); No, I meant what I said. Indexes don't use the FSM. (The premise of the FSM is that one bit of free space in a table is as good as any other bit; a premise quite incorrect for indexes.) regards, tom lane
At 07:01 PM 9/12/2002 -0500, Tom Lane wrote: >We >could make the constraint be on total space for relation entries + page >entries rather than either individually, but I think that'd mostly make >it harder to interpret the config setting rather than offer any real >ease of administration. Perhaps doing both? And issue a warning to the logs when max_fsm_relations is borrowing from max_fsm_pages. It might be that the outstanding patches address the problem, but at the moment the choice of which relations to include is not well made when max_fsm_relations of much too small. We should at least issue a warning; but allowing max_fsm_relations to borrow from max_fsm_pages seems like a good idea, since having the number too low (with 161 relations setting it to the default of 100) is useless. Secondly, an empty database contains 98 tables, so the default setting of max_fsm_pages to 100 is way too low. The tradeoff of losing 7 pages from the map to include another relation is worth it, especially if the logs contain a warning. But perhaps the test itself is flawed and there is another problem resulting in this behaviour (doing vacuums twice in a row seems to make it use the free space, but I'd guess this is just edge behaviour of the FSM heuristics): Create Table t(i serial, t text); insert into t(t) .... 47K of UUEncoded jpeg file -> ~47K of toast. insert into t(t) select t from t; ...repeat 9 times... create table t1 as select * from t limit 1; ... create table t19 as select * from t limit 1; create table t20(i serial, t text); insert into t20(t) select t from t; ie. build a lot of tables, with two big ones separated by OID (not sure if the last part is relevant). select count(*) from pg_class where relkind in ('t','r'); in my case this resulted in 161, so I set max_fsm_relations to 100 (ie. not a borderline case, but the default setting). I also left max_fsm_pages at 10000 so that we should have space for several thousand rows. Stop & start postmaster, then vacuum full to be comfortable no other problems occur, an look at file sizes of relation file and toast file. Now: delete from t where i <= 128; delete from t20 where i <= 128; vacuum; check file sizes - no surprises, they should be unchanged. Tue Dec 10 12:03:53 EST 2002 -rw------- 1 pjw users 65536 2002-12-10 12:03 16979 -rw------- 1 pjw users 65536 2002-12-10 12:03 33432 -rw------- 1 pjw users 67108864 2002-12-10 12:03 16982 -rw------- 1 pjw users 67108864 2002-12-10 12:03 33435 then do: insert into t(t) select t from t20 limit 10; insert into t20(t) select t from t limit 10; and both files have grown: Tue Dec 10 12:08:20 EST 2002 -rw------- 1 pjw users 65536 2002-12-10 12:08 33432 -rw------- 1 pjw users 67764224 2002-12-10 12:08 33435 -rw------- 1 pjw users 67764224 2002-12-10 12:08 16982 -rw------- 1 pjw users 65536 2002-12-10 12:08 16979 oddly (bug? edge behaviour?) doing two vacuums in a row results in the free space being used. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 12:17 PM 10/12/2002 +1100, Philip Warner wrote: >Secondly, an empty database contains 98 tables, Corrected based on Tom's later mail; from the FSM PoV, it contains 37 (indices don't count). So it is exhausted when more than two DBs are created. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > Secondly, an empty database contains 98 tables, so the default setting of > max_fsm_pages to 100 is way too low. Only 37 of them need FSM entries, but still a good point; we should probably bump it up to 1000 to be more realistic. > oddly (bug? edge behaviour?) doing two vacuums in a row results in the free > space being used. I'm on my way out the door, so no time to think about what's actually happening in the current code, but ideally I would think that when the FSM doesn't have enough space, it should prefer to remember info about rels with heavy update activity (which might be approximated by "rels with lots of free space", but isn't really the same thing). A VACUUM done just after startup does not have any historical info to base this decision on. So it's not unreasonable for the system to make better choices after it's been running awhile than when it's freshly booted. I'm not sure that this is actually what's happening today, just pointing out that I don't consider it a bug per se if the code behaves that way. (The existing code does have some LRU effects, IIRC, but not sure if they account for what you see.) regards, tom lane
Tom Lane wrote: > Scott Shattuck <ss@technicalpursuit.com> writes: > >>Willing to learn here but skipping a vacuum full has caused some issues >>for us. Here's some data from a recent 3 day test run that was done with >>regular vacuums but not vacuum fulls. When running with vacuum full the >>indexes remain in line: > > >>nsuite-10=# select relname, relpages, reltuples from pg_class where >>relname not like 'pg_%' order by reltuples desc; >>-[ RECORD 1 ]------------------------------ >>relname | directory_fullpath_ix >>relpages | 96012 >>reltuples | 1.38114e+06 >>-[ RECORD 2 ]------------------------------ >>relname | directory_pkey >>relpages | 16618 >>reltuples | 1.38114e+06 >>-[ RECORD 3 ]------------------------------ >>relname | directory >>relpages | 23924 >>reltuples | 59578 >><snip> > > > <<blink>> There's no way that the index and table tuple counts should > get that far out of line; in the absence of any concurrent updates, > they should be *equal* (or index < table, if you have a partial index, > which I assume these are not). I would credit the recorded index count > exceeding the recorded table count by the number of tuples inserted/ > updated while a (plain) VACUUM is in process on that table --- but this > doesn't look like it meets that situation. > > There was a bug a long time ago wherein vacuum would forget to update > pg_class.reltuples for indexes in some cases, but according to the CVS > logs that was fixed before 7.2 release. What version are you running > exactly? test=# select version(); version ------------------------------------------------------------- PostgreSQL 7.2.2 on i686-pc-linux-gnu, compiled by GCC 2.96 (1 row) test=# > > In any case, you seem to be dealing with some kind of bug here. It > might be helpful to look at the output of "vacuum verbose directory" > if you still have it available. > NOTICE: --Relation directory-- NOTICE: Index directory_pkey: Pages 15628; Tuples 4988848: Deleted 35407. CPU 0.73s/3.00u sec elapsed 40.53 sec. NOTICE: Index directory_fullpath_ix: Pages 80808; Tuples 4989317: Deleted 35407. CPU 4.84s/3.91u sec elapsed 275.66 sec. NOTICE: Removed 35407 tuples in 786 pages. CPU 0.13s/0.11u sec elapsed 1.80 sec. NOTICE: Pages 80156: Changed 18, Empty 0; Tup 4988787: Vac 35407, Keep 4977704, UnUsed 348422. Total CPU 7.85s/7.58u sec elapsed 343.84 sec. > regards, tom lane > Thanks for any insight you can offer here. ss
At 08:39 PM 9/12/2002 -0500, Tom Lane wrote: >A VACUUM >done just after startup does not have any historical info to base this >decision on. The actual order is: start delete vacuum; insert <- does not use free space vacuum; insert <- does not use free space vacuum; vacuum; insert <- uses free space my guess is the fact that the second vacuum has no useful statistics means that (somehow) it is choosing the tables with free space -- perhaps because they are the largest? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
On Mon, 09 Dec 2002 19:10:23 -0500, Tom Lane wrote: > Philip Warner <pjw@rhyme.com.au> writes: >> I think it would be worth looking at removing max_fsm_tables as a >> tuning option, and adding a 'relhasfsm' flag to pg_class for those >> tables that should not be mapped. Default to 't'. Then, make the table >> grow dynamically as tables are added, or when a VACUUM occurs... > > If we could "make the table grow dynamically" then there'd not be much > need for the config parameters at all. The real problem is to fit into > a shmem segment whose size has to be frozen at postmaster start (which, > not incidentally, is before we've ever looked at the database...). We > could make the constraint be on total space for relation entries + page > entries rather than either individually, but I think that'd mostly make > it harder to interpret the config setting rather than offer any real > ease of administration. > Can we not just have vacuum of a database return a total # of pages modified and relations modified, and then report suggested free space map settings? Even this little bit would be a step in the right direction. Robert Treat
At 09:10 PM 9/12/2002 -0500, Robert Treat wrote: >Even this little bit would be a step in the right direction. What I would find really useful is a 'VACUUM...WITH HISORY' which wrote the underlying details of VACUUM VERBOSE to a 'pg_vacuum_history' table. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 03:54 PM 9/12/2002 -0500, Tom Lane wrote: >I have some uncommitted patches concerning the FSM management heuristics >from Stephen Marshall, which I deemed too late/risky for 7.3, but we >should get something done for 7.4. Anyone interested in playing around >in this area? I'd be interested in seeing the patches, but can't commit to doing anything with them at this point. I would like to get to the bottom of the weird behaviour, however. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/