Thread: How to avoid database bloat
Hello, Our database increases in size 2.5 times during the day. What to do to avoid this? Autovacuum running with quite aggressive settings, FSM settings are high enough. Database size should be more or less constant but it has high turnover rate (100+ insert/update/delete per second). Below is "du -sk" of database dir during the day. On 4:05 full vacuum+reindex runs and database size is once again reduced. Thanks, Mindaugas Tue May 31 11:00:01 EEST 2005 533808 /ora/pgsql/base/465436/ Tue May 31 11:30:01 EEST 2005 567344 /ora/pgsql/base/465436/ Tue May 31 12:00:01 EEST 2005 578632 /ora/pgsql/base/465436/ Tue May 31 12:30:01 EEST 2005 586336 /ora/pgsql/base/465436/ Tue May 31 13:00:01 EEST 2005 594716 /ora/pgsql/base/465436/ Tue May 31 13:30:01 EEST 2005 604932 /ora/pgsql/base/465436/ Tue May 31 14:00:01 EEST 2005 613668 /ora/pgsql/base/465436/ Tue May 31 14:30:01 EEST 2005 625752 /ora/pgsql/base/465436/ Tue May 31 15:00:01 EEST 2005 637704 /ora/pgsql/base/465436/ Tue May 31 15:30:01 EEST 2005 649700 /ora/pgsql/base/465436/ Tue May 31 16:00:01 EEST 2005 657392 /ora/pgsql/base/465436/ Tue May 31 16:30:02 EEST 2005 668228 /ora/pgsql/base/465436/ Tue May 31 17:00:01 EEST 2005 676332 /ora/pgsql/base/465436/ Tue May 31 17:30:01 EEST 2005 686376 /ora/pgsql/base/465436/ Tue May 31 18:00:01 EEST 2005 694080 /ora/pgsql/base/465436/ Tue May 31 18:30:02 EEST 2005 705876 /ora/pgsql/base/465436/ Tue May 31 19:00:01 EEST 2005 713916 /ora/pgsql/base/465436/ Tue May 31 19:30:01 EEST 2005 725460 /ora/pgsql/base/465436/ Tue May 31 20:00:01 EEST 2005 733892 /ora/pgsql/base/465436/ Tue May 31 20:30:01 EEST 2005 745344 /ora/pgsql/base/465436/ Tue May 31 21:00:01 EEST 2005 753048 /ora/pgsql/base/465436/ Tue May 31 21:30:02 EEST 2005 768228 /ora/pgsql/base/465436/ Tue May 31 22:00:01 EEST 2005 804796 /ora/pgsql/base/465436/ Tue May 31 22:30:01 EEST 2005 858840 /ora/pgsql/base/465436/ Tue May 31 23:00:02 EEST 2005 902684 /ora/pgsql/base/465436/ Tue May 31 23:30:01 EEST 2005 939796 /ora/pgsql/base/465436/ Wed Jun 1 00:00:02 EEST 2005 990840 /ora/pgsql/base/465436/ Wed Jun 1 00:30:11 EEST 2005 1005316 /ora/pgsql/base/465436/ Wed Jun 1 01:00:02 EEST 2005 1011408 /ora/pgsql/base/465436/ Wed Jun 1 01:30:01 EEST 2005 1010888 /ora/pgsql/base/465436/ Wed Jun 1 02:00:01 EEST 2005 1010872 /ora/pgsql/base/465436/ Wed Jun 1 02:30:01 EEST 2005 1010784 /ora/pgsql/base/465436/ Wed Jun 1 03:00:02 EEST 2005 1003260 /ora/pgsql/base/465436/ Wed Jun 1 03:30:02 EEST 2005 1003372 /ora/pgsql/base/465436/ Wed Jun 1 04:00:01 EEST 2005 1003380 /ora/pgsql/base/465436/ Wed Jun 1 04:30:01 EEST 2005 426508 /ora/pgsql/base/465436/ Wed Jun 1 05:00:01 EEST 2005 429036 /ora/pgsql/base/465436/ Wed Jun 1 05:30:01 EEST 2005 432156 /ora/pgsql/base/465436/ Wed Jun 1 06:00:01 EEST 2005 433332 /ora/pgsql/base/465436/ Wed Jun 1 06:30:01 EEST 2005 435052 /ora/pgsql/base/465436/ Wed Jun 1 07:00:02 EEST 2005 439908 /ora/pgsql/base/465436/ Wed Jun 1 07:30:01 EEST 2005 450144 /ora/pgsql/base/465436/ Wed Jun 1 08:00:01 EEST 2005 471120 /ora/pgsql/base/465436/ Wed Jun 1 08:30:02 EEST 2005 490712 /ora/pgsql/base/465436/ Wed Jun 1 09:00:01 EEST 2005 501652 /ora/pgsql/base/465436/ Wed Jun 1 09:30:01 EEST 2005 530128 /ora/pgsql/base/465436/ Wed Jun 1 10:00:01 EEST 2005 541580 /ora/pgsql/base/465436/ Wed Jun 1 10:30:01 EEST 2005 571204 /ora/pgsql/base/465436/
"Mindaugas Riauba" <mind@bi.lt> writes: > Our database increases in size 2.5 times during the day. > What to do to avoid this? Autovacuum running with quite > aggressive settings, FSM settings are high enough. First thing I'd suggest is to get a more detailed idea of exactly what is bloating --- which tables/indexes are the problem? regards, tom lane
> > Our database increases in size 2.5 times during the day. > > What to do to avoid this? Autovacuum running with quite > > aggressive settings, FSM settings are high enough. > > First thing I'd suggest is to get a more detailed idea of exactly > what is bloating --- which tables/indexes are the problem? I think the most problematic table is this one. After vacuum full/reindex it was 20MB in size now (after 6 hours) it is already 70MB and counting. vacuum verbose output below. msg_id is integer, next_retry - timestamp, recipient - varchar(20). max_fsm_pages = 200000. Another table has foregn key which referenced msg_id in this one. Thanks, Mindaugas $ vacuumdb -v -z -U postgres -t queue database INFO: vacuuming "queue" INFO: index "queue_msg_id_pk" now contains 110531 row versions in 5304 pages DETAIL: 31454 index row versions were removed. 95 index pages have been deleted, 63 are currently reusable. CPU 0.03s/0.07u sec elapsed 2.50 sec. INFO: index "queue_next_retry" now contains 110743 row versions in 3551 pages DETAIL: 31454 index row versions were removed. 1163 index pages have been deleted, 560 are currently reusable. CPU 0.04s/0.06u sec elapsed 4.93 sec. INFO: index "queue_recipient_idx" now contains 111596 row versions in 1802 pages DETAIL: 31454 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.05u sec elapsed 0.16 sec. INFO: "queue": removed 31454 row versions in 1832 pages DETAIL: CPU 0.00s/0.01u sec elapsed 0.27 sec. INFO: "queue": found 31454 removable, 110096 nonremovable row versions in 9133 pages DETAIL: 119 dead row versions cannot be removed yet. There were 258407 unused item pointers. 0 pages are entirely empty. CPU 0.12s/0.25u sec elapsed 8.20 sec. INFO: analyzing "queue" INFO: "queue": scanned 3000 of 9133 pages, containing 34585 live rows and 1808 dead rows; 3000 rows in sample, 105288 estimated total rows VACUUM
"Mindaugas Riauba" <mind@bi.lt> writes: >> First thing I'd suggest is to get a more detailed idea of exactly >> what is bloating --- which tables/indexes are the problem? > I think the most problematic table is this one. After vacuum full/reindex > it was 20MB in size now (after 6 hours) it is already 70MB and counting. AFAICT the vacuum is doing what it is supposed to, and the problem has to be just that it's not being done often enough. Which suggests either an autovacuum bug or your autovacuum settings aren't aggressive enough. Which PG version is this exactly? Some of the earlier autovacuum releases do have known bugs, so it'd be worth your while to update if you're not on the latest point release of your series. I don't know much about autovacuum settings, but if you'll show what you're using someone can probably comment on them. regards, tom lane
> >> First thing I'd suggest is to get a more detailed idea of exactly > >> what is bloating --- which tables/indexes are the problem? > > > I think the most problematic table is this one. After vacuum full/reindex > > it was 20MB in size now (after 6 hours) it is already 70MB and counting. > > AFAICT the vacuum is doing what it is supposed to, and the problem has > to be just that it's not being done often enough. Which suggests either > an autovacuum bug or your autovacuum settings aren't aggressive enough. -D -d 1 -v 1000 -V 0.5 -a 1000 -A 0.1 -s 10 That is autovacuum settings. Should be aggressive enough I think? > Which PG version is this exactly? Some of the earlier autovacuum > releases do have known bugs, so it'd be worth your while to update > if you're not on the latest point release of your series. 8.0.3 > I don't know much about autovacuum settings, but if you'll show what > you're using someone can probably comment on them. And what in vacuum verbose output suggests that vacuum is not done often enough? Current output (table is 100MB already) is below. Thanks, Mindaugas $ vacuumdb -v -z -U postgres -t queue database INFO: vacuuming "queue" INFO: index "queue_msg_id_pk" now contains 302993 row versions in 18129 pages DETAIL: 102763 index row versions were removed. 1 index pages have been deleted, 1 are currently reusable. CPU 0.87s/0.46u sec elapsed 76.40 sec. INFO: index "queue_next_retry" now contains 310080 row versions in 9092 pages DETAIL: 102763 index row versions were removed. 675 index pages have been deleted, 658 are currently reusable. CPU 0.38s/0.31u sec elapsed 79.47 sec. INFO: index "queue_recipient_idx" now contains 323740 row versions in 2900 pages DETAIL: 102763 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.07s/0.27u sec elapsed 9.06 sec. INFO: "queue": removed 102763 row versions in 9623 pages DETAIL: CPU 0.16s/0.39u sec elapsed 29.26 sec. INFO: "queue": found 102763 removable, 292342 nonremovable row versions in 12452 pages DETAIL: 14 dead row versions cannot be removed yet. There were 183945 unused item pointers. 0 pages are entirely empty. CPU 1.56s/1.51u sec elapsed 194.39 sec. INFO: analyzing "queue" INFO: "queue": scanned 3000 of 12452 pages, containing 72850 live rows and 7537 dead rows; 3000 rows in sample, 302376 estimated total rows VACUUM
Mindaugas Riauba wrote: >>AFAICT the vacuum is doing what it is supposed to, and the problem has >>to be just that it's not being done often enough. Which suggests either >>an autovacuum bug or your autovacuum settings aren't aggressive enough. >> >> > > -D -d 1 -v 1000 -V 0.5 -a 1000 -A 0.1 -s 10 > > That is autovacuum settings. Should be aggressive enough I think? > > Might e aggressive enough, but might not. I have seen some people run -V 0.1. Also you probably don't need -A that low. This could an issue where analyze results in an inaccurate reltuples value which is preventing autovacuum from doing it's job. Could you please run it with -d 2 and show us the relevant log output. >>Which PG version is this exactly? Some of the earlier autovacuum >>releases do have known bugs, so it'd be worth your while to update >>if you're not on the latest point release of your series. >> >> > > 8.0.3 > > That should be fine.
"Mindaugas Riauba" <mind@bi.lt> writes: > And what in vacuum verbose output suggests that vacuum is not done > often enough? Current output (table is 100MB already) is below. The output shows vacuum cleaning up about a third of the table. Usually people like to keep the overhead down to 10% or so ... regards, tom lane
> >>AFAICT the vacuum is doing what it is supposed to, and the problem has > >>to be just that it's not being done often enough. Which suggests either > >>an autovacuum bug or your autovacuum settings aren't aggressive enough. > > > > -D -d 1 -v 1000 -V 0.5 -a 1000 -A 0.1 -s 10 > > > > That is autovacuum settings. Should be aggressive enough I think? > > Might e aggressive enough, but might not. I have seen some people run > -V 0.1. Also you probably don't need -A that low. This could an issue > where analyze results in an inaccurate reltuples value which is > preventing autovacuum from doing it's job. Could you please run it with > -d 2 and show us the relevant log output. Relevant parts are below. And we had to set so aggressive analyze because otherwise planer statistics were getting old too fast. As I said table has very high turnover most of the records live here only for a few seconds. And one more question - anyway why table keeps growing? It is shown that it occupies <10000 pages and max_fsm_pages = 200000 so vacuum should keep up with the changes? Or is it too low according to pg_class system table? What should be the reasonable value? select sum(relpages) from pg_class; sum ------- 77994 (1 row) Thanks, Mindaugas [2005-06-03 09:30:31 EEST] DEBUG: Performing: ANALYZE "queue" [2005-06-03 09:30:31 EEST] INFO: table name: database."queue" [2005-06-03 09:30:31 EEST] INFO: relid: 465440; relisshared: 0 [2005-06-03 09:30:31 EEST] INFO: reltuples: 98615.000000; relpages: 6447 [2005-06-03 09:30:31 EEST] INFO: curr_analyze_count: 39475111; curr_vacuum_count: 30 953987 [2005-06-03 09:30:31 EEST] INFO: last_analyze_count: 39475111; last_vacuum_count: 30 913733 [2005-06-03 09:30:31 EEST] INFO: analyze_threshold: 10861; vacuum_threshold: 43700 [2005-06-03 09:31:11 EEST] DEBUG: Performing: VACUUM ANALYZE "queue" [2005-06-03 09:31:12 EEST] INFO: table name: database."queue" [2005-06-03 09:31:12 EEST] INFO: relid: 465440; relisshared: 0 [2005-06-03 09:31:12 EEST] INFO: reltuples: 99355.000000; relpages: 6447 [2005-06-03 09:31:12 EEST] INFO: curr_analyze_count: 39480332; curr_vacuum_count: 30 957872 [2005-06-03 09:31:12 EEST] INFO: last_analyze_count: 39480332; last_vacuum_count: 30 957872 [2005-06-03 09:31:12 EEST] INFO: analyze_threshold: 10935; vacuum_threshold: 50677
Mindaugas Riauba wrote: >>Might e aggressive enough, but might not. I have seen some people run >>-V 0.1. Also you probably don't need -A that low. This could an issue >>where analyze results in an inaccurate reltuples value which is >>preventing autovacuum from doing it's job. Could you please run it with >>-d 2 and show us the relevant log output. >> >> > > Relevant parts are below. And we had to set so aggressive analyze because >otherwise planer statistics were getting old too fast. As I said table has >very >high turnover most of the records live here only for a few seconds. > > Looked like pg_autovacuum is operating as expected. One of the annoying limitations of pg_autovacuum in current releases is that you can't set thresholds on a per table basis. It looks like this table might require an even more aggressive vacuum threshold. Couple of thoughts, are you sure it's the table that is growing and not the indexes? (assuming this table has indexes on it). > And one more question - anyway why table keeps growing? It is shown that >it occupies ><10000 pages and max_fsm_pages = 200000 so vacuum should keep up with the >changes? >Or is it too low according to pg_class system table? What should be the >reasonable value? > > Does the table keep growing? Or does it grow to a point an then stop growing? It's normal for a table to operate at a steady state size that is bigger that it's fresly "vacuum full"'d size. And with -V set at 0.5 it should be at a minimum 50% larger than it's minimum size. Your email before said that this table went from 20M to 70M but does it keep going? Perhaps it would start leveling off at this point, or some point shortly there-after. Anyway, I'm not sure if there is something else going on here, but from the log it looks as though pg_autovacuum is working as advertised.
> "Mindaugas Riauba" <mind@bi.lt> writes: >> And what in vacuum verbose output suggests that vacuum is not done >> often enough? Current output (table is 100MB already) is below. > > The output shows vacuum cleaning up about a third of the table. Usually > people like to keep the overhead down to 10% or so ... He was running with -V 0.5 which should transalate to roughly 50% of the table being touched before a vacuum is issued.j Matt
Mindaugas Riauba wrote: > Hello, > > Our database increases in size 2.5 times during the day. > What to do to avoid this? Autovacuum running with quite > aggressive settings, FSM settings are high enough. > > Database size should be more or less constant but it > has high turnover rate (100+ insert/update/delete per second). > > Below is "du -sk" of database dir during the day. On 4:05 > full vacuum+reindex runs and database size is once again > reduced. > > Thanks, > > Mindaugas > > Tue May 31 11:00:01 EEST 2005 > 533808 /ora/pgsql/base/465436/ > Tue May 31 11:30:01 EEST 2005 > 567344 /ora/pgsql/base/465436/ > Tue May 31 12:00:01 EEST 2005 > 578632 /ora/pgsql/base/465436/ > Tue May 31 12:30:01 EEST 2005 > 586336 /ora/pgsql/base/465436/ > Tue May 31 13:00:01 EEST 2005 > 594716 /ora/pgsql/base/465436/ > Tue May 31 13:30:01 EEST 2005 > 604932 /ora/pgsql/base/465436/ > Tue May 31 14:00:01 EEST 2005 > 613668 /ora/pgsql/base/465436/ > Tue May 31 14:30:01 EEST 2005 > 625752 /ora/pgsql/base/465436/ > Tue May 31 15:00:01 EEST 2005 > 637704 /ora/pgsql/base/465436/ > Tue May 31 15:30:01 EEST 2005 > 649700 /ora/pgsql/base/465436/ > Tue May 31 16:00:01 EEST 2005 > 657392 /ora/pgsql/base/465436/ > Tue May 31 16:30:02 EEST 2005 > 668228 /ora/pgsql/base/465436/ > Tue May 31 17:00:01 EEST 2005 > 676332 /ora/pgsql/base/465436/ > Tue May 31 17:30:01 EEST 2005 > 686376 /ora/pgsql/base/465436/ > Tue May 31 18:00:01 EEST 2005 > 694080 /ora/pgsql/base/465436/ > Tue May 31 18:30:02 EEST 2005 > 705876 /ora/pgsql/base/465436/ > Tue May 31 19:00:01 EEST 2005 > 713916 /ora/pgsql/base/465436/ > Tue May 31 19:30:01 EEST 2005 > 725460 /ora/pgsql/base/465436/ > Tue May 31 20:00:01 EEST 2005 > 733892 /ora/pgsql/base/465436/ > Tue May 31 20:30:01 EEST 2005 > 745344 /ora/pgsql/base/465436/ > Tue May 31 21:00:01 EEST 2005 > 753048 /ora/pgsql/base/465436/ > Tue May 31 21:30:02 EEST 2005 > 768228 /ora/pgsql/base/465436/ > Tue May 31 22:00:01 EEST 2005 > 804796 /ora/pgsql/base/465436/ > Tue May 31 22:30:01 EEST 2005 > 858840 /ora/pgsql/base/465436/ > Tue May 31 23:00:02 EEST 2005 > 902684 /ora/pgsql/base/465436/ > Tue May 31 23:30:01 EEST 2005 > 939796 /ora/pgsql/base/465436/ > Wed Jun 1 00:00:02 EEST 2005 > 990840 /ora/pgsql/base/465436/ > Wed Jun 1 00:30:11 EEST 2005 > 1005316 /ora/pgsql/base/465436/ > Wed Jun 1 01:00:02 EEST 2005 > 1011408 /ora/pgsql/base/465436/ > Wed Jun 1 01:30:01 EEST 2005 > 1010888 /ora/pgsql/base/465436/ > Wed Jun 1 02:00:01 EEST 2005 > 1010872 /ora/pgsql/base/465436/ > Wed Jun 1 02:30:01 EEST 2005 > 1010784 /ora/pgsql/base/465436/ > Wed Jun 1 03:00:02 EEST 2005 > 1003260 /ora/pgsql/base/465436/ > Wed Jun 1 03:30:02 EEST 2005 > 1003372 /ora/pgsql/base/465436/ > Wed Jun 1 04:00:01 EEST 2005 > 1003380 /ora/pgsql/base/465436/ > Wed Jun 1 04:30:01 EEST 2005 > 426508 /ora/pgsql/base/465436/ > Wed Jun 1 05:00:01 EEST 2005 > 429036 /ora/pgsql/base/465436/ > Wed Jun 1 05:30:01 EEST 2005 > 432156 /ora/pgsql/base/465436/ > Wed Jun 1 06:00:01 EEST 2005 > 433332 /ora/pgsql/base/465436/ > Wed Jun 1 06:30:01 EEST 2005 > 435052 /ora/pgsql/base/465436/ > Wed Jun 1 07:00:02 EEST 2005 > 439908 /ora/pgsql/base/465436/ > Wed Jun 1 07:30:01 EEST 2005 > 450144 /ora/pgsql/base/465436/ > Wed Jun 1 08:00:01 EEST 2005 > 471120 /ora/pgsql/base/465436/ > Wed Jun 1 08:30:02 EEST 2005 > 490712 /ora/pgsql/base/465436/ > Wed Jun 1 09:00:01 EEST 2005 > 501652 /ora/pgsql/base/465436/ > Wed Jun 1 09:30:01 EEST 2005 > 530128 /ora/pgsql/base/465436/ > Wed Jun 1 10:00:01 EEST 2005 > 541580 /ora/pgsql/base/465436/ > Wed Jun 1 10:30:01 EEST 2005 > 571204 /ora/pgsql/base/465436/ > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > run autovacuum more often.
> Looked like pg_autovacuum is operating as expected. One of the annoying > limitations of pg_autovacuum in current releases is that you can't set > thresholds on a per table basis. It looks like this table might require > an even more aggressive vacuum threshold. Couple of thoughts, are you > sure it's the table that is growing and not the indexes? (assuming this > table has indexes on it). Yes I am sure (oid2name :) ). > > And one more question - anyway why table keeps growing? It is shown that > >it occupies > ><10000 pages and max_fsm_pages = 200000 so vacuum should keep up with the > >changes? > >Or is it too low according to pg_class system table? What should be the > >reasonable value? > > > > > > Does the table keep growing? Or does it grow to a point an then stop > growing? It's normal for a table to operate at a steady state size that > is bigger that it's fresly "vacuum full"'d size. And with -V set at 0.5 > it should be at a minimum 50% larger than it's minimum size. Your email > before said that this table went from 20M to 70M but does it keep > going? Perhaps it would start leveling off at this point, or some point > shortly there-after. Yes it keeps growing. And the main problem is that performance starts to suffer from that. Do not forget that we are talking about 100+ insert/ update/select/delete cycles per second. > Anyway, I'm not sure if there is something else going on here, but from > the log it looks as though pg_autovacuum is working as advertised. Something is out there :). But how to fix that bloat? More aggressive autovacuum settings? Even larger FSM? Do not know if that matters but database has very many connections to it (400-600) and clients are doing mostly asynchronous operations. How to find out where this extra space gone? Thanks, Mindaugas