Thread: Vacuuming on heavily changed databases
Hello, I would like to ask an opinion on vacuuming general. Imagine situation that you have single table with 5 fields (one varchar). This table has during the day - cca 620 000 inserts - 0 updates - cca 620 000 deletes The table is vacuumed daily, but somehow after several months I got to size of ~50GB Result of VACUUM FULL VERBOSE ANALYZE is: Nonremovable row versions range from 102 to 315 bytes long. There were 218253801 unused item pointers. Total free space (including removable row versions) is 40627058888 bytes. 4850610 pages are or will become empty, including 0 at the end of the table. 5121624 pages containing 40625563500 free bytes are potential move destinations. CPU 161.85s/35.51u sec elapsed 1191.17 sec. This means 80% wasted space that could be reused. Right now, I am doing vacuum full but this requires exclusive lock. During that time the database is locked so I am missing "inserts and deletes" ;-) I would like to avoid this in future, so I would like to prepare strategy how to do it next time or avoid. Basically I have the follwing limitations: 1) sometimes deletes vs vacuum analyze does not help, extra space is not relcaimed. Do not know why this is happening, but maybe vacuum cannot get lock 2) manualy evoked vacuum full requires bringing database long-time offline 3) There were suggestions (in archives) doing dump and then restore on dropped database, but still requires downtime. What would be your strategy for the database maintenance like this? What tweaking of vacuuming can I make, so I do not get those "forgotten" records? Thank you, Bohdan
Hello, > I would like to ask an opinion on vacuuming general. Imagine situation > that you have single table with 5 fields (one varchar). This table has > during the day > > - cca 620 000 inserts > - 0 updates > - cca 620 000 deletes > > The table is vacuumed daily, but somehow after several months I got to > size of ~50GB do not vacuum DAILY. set up autovacuum to run AT LEAST every minute. autovacuum will flag the "deleted" rows as to be reusable by next insert. Make sure to use 8.3.<latest>, it's much more easy to setup autovacuum then before. best wishes, Harald -- GHUM Harald Massa persuadere et programmare Harald Armin Massa Spielberger Straße 49 70435 Stuttgart 0173/9409607 no fx, no carrier pidgeon - EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!
Apart from reinterating what someone else: you're not vacuuming anywhere near often often. Normal vacuum takes no locks. On Mon, May 19, 2008 at 04:50:18PM +0200, Bohdan Linda wrote: > 3) There were suggestions (in archives) doing dump and then restore on > dropped database, but still requires downtime. CLUSTER will rebuild the table with only the actually used records in it. It also requires an exclusive lock so try to keep it small :) Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
On Mon, May 19, 2008 at 04:59:42PM +0200, Harald Armin Massa wrote: > do not vacuum DAILY. set up autovacuum to run AT LEAST every minute. > autovacuum will flag the "deleted" rows as to be reusable by next > insert. Make sure to use 8.3.<latest>, it's much more easy to setup > autovacuum then before. Hello Harald, Thank you, will look at that. My problem is I have to use 8.0.x, but it should be supported. Regards, Bohdan
On Mon, May 19, 2008 at 04:59:42PM +0200, Harald Armin Massa wrote: > On Mon, May 19, 2008 at 04:50:18PM +0200, Bohdan Linda wrote: > > I would like to ask an opinion on vacuuming general. Imagine situation > > that you have single table with 5 fields (one varchar). This table has > > during the day > > > > - cca 620 000 inserts > > - 0 updates > > - cca 620 000 deletes > > > > The table is vacuumed daily, but somehow after several months I got to > > size of ~50GB > > do not vacuum DAILY. set up autovacuum to run AT LEAST every minute. This will help if the changes to the database are evenly distributed throughout the day, if they're very spiky then you may want to run a vacuum after one of these bulk changes. I believe that on larger databases for those tables that are incurring heavy modification the admin would configure an associated session whose sole responsibility would be to issue a never ending stream of VACUUMs. This tends to imply larger RAID arrays that can tolerate multiple concurrent read/write requests. I.e. something like the following, but with some error checking: for t in foo bar baz do ( while echo "VACUUM $t;" && false ; do true ; done | psql ) & done As an ongoing thing, PG records partially filled pages in its so-called "Free Space Map". The FSM gets updated by VACUUM and is used by the data modification statements to find places where new data can be written. If the FSM is too small then PG will grow the table even though there is free space in the table (because it doesn't know it exists). VACUUM will normally give error messages about the FSM map being too small, that and the fact that you didn't mentioned this makes me think you have already found an optimum value here. Sam
On Mon, May 19, 2008 at 06:21:18PM +0100, Sam Mason wrote: > for t in foo bar baz > do ( while echo "VACUUM $t;" && false ; do true ; done | psql ) & > done oops, that "&& false" shouldn't be there! like like this: for t in foo bar baz do ( while echo "VACUUM $t;" ; do true ; done | psql ) & done Sam
On Mon, May 19, 2008 at 8:50 AM, Bohdan Linda <bohdan.linda@seznam.cz> wrote: > Hello, > > I would like to ask an opinion on vacuuming general. Imagine situation > that you have single table with 5 fields (one varchar). This table has > during the day > > - cca 620 000 inserts > - 0 updates > - cca 620 000 deletes > > The table is vacuumed daily, but somehow after several months I got to > size of ~50GB OK. Assuming that the 50G is mostly dead space, there are a few possibilities that could be biting you here, but the most likely one is that your Free Space Map settings aren't high enough to include all the rows that have been deleted since the last vacuum was run. If you can't take down the server to change those settings, then running vacuum more often will help. The autovacuum daemon is your friend. Even with the default non aggresive settings it comes with, it would have caught this long before now. Note that I no longer have 8.0 installed anywhere, only 8.1. But I know that 8.1 supported vacuum costing features to let you slow down vacuum so it doesn't stomp on your I/O to the detriment of all the other pgsql processes. Look for those settings. You can adjust those and just reload.
Hello I have some similar situation like Yours,we're using at the moment PG 8.2.0. At the moment we do manually vacuum (one or more times to minimize 'dead' data/tuples),and if necessary we do 'full' vacuum. On heavy-updated PG,one surely must think of this procedures because they are considered to be 'daily maintenance routine'. Still haven't migrated to PG 8.3,which is planned,but one way to look at this problem is to have 'timeout aware applications',meaning when You fire up vacuum or some other command that will lock some (or all data), You application does not stop operating but put itself into a little 'sleep' until data became available again. Therefore Your frontend (apps,clients,whatsoever) will observe only a small glitch and not a corruption in database connectivity,operations,etc. Also bear in mind that more TPS,more 'dead' data/tuples You will have,meaning the following: 1. Ordinary inserts insert into foo (column1,column2) values (val1,val2); insert into foo (column1,column2) values (val3,val4); insert into foo (column1,column2) values (val5,val6); 3 separated transaction,guess it means 3 'dead' tuples ? 2. Multi-insert command insert into foo (column1,column2) values (val1,val2),(val3,val4),(val5,val6); 1 transaction,guess it means 1 'dead' tuples ? I'm not sure about this,but guess somebody will correct me if I'm wrong :) The questions are: Is number of transactions related to the number of 'dead' rows in PG ? Meaning less transactions,means less frequently vacuum needed for same amount of data ? Sincerely Dragan Bohdan Linda wrote: > Hello, > > I would like to ask an opinion on vacuuming general. Imagine situation > that you have single table with 5 fields (one varchar). This table has > during the day > > - cca 620 000 inserts > - 0 updates > - cca 620 000 deletes >
Sorry for top-posting -- challenged reader, but less challenged than running 8.2.0 -- upgrade to latest release -- 8.2.7! There were many fixes after the initial release and you're risking some bad mojo.
Greg Williamson
Senior DBA
DigitalGlobe
Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
(My corporate masters made me say this.)
-----Original Message-----
From: pgsql-general-owner@postgresql.org on behalf of Dragan Zubac
Sent: Mon 5/19/2008 4:44 PM
To: Bohdan Linda
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Vacuuming on heavily changed databases
> Hello
>
> I have some similar situation like Yours,we're using at the moment PG
> 8.2.0.
<...>
On Mon, May 19, 2008 at 5:44 PM, Dragan Zubac <zubac@vlayko.tv> wrote: > Hello > > I have some similar situation like Yours,we're using at the moment PG 8.2.0. As Gregory above mentioned, update NOW to 8.2.7. It only takes minutes to do. > At the moment we do manually vacuum (one or more times to minimize 'dead' > data/tuples),and if necessary we do 'full' vacuum. Think of vacuum as regular maintenance, and full vacuum as emergency fixing because regular vacuum wasn't run often enough. Also, look at running the autovacuum daemon. It does a great job for most people. > Still haven't migrated to PG 8.3,which is > planned,but one way to look at this problem is to have 'timeout aware > applications',meaning when You fire up vacuum or some other command that > will lock some (or all data), Regular vacuum does not lock the table (well for more than a split second anyway). However, vacuum without a cost delay may chew up so much of your I/O that performance suffers . Which is why you've got cost delay settings in postgresql.conf to adjust it. > Also bear in mind that more TPS,more 'dead' data/tuples You will > have,meaning the following: > > 1. Ordinary inserts > > insert into foo (column1,column2) values (val1,val2); > insert into foo (column1,column2) values (val3,val4); > insert into foo (column1,column2) values (val5,val6); > > 3 separated transaction,guess it means 3 'dead' tuples ? No, inserts (at least the ones that don't fail) don't create dead tuples. Updates and deletes do. > 2. Multi-insert command > > insert into foo (column1,column2) values > (val1,val2),(val3,val4),(val5,val6); > > 1 transaction,guess it means 1 'dead' tuples ? No, for the reasons above. However, if they were updates, like this: begin; update... update... update... commit; and each one updated one row, you'd have three dead tuples. > Is number of transactions related to the number of 'dead' rows in PG ? > Meaning less transactions,means less frequently vacuum needed for same > amount of data ? each tuple replaced by an update, or deleted by a delete, or created by a failed insert will be a single dead tuple. The number of transactions means nothing. But the point about vacuuming stands. More frequent updates / deletes / failed inserts require more frequent vacuums. The autovacuum daemon can handle this for ya for the most part, but some busy systems will outrun it sometimes. Also, make sure you have enough free space map entries to cover all your dead tuples or they can't be reclaimed.
On Mon, May 19, 2008 at 08:38:09PM +0200, Scott Marlowe wrote: > OK. Assuming that the 50G is mostly dead space, there are a few > possibilities that could be biting you here, but the most likely one > is that your Free Space Map settings aren't high enough to include all > the rows that have been deleted since the last vacuum was run. If you > can't take down the server to change those settings, then running > vacuum more often will help. > > The autovacuum daemon is your friend. Even with the default non > aggresive settings it comes with, it would have caught this long > before now. I can bring down the DB for short time, but I am stuct with 8.0. Found that autovacuum is part of contrib, thus will try Thank you all for the opinion Regards, Bohdan