Thread: Bloated Table
Hello everybody. I'm using a view (http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html) to display the bloat (unused disk space) ofthe tables in a PostgreSQL database. Using this data I want to implement a database maintenance script automatically exectutinga VACUUM FULL on these tables. Unfortunately I am finding a table to have bloat which can't be reclaimed. I have tried VACUUM, REINDEX, VACUUM FULL ANALYZEwith REINDEX, and even dump and restore. The view always shows 375MB of bloat for the table. Is this normal? Here's the table structure: Table "public.foobar_log" Column | Type | Modifiers ------------+--------------------------+-------------------------------- ------------+--------------------------+--------------------------- foorbarid | integer | not null default nextval('foobar_log_id_seq'::regclass) created_at | timestamp with time zone | not null foo | character varying(50) | not null bar | character varying(16) | not null chit | integer | not null chat | boolean | not null default false Indexes: "bar_index" btree (bar) "foobarid_foobar_log_key" btree (foobarid) "chit_foobar_log_key" btree (chit) The table consists of approximately 2.4 million entries. Any help is appreciated. Kind regards, Alex
On Wed, May 27, 2009 at 3:54 PM, Alexander Schöcke <asc@turtle-entertainment.de> wrote: > Hello everybody. > > I'm using a view (http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html) to display the bloat (unused disk space)of the tables in a PostgreSQL database. Using this data I want to implement a database maintenance script automaticallyexectuting a VACUUM FULL on these tables. > Unfortunately I am finding a table to have bloat which can't be reclaimed. I have tried VACUUM, REINDEX, VACUUM FULL ANALYZEwith REINDEX, and even dump and restore. The view always shows 375MB of bloat for the table. > > Is this normal? yes it is normal, if you do updates on it, and it doesn't use HOT (for instance if you update indexed column), now, in real life you shouldn't really ever need to vacuum full. Reindex - yes, but not vacuum full. Make sure, that you adjust fsm_* settings in your config, best thing is tu run vacuum analyze verbose, and see if it warns you about it being too small. -- GJ
In response to Alexander Schöcke <asc@turtle-entertainment.de>: > Hello everybody. > > I'm using a view (http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html) to display the bloat (unused disk space)of the tables in a PostgreSQL database. Using this data I want to implement a database maintenance script automaticallyexectuting a VACUUM FULL on these tables. > Unfortunately I am finding a table to have bloat which can't be reclaimed. I have tried VACUUM, REINDEX, VACUUM FULL ANALYZEwith REINDEX, and even dump and restore. The view always shows 375MB of bloat for the table. > > Is this normal? Here's the table structure: > > Table "public.foobar_log" > Column | Type | Modifiers > ------------+--------------------------+-------------------------------- > ------------+--------------------------+--------------------------- > foorbarid | integer | not null default nextval('foobar_log_id_seq'::regclass) > created_at | timestamp with time zone | not null > foo | character varying(50) | not null > bar | character varying(16) | not null > chit | integer | not null > chat | boolean | not null default false > Indexes: > "bar_index" btree (bar) > "foobarid_foobar_log_key" btree (foobarid) > "chit_foobar_log_key" btree (chit) > > > The table consists of approximately 2.4 million entries. > > Any help is appreciated. What is the output of VACUUM VERBOSE foobar_log? -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
=?iso-8859-1?Q?Alexander_Sch=F6cke?= <asc@turtle-entertainment.de> writes: > I'm using a view > (http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html) to > display the bloat (unused disk space) of the tables in a PostgreSQL > database. I wouldn't trust the calculations that view does in the least. You might look at contrib/pgstattuple if you want numbers that have some relationship to reality (and are correspondingly more expensive to get :-() regards, tom lane
>What is the output of VACUUM VERBOSE foobar_log? INFO: vacuuming "public.foobar_log" INFO: index "bar_index" now contains 23832007 row versions in 118151 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.02s/0.00u sec elapsed 64.10 sec. INFO: index "foobarid_foobar_log_key" now contains 23832007 row versions in 65347 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.02s/0.00u sec elapsed 32.60 sec. INFO: index "foo_foobar_log_key" now contains 23832007 row versions in 65347 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.03s/0.00u sec elapsed 25.14 sec. INFO: "foobar_log": found 0 removable, 23832007 nonremovable row versions in 269835 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 1 pages contain useful free space. 0 pages are entirely empty. CPU 0.16s/0.09u sec elapsed 299.55 sec. VACUUM
On Wed, 2009-05-27 at 11:15 -0400, Tom Lane wrote: > =?iso-8859-1?Q?Alexander_Sch=F6cke?= <asc@turtle-entertainment.de> writes: > > I'm using a view > > (http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html) to > > display the bloat (unused disk space) of the tables in a PostgreSQL > > database. > > I wouldn't trust the calculations that view does in the least. > You might look at contrib/pgstattuple if you want numbers that > have some relationship to reality (and are correspondingly more > expensive to get :-() > > regards, tom lane Is the referenced query reliable for even estimating, or is it flat our wrong? Co-workers that were PGCon are saying that this is becoming a popular/accepted way to check for bloated tables. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.
Brad Nicholson wrote: > On Wed, 2009-05-27 at 11:15 -0400, Tom Lane wrote: > > =?iso-8859-1?Q?Alexander_Sch=F6cke?= <asc@turtle-entertainment.de> writes: > > > I'm using a view > > > (http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html) to > > > display the bloat (unused disk space) of the tables in a PostgreSQL > > > database. > > > > I wouldn't trust the calculations that view does in the least. > > You might look at contrib/pgstattuple if you want numbers that > > have some relationship to reality (and are correspondingly more > > expensive to get :-() > > Is the referenced query reliable for even estimating, or is it flat our > wrong? > > Co-workers that were PGCon are saying that this is becoming a > popular/accepted way to check for bloated tables. If "ma" is supposed to be "maxalign", then this code is broken because it only reports mingw32 as 8, all others as 4, which is wrong. However I think the big problem is that it relies on pg_class.relpages and reltuples which are only accurate just after VACUUM, only a sample-based estimate just after ANALYZE, and wrong at any other time (assuming the table has any movement). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Brad Nicholson wrote: >> On Wed, 2009-05-27 at 11:15 -0400, Tom Lane wrote: >>> I wouldn't trust the calculations that view does in the least. > If "ma" is supposed to be "maxalign", then this code is broken because > it only reports mingw32 as 8, all others as 4, which is wrong. I didn't bother to go through every detail of the calculations, but there are multiple small errors there. (Handling alignment for the tuple header and not anyplace else is pretty pointless, for instance, even if you had the correct alignment number for the machine.) However ... > However I think the big problem is that it relies on pg_class.relpages > and reltuples which are only accurate just after VACUUM, only a > sample-based estimate just after ANALYZE, and wrong at any other time > (assuming the table has any movement). That's big problem number one, and big problem number two is that it has no good idea of the width of variable-width fields. (Should I even mention TOAST?) It's an interesting exercise in trying to estimate bloat without groveling through the whole relation, but I seriously doubt you could ever get numbers this way that are trustworthy enough to drive maintenance decisions. regards, tom lane
Tom Lane wrote: > It's an interesting exercise in trying to estimate bloat without > groveling through the whole relation, but I seriously doubt you could > ever get numbers this way that are trustworthy enough to drive > maintenance decisions. Well, pg_stattuple is way too expensive to be running every minute ... -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> It's an interesting exercise in trying to estimate bloat without >> groveling through the whole relation, but I seriously doubt you could >> ever get numbers this way that are trustworthy enough to drive >> maintenance decisions. > Well, pg_stattuple is way too expensive to be running every minute ... Agreed. I was idly wondering whether it'd be useful to have an option in pg_stattuple to sample some small fraction of the pages in a relation, rather than exhaustively examining each one. It'd be subject to all the normal ills of sampling, of course, but I'd still trust it way more than this view. (Especially noting that the view is already dependent on sampling anyway, since it's looking at pg_stats...) regards, tom lane
On Wed, 27 May 2009, Tom Lane wrote: > It's an interesting exercise in trying to estimate bloat without > groveling through the whole relation Interesting and an extremely common request. I just added an item to the Vacuum section of the TODO list while you were listing issues and potential solutions here: "Provide more information in order to improve user-side estimates of dead space bloat in relations" links to this thread. As pointed out upthread, there's a fair number of people who have picked up on this particular query for bloat estimation. It seemed accurate enough for gross maintenance use when I did a quick check of its results before, but those were tables without variable widths, TOAST, etc. This is one of those boring tasks that DBAs really want more monitoring-friendly visibility into. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On May 27, 2009, at 3:25 PM, Greg Smith wrote: > Interesting and an extremely common request. I just added an item > to the Vacuum section of the TODO list while you were listing issues > and potential solutions here: "Provide more information in order to > improve user-side estimates of dead space bloat in relations" links > to this thread. Given this, is there is a recommended way of detecting bloat in indices automatically, rather than just looking at them once in a while and saying, "Hm, that looks pretty big"?
Christophe <xof@thebuild.com> writes: > Given this, is there is a recommended way of detecting bloat in > indices automatically, rather than just looking at them once in a > while and saying, "Hm, that looks pretty big"? contrib/pgstattuple works; it's just too expensive to run every few minutes ... regards, tom lane