Thread: Identifying bloated tables
I just put together a view, which helps us in indentifying which database tables are suffering from space bloat, ie. they take up much more space than they actually should. I though this might be useful for some folk here, because the questions about bloat-related performance degradation are quite common. When using this view, you are interested in tables, which have the "bloat" column higher that say 2.0 (in freshly dump/restored/analyzed database they should all be around 1.0). The bloat problem can be one-time fixed either by VACUUM FULL or CLUSTER, but if the problem is coming back after while, you should consider doing VACUUM more often or increasing you FSM settings in postgresql.conf. I hope I did the view right, it is more or less accurate, for our purposes (for tables of just few pages the numbers may be off, but then again, you are usually not much concerned about these tiny 5-page tables performance-wise). Hope this helps someone. Here comes the view. CREATE OR REPLACE VIEW "public"."relbloat" ( nspname, relname, reltuples, relpages, avgwidth, expectedpages, bloat, wastedspace) AS SELECT pg_namespace.nspname, pg_class.relname, pg_class.reltuples, pg_class.relpages, rowwidths.avgwidth, ceil(((pg_class.reltuples * (rowwidths.avgwidth)::double precision) / (current_setting('block_size'::text))::double precision)) AS expectedpages, ((pg_class.relpages)::double precision / ceil(((pg_class.reltuples * (rowwidths.avgwidth)::double precision) / (current_setting('block_size'::text))::double precision))) AS bloat, ceil(((((pg_class.relpages)::double precision * (current_setting('block_size'::text))::double precision) - ceil((pg_class.reltuples * (rowwidths.avgwidth)::double precision))) / (1024)::double precision)) AS wastedspace FROM ((( SELECT pg_statistic.starelid, sum(pg_statistic.stawidth) AS avgwidth FROM pg_statistic GROUP BY pg_statistic.starelid ) rowwidths JOIN pg_class ON ((rowwidths.starelid = pg_class.oid))) JOIN pg_namespace ON ((pg_namespace.oid = pg_class.relnamespace))) WHERE (pg_class.relpages > 1); Bye. -- Michal Táborský IT operations chief Internet Mall, a.s. <http://www.MALL.cz>
On Mon, 2006-08-28 at 16:39 +0200, Michal Taborsky - Internet Mall wrote: > I just put together a view, which helps us in indentifying which > database tables are suffering from space bloat, ie. they take up much > more space than they actually should. I though this might be useful for > some folk here, because the questions about bloat-related performance > degradation are quite common. Are you sure you haven't reinvented the wheel? Have you checked out contrib/pgstattuple ? Brad.
Brad Nicholson napsal(a): >> I just put together a view, which helps us in indentifying which >> database tables are suffering from space bloat, ie. they take up much > Are you sure you haven't reinvented the wheel? Have you checked out > contrib/pgstattuple ? Well, I wasn't aware of it, so I guess I did reinvent the wheel. I Googled for a solution to this problem, but Googled poorly I suppose. On the other hand, pgstattuple might be a bit difficult to use for not-so-experienced users in answering the question "Which table should I shrink?", as you have to first install it from contrib and then come up with a select to pick the "worst" relations. Anyway, if someone finds this view useful, good. If not, ignore it. Bye. -- Michal Táborský IT operations chief Internet Mall, a.s. <http://www.MALL.cz>
Brad Nicholson wrote: > On Mon, 2006-08-28 at 16:39 +0200, Michal Taborsky - Internet Mall > wrote: > > I just put together a view, which helps us in indentifying which > > database tables are suffering from space bloat, ie. they take up much > > more space than they actually should. I though this might be useful for > > some folk here, because the questions about bloat-related performance > > degradation are quite common. > > Are you sure you haven't reinvented the wheel? Have you checked out > contrib/pgstattuple ? Actually, pgstattuple needs to scan the whole table, so I think having a cheap workaround that gives approximate figures is a good idea anyway. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Hi, Michal, Michal Taborsky - Internet Mall wrote: > When using this view, you are interested in tables, which have the > "bloat" column higher that say 2.0 (in freshly dump/restored/analyzed > database they should all be around 1.0). I just noticed some columns in pg_catalog with a bloat value <1 and a negative "wasted space" - is this due to the pseudo nature of them? Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
Markus Schaber napsal(a): > Hi, Michal, > > Michal Taborsky - Internet Mall wrote: > >> When using this view, you are interested in tables, which have the >> "bloat" column higher that say 2.0 (in freshly dump/restored/analyzed >> database they should all be around 1.0). > > I just noticed some columns in pg_catalog with a bloat value <1 and a > negative "wasted space" - is this due to the pseudo nature of them? It is more likely due to the fact, that these numbers are just estimates, based on collected table statistics, so for small or non-standard tables the statistical error is greater that the actual value. You are usually not interested in tables, which have wasted space of 1000kB or -1000kB. Also the database must be ANALYZEd properly for these numbers to carry any significance. -- Michal Táborský IT operations chief Internet Mall, a.s. Internet Mall - obchody, které si oblíbíte <http://www.MALL.cz>
On 28/08/06, Michal Taborsky - Internet Mall <michal.taborsky@mall.cz> wrote: > Markus Schaber napsal(a): > > Hi, Michal, > > > > Michal Taborsky - Internet Mall wrote: > > > >> When using this view, you are interested in tables, which have the > >> "bloat" column higher that say 2.0 (in freshly dump/restored/analyzed > >> database they should all be around 1.0). > > > > I just noticed some columns in pg_catalog with a bloat value <1 and a > > negative "wasted space" - is this due to the pseudo nature of them? > > It is more likely due to the fact, that these numbers are just > estimates, based on collected table statistics, so for small or > non-standard tables the statistical error is greater that the actual > value. You are usually not interested in tables, which have wasted space > of 1000kB or -1000kB. Also the database must be ANALYZEd properly for > these numbers to carry any significance. > I was just playing around with this table and noticed it preforms the badly in tables with very small record sizes. This seams to be because it ignores the system overhead (oid, xmin ctid etc) which seams to be about 28 bytes per a record this can be quite significate in small record tables and can cause trouble even with a smal numbers of record. Hence I've got a table thats static and fresly "vacuum full" which reads with a bloat of 4. Easy to recreate problem to Create table regionpostcode (area varchar(4), regionid int); then insert 120000 records. Peter.