Thread: Identifying diskspace leakage

Identifying diskspace leakage

From

"Ed L."

Date:

14 May 2004, 14:13:40

I am trying to identify tables with significant diskspace "leakage" due to
in appropriately low max_fsm_pages settings.  I can see the results of
VACUUM ANALYZE VERBOSE output counts of tuples and unused tuples, and
understand that (1 - (tuples/unused)) is the amount of diskspace available
to be reclaimed with a VACUUM FULL or dump/reload.

Is there a way to identify the numbers of unused tuples without performing a
VACUUM?  Is it stored in a system table anywhere?  Other ideas on how to
identify disk bloat short of forcing downtime?

TIA.

Re: Identifying diskspace leakage

From

"Jeffrey W. Baker"

Date:

14 May 2004, 14:47:47

On Fri, 2004-05-14 at 10:10, Ed L. wrote:
> I am trying to identify tables with significant diskspace "leakage" due to
> in appropriately low max_fsm_pages settings.  I can see the results of
> VACUUM ANALYZE VERBOSE output counts of tuples and unused tuples, and
> understand that (1 - (tuples/unused)) is the amount of diskspace available
> to be reclaimed with a VACUUM FULL or dump/reload.
>
> Is there a way to identify the numbers of unused tuples without performing a
> VACUUM?  Is it stored in a system table anywhere?  Other ideas on how to
> identify disk bloat short of forcing downtime?

You can calculate the number of bytes per row, multiply by the number of
live tuples (count(1) from table), and subtract that from the actual #
of bytes in the on-disk representation.  The difference is wasted space.

-jwb

Re: Identifying diskspace leakage

From

"Ed L."

Date:

14 May 2004, 19:54:37

On Friday May 14 2004 11:47, Jeffrey W. Baker wrote:
> > Is there a way to identify the numbers of unused tuples without
> > performing a VACUUM?  Is it stored in a system table anywhere?  Other
> > ideas on how to identify disk bloat short of forcing downtime?
>
> You can calculate the number of bytes per row, multiply by the number of
> live tuples (count(1) from table), and subtract that from the actual #
> of bytes in the on-disk representation.  The difference is wasted space.

That works, but with umpteen clusters to manage, I'm really hoping for a
SQL-based check so it can be done remotely and non-interactively.  Maybe it
is too much to keep track of, but it would be cool if VACUUM updated a
system table with the same info it spits out during verbose mode.  That
would be very helpful in auto-identifying leakage and also a recent case
where the cpu:real time ratio during vacuum went thru the roof due to I/O
overload from leakage.

Re: Identifying diskspace leakage

From

"Ed L."

Date:

16 May 2004, 14:42:09

Here's an attempt at a query to estimate diskspace leakage.  This
leakage might occur when max_fsm_pages and/or max_fsm_relations are
set too low.  Not sure which of the two approaches below (leak1 or
leak2) is more accurate?  Is there a better way via SQL?

The query uses the 'dbsize' project from contrib.  Dbsize has a
function called relation_size() which performs a 'stat' to get
actual disk usage for a database and/or table.  I use the column
pg_class.reltuples instead of actually counting rows because I
suspect that would essentially flush our OS cache of useful pages,
degrading performance.  This query assumes you're keeping stats
updated.

SELECT c.relname,
       SUM(s.avg_width) as width,
       CAST(c.reltuples as BIGINT) AS tuples,
       CAST(SUM(s.avg_width) * c.reltuples/1048576 AS INTEGER) AS tupdu,
       c.relpages AS pages,
       CAST(c.relpages * 8192 / 1048576 AS INTEGER) AS pgdu,
       relation_size(s.tablename)/1048576 AS reldu,
       CAST((relation_size(s.tablename)
           - SUM(s.avg_width) * c.reltuples)/1048576 AS INTEGER) AS leak1,
       CAST((relation_size(s.tablename)
           - c.relpages * 8192) / 1048576 AS INTEGER) AS leak2
FROM pg_stats s, pg_class c
WHERE c.relname NOT LIKE 'pg_%'
  AND c.relname = s.tablename
GROUP BY c.oid, s.tablename, c.reltuples, c.relpages, pgdu
ORDER BY tupdu;

    relname    | width | tuples  | tupdu | pages  | pgdu | reldu | leak1 | leak2
---------------+-------+---------+-------+--------+------+-------+-------+-------
 table_1766485 |    27 |     198 |     0 |     12 |    0 |     0 |     0 |     0
 table_1766443 |   186 |       0 |     0 |   9317 |   72 |    72 |    73 |     0
 table_1766439 |    83 |       0 |     0 |     10 |    0 |     0 |     0 |     0
 table_1766435 |    27 |       0 |     0 |      0 |    0 |     0 |     0 |     0
 table_1766437 |    30 |       0 |     0 |      0 |    0 |     0 |     0 |     0
 table_1766421 |    23 |       2 |     0 |      1 |    0 |     0 |     0 |     0
 table_1766451 |    30 |  189822 |     5 |   1754 |   13 |    13 |     8 |     0
 table_1766396 |    48 |  278781 |    13 |   3185 |   24 |    24 |    12 |     0
 table_1766391 |    74 |  200826 |    14 |   3271 |   25 |    25 |    11 |     0
 table_1766446 |    36 |  504594 |    17 |   4881 |   38 |    38 |    21 |     0
 table_1766426 |   149 | 2241719 |   319 |  55555 |  434 |   434 |   116 |     0
 table_1766456 |   888 |  390657 |   331 | 637949 |  887 |  4983 |  4653 |  4096
 table_1766399 |   596 |  732708 |   416 |  41876 |  327 |   327 |   -89 |     0
(13 rows)

The basic column definitions are:

    tupdu(MB) = avg_width * reltuples
    pgdu(MB) = relpages * 8K/page
    reldu(MB) = relation_size(tablename) (src/contrib/dbsize)
    leak1 = reldu - tupdu
    leak2 = reldu - pgdu

Not sure how we ended up with a couple of cases where the number of
mb on disk was less than the estimated size; maybe we had some
deletions after the last update of pg_stats?

Re: Identifying diskspace leakage

From

"Ed L."

Date:

17 May 2004, 15:17:32

[reposting...original seems to have been lost in ether...]

Here's an attempt at a query to estimate diskspace leakage.  This
leakage might occur when max_fsm_pages and/or max_fsm_relations are
set too low.  Not sure which of the two approaches below (leak1 or
leak2) is more accurate?  Is there a better way via SQL?

The query uses the 'dbsize' project from contrib.  Dbsize has a
function called relation_size() which performs a 'stat' to get
actual disk usage for a database and/or table.  I use the column
pg_class.reltuples instead of actually counting rows because I
suspect that would essentially flush our OS cache of useful pages,
degrading performance.  This query assumes you're keeping stats
updated.

SELECT c.relname,
       SUM(s.avg_width) as width,
       CAST(c.reltuples as BIGINT) AS tuples,
       CAST(SUM(s.avg_width) * c.reltuples/1048576 AS INTEGER) AS tupdu,
       c.relpages AS pages,
       CAST(c.relpages * 8192 / 1048576 AS INTEGER) AS pgdu,
       relation_size(s.tablename)/1048576 AS reldu,
       CAST((relation_size(s.tablename)
           - SUM(s.avg_width) * c.reltuples)/1048576 AS INTEGER) AS leak1,
       CAST((relation_size(s.tablename)
           - c.relpages * 8192) / 1048576 AS INTEGER) AS leak2
FROM pg_stats s, pg_class c
WHERE c.relname NOT LIKE 'pg_%'
  AND c.relname = s.tablename
GROUP BY c.oid, s.tablename, c.reltuples, c.relpages, pgdu
ORDER BY tupdu;

    relname    | width | tuples  | tupdu | pages  | pgdu | reldu | leak1 |
leak2
---------------+-------+---------+-------+--------+------+-------+-------+-------
 table_1766485 |    27 |     198 |     0 |     12 |    0 |     0 |     0 |
0
 table_1766443 |   186 |       0 |     0 |   9317 |   72 |    72 |    73 |
0
 table_1766439 |    83 |       0 |     0 |     10 |    0 |     0 |     0 |
0
 table_1766435 |    27 |       0 |     0 |      0 |    0 |     0 |     0 |
0
 table_1766437 |    30 |       0 |     0 |      0 |    0 |     0 |     0 |
0
 table_1766421 |    23 |       2 |     0 |      1 |    0 |     0 |     0 |
0
 table_1766451 |    30 |  189822 |     5 |   1754 |   13 |    13 |     8 |
0
 table_1766396 |    48 |  278781 |    13 |   3185 |   24 |    24 |    12 |
0
 table_1766391 |    74 |  200826 |    14 |   3271 |   25 |    25 |    11 |
0
 table_1766446 |    36 |  504594 |    17 |   4881 |   38 |    38 |    21 |
0
 table_1766426 |   149 | 2241719 |   319 |  55555 |  434 |   434 |   116 |
0
 table_1766456 |   888 |  390657 |   331 | 637949 |  887 |  4983 |  4653 |
4096
 table_1766399 |   596 |  732708 |   416 |  41876 |  327 |   327 |   -89 |
0
(13 rows)

The basic column definitions are:

    tupdu(MB) = avg_width * reltuples
    pgdu(MB) = relpages * 8K/page
    reldu(MB) = relation_size(tablename) (src/contrib/dbsize)
    leak1 = reldu - tupdu
    leak2 = reldu - pgdu

Not sure how we ended up with a couple of cases where the number of
mb on disk was less than the estimated size; maybe we had some
deletions after the last update of pg_stats?