Thread: Table using more disk space than expected

Table using more disk space than expected

From
Steve Pritchard
Date:
I have a large table in Postgres 9.4.4 called 'observation', which is by far the largest item in the database. When I pg_dump this table the resulting file is about 9GB. However I've noticed that the pg data directory uses about 30GB (excluding pg_xlog).

Looking at the space usage:

-- Size of 'warehouse' database:
select pg_size_pretty(pg_database_size('warehouse'));
-- 29GB

-- Total space used by observation table including indexes:
select pg_size_pretty(pg_total_relation_size('observation'));
-- 29GB

-- Excluding indexes:
select pg_size_pretty(pg_relation_size('observation'));
-- 20GB

-- Percentage of dead tuples:
select pg_stat_get_dead_tuples('observation'::regclass)::numeric * 100 / pg_stat_get_live_tuples('observation'::regclass);
-- 13%

-- Average length of a row in bytes:
select avg(octet_length(t.*::text)) FROM observation t;
-- 287 bytes

-- Number of rows * average size of row:
select pg_size_pretty(count(obs_id) * 287) from observation;
-- 9.4 GB

If the live tuples take 9.4GB, then dead ones take 1.2GB = 10.6GB total.

What accounts for the remaining 9.4GB? (20GB - 10.6GB)

Steve Pritchard
British Trust for Ornithology

Re: Table using more disk space than expected

From
Tom Lane
Date:
Steve Pritchard <steve.pritchard@bto.org> writes:
> -- Average length of a row in bytes:
> select avg(octet_length(t.*::text)) FROM observation t;
> -- 287 bytes

That measurement technique doesn't have a lot to do with reality,
I'm afraid.

The contrib/pgstattuple module could give you a more reliable idea of
how much space is in use or not in the table.

            regards, tom lane


Re: Table using more disk space than expected

From
Jimit Amin
Date:

Could you please check this after running Vacuum Analyze. I know there may not be big difference.

As par my analysis this is free space available in table but not free in respect to server space.
Like table contains 3 type of space.

1 Live row space
2 Dead row space
3 Free space available for that table (before vacuum it is dead row.. After vacuum it is free to use for new insert or update)

Vacuum Full will give this space back to server.(Exclusively locking of table)

On 23 Sep 2015 22:07, "Steve Pritchard" <steve.pritchard@bto.org> wrote:
I have a large table in Postgres 9.4.4 called 'observation', which is by far the largest item in the database. When I pg_dump this table the resulting file is about 9GB. However I've noticed that the pg data directory uses about 30GB (excluding pg_xlog).

Looking at the space usage:

-- Size of 'warehouse' database:
select pg_size_pretty(pg_database_size('warehouse'));
-- 29GB

-- Total space used by observation table including indexes:
select pg_size_pretty(pg_total_relation_size('observation'));
-- 29GB

-- Excluding indexes:
select pg_size_pretty(pg_relation_size('observation'));
-- 20GB

-- Percentage of dead tuples:
select pg_stat_get_dead_tuples('observation'::regclass)::numeric * 100 / pg_stat_get_live_tuples('observation'::regclass);
-- 13%

-- Average length of a row in bytes:
select avg(octet_length(t.*::text)) FROM observation t;
-- 287 bytes

-- Number of rows * average size of row:
select pg_size_pretty(count(obs_id) * 287) from observation;
-- 9.4 GB

If the live tuples take 9.4GB, then dead ones take 1.2GB = 10.6GB total.

What accounts for the remaining 9.4GB? (20GB - 10.6GB)

Steve Pritchard
British Trust for Ornithology

Re: Table using more disk space than expected

From
Steve Pritchard
Date:
Tom,

Thanks for the tip about pgstattuple - I hadn't discovered that (and I hadn't realised that it's bundled in the 9.4.4 distribution).

This is what I get:

SELECT * FROM pgstattuple('observation');

table_len           21,954,740,224 
tuple_count         34,373,274 
tuple_len      9,307,650,026 
tuple_percent                 42 
dead_tuple_count    198,281 
dead_tuple_len         52,524,135 
dead_tuple_percent  0.2 
free_space     12,093,978,284 
free_percent                 55 

So I think my estimate of the space used by live rows wasn't far out (albeit I accept the method I used may not be reliable).

I thought that autovacuum should recover the free space, however I see now from the documentation that it doesn't (and that this is deliberate):

"The standard form of VACUUM removes dead row versions in tables and indexes and marks the space available for future reuse. However, it will not return the space to the operating system, except .... In contrast, VACUUM FULL actively compacts tables by writing a complete new version of the table file with no dead space. This minimizes the size of the table, but can take a long time...
The usual goal of routine vacuuming is to do standard VACUUMs often enough to avoid needing VACUUM FULL. The autovacuum daemon attempts to work this way, and in fact will never issue VACUUM FULL."

I'll do a VACUUM FULL, which I expect to reduce table_len.

Steve

On 23 September 2015 at 17:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Steve Pritchard <steve.pritchard@bto.org> writes:
> -- Average length of a row in bytes:
> select avg(octet_length(t.*::text)) FROM observation t;
> -- 287 bytes

That measurement technique doesn't have a lot to do with reality,
I'm afraid.

The contrib/pgstattuple module could give you a more reliable idea of
how much space is in use or not in the table.

                        regards, tom lane

Re: Table using more disk space than expected

From
Francisco Olarte
Date:
Hi Steve:

On Wed, Sep 23, 2015 at 7:25 PM, Steve Pritchard
<steve.pritchard@bto.org> wrote:
> I thought that autovacuum should recover the free space, however I see now
> from the documentation that it doesn't (and that this is deliberate):
....
> I'll do a VACUUM FULL, which I expect to reduce table_len.

Full is for when you've done a drastic reduction on a table. Some
tables oscillate in size, grow and shrink and regrow.., those do not
benefit of vacuum full on the long run, because if you have a table
which oscilates between , let's say, 10G and 20G you need 20G of disk
space, if you shrink and fill the rest with other uses server will
crash on next growth ( some very special cases may be different, but
in general if you have free space is because you create/delete, be it
directly or via MVCC updates, so having it there for next usage is not
so bad ).

Francisco Olarte.