Thread: Table using more disk space than expected
I have a large table in Postgres 9.4.4 called 'observation', which is by far the largest item in the database. When I pg_dump this table the resulting file is about 9GB. However I've noticed that the pg data directory uses about 30GB (excluding pg_xlog).
Looking at the space usage:
-- Size of 'warehouse' database:
select pg_size_pretty(pg_database_size('warehouse'));
-- 29GB
-- Total space used by observation table including indexes:
select pg_size_pretty(pg_total_relation_size('observation'));
-- 29GB
-- Excluding indexes:
select pg_size_pretty(pg_relation_size('observation'));
-- 20GB
-- Percentage of dead tuples:
select pg_stat_get_dead_tuples('observation'::regclass)::numeric * 100 / pg_stat_get_live_tuples('observation'::regclass);
-- 13%
-- Average length of a row in bytes:
select avg(octet_length(t.*::text)) FROM observation t;
-- 287 bytes
Looking at the space usage:
-- Size of 'warehouse' database:
select pg_size_pretty(pg_database_size('warehouse'));
-- 29GB
-- Total space used by observation table including indexes:
select pg_size_pretty(pg_total_relation_size('observation'));
-- 29GB
-- Excluding indexes:
select pg_size_pretty(pg_relation_size('observation'));
-- 20GB
-- Percentage of dead tuples:
select pg_stat_get_dead_tuples('observation'::regclass)::numeric * 100 / pg_stat_get_live_tuples('observation'::regclass);
-- 13%
-- Average length of a row in bytes:
select avg(octet_length(t.*::text)) FROM observation t;
-- 287 bytes
-- Number of rows * average size of row:
select pg_size_pretty(count(obs_id) * 287) from observation;
-- 9.4 GB
select pg_size_pretty(count(obs_id) * 287) from observation;
-- 9.4 GB
If the live tuples take 9.4GB, then dead ones take 1.2GB = 10.6GB total.
What accounts for the remaining 9.4GB? (20GB - 10.6GB)
Steve Pritchard
British Trust for Ornithology
Steve Pritchard <steve.pritchard@bto.org> writes: > -- Average length of a row in bytes: > select avg(octet_length(t.*::text)) FROM observation t; > -- 287 bytes That measurement technique doesn't have a lot to do with reality, I'm afraid. The contrib/pgstattuple module could give you a more reliable idea of how much space is in use or not in the table. regards, tom lane
Could you please check this after running Vacuum Analyze. I know there may not be big difference.
As par my analysis this is free space available in table but not free in respect to server space.
Like table contains 3 type of space.
1 Live row space
2 Dead row space
3 Free space available for that table (before vacuum it is dead row.. After vacuum it is free to use for new insert or update)
Vacuum Full will give this space back to server.(Exclusively locking of table)
On 23 Sep 2015 22:07, "Steve Pritchard" <steve.pritchard@bto.org> wrote:
I have a large table in Postgres 9.4.4 called 'observation', which is by far the largest item in the database. When I pg_dump this table the resulting file is about 9GB. However I've noticed that the pg data directory uses about 30GB (excluding pg_xlog).
Looking at the space usage:
-- Size of 'warehouse' database:
select pg_size_pretty(pg_database_size('warehouse'));
-- 29GB
-- Total space used by observation table including indexes:
select pg_size_pretty(pg_total_relation_size('observation'));
-- 29GB
-- Excluding indexes:
select pg_size_pretty(pg_relation_size('observation'));
-- 20GB
-- Percentage of dead tuples:
select pg_stat_get_dead_tuples('observation'::regclass)::numeric * 100 / pg_stat_get_live_tuples('observation'::regclass);
-- 13%
-- Average length of a row in bytes:
select avg(octet_length(t.*::text)) FROM observation t;
-- 287 bytes-- Number of rows * average size of row:
select pg_size_pretty(count(obs_id) * 287) from observation;
-- 9.4 GBIf the live tuples take 9.4GB, then dead ones take 1.2GB = 10.6GB total.What accounts for the remaining 9.4GB? (20GB - 10.6GB)Steve PritchardBritish Trust for Ornithology
Tom,
Thanks for the tip about pgstattuple - I hadn't discovered that (and I hadn't realised that it's bundled in the 9.4.4 distribution).
This is what I get:
SELECT * FROM pgstattuple('observation');
table_len 21,954,740,224
tuple_count 34,373,274
tuple_len 9,307,650,026
tuple_percent 42
dead_tuple_count 198,281
dead_tuple_len 52,524,135
dead_tuple_percent 0.2
free_space 12,093,978,284
free_percent 55
So I think my estimate of the space used by live rows wasn't far out (albeit I accept the method I used may not be reliable).
I thought that autovacuum should recover the free space, however I see now from the documentation that it doesn't (and that this is deliberate):
"The standard form of VACUUM removes dead row versions in tables and indexes and marks the space available for future reuse. However, it will not return the space to the operating system, except .... In contrast, VACUUM FULL actively compacts tables by writing a complete new version of the table file with no dead space. This minimizes the size of the table, but can take a long time...
The usual goal of routine vacuuming is to do standard VACUUMs often enough to avoid needing VACUUM FULL. The autovacuum daemon attempts to work this way, and in fact will never issue VACUUM FULL."
I'll do a VACUUM FULL, which I expect to reduce table_len.
Steve
On 23 September 2015 at 17:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Steve Pritchard <steve.pritchard@bto.org> writes:
> -- Average length of a row in bytes:
> select avg(octet_length(t.*::text)) FROM observation t;
> -- 287 bytes
That measurement technique doesn't have a lot to do with reality,
I'm afraid.
The contrib/pgstattuple module could give you a more reliable idea of
how much space is in use or not in the table.
regards, tom lane
Hi Steve: On Wed, Sep 23, 2015 at 7:25 PM, Steve Pritchard <steve.pritchard@bto.org> wrote: > I thought that autovacuum should recover the free space, however I see now > from the documentation that it doesn't (and that this is deliberate): .... > I'll do a VACUUM FULL, which I expect to reduce table_len. Full is for when you've done a drastic reduction on a table. Some tables oscillate in size, grow and shrink and regrow.., those do not benefit of vacuum full on the long run, because if you have a table which oscilates between , let's say, 10G and 20G you need 20G of disk space, if you shrink and fill the rest with other uses server will crash on next growth ( some very special cases may be different, but in general if you have free space is because you create/delete, be it directly or via MVCC updates, so having it there for next usage is not so bad ). Francisco Olarte.