Thread: Why is tuple_percent so low?
I am trying to refactor a table on disk so it consumes less space: Original is: create table post_timings( topic_id int not null, post_number int not null, user_id int not null, msecs int not null ) Target is: create table post_timings( post_id int not null, user_id int not null, dsecs smallint not null ) Before I have: select * from pgstattuple('post_timings2'); table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent ------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+-------------- 5146427392 | 116221695 | 4648867800 | 90.33 | 0 | 0 | 0 | 15082484 | 0.29 After I have: table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent ------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+-------------- 5142036480 | 116122544 | 3948166496 | 76.78 | 0 | 0 | 0 | 15069224 | 0.29 What I find striking is that the table size on disk remains almost unchanged despite tuples taking 6 less bytes per tuple. All the "missing space" is in overhead that is missing from pgstattuple, in particular tuple percent moves from 90 to 76.7 I was wondering: 1. Where is all my missing space, is this in page alignment stuff and per-page overhead? 2. Is there any other schemes I can look at for storing this data to have a more efficient yet easily queryable / updateable table. Keep in mind these tables get huge and in many of our cases will span 10-20GB just to store this information. Sam
On 27 February 2018 at 18:03, Sam Saffron <sam.saffron@gmail.com> wrote: > 1. Where is all my missing space, is this in page alignment stuff and > per-page overhead? Yes, tuples are MAXALIGNed when copied onto the page. That appears to be 8 bytes on your system. You're just 2 bytes over the 8-byte boundary. You'll notice the table becoming more compact if you somehow could do away with the dsecs column. > 2. Is there any other schemes I can look at for storing this data to > have a more efficient yet easily queryable / updateable table. Hard to say without learning about the use case. Some sort of de-normalisation may help to allow fewer tuples, e.g storing one column in an array, but may be more painful from a design point of view. It may also cause pain from a dead-tuple point of view if you have to UPDATE large tuples frequently. > Keep in mind these tables get huge and in many of our cases will span > 10-20GB just to store this information. I guess it's all relative, but that does not seem that large. Is this causing some sort of performance problems you're trying to solve? or is this completely down to disk space consumption? If it's down to performance then you might be able to solve that problem with an index. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services