Thread: Why is tuple_percent so low?

Why is tuple_percent so low?

From
Sam Saffron
Date:
I am trying to refactor a table on disk so it consumes less space:

Original is:

create table post_timings(
   topic_id int not null,
   post_number int not null,
   user_id int not null,
   msecs int not null
)


Target is:

create table post_timings(
  post_id int not null,
  user_id int not null,
  dsecs smallint not null
)


Before I have:

select * from pgstattuple('post_timings2');

 table_len  | tuple_count | tuple_len  | tuple_percent |
dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space |
free_percent

------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+--------------
 5146427392 |   116221695 | 4648867800 |         90.33 |
 0 |              0 |                  0 |   15082484 |         0.29


After I have:

 table_len  | tuple_count | tuple_len  | tuple_percent |
dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space |
free_percent

------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+--------------
 5142036480 |   116122544 | 3948166496 |         76.78 |
 0 |              0 |                  0 |   15069224 |         0.29


What I find striking is that the table size on disk remains almost
unchanged despite tuples taking 6 less bytes per tuple.

All the "missing space" is in overhead that is missing from
pgstattuple, in particular tuple percent moves from 90 to 76.7

I was wondering:

1. Where is all my missing space, is this in page alignment stuff and
per-page overhead?

2. Is there any other schemes I can look at for storing this data to
have a more efficient yet easily queryable / updateable table.

Keep in mind these tables get huge and in many of our cases will span
10-20GB just to store this information.

Sam


Re: Why is tuple_percent so low?

From
David Rowley
Date:
On 27 February 2018 at 18:03, Sam Saffron <sam.saffron@gmail.com> wrote:
> 1. Where is all my missing space, is this in page alignment stuff and
> per-page overhead?

Yes, tuples are MAXALIGNed when copied onto the page. That appears to
be 8 bytes on your system. You're just 2 bytes over the 8-byte
boundary. You'll notice the table becoming more compact if you somehow
could do away with the dsecs column.

> 2. Is there any other schemes I can look at for storing this data to
> have a more efficient yet easily queryable / updateable table.

Hard to say without learning about the use case.  Some sort of
de-normalisation may help to allow fewer tuples, e.g storing one
column in an array, but may be more painful from a design point of
view. It may also cause pain from a dead-tuple point of view if you
have to UPDATE large tuples frequently.

> Keep in mind these tables get huge and in many of our cases will span
> 10-20GB just to store this information.

I guess it's all relative, but that does not seem that large. Is this
causing some sort of performance problems you're trying to solve? or
is this completely down to disk space consumption?  If it's down to
performance then you might be able to solve that problem with an
index.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services