Home > mailing lists

Thread: [GENERAL] tuple data size and compression

[GENERAL] tuple data size and compression

From

Tom DalPozzo

Date:

15 December 2016, 18:17:51

Hi,

it's not clear to me when tuple data (max 1000 bytes total) get compressed on disk and when not.

I tried with pg_column_size to find the tuple's size but I get ambiguous results.

It seems to work but sometimes the tuples seem compressed and sometime not.

I tried both with constant data and random bytes.

Thanks

Pupillo

Re: [GENERAL] tuple data size and compression

From

Adrian Klaver

Date:

15 December 2016, 18:23:28

On 12/15/2016 07:17 AM, Tom DalPozzo wrote:
> Hi,
> it's not clear to me when tuple data (max 1000 bytes total) get
> compressed on disk and when not.
> I tried with pg_column_size to find the tuple's size but I get ambiguous
> results.

What is the table definition?

Maybe this will help?:

https://www.postgresql.org/docs/9.5/static/storage-toast.html

> It seems to work but sometimes the tuples seem compressed and sometime not.
> I tried both with constant data and random bytes.
>
> Thanks
> Pupillo
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: [GENERAL] tuple data size and compression

From

Tom DalPozzo

Date:

15 December 2016, 18:48:57

2016-12-15 16:23 GMT+01:00 Adrian Klaver <adrian.klaver@aklaver.com>:
On 12/15/2016 07:17 AM, Tom DalPozzo wrote:
Hi,
it's not clear to me when tuple data (max 1000 bytes total) get
compressed on disk and when not.
I tried with pg_column_size to find the tuple's size but I get ambiguous
results.

What is the table definition?

Maybe this will help?:

https://www.postgresql.org/docs/9.5/static/storage-toast.html

It seems to work but sometimes the tuples seem compressed and sometime not.
I tried both with constant data and random bytes.

Thanks
Pupillo

--
Adrian Klaver
adrian.klaver@aklaver.com

Hi,

definition: create table stato (id bigint, dati bytea);

then I populate the dati field with 100 bytes.

Pupillo

Re: [GENERAL] tuple data size and compression

From

Adrian Klaver

Date:

15 December 2016, 19:11:45

On 12/15/2016 07:48 AM, Tom DalPozzo wrote:
>     2016-12-15 16:23 GMT+01:00 Adrian Klaver <adrian.klaver@aklaver.com
>     <mailto:adrian.klaver@aklaver.com>>:
>
>         On 12/15/2016 07:17 AM, Tom DalPozzo wrote:
>
>             Hi,
>
>             it's not clear to me when tuple data (max 1000 bytes total) get
>
>             compressed on disk and when not.
>
>             I tried with pg_column_size to find the tuple's size but I
>             get ambiguous
>
>             results.
>
>
>         What is the table definition?
>
>
>         Maybe this will help?:
>
>
>         https://www.postgresql.org/docs/9.5/static/storage-toast.html
>         <https://www.postgresql.org/docs/9.5/static/storage-toast.html>
>
>
>             It seems to work but sometimes the tuples seem compressed
>             and sometime not.
>
>             I tried both with constant data and random bytes.
>
>
>             Thanks
>
>             Pupillo
>
>
>
>
>
>         --
>
>         Adrian Klaver
>
>         adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>
> Hi,
> definition:   create table stato (id bigint, dati bytea);
> then I populate the dati field with  100 bytes.

https://www.postgresql.org/docs/9.5/static/storage-toast.html

"The TOAST management code is triggered only when a row value to be
stored in a table is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2
kB). The TOAST code will compress and/or move field values out-of-line
until the row value is shorter than TOAST_TUPLE_TARGET bytes (also
normally 2 kB) or no more gains can be had. During an UPDATE operation,
values of unchanged fields are normally preserved as-is; so an UPDATE of
a row with out-of-line values incurs no TOAST costs if none of the
out-of-line values change."

> Pupillo


>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: [GENERAL] tuple data size and compression

From

Tom DalPozzo

Date:

15 December 2016, 19:26:45

https://www.postgresql.org/docs/9.5/static/storage-toast.html

"The TOAST management code is triggered only when a row value to be stored in a table is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST code will compress and/or move field values out-of-line until the row value is shorter than TOAST_TUPLE_TARGET bytes (also normally 2 kB) or no more gains can be had. During an UPDATE operation, values of unchanged fields are normally preserved as-is; so an UPDATE of a row with out-of-line values incurs no TOAST costs if none of the out-of-line values change."

Pupillo
--
Adrian Klaver
adrian.klaver@aklaver.com

I see. But in my case rows don't reach that thresold (I didn't check if 2K but I didn't change anything). So I'm wondering if there is any other chance except the TOAST to get the rows compressed or not.

I noticed that, when I use constant data, the total IO writes (by iostat) are more or less 1/2 of the the total IO writes when using random or other data hard to compress.

Re: [GENERAL] tuple data size and compression

From

Adrian Klaver

Date:

16 December 2016, 00:14:48

On 12/15/2016 08:26 AM, Tom DalPozzo wrote:
>         https://www.postgresql.org/docs/9.5/static/storage-toast.html
>         <https://www.postgresql.org/docs/9.5/static/storage-toast.html>
>
>
>     "The TOAST management code is triggered only when a row value to be
>     stored in a table is wider than TOAST_TUPLE_THRESHOLD bytes
>     (normally 2 kB). The TOAST code will compress and/or move field
>     values out-of-line until the row value is shorter than
>     TOAST_TUPLE_TARGET bytes (also normally 2 kB) or no more gains can
>     be had. During an UPDATE operation, values of unchanged fields are
>     normally preserved as-is; so an UPDATE of a row with out-of-line
>     values incurs no TOAST costs if none of the out-of-line values change."
>
>         Pupillo
>
>         --
>
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>
> I see. But in my case rows don't reach that thresold (I didn't check if
> 2K but I didn't change anything). So I'm wondering if there is any other
> chance except the TOAST to get the rows compressed or not.

Are you really sure you want that? For small files the overhead of
compression tends to out weigh the benefits. A contrived example biased
to making my point:

aklaver@killi:~> dd if=/dev/urandom of=file.txt bs=10 count=10
10+0 records in
10+0 records out
100 bytes (100 B) copied, 0.253617 s, 0.4 kB/s
aklaver@killi:~> l -h file.txt
-rw-r--r-- 1 aklaver users 100 Dec 15 13:07 file.txt
aklaver@killi:~> gzip file.txt
aklaver@killi:~> l -h file.txt.gz
-rw-r--r-- 1 aklaver users 132 Dec 15 13:07 file.txt.gz


> I noticed that, when I use constant data, the total IO writes (by
> iostat) are more or less 1/2 of the the total IO writes when using
> random or other data hard to compress.

Define constant data?

I thought the data you are inputting is below the compression threshold?

Is I/O causing a problem or to put it another way, what is the problem
you are trying to solve?


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: [GENERAL] tuple data size and compression

From

Tom DalPozzo

Date:

16 December 2016, 11:23:06

I see. But in my case rows don't reach that thresold (I didn't check if
2K but I didn't change anything). So I'm wondering if there is any other
chance except the TOAST to get the rows compressed or not.

Are you really sure you want that? For small files the overhead of compression tends to out weigh the benefits. A contrived example biased to making my point:

aklaver@killi:~> dd if=/dev/urandom of=file.txt bs=10 count=10
10+0 records in
10+0 records out
100 bytes (100 B) copied, 0.253617 s, 0.4 kB/s
aklaver@killi:~> l -h file.txt
-rw-r--r-- 1 aklaver users 100 Dec 15 13:07 file.txt
aklaver@killi:~> gzip file.txt
aklaver@killi:~> l -h file.txt.gz
-rw-r--r-- 1 aklaver users 132 Dec 15 13:07 file.txt.gz

I noticed that, when I use constant data, the total IO writes (by
iostat) are more or less 1/2 of the the total IO writes when using
random or other data hard to compress.

Define constant data?

I thought the data you are inputting is below the compression threshold?

Is I/O causing a problem or to put it another way, what is the problem you are trying to solve?

--
Adrian Klaver
adrian.klaver@aklaver.com

Hi Adrian,

I haven't got a problem. I'm just trying to estimate a worst case of total IO writes of my DB on the field, over the years.

If tuples get compressed, then all depends on the data compressibility.

So, either 1) I understand if/when they get compressed or 2) I must simulate with different kinds of data (compressible and not).

Furthermore, in general I like, if possible, to be aware of what is going behind the scenes.

As for 2) I made 3 tests, each inserting 1million rows of 100 bytea on my 'dati' field. First I tried 100 bytes=constant data=\\x7b7b7b7b.... then I tried 100 bytes=random and finally inserting 100 bytes taken from random offsets of a compressed file.

It seems that IO writes strongly depend on the data, even if dati field length is just 100.

wal_compression=off.

Regards

Pupillo

Re: [GENERAL] tuple data size and compression

From

Adrian Klaver

Date:

16 December 2016, 18:10:24

On 12/16/2016 12:23 AM, Tom DalPozzo wrote:
>         I see. But in my case rows don't reach that thresold (I didn't
>         check if
>         2K but I didn't change anything). So I'm wondering if there is
>         any other
>         chance except the TOAST to get the rows compressed or not.
>
>
>     Are you really sure you want that? For small files the overhead of
>     compression tends to out weigh the benefits. A contrived example
>     biased to making my point:
>
>     aklaver@killi:~> dd if=/dev/urandom of=file.txt bs=10 count=10
>     10+0 records in
>     10+0 records out
>     100 bytes (100 B) copied, 0.253617 s, 0.4 kB/s
>     aklaver@killi:~> l -h file.txt
>     -rw-r--r-- 1 aklaver users 100 Dec 15 13:07 file.txt
>     aklaver@killi:~> gzip file.txt
>     aklaver@killi:~> l -h file.txt.gz
>     -rw-r--r-- 1 aklaver users 132 Dec 15 13:07 file.txt.gz
>
>
>         I noticed that, when I use constant data, the total IO writes (by
>         iostat) are more or less 1/2 of the the total IO writes when using
>         random or other data hard to compress.
>
>
>     Define constant data?
>
>     I thought the data you are inputting is below the compression threshold?
>
>     Is I/O causing a problem or to put it another way, what is the
>     problem you are trying to solve?
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
> Hi Adrian,
> I haven't got a problem. I'm just trying to estimate a worst case of
> total IO writes of my DB on the field, over the years.

Well worst case is some program goes haywire and saturates you I0 with
writes. I would think a more reasonable estimate would be the average
write load with a +/- standard deviation. Though that would entail
guessing more about the future then I would be willing to commit to.

> If tuples get compressed, then all depends on the data compressibility.
> So, either 1) I understand if/when they get compressed or 2) I must
> simulate with different kinds of data (compressible and not).
> Furthermore, in general I like, if possible, to be aware of what is
> going behind the scenes.
> As for 2) I made 3 tests, each inserting 1million rows of 100 bytea on
> my 'dati' field. First I tried  100 bytes=constant data=\\x7b7b7b7b....
>  then I tried 100 bytes=random and finally inserting 100 bytes taken
> from random offsets of a compressed file.
> It seems that IO writes strongly depend on the data, even if dati field
> length is just 100.

That goes without saying.

Two questions:

1) Do you know what your data in the future is going to be?

2) Is a 100 byte bytea a realistic approximation of that data?

> wal_compression=off.
> Regards
> Pupillo
>
>
>
>
>
>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: [GENERAL] tuple data size and compression

From

Tom DalPozzo

Date:

16 December 2016, 19:45:45

Two questions:

1) Do you know what your data in the future is going to be?

2) Is a 100 byte bytea a realistic approximation of that data?

wal_compression=off.

--
Adrian Klaver
adrian.klaver@aklaver.com

Hi,

1) not precisely, but 10/100 million insertions per day. I can't know, it depends on the success of all the system, whose the DB is a part only.

2) It will depend on my implementation which, in turn, depends on these tests... Could be between 100 and 1000 bytes.

If 2) is 1000 random bytes, I saw that, with continuous archiveing (I'm still evaluating if keeping it or not), I get 4KB/row IO write that,

in one year could become 70TB...

Anyway, I will perform my tests with realistic patterns

Regards

Pupillo