Re: [GENERAL] tuple data size and compression - Mailing list pgsql-general

From Adrian Klaver
Subject Re: [GENERAL] tuple data size and compression
Date
Msg-id 38569cb0-7de0-4015-feac-755950955ec6@aklaver.com
Whole thread Raw
In response to Re: [GENERAL] tuple data size and compression  (Tom DalPozzo <t.dalpozzo@gmail.com>)
Responses Re: [GENERAL] tuple data size and compression
List pgsql-general
On 12/16/2016 12:23 AM, Tom DalPozzo wrote:
>         I see. But in my case rows don't reach that thresold (I didn't
>         check if
>         2K but I didn't change anything). So I'm wondering if there is
>         any other
>         chance except the TOAST to get the rows compressed or not.
>
>
>     Are you really sure you want that? For small files the overhead of
>     compression tends to out weigh the benefits. A contrived example
>     biased to making my point:
>
>     aklaver@killi:~> dd if=/dev/urandom of=file.txt bs=10 count=10
>     10+0 records in
>     10+0 records out
>     100 bytes (100 B) copied, 0.253617 s, 0.4 kB/s
>     aklaver@killi:~> l -h file.txt
>     -rw-r--r-- 1 aklaver users 100 Dec 15 13:07 file.txt
>     aklaver@killi:~> gzip file.txt
>     aklaver@killi:~> l -h file.txt.gz
>     -rw-r--r-- 1 aklaver users 132 Dec 15 13:07 file.txt.gz
>
>
>         I noticed that, when I use constant data, the total IO writes (by
>         iostat) are more or less 1/2 of the the total IO writes when using
>         random or other data hard to compress.
>
>
>     Define constant data?
>
>     I thought the data you are inputting is below the compression threshold?
>
>     Is I/O causing a problem or to put it another way, what is the
>     problem you are trying to solve?
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
> Hi Adrian,
> I haven't got a problem. I'm just trying to estimate a worst case of
> total IO writes of my DB on the field, over the years.

Well worst case is some program goes haywire and saturates you I0 with
writes. I would think a more reasonable estimate would be the average
write load with a +/- standard deviation. Though that would entail
guessing more about the future then I would be willing to commit to.

> If tuples get compressed, then all depends on the data compressibility.
> So, either 1) I understand if/when they get compressed or 2) I must
> simulate with different kinds of data (compressible and not).
> Furthermore, in general I like, if possible, to be aware of what is
> going behind the scenes.
> As for 2) I made 3 tests, each inserting 1million rows of 100 bytea on
> my 'dati' field. First I tried  100 bytes=constant data=\\x7b7b7b7b....
>  then I tried 100 bytes=random and finally inserting 100 bytes taken
> from random offsets of a compressed file.
> It seems that IO writes strongly depend on the data, even if dati field
> length is just 100.

That goes without saying.

Two questions:

1) Do you know what your data in the future is going to be?

2) Is a 100 byte bytea a realistic approximation of that data?

> wal_compression=off.
> Regards
> Pupillo
>
>
>
>
>
>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Thomas.Deboben.ext@rohde-schwarz.com
Date:
Subject: Re: [GENERAL] Windows installation - could not connect to server:Connection refused (0x0000274D/10061)
Next
From: Vladimir Rusinov
Date:
Subject: Re: [GENERAL] pgbench initialize