Thread: Compressed binary field

Compressed binary field

From

Edson Richter

Date:

10 September 2012, 18:13:27

Hi,

My application have few binary fields that accept files. Most of them
are XML files archived for reference only.
I would like to know if there is any plan to implement compressed fields
(just a "flag" in the field definition, like "not null") at database
side (these fields are and will never be indexed neither used for search).
Besides I know there are other options for file storage (like
filesystem, etc), but database is so convenient and I really don't want
to push application architecture now (it will replicate automatically,
is available at an device I wish without sharing folders, etc).

Thanks,

Edson.

Re: Compressed binary field

From

Tom Lane

Date:

10 September 2012, 18:35:42

Edson Richter <edsonrichter@hotmail.com> writes:
> I would like to know if there is any plan to implement compressed fields
> (just a "flag" in the field definition, like "not null") at database
> side (these fields are and will never be indexed neither used for search).

Any field value over a couple kilobytes is compressed automatically;
this has been true for more than a dozen years.  You can turn that
off if you want, but it's done by default.

http://www.postgresql.org/docs/9.1/static/storage-toast.html

            regards, tom lane

Re: Compressed binary field

From

Edson Richter

Date:

10 September 2012, 19:09:38

Em 10/09/2012 15:35, Tom Lane escreveu:
> Edson Richter <edsonrichter@hotmail.com> writes:
>> I would like to know if there is any plan to implement compressed fields
>> (just a "flag" in the field definition, like "not null") at database
>> side (these fields are and will never be indexed neither used for search).
> Any field value over a couple kilobytes is compressed automatically;
> this has been true for more than a dozen years.  You can turn that
> off if you want, but it's done by default.
>
> http://www.postgresql.org/docs/9.1/static/storage-toast.html
>
>             regards, tom lane
>
Hi, Tom!

I've read this section of manual, but I was wondering (problably I did
not fully understand the manual, maybe too technical or just my poor
english suffering here) if there is a way to force all these filed to be
compressed no matter size. I actually have more than 250,000 files in
database with 7Gb on size (about this amount every 6 months).

Thanks,

Edson

Re: Compressed binary field

From

Edson Richter

Date:

10 September 2012, 20:48:36

Em 10/09/2012 16:09, Edson Richter escreveu:
> Em 10/09/2012 15:35, Tom Lane escreveu:
>> Edson Richter <edsonrichter@hotmail.com> writes:
>>> I would like to know if there is any plan to implement compressed
>>> fields
>>> (just a "flag" in the field definition, like "not null") at database
>>> side (these fields are and will never be indexed neither used for
>>> search).
>> Any field value over a couple kilobytes is compressed automatically;
>> this has been true for more than a dozen years.  You can turn that
>> off if you want, but it's done by default.
>>
>> http://www.postgresql.org/docs/9.1/static/storage-toast.html
>>
>>             regards, tom lane
>>
> Hi, Tom!
>
> I've read this section of manual, but I was wondering (problably I did
> not fully understand the manual, maybe too technical or just my poor
> english suffering here) if there is a way to force all these filed to
> be compressed no matter size. I actually have more than 250,000 files
> in database with 7Gb on size (about this amount every 6 months).
>
> Thanks,
>
> Edson
>
>
Also, this automatic compression applies to bytea fields?

Edson

Re: Compressed binary field

From

"Kevin Grittner"

Date:

10 September 2012, 22:06:20

Edson Richter <edsonrichter@hotmail.com> wrote:

> this automatic compression applies to bytea fields?

Yes, but keep in mind that anything which is already compressed or
encrypted will probably not compress much if at all.  Many of the
binary objects you might want to store in the database probably
already use compression internally.

-Kevin

Re: Compressed binary field

From

Edson Richter

Date:

10 September 2012, 23:04:45

Em 10/09/2012 19:06, Kevin Grittner escreveu:
> Edson Richter <edsonrichter@hotmail.com> wrote:
>
>> this automatic compression applies to bytea fields?
>
> Yes, but keep in mind that anything which is already compressed or
> encrypted will probably not compress much if at all.  Many of the
> binary objects you might want to store in the database probably
> already use compression internally.
>
> -Kevin
My files are mostly XML, TXT and PDF files only.
I already have separate tables for storing files, and avoid to query
them together with other stuff.
So, should I use

alter table MYTABLE set storage EXTENDED

?

Thanks,

Edson

Re: Compressed binary field

From

"Kevin Grittner"

Date:

11 September 2012, 12:41:15

Edson Richter  wrote:

> So, should I use
>
> alter table MYTABLE set storage EXTENDED

Don't bother; that is the default.

This should already be happening automatically.  Is there some
problem you're seeing that you want to fix?  If so, you should
probably describe that.

-Kevin

Re: Compressed binary field

From

Edson Richter

Date:

11 September 2012, 16:34:34

Em 11/09/2012 09:40, Kevin Grittner escreveu:
> Edson Richter  wrote:
>
>> So, should I use
>>
>> alter table MYTABLE set storage EXTENDED
>
> Don't bother; that is the default.
>
> This should already be happening automatically.  Is there some
> problem you're seeing that you want to fix?  If so, you should
> probably describe that.
>
> -Kevin
No, there is no problem. Just trying to reduce database size forcing
these fields to compress.
Actual database size = 8Gb
Backup size = 1.6Gb (5x smaller)

Seems to me (IMHO) that there is room for improvement in database
storage (we don't have many indexes, and biggest tables are just the
ones with bytea fields). That's why I've asked for experts counseling.

Regards,

Edson.

Re: Compressed binary field

From

"Kevin Grittner"

Date:

11 September 2012, 17:00:21

Edson Richter <edsonrichter@hotmail.com> wrote:

> there is no problem. Just trying to reduce database size

> Actual database size = 8Gb
> Backup size = 1.6Gb (5x smaller)
>
> Seems to me (IMHO) that there is room for improvement in database
> storage (we don't have many indexes, and biggest tables are just
> the ones with bytea fields). That's why I've asked for experts
> counseling.

What version of PostgreSQL is this?

How are you measuring the size?

Where is the space going?  (Heap files?  TOAST files?  Index files?
WAL files?  Free space maps?  Visibility maps?  Server logs?
Temporary files?)

You aren't creating a separate table with one row for each binary
object, are you?  I only ask this because in an earlier post you
mentioned having a quarter million files in the database, and in a
production database which has been running for years with over 400
user tables and lots of indexes I only have about 4000 files in the
whole database cluster.  A separate table for each object would be
disastrous for both performance and space usage.

-Kevin

Re: Compressed binary field

From

Edson Richter

Date:

11 September 2012, 17:06:03

Em 11/09/2012 14:00, Kevin Grittner escreveu:
> Edson Richter <edsonrichter@hotmail.com> wrote:
>
>> there is no problem. Just trying to reduce database size
>
>> Actual database size = 8Gb
>> Backup size = 1.6Gb (5x smaller)
>>
>> Seems to me (IMHO) that there is room for improvement in database
>> storage (we don't have many indexes, and biggest tables are just
>> the ones with bytea fields). That's why I've asked for experts
>> counseling.
>
> What version of PostgreSQL is this?
9.1.5 on Linux x64 (CentOS 5)
>
> How are you measuring the size?
For storage, du -h --max-depth 1 on data directory gives me the amount
of data.
> Where is the space going?  (Heap files?  TOAST files?  Index files?
> WAL files?  Free space maps?  Visibility maps?  Server logs?
> Temporary files?)
Biggest objects are just the tables with files.
> You aren't creating a separate table with one row for each binary
> object, are you?  I only ask this because in an earlier post you
> mentioned having a quarter million files in the database, and in a
> production database which has been running for years with over 400
> user tables and lots of indexes I only have about 4000 files in the
> whole database cluster.  A separate table for each object would be
> disastrous for both performance and space usage.
I've 2 tables that held all these objects. Structure is

create table MYTABLE (id bigint not null primary key, mimetype
varchar(100) null, bytea datafile null)


Regards,

Edson.

>
> -Kevin
>
>

Re: Compressed binary field

From

"Kevin Grittner"

Date:

11 September 2012, 17:35:03

Edson Richter <edsonrichter@hotmail.com> wrote:

> For storage, du -h --max-depth 1 on data directory gives me the
> amount of data.

> Biggest objects are just the tables with files.

> I've 2 tables that held all these objects. Structure is
>
> create table MYTABLE (id bigint not null primary key, mimetype
> varchar(100) null, bytea datafile null)

Could you show the results of this query?:

SELECT relkind, oid, relfilenode, reltoastrelid,
       relpages, reltuples
  FROM pg_class
  ORDER BY relpages DESC
  LIMIT 10;

Also, just to be sure that all calculations are based on your actual
build, can you show the results of?:

SHOW block_size;

Have you checked the level of bloat yet?  (Perhaps autovacuum needs
to be made more aggressive.)

-Kevin

Re: Compressed binary field

From

Edson Richter

Date:

11 September 2012, 17:41:19

Em 11/09/2012 14:34, Kevin Grittner escreveu:

Edson Richter <edsonrichter@hotmail.com> wrote:

For storage, du -h --max-depth 1 on data directory gives me the
amount of data.

Biggest objects are just the tables with files.

I've 2 tables that held all these objects. Structure is

create table MYTABLE (id bigint not null primary key, mimetype 
varchar(100) null, bytea datafile null)

 
Could you show the results of this query?:
SELECT relkind, oid, relfilenode, reltoastrelid,      relpages, reltuples FROM pg_class ORDER BY relpages DESC LIMIT 10;

relkind	oid	relfilenode	reltoastrelid	relpages	reltuples
r	312470	1043546	312492	29321	639571
r	312585	1043643	0	22732	1.80617e+06
r	312522	1043578	312527	19769	724210
r	312749	1043773	312753	14307	928538
r	312758	1043763	0	10488	917134
r	312498	1043525	0	7689	640572
r	312802	1043804	312810	7670	172789
r	312964	1044076	0	7586	385833
i	1041923	1043648	0	6958	1.80617e+06
r	312815	1043825	312819	6684	715081

Also, just to be sure that all calculations are based on your actual
build, can you show the results of?:
SHOW block_size;

8192

Have you checked the level of bloat yet?  (Perhaps autovacuum needs
to be made more aggressive.)

Besides autocacuum, I usually run Vacuum Full on weekly basis. My calculations came after Vacuum Full.

Edson.

-Kevin

Re: Compressed binary field

From

"Kevin Grittner"

Date:

11 September 2012, 17:59:27

Edson Richter <edsonrichter@hotmail.com> wrote:
> Em 11/09/2012 14:34, Kevin Grittner escreveu:
>> Edson Richter <edsonrichter@hotmail.com> wrote:
>>
>>> For storage, du -h --max-depth 1 on data directory gives me the
>>> amount of data.
>>
>>> Biggest objects are just the tables with files.
>>
>>> I've 2 tables that held all these objects. Structure is
>>>
>>> create table MYTABLE (id bigint not null primary key, mimetype
>>> varchar(100) null, bytea datafile null)
>>
>> Could you show the results of this query?:
>>
>> SELECT relkind, oid, relfilenode, reltoastrelid,
>>         relpages, reltuples
>>    FROM pg_class
>>    ORDER BY relpages DESC
>>    LIMIT 10;

> [biggest relation was a table heap with 29321 pages]

>> Also, just to be sure that all calculations are based on your
>> actual build, can you show the results of?:
>>
>> SHOW block_size;
> 8192

So your biggest table is actually 229 MB.  Something is not adding
up.  I can't see any way to reconcile your previous statements with
this number.  There also hasn't been any real explanation for the
statement that you have 250000 files.  There must be something which
matters here which hasn't yet been mentioned.  Any ideas?

-Kevin

Re: Compressed binary field

From

Edson Richter

Date:

12 September 2012, 03:37:32

Em 11/09/2012 14:59, Kevin Grittner escreveu:
> Edson Richter <edsonrichter@hotmail.com> wrote:
>> Em 11/09/2012 14:34, Kevin Grittner escreveu:
>>> Edson Richter <edsonrichter@hotmail.com> wrote:
>>>
>>>> For storage, du -h --max-depth 1 on data directory gives me the
>>>> amount of data.
>>>
>>>> Biggest objects are just the tables with files.
>>>
>>>> I've 2 tables that held all these objects. Structure is
>>>>
>>>> create table MYTABLE (id bigint not null primary key, mimetype
>>>> varchar(100) null, bytea datafile null)
>>>
>>> Could you show the results of this query?:
>>>
>>> SELECT relkind, oid, relfilenode, reltoastrelid,
>>>          relpages, reltuples
>>>     FROM pg_class
>>>     ORDER BY relpages DESC
>>>     LIMIT 10;
>
>> [biggest relation was a table heap with 29321 pages]
>
>>> Also, just to be sure that all calculations are based on your
>>> actual build, can you show the results of?:
>>>
>>> SHOW block_size;
>> 8192
>
> So your biggest table is actually 229 MB.  Something is not adding
> up.  I can't see any way to reconcile your previous statements with
> this number.  There also hasn't been any real explanation for the
> statement that you have 250000 files.  There must be something which
> matters here which hasn't yet been mentioned.  Any ideas?
>
> -Kevin
I don't know why, look result of the following query (arquivo is the
bytea field):

select count(*) from notafiscalarq where arquivo is not null;
  count
--------
  715084


Strange, huh?

Edson.
>
>

Re: Compressed binary field

From

Edson Richter

Date:

12 September 2012, 11:06:17

Em 12/09/2012 00:37, Edson Richter escreveu:

Em 11/09/2012 14:59, Kevin Grittner escreveu:
Edson Richter <edsonrichter@hotmail.com> wrote:
Em 11/09/2012 14:34, Kevin Grittner escreveu:
Edson Richter <edsonrichter@hotmail.com> wrote:

For storage, du -h --max-depth 1 on data directory gives me the
amount of data.

Biggest objects are just the tables with files.

I've 2 tables that held all these objects. Structure is

create table MYTABLE (id bigint not null primary key, mimetype
varchar(100) null, bytea datafile null)
   Could you show the results of this query?:
   SELECT relkind, oid, relfilenode, reltoastrelid,
         relpages, reltuples
    FROM pg_class
    ORDER BY relpages DESC
    LIMIT 10;

[biggest relation was a table heap with 29321 pages]

Also, just to be sure that all calculations are based on your
actual build, can you show the results of?:
   SHOW block_size;
8192
So your biggest table is actually 229 MB. Something is not adding
up. I can't see any way to reconcile your previous statements with
this number. There also hasn't been any real explanation for the
statement that you have 250000 files. There must be something which
matters here which hasn't yet been mentioned. Any ideas?
-Kevin
I don't know why, look result of the following query (arquivo is the bytea field):

select count(*) from notafiscalarq where arquivo is not null;
count
--------
715084

Strange, huh?

Edson.

Look at the size (5100MB) of this table alone (got after Vacuum with PgAdmin 14):

This table has no deletes or updates, only inserts. It relates to the one above (notafiscal) in a 1:1 relationship.

Regards,

Edson.

Attachment

bbhdhebj.png

Re: Compressed binary field

From

"Kevin Grittner"

Date:

12 September 2012, 12:16:48

Edson Richter  wrote:
> Em 12/09/2012 00:37, Edson Richter escreveu:
>> Em 11/09/2012 14:59, Kevin Grittner escreveu:
>>> Edson Richter  wrote:

>>>> [biggest relation was a table heap with 29321 pages]

>>>> [block size is 8 KB]

>>> So your biggest table is actually 229 MB. Something is not adding
>>> up. I can't see any way to reconcile your previous statements
>>> with this number. There also hasn't been any real explanation for
>>> the statement that you have 250000 files. There must be something
>>> which matters here which hasn't yet been mentioned. Any ideas?

>> I don't know why, look result of the following query (arquivo is
>> the bytea field):
>>
>> select count(*) from notafiscalarq where arquivo is not null;
>> count
>> --------
>> 715084

What is a count of active rows in that table supposed to show me?

> Look at the size (5100MB) of this table alone (got after Vacuum
> with PgAdmin 14):

Please show (copy/paste if possible) *exactly* how you arrived at
that number and *exactly* how you determined that this number
represented the size of a table and how you determined which table.
If the results you previously posted are from the same database, you
simply don't have a table heap that large.

-Kevin

Re: Compressed binary field

From

Edson Richter

Date:

12 September 2012, 17:22:06

Em 12/09/2012 09:16, Kevin Grittner escreveu:
> Edson Richter  wrote:
>> Em 12/09/2012 00:37, Edson Richter escreveu:
>>> Em 11/09/2012 14:59, Kevin Grittner escreveu:
>>>> Edson Richter  wrote:
>
>>>>> [biggest relation was a table heap with 29321 pages]
>>>>> [block size is 8 KB]
>
>>>> So your biggest table is actually 229 MB. Something is not adding
>>>> up. I can't see any way to reconcile your previous statements
>>>> with this number. There also hasn't been any real explanation for
>>>> the statement that you have 250000 files. There must be something
>>>> which matters here which hasn't yet been mentioned. Any ideas?
>
>>> I don't know why, look result of the following query (arquivo is
>>> the bytea field):
>>>
>>> select count(*) from notafiscalarq where arquivo is not null;
>>> count
>>> --------
>>> 715084
>
> What is a count of active rows in that table supposed to show me?
>
>> Look at the size (5100MB) of this table alone (got after Vacuum
>> with PgAdmin 14):
>
> Please show (copy/paste if possible) *exactly* how you arrived at
> that number and *exactly* how you determined that this number
> represented the size of a table and how you determined which table.
> If the results you previously posted are from the same database, you
> simply don't have a table heap that large.
>
> -Kevin
Ok, maybe I've used wrong database by mistake (I have dozens databases
here, so it's easy to do so in such different timeframes), let's repeat
all operations in one session:

in pgsql dir, executing "du -h --max-depth 1" results:

8,6G    ./data
0       ./backups
8,6G    .

Executing query
SELECT pg_size_pretty(pg_database_size('mydatabase'));
  pg_size_pretty
----------------
  7234 MB

Executing query
SELECT pg_size_pretty(pg_relation_size('notafiscalarq'));
  pg_size_pretty
----------------
  52 MB

Executing query
SELECT nspname || '.' || relname AS "relation",
     pg_size_pretty(pg_relation_size(C.oid)) AS "size"
   FROM pg_class C
   LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
   WHERE nspname NOT IN ('pg_catalog', 'information_schema')
   ORDER BY pg_relation_size(C.oid) DESC
   LIMIT 10;
             relation             |  size
---------------------------------+---------
  pg_toast.pg_toast_18409         | 4976 MB
  pg_toast.pg_toast_18146         | 290 MB
  public.cotacao                  | 229 MB
  public.elementocalculado        | 179 MB
  public.cotacaotransicaosituacao | 155 MB
  public.log                      | 112 MB
  public.logradouro               | 82 MB
  public.cotacaonf                | 60 MB
  public.notafiscal               | 60 MB
  public.tabelacalculada          | 60 MB
(10 registros)


SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC
LIMIT 10;
             relation             | total_size
---------------------------------+------------
  public.notafiscalarq            | 5102 MB
  public.cotacao                  | 331 MB
  public.elementocalculado        | 312 MB
  public.documentotransportearq   | 294 MB
  public.cotacaotransicaosituacao | 233 MB
  public.log                      | 196 MB
  public.logradouro               | 149 MB
  public.cotacaonf                | 118 MB
  public.tabelacalculada          | 116 MB
  public.notafiscal               | 94 MB
(10 registros)

Looking at PgAdmin 14, I get the following data for table notafiscalarq:
Table Size    52 MB
Toast Table Size    5033 MB
Indexes Size    15 MB


Executing query
SELECT relkind, oid, relfilenode, reltoastrelid,
        relpages, reltuples
   FROM pg_class
   ORDER BY relpages DESC
   LIMIT 10;

results
  relkind |  oid   | relfilenode | reltoastrelid | relpages | reltuples
---------+--------+-------------+---------------+----------+-----------
  r       |  18064 |       18064 |         18086 |    29332 | 639639
  r       |  18179 |       18179 |             0 |    22797 | 1.811e+06
  r       |  18116 |       18116 |         18121 |    19772 | 724370
  r       |  18343 |       18343 |         18347 |    14311 | 928633
  r       |  18352 |       18352 |             0 |    10488 | 917134
  r       |  18092 |       18092 |             0 |     7691 | 640709
  r       |  18396 |       18396 |         18404 |     7670 | 172791
  r       |  18558 |       18558 |             0 |     7608 | 386907
  i       | 747805 |      747805 |             0 |     6976 | 1.811e+06
  r       |  18409 |       18409 |         18413 |     6684 | 715084

Executing query
show block_size

  block_size
------------
  8192


So, all of this information was get using unique database session, so
they must related to same database and data files

Regards,

Edson

Re: Compressed binary field

From

"Kevin Grittner"

Date:

12 September 2012, 18:09:58

Edson Richter <edsonrichter@hotmail.com> wrote:

> SELECT pg_size_pretty(pg_database_size('mydatabase'));
>   pg_size_pretty
> ----------------
>   7234 MB

> SELECT nspname || '.' || relname AS "relation",
>      pg_size_pretty(pg_relation_size(C.oid)) AS "size"
>    FROM pg_class C
>    LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
>    WHERE nspname NOT IN ('pg_catalog', 'information_schema')
>    ORDER BY pg_relation_size(C.oid) DESC
>    LIMIT 10;
>              relation             |  size
> ---------------------------------+---------
>   pg_toast.pg_toast_18409         | 4976 MB
>   pg_toast.pg_toast_18146         | 290 MB
> [ ... ]

> SELECT relkind, oid, relfilenode, reltoastrelid,
>         relpages, reltuples
>    FROM pg_class
>    ORDER BY relpages DESC
>    LIMIT 10;
>
> results
>   relkind |  oid   | relfilenode | reltoastrelid | relpages |
> reltuples
>
---------+--------+-------------+---------------+----------+------
> -----
>   r       |  18064 |       18064 |         18086 |    29332 |
> 639639
>   r       |  18179 |       18179 |             0 |    22797 |
> 1.811e+06
>   r       |  18116 |       18116 |         18121 |    19772 |
> 724370
>   r       |  18343 |       18343 |         18347 |    14311 |
> 928633
>   r       |  18352 |       18352 |             0 |    10488 |
> 917134
>   r       |  18092 |       18092 |             0 |     7691 |
> 640709
>   r       |  18396 |       18396 |         18404 |     7670 |
> 172791
>   r       |  18558 |       18558 |             0 |     7608 |
> 386907
>   i       | 747805 |      747805 |             0 |     6976 |
> 1.811e+06
>   r       |  18409 |       18409 |         18413 |     6684 |
> 715084

When I run that query on a big database here, my top three entries
are for relkind of 't' (starting with a 2TB TOAST table for our
document images) and number four is a TOAST index.  It's hard to see
why you don't have TOAST entries at the top of your list.  Instead
of a VACUUM FULL, could you try a VACUUM FREEZE VERBOSE ANALYZE
against the full database (using a database superuser login) and
capture the output?  Please post the portion of the output for the
big table and its TOAST table, and see whether the numbers
(pg_class.relpages * 8KB versus pg_relation_size()) start to match
up.

You might also want to confirm that neither pg_stat_activity nor
pg_prepared_xacts shows any lingering transactions started more than
a few minutes ago.

> So, all of this information was get using unique database session,
> so they must related to same database and data files

Thanks, that helps suggest where to look next.

-Kevin

Re: Compressed binary field

From

Edson Richter

Date:

13 September 2012, 00:51:18

Em 12/09/2012 15:09, Kevin Grittner escreveu:
> VACUUM FREEZE VERBOSE ANALYZE
Sorry, most of the messages are in portuguese, but I guess numbers are
more important, right?

INFO:  índice "pk_notafiscalarq" agora contém 715084 versões de
registros em 1963 páginas
DETALHE:  0 versões de registros de índices foram apagadas.
0 páginas de índice foram removidas, 0 são reutilizáveis.
CPU 0.00s/0.00u sec elapsed 0.12 sec.
INFO:  "notafiscalarq": encontrados 0 versões de registros removíveis e
715084 não-removíveis em 6684 de 6684 páginas
DETALHE:  0 versões de registros não vigentes não podem ser removidas ainda.
Havia 0 ponteiros de itens não utilizados.
0 páginas estão completamente vazias.
CPU 0.03s/0.04u sec elapsed 0.38 sec.
INFO:  limpando "pg_toast.pg_toast_18409"
INFO:  índice "pg_toast_18409_index" agora contém 2643728 versões de
registros em 7251 páginas
DETALHE:  0 versões de registros de índices foram apagadas.
0 páginas de índice foram removidas, 0 são reutilizáveis.
CPU 0.00s/0.00u sec elapsed 0.10 sec.
INFO:  "pg_toast_18409": encontrados 0 versões de registros removíveis e
2643728 não-removíveis em 636949 de 636949 páginas
DETALHE:  0 versões de registros não vigentes não podem ser removidas ainda.
Havia 0 ponteiros de itens não utilizados.
0 páginas estão completamente vazias.
CPU 2.45s/1.30u sec elapsed 30.16 sec.
INFO:  analisando "public.notafiscalarq"
INFO:  "notafiscalarq": processados 6684 de 6684 páginas, contendo
715084 registros vigentes e 0 registros não vigentes; 30000 registros
amostrados, 715084 registros totais estimados


Thanks,

Edson

Re: Compressed binary field

From

Edson Richter

Date:

13 September 2012, 18:51:42

Em 12/09/2012 21:50, Edson Richter escreveu:
> Em 12/09/2012 15:09, Kevin Grittner escreveu:
>> VACUUM FREEZE VERBOSE ANALYZE
> Sorry, most of the messages are in portuguese, but I guess numbers are
> more important, right?
>
> INFO:  índice "pk_notafiscalarq" agora contém 715084 versões de
> registros em 1963 páginas
> DETALHE:  0 versões de registros de índices foram apagadas.
> 0 páginas de índice foram removidas, 0 são reutilizáveis.
> CPU 0.00s/0.00u sec elapsed 0.12 sec.
> INFO:  "notafiscalarq": encontrados 0 versões de registros removíveis
> e 715084 não-removíveis em 6684 de 6684 páginas
> DETALHE:  0 versões de registros não vigentes não podem ser removidas
> ainda.
> Havia 0 ponteiros de itens não utilizados.
> 0 páginas estão completamente vazias.
> CPU 0.03s/0.04u sec elapsed 0.38 sec.
> INFO:  limpando "pg_toast.pg_toast_18409"
> INFO:  índice "pg_toast_18409_index" agora contém 2643728 versões de
> registros em 7251 páginas
> DETALHE:  0 versões de registros de índices foram apagadas.
> 0 páginas de índice foram removidas, 0 são reutilizáveis.
> CPU 0.00s/0.00u sec elapsed 0.10 sec.
> INFO:  "pg_toast_18409": encontrados 0 versões de registros removíveis
> e 2643728 não-removíveis em 636949 de 636949 páginas
> DETALHE:  0 versões de registros não vigentes não podem ser removidas
> ainda.
> Havia 0 ponteiros de itens não utilizados.
> 0 páginas estão completamente vazias.
> CPU 2.45s/1.30u sec elapsed 30.16 sec.
> INFO:  analisando "public.notafiscalarq"
> INFO:  "notafiscalarq": processados 6684 de 6684 páginas, contendo
> 715084 registros vigentes e 0 registros não vigentes; 30000 registros
> amostrados, 715084 registros totais estimados
>
>
> Thanks,
>
> Edson
>
>
Anything else I can do from here?

Edson

Re: Compressed binary field

From

"Kevin Grittner"

Date:

13 September 2012, 19:12:42

Edson Richter <edsonrichter@hotmail.com> wrote:

> Anything else I can do from here?

Did that result in more accurate numbers for pg_class.reltuples?

-Kevin

Re: Compressed binary field

From

Edson Richter

Date:

13 September 2012, 23:15:10

Em 13/09/2012 16:12, Kevin Grittner escreveu:
> Edson Richter <edsonrichter@hotmail.com> wrote:
>
>> Anything else I can do from here?
>
> Did that result in more accurate numbers for pg_class.reltuples?
>
> -Kevin
>
I don't how number were not accurate - for me they always seemed
consistent with what I knew about it...
Let's repeat all tests again (see, data grows on daily basis, so numbers
will be a bit different - yes, I've run the vacuum again):

SELECT pg_size_pretty(pg_database_size('mydatabase'));
  pg_size_pretty
----------------
  7238 MB

  SELECT pg_size_pretty(pg_relation_size('notafiscalarq'));
  pg_size_pretty
----------------
  52 MB

SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_relation_size(C.oid)) AS "size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_relation_size(C.oid) DESC
LIMIT 10;
             relation             |  size
---------------------------------+---------
  pg_toast.pg_toast_18409           | 4976 MB
  pg_toast.pg_toast_18146           | 290 MB
  public.cotacao                               | 229 MB
  public.elementocalculado            | 179 MB
  public.cotacaotransicaosituacao | 155 MB
  public.log                                        | 112 MB
  public.logradouro                          | 82 MB
  public.cotacaonf                            | 60 MB
  public.notafiscal                            | 60 MB
  public.tabelacalculada                  | 60 MB


SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC
LIMIT 10;
             relation             | total_size
---------------------------------+------------
  public.notafiscalarq                        | 5102 MB
  public.cotacao                                 | 331 MB
  public.elementocalculado              | 313 MB
  public.documentotransportearq   | 294 MB
  public.cotacaotransicaosituacao   | 233 MB
  public.log                                          | 196 MB
  public.logradouro                            | 149 MB
  public.cotacaonf                              | 118 MB
  public.tabelacalculada                    | 116 MB
  public.notafiscal                              | 94 MB


SELECT relkind, oid, relfilenode, reltoastrelid,
relpages, reltuples
FROM pg_class
ORDER BY relpages DESC
LIMIT 10;
  relkind |  oid  | relfilenode | reltoastrelid | relpages | reltuples
---------+-------+-------------+---------------+----------+-------------
  t       | 18413 |       18413 |                  0 |   636949 |
2.64373e+06
  t       | 18150 |       18150 |                  0 |     37086 |
149502
  r       | 18064 |       18064 |         18086 |     29347 | 639695
  r       | 18179 |       18179 |                  0 |     22901 |
1.8172e+06
  r       | 18116 |       18116 |         18121 |     19779 | 724619
  r       | 18343 |       18343 |         18347 |     14325 | 928805
  r       | 18352 |       18352 |                  0 |     10488 |
917134
  r       | 18092 |       18092 |                  0 |       7695 |
640804
  r       | 18396 |       18396 |         18404 |       7671 | 172792
  r       | 18558 |       18558 |                  0 |       7644 |
388332


show block_size;
  block_size
------------
  8192


Regards,

Edson

Re: Compressed binary field

From

Jeff Janes

Date:

17 September 2012, 03:17:59

On Tue, Sep 11, 2012 at 9:34 AM, Edson Richter <edsonrichter@hotmail.com> wrote:
>
> No, there is no problem. Just trying to reduce database size forcing these
> fields to compress.
> Actual database size = 8Gb
> Backup size = 1.6Gb (5x smaller)
>
> Seems to me (IMHO) that there is room for improvement in database storage
> (we don't have many indexes, and biggest tables are just the ones with bytea
> fields). That's why I've asked for experts counseling.

There are two things to keep in mind.  One is that each datum is
compressed separately, so that a lot of redundancy that occurs between
fields of different tuples, but not within any given tuple, will not
be available to TOAST, but will be available to the compression of a
dump file.

Another thing is that PG's TOAST compression was designed to be simple
and fast and patent free, and often it is not all that good.  It is
quite good if you have long stretches of repeats of a single
character, or exact densely spaced repeats of a sequence of characters
("123123123123123..."), but when the redundancy is less simple it does
a much worse job than gzip, for example, does.

It is possible but unlikely there is a bug somewhere, but most likely
your documents just aren't very compressible using pglz_compress.

Cheers,

Jeff

Re: Compressed binary field

From

Edson Richter

Date:

17 September 2012, 14:10:01

Em 17/09/2012 00:17, Jeff Janes escreveu:
> On Tue, Sep 11, 2012 at 9:34 AM, Edson Richter <edsonrichter@hotmail.com> wrote:
>> No, there is no problem. Just trying to reduce database size forcing these
>> fields to compress.
>> Actual database size = 8Gb
>> Backup size = 1.6Gb (5x smaller)
>>
>> Seems to me (IMHO) that there is room for improvement in database storage
>> (we don't have many indexes, and biggest tables are just the ones with bytea
>> fields). That's why I've asked for experts counseling.
> There are two things to keep in mind.  One is that each datum is
> compressed separately, so that a lot of redundancy that occurs between
> fields of different tuples, but not within any given tuple, will not
> be available to TOAST, but will be available to the compression of a
> dump file.
>
> Another thing is that PG's TOAST compression was designed to be simple
> and fast and patent free, and often it is not all that good.  It is
> quite good if you have long stretches of repeats of a single
> character, or exact densely spaced repeats of a sequence of characters
> ("123123123123123..."), but when the redundancy is less simple it does
> a much worse job than gzip, for example, does.
>
> It is possible but unlikely there is a bug somewhere, but most likely
> your documents just aren't very compressible using pglz_compress.
>
> Cheers,
>
> Jeff
Most of data is XML (few are PDF).
Probably, the best solution for me is to compress before sending to
database.

Thanks for the info.

Regards,

Edson.