Re: Doubt with physical storage being used by postgres when storing LOBs - Mailing list pgsql-general
From | Víctor Cosqui |
---|---|
Subject | Re: Doubt with physical storage being used by postgres when storing LOBs |
Date | |
Msg-id | CACg6vWXy_84ShCCXzNCsz9xLfWnx5sZvQRU6aNcrR0c3XW1Bxg@mail.gmail.com Whole thread Raw |
In response to | Re: Doubt with physical storage being used by postgres when storing LOBs (Merlin Moncure <mmoncure@gmail.com>) |
List | pgsql-general |
Hello again,
I have been digging a bit more into this and I have seen that the increase of the table size is not caused by the index, but by the data storage to disk implementation. As it is explained at the documentation of pg_largeobject (http://www.postgresql.org/docs/9.2/static/catalog-pg-largeobject.html)
"Each row of pg_largeobject holds data for one page of a large object, beginning at byte offset (pageno * LOBLKSIZE) within the object. The implementation allows sparse storage: pages might be missing, and might be shorter than LOBLKSIZE bytes even if they are not the last page of the object. Missing regions within a large object read as zeroes."
I could confirm by reading the file in disk where the table is stored that when I put a file whose content is like:
[aaa…abbb…bccc…c]
What I can read from the file in disk is like
[aaa…000bbb…000ccc…0000]
So pages are not completely fulfilled with object data and filled with zeroes to reach the LOBLKSIZE.
I have also playing with different LOBLKSIZE values to see what happens with smaller and bigger values and the conclusion is that the bigger LOBLKSIZE is, the more space is filled with zeroes.
This is what happens when you store a 17MB compressed file in different LOBLKSIZE conditions.
(Default value for LOBLKSIZE is 2KB)
------------------------------------------------+-------------+----------+-------------
LOBLKSIZE [B] | 512 | 2048 | 8000 |
------------------------------------------------+-------------+----------+-------------
Rows in pg_largeobject table | 34000 | 8712 | 2231 |
------------------------------------------------+-------------+----------+-------------
Size of pg_largeobject table [MB] | 21 | 24 | 28 |
------------------------------------------------+-------------+----------+-------------
Overhead [%] | 23.53 | 41.18 | 64.71 |
------------------------------------------------+-------------+----------+-------------
IMHO the overhead added by these zeroes is too high.
Thanks for your support and regards!
On Wed, Oct 2, 2013 at 9:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
TOAST will compress data if it thinks it can (you can disable thisOn Wed, Oct 2, 2013 at 11:08 AM, Víctor Cosqui <victor.cosqui@gmail.com> wrote:
> Hi all
>
>
> I am developing an application which uses postgres 9.2 to store binaries as
> oid objects.
>
> CREATE TABLE content (contentname text, contentoid oid);
>
> I am making some tests to evaluate how much HD space I will need to allocate
> these objects.
>
> To measure the space used by postgres I have used two different tools, both
> with the same results
>
> 1.- Checking physical HD space by making a "sudo du -sb
> /opt/PostgreSQL/9.2/data/base/" before and after inserting the data
>
> 2.- Asking directly postgres about the tables size estimation "select
> pg_size_pretty(pg_relation_size('pg_largeobject'))"
>
> I have tested with different binaries and I am getting different results,
> for example when I put the content of a zipped file of 17MB size, the
> increment of the disk space is of 24MB. The reason for this increment seems
> to be an index created on the table "pg_largeobject". The index is
> "pg_largeobject_loid_pn_index"
>
> In other hand when I put let's say many zeroes (same 17Mb) the increase of
> HD usage is much smaller.
>
> I think it could be caused because TOAST compresses the content stored, se
> he can compress the zeroes but not the previously compressed zip content.
>
> My question is: Is this increase of ~40% normal? Has someone else
> experienced this?
behavior and arguably should if your data is pre-compressed). 40% for
the index seems high but it may be accurate. Personally, I prefer
bytea storage to LOB although LOB is a little bit faster.
merlin
pgsql-general by date: