Thread: pg_dump with compressible and non-compressible tables

pg_dump with compressible and non-compressible tables

From
Ron
Date:
Hi,

v9.6

We've got big databases where some of the tables are highly compressible, 
but some have many bytea fields containing PDF files.

When the data format is custom, directory or tar, how feasible would a 
"--no-blob-compression" option be (where pg_dump just tells the zlib library 
to just Store tables with bytea columns, while compressing all other tables 
at the specified -Z level)?

Thanks

-- 
Angular momentum makes the world go 'round.


Re: pg_dump with compressible and non-compressible tables

From
Adrian Klaver
Date:
On 05/05/2018 07:14 AM, Ron wrote:
> Hi,
> 
> v9.6
> 
> We've got big databases where some of the tables are highly 
> compressible, but some have many bytea fields containing PDF files.

Can you see a demonstrable difference?

> 
> When the data format is custom, directory or tar, how feasible would a 
> "--no-blob-compression" option be (where pg_dump just tells the zlib 
> library to just Store tables with bytea columns, while compressing all 
> other tables at the specified -Z level)?

In pg_dump blob refers to large objects:

https://www.postgresql.org/docs/10/static/app-pgdump.html
"
-b
--blobs

     Include large objects in the dump. This is the default behavior 
except when --schema, --table, or --schema-only is specified. The -b 
switch is therefore only useful to add large objects to dumps where a 
specific schema or table has been requested. Note that blobs are 
considered data and therefore will be included when --data-only is used, 
but not when --schema-only is.
"

These are different critters then bytea.

> 
> Thanks
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: pg_dump with compressible and non-compressible tables

From
Ron
Date:
On 05/05/2018 12:13 PM, Adrian Klaver wrote:
On 05/05/2018 07:14 AM, Ron wrote:
Hi,

v9.6

We've got big databases where some of the tables are highly compressible, but some have many bytea fields containing PDF files.

Can you see a demonstrable difference?

Very much so.  The ASCII hex representations of the PDF files are compressible, but take a long time to compress.  Uncompressed backups are 50% faster.


When the data format is custom, directory or tar, how feasible would a "--no-blob-compression" option be (where pg_dump just tells the zlib library to just Store tables with bytea columns, while compressing all other tables at the specified -Z level)?

In pg_dump blob refers to large objects:

https://www.postgresql.org/docs/10/static/app-pgdump.html
"
-b
--blobs

    Include large objects in the dump. This is the default behavior except when --schema, --table, or --schema-only is specified. The -b switch is therefore only useful to add large objects to dumps where a specific schema or table has been requested. Note that blobs are considered data and therefore will be included when --data-only is used, but not when --schema-only is.
"

These are different critters then bytea.

Ok.  I need the data in my backups anyway, so excluding them is 100% contrary to what I need.

--
Angular momentum makes the world go 'round.

Re: pg_dump with compressible and non-compressible tables

From
Adrian Klaver
Date:
On 05/05/2018 12:41 PM, Ron wrote:
> On 05/05/2018 12:13 PM, Adrian Klaver wrote:
>> On 05/05/2018 07:14 AM, Ron wrote:
>>> Hi,
>>>
>>> v9.6
>>>
>>> We've got big databases where some of the tables are highly 
>>> compressible, but some have many bytea fields containing PDF files.
>>
>> Can you see a demonstrable difference?
> 
> Very much so.  The ASCII hex representations of the PDF files are 
> compressible, but take a *long* time to compress. Uncompressed backups 
> are 50% faster.

Got it. The developers will need to comment on whether this is doable or 
not. The thing is that this would be a new feature. At this point 
version 11 is closed to new features, so you are looking at version 12 
which means 1.5-2 years out. If it where me I would try piping a pg_dump 
plain text output to a compression program other then zlib(used in 
pg_dump compression) and see if you can get better performance.


>> These are different critters then bytea.
> 
> Ok.  I need the data in my backups anyway, so excluding them is 100% 
> contrary to what I need.

I understand. What I was trying to say was that the blob you are 
referring to, bytea in a field, is not the same as what pg_dump is 
referring to, a large object stored in the  pg_largeobject table:

https://www.postgresql.org/docs/10/static/lo-intro.html

So if you want to pursue this feature I think you need to come up with 
another name for it to avoid the confusion I mentioned above.

> 
> -- 
> Angular momentum makes the world go 'round.


-- 
Adrian Klaver
adrian.klaver@aklaver.com