Thread: Strange result using pg_dump gzip or split.

Strange result using pg_dump gzip or split.

From
Condor
Date:
Hello,

I found strange result when I use pg_dump described on postgresql site:
http://www.postgresql.org/docs/9.3/static/backup-dump.html

I have a database with 30 gb data and decide to archive it, postgresql
is 9.3.5 x64_86, ext4 file system, kernel 3.14.18 Slackware 14.2
(current)


First I use gzip with : pg_dump logdb | gzip > log.sql.gz

After a few minute I have log.sql.gz with size 2 170 016 226
Well, that is strange and I dump database again with:

pg_dump logdb | split -b 1024m - log.sql

20 files is generated and I zip them with:

zip -r log.sql.zip logdir (because I move them in logdir)

file size is : 2 170 020 867

Almost the same, but if I check size in archives there is a huge
difference.


$ gzip -l log.sql.gz
          compressed        uncompressed  ratio uncompressed_name
          2170016226          3060688725  29.1% log_to.sql

and


$ unzip -v log.sql.zip
*** snip ***
--------          -------  ---                            -------
20240557909         2170020867  89%                            20 files


Here is difference: with gzip I have 29.1% compress ratio and
uncompressed size is 3 060 688 725 which means 3 GB
and with zip I have 89% compress ratio and uncompressed size is 20 240
557 909 witch mean 20 GB. That is 7 times bigger.

My question is: Is there some special config params that is not
described in documentation here:
http://www.postgresql.org/docs/9.3/static/backup-dump.html
Or something need to be configured on my linux.

And most important question for me is: Did the database dump is corrupt
or not ?



Regards,

Hristo Simeonov


Re: Strange result using pg_dump gzip or split.

From
John R Pierce
Date:
On 11/10/2014 3:34 AM, Condor wrote:
> Did the database dump is corrupt or not ?

try restoring them to a new database....



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: Strange result using pg_dump gzip or split.

From
Francisco Olarte
Date:
Hi Condor.

On Mon, Nov 10, 2014 at 12:34 PM, Condor <condor@stz-bg.com> wrote:
I have a database with 30 gb data and decide to archive it, postgresql is 9.3.5 x64_86, ext4 file system, kernel 3.14.18 Slackware 14.2 (current)

You should have a look at your tools, it seems you have a file size problem....
 
First I use gzip with : pg_dump logdb | gzip > log.sql.gz
...
 $ gzip -l log.sql.gz
         compressed        uncompressed  ratio uncompressed_name
         2170016226          3060688725  29.1% log_to.sql
$ unzip -v log.sql.zip
*** snip ***
--------          -------  ---                            -------
20240557909         2170020867  89%                            20 files

When you have this kind of problem, your first thing should be to pass everything to hex:

2170016226=0x8157D1E2

2170020867=0x8157E403


Not a great difference there, this is normal, but on the uncompressed side:


20240557909=0x4B66E6755

3060688725=0xB66E6755

Mmmm, something phisy here, it seems gzip is using 32 bits only, so it gets the things wrong.  You can investigate more from there. If you can spare the disk space ( which seems to since you had it for the split/zip ) you should try to gunzip it, and see how big it comes out ( I would recommenf 'gzip -tv once to see what it does print, and then gunzip -cv > xxx to preserve the input AND get verbose output ). The problem seems to be with gzip.

Francisco Olarte.



Re: Strange result using pg_dump gzip or split.

From
Francisco Olarte
Date:
Hi Condor.

Followup, I did not spot it at first, looking at http://www.gzip.org/zlib/rfc-gzip.html#file-format I see:

On Mon, Nov 10, 2014 at 12:34 PM, Condor <condor@stz-bg.com> wrote:

Hello,

I found strange result when I use pg_dump described on postgresql site: http://www.postgresql.org/docs/9.3/static/backup-dump.html

I have a database with 30 gb data and decide to archive it, postgresql is 9.3.5 x64_86, ext4 file system, kernel 3.14.18 Slackware 14.2 (current)


First I use gzip with : pg_dump logdb | gzip > log.sql.gz

After a few minute I have log.sql.gz with size 2 170 016 226
Well, that is strange and I dump database again with:

pg_dump logdb | split -b 1024m - log.sql

20 files is generated and I zip them with:

zip -r log.sql.zip logdir (because I move them in logdir)

file size is : 2 170 020 867

Almost the same, but if I check size in archives there is a huge difference.


$ gzip -l log.sql.gz
         compressed        uncompressed  ratio uncompressed_name
         2170016226          3060688725  29.1% log_to.sql

and


$ unzip -v log.sql.zip
*** snip ***
--------          -------  ---                            -------
20240557909         2170020867  89%                            20 files


Here is difference: with gzip I have 29.1% compress ratio and uncompressed size is 3 060 688 725 which means 3 GB
and with zip I have 89% compress ratio and uncompressed size is 20 240 557 909 witch mean 20 GB. That is 7 times bigger.

My question is: Is there some special config params that is not described in documentation here: http://www.postgresql.org/docs/9.3/static/backup-dump.html
Or something need to be configured on my linux.

And most important question for me is: Did the database dump is corrupt or not ?



Regards,

Hristo Simeonov


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Strange result using pg_dump gzip or split.

From
Francisco Olarte
Date:
Followup, second try.

First of all, I'd like to apologize to the list for my previous message, I borked some finger gymnastics when switching tabs and sent and incomplete one. My fault. Sorry.

Now what I tried to say was:

I did not spot it at first, looking at http://www.gzip.org/zlib/rfc-gzip.html#file-format I see:

ISIZE (Input SIZE)
This contains the size of the original (uncompressed) input data modulo 2^32.
And given gzip -l is usually much faster than -tv I suspect it's just reporting this size.

Francisco Olarte.

Re: Strange result using pg_dump gzip or split.

From
Adrian Klaver
Date:
On 11/10/2014 03:34 AM, Condor wrote:
>
> Hello,
>
> I found strange result when I use pg_dump described on postgresql site:
> http://www.postgresql.org/docs/9.3/static/backup-dump.html
>
> I have a database with 30 gb data and decide to archive it, postgresql
> is 9.3.5 x64_86, ext4 file system, kernel 3.14.18 Slackware 14.2 (current)

How did you determine there is 30GB of data?

>
>
> First I use gzip with : pg_dump logdb | gzip > log.sql.gz
>
> After a few minute I have log.sql.gz with size 2 170 016 226
> Well, that is strange and I dump database again with:
>
> pg_dump logdb | split -b 1024m - log.sql
>
> 20 files is generated and I zip them with:
>
> zip -r log.sql.zip logdir (because I move them in logdir)
>
> file size is : 2 170 020 867
>
> Almost the same, but if I check size in archives there is a huge
> difference.

Any reason for not using pg_dump -Fc and get the built in compression?

>
>
> $ gzip -l log.sql.gz
>           compressed        uncompressed  ratio uncompressed_name
>           2170016226          3060688725  29.1% log_to.sql
>
> and
>
>
> $ unzip -v log.sql.zip
> *** snip ***
> --------          -------  ---                            -------
> 20240557909         2170020867  89%                            20 files
>
>
> Here is difference: with gzip I have 29.1% compress ratio and
> uncompressed size is 3 060 688 725 which means 3 GB
> and with zip I have 89% compress ratio and uncompressed size is 20 240
> 557 909 witch mean 20 GB. That is 7 times bigger.
>
> My question is: Is there some special config params that is not
> described in documentation here:
> http://www.postgresql.org/docs/9.3/static/backup-dump.html
> Or something need to be configured on my linux.
>
> And most important question for me is: Did the database dump is corrupt
> or not ?
>
>
>
> Regards,
>
> Hristo Simeonov
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com