Thread: Strange result using pg_dump gzip or split.
Hello, I found strange result when I use pg_dump described on postgresql site: http://www.postgresql.org/docs/9.3/static/backup-dump.html I have a database with 30 gb data and decide to archive it, postgresql is 9.3.5 x64_86, ext4 file system, kernel 3.14.18 Slackware 14.2 (current) First I use gzip with : pg_dump logdb | gzip > log.sql.gz After a few minute I have log.sql.gz with size 2 170 016 226 Well, that is strange and I dump database again with: pg_dump logdb | split -b 1024m - log.sql 20 files is generated and I zip them with: zip -r log.sql.zip logdir (because I move them in logdir) file size is : 2 170 020 867 Almost the same, but if I check size in archives there is a huge difference. $ gzip -l log.sql.gz compressed uncompressed ratio uncompressed_name 2170016226 3060688725 29.1% log_to.sql and $ unzip -v log.sql.zip *** snip *** -------- ------- --- ------- 20240557909 2170020867 89% 20 files Here is difference: with gzip I have 29.1% compress ratio and uncompressed size is 3 060 688 725 which means 3 GB and with zip I have 89% compress ratio and uncompressed size is 20 240 557 909 witch mean 20 GB. That is 7 times bigger. My question is: Is there some special config params that is not described in documentation here: http://www.postgresql.org/docs/9.3/static/backup-dump.html Or something need to be configured on my linux. And most important question for me is: Did the database dump is corrupt or not ? Regards, Hristo Simeonov
On 11/10/2014 3:34 AM, Condor wrote: > Did the database dump is corrupt or not ? try restoring them to a new database.... -- john r pierce 37N 122W somewhere on the middle of the left coast
Hi Condor.
On Mon, Nov 10, 2014 at 12:34 PM, Condor <condor@stz-bg.com> wrote:
I have a database with 30 gb data and decide to archive it, postgresql is 9.3.5 x64_86, ext4 file system, kernel 3.14.18 Slackware 14.2 (current)
You should have a look at your tools, it seems you have a file size problem....
First I use gzip with : pg_dump logdb | gzip > log.sql.gz$ gzip -l log.sql.gz
...
compressed uncompressed ratio uncompressed_name
2170016226 3060688725 29.1% log_to.sql
$ unzip -v log.sql.zip
*** snip ***
-------- ------- --- -------
20240557909 2170020867 89% 20 files
When you have this kind of problem, your first thing should be to pass everything to hex:
2170016226=0x8157D1E2
2170020867=0x8157E403
Not a great difference there, this is normal, but on the uncompressed side:
20240557909=0x4B66E6755
3060688725=0xB66E6755
Mmmm, something phisy here, it seems gzip is using 32 bits only, so it gets the things wrong. You can investigate more from there. If you can spare the disk space ( which seems to since you had it for the split/zip ) you should try to gunzip it, and see how big it comes out ( I would recommenf 'gzip -tv once to see what it does print, and then gunzip -cv > xxx to preserve the input AND get verbose output ). The problem seems to be with gzip.
Francisco Olarte.
Hi Condor.
Followup, I did not spot it at first, looking at http://www.gzip.org/zlib/rfc-gzip.html#file-format I see:On Mon, Nov 10, 2014 at 12:34 PM, Condor <condor@stz-bg.com> wrote:
Hello,
I found strange result when I use pg_dump described on postgresql site: http://www.postgresql.org/docs/9.3/static/backup-dump.html
I have a database with 30 gb data and decide to archive it, postgresql is 9.3.5 x64_86, ext4 file system, kernel 3.14.18 Slackware 14.2 (current)
First I use gzip with : pg_dump logdb | gzip > log.sql.gz
After a few minute I have log.sql.gz with size 2 170 016 226
Well, that is strange and I dump database again with:
pg_dump logdb | split -b 1024m - log.sql
20 files is generated and I zip them with:
zip -r log.sql.zip logdir (because I move them in logdir)
file size is : 2 170 020 867
Almost the same, but if I check size in archives there is a huge difference.
$ gzip -l log.sql.gz
compressed uncompressed ratio uncompressed_name
2170016226 3060688725 29.1% log_to.sql
and
$ unzip -v log.sql.zip
*** snip ***
-------- ------- --- -------
20240557909 2170020867 89% 20 files
Here is difference: with gzip I have 29.1% compress ratio and uncompressed size is 3 060 688 725 which means 3 GB
and with zip I have 89% compress ratio and uncompressed size is 20 240 557 909 witch mean 20 GB. That is 7 times bigger.
My question is: Is there some special config params that is not described in documentation here: http://www.postgresql.org/docs/9.3/static/backup-dump.html
Or something need to be configured on my linux.
And most important question for me is: Did the database dump is corrupt or not ?
Regards,
Hristo Simeonov
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Followup, second try.
First of all, I'd like to apologize to the list for my previous message, I borked some finger gymnastics when switching tabs and sent and incomplete one. My fault. Sorry.Now what I tried to say was:
I did not spot it at first, looking at http://www.gzip.org/zlib/rfc-gzip.html#file-format I see:
- ISIZE (Input SIZE)
- This contains the size of the original (uncompressed) input data modulo 2^32.
Francisco Olarte.
On 11/10/2014 03:34 AM, Condor wrote: > > Hello, > > I found strange result when I use pg_dump described on postgresql site: > http://www.postgresql.org/docs/9.3/static/backup-dump.html > > I have a database with 30 gb data and decide to archive it, postgresql > is 9.3.5 x64_86, ext4 file system, kernel 3.14.18 Slackware 14.2 (current) How did you determine there is 30GB of data? > > > First I use gzip with : pg_dump logdb | gzip > log.sql.gz > > After a few minute I have log.sql.gz with size 2 170 016 226 > Well, that is strange and I dump database again with: > > pg_dump logdb | split -b 1024m - log.sql > > 20 files is generated and I zip them with: > > zip -r log.sql.zip logdir (because I move them in logdir) > > file size is : 2 170 020 867 > > Almost the same, but if I check size in archives there is a huge > difference. Any reason for not using pg_dump -Fc and get the built in compression? > > > $ gzip -l log.sql.gz > compressed uncompressed ratio uncompressed_name > 2170016226 3060688725 29.1% log_to.sql > > and > > > $ unzip -v log.sql.zip > *** snip *** > -------- ------- --- ------- > 20240557909 2170020867 89% 20 files > > > Here is difference: with gzip I have 29.1% compress ratio and > uncompressed size is 3 060 688 725 which means 3 GB > and with zip I have 89% compress ratio and uncompressed size is 20 240 > 557 909 witch mean 20 GB. That is 7 times bigger. > > My question is: Is there some special config params that is not > described in documentation here: > http://www.postgresql.org/docs/9.3/static/backup-dump.html > Or something need to be configured on my linux. > > And most important question for me is: Did the database dump is corrupt > or not ? > > > > Regards, > > Hristo Simeonov > > -- Adrian Klaver adrian.klaver@aklaver.com