Thread: Verify Option with pg_dump

Verify Option with pg_dump

From
Howard News
Date:

Hi,

recently I had problems with a corrupt pg_dump file. The problem with the file was due to a faulty disk. The trouble with this is that I was unaware of the disk problem and the pg_dump file corruption so I did not have a full valid backup. In order to reduce the chances of this I was hoping that there could be a verify option as in SQL Server for the backups. This could be as simple as checking the CRC/MD5 as the stream is created. So pg_dump | crc_save

The idea being that the pg_dump is crc'd before it is streamed to disk, and then the file re-read from disk to check the CRC.

Is there a linux utility to do this or would it be simple to modify pg_dump to do this?

Thanks

Howard.

www.selestial.com


Re: Verify Option with pg_dump

From
Karsten Hilbert
Date:
On Wed, Nov 30, 2016 at 12:00:07PM +0000, Howard News wrote:

> recently I had problems with a corrupt pg_dump file. The problem with the
> file was due to a faulty disk. The trouble with this is that I was unaware
> of the disk problem and the pg_dump file corruption so I did not have a full
> valid backup. In order to reduce the chances of this I was hoping that there
> could be a verify option as in SQL Server for the backups. This could be as
> simple as checking the CRC/MD5 as the stream is created. So pg_dump |
> crc_save
>
> The idea being that the pg_dump is crc'd before it is streamed to disk, and
> then the file re-read from disk to check the CRC.
>
> Is there a linux utility to do this or would it be simple to modify pg_dump
> to do this?

You can try to suitably combine "pg_dump --format=plain" with
"tee" and "md5sum" such that the output stream is diverted to
both a file and a pipe-into-CRC-algorithm and eventually
compare the pipe's sum with the sum generated from the file.

But the better solution might be to stream to a filesystem
that verifies disk writes immediately. Or to a suitable RAID
array.

Regards,
Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


Re: Verify Option with pg_dump

From
Howard News
Date:
On 30/11/2016 12:27, Karsten Hilbert wrote:
>
> You can try to suitably combine "pg_dump --format=plain" with
> "tee" and "md5sum" such that the output stream is diverted to
> both a file and a pipe-into-CRC-algorithm and eventually
> compare the pipe's sum with the sum generated from the file.
>
> But the better solution might be to stream to a filesystem
> that verifies disk writes immediately. Or to a suitable RAID
> array.
>
> Regards,
> Karsten
Thanks for this info Karsten. I will look into using "tee". As a matter
of interest, why does the format need to be plain?

Regarding the filesystem solution, the dump is currently written to a HP
RAID 10 array with an NTFS partition. What filesystems / raid arrays
have this ability?

Thanks.


Re: Verify Option with pg_dump

From
Karsten Hilbert
Date:
On Wed, Nov 30, 2016 at 01:11:58PM +0000, Howard News wrote:

> > You can try to suitably combine "pg_dump --format=plain" with
> > "tee" and "md5sum" such that the output stream is diverted to
> > both a file and a pipe-into-CRC-algorithm and eventually
> > compare the pipe's sum with the sum generated from the file.
> >
> > But the better solution might be to stream to a filesystem
> > that verifies disk writes immediately. Or to a suitable RAID
> > array.
> Thanks for this info Karsten. I will look into using "tee". As a matter of
> interest, why does the format need to be plain?

Actually, any of the formats producing a _single_ file right
away are likely to work. So, any but "directory", I guess.

> Regarding the filesystem solution, the dump is currently written to a HP
> RAID 10 array with an NTFS partition. What filesystems / raid arrays have
> this ability?

If you can't trust your RAID 10 (1 meaning mirrored) to
actually store what you told it to you've got problems beyond
somehow verifying a pg_dump.

Regards,
Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


Re: Verify Option with pg_dump

From
Howard News
Date:
Regarding the filesystem solution, the dump is currently written to a HP
>> RAID 10 array with an NTFS partition. What filesystems / raid arrays have
>> this ability?
> If you can't trust your RAID 10 (1 meaning mirrored) to
> actually store what you told it to you've got problems beyond
> somehow verifying a pg_dump.
>
> Regards,
> Karsten
I am told RAID can only protect you against disk failure. File writes to
one or more of the disks in an array are not typically compared so a
RAID array carrys on until the disk failure, or error count get to a
certain level. So RAID does not fully protect you from data corruption.

So you can't trust RAID!



Re: Verify Option with pg_dump

From
Karsten Hilbert
Date:
On Wed, Nov 30, 2016 at 01:53:21PM +0000, Howard News wrote:

> Regarding the filesystem solution, the dump is currently written to a HP
> > > RAID 10 array with an NTFS partition. What filesystems / raid arrays have
> > > this ability?
> > If you can't trust your RAID 10 (1 meaning mirrored) to
> > actually store what you told it to you've got problems beyond
> > somehow verifying a pg_dump.
> >
> > Regards,
> > Karsten
> I am told RAID can only protect you against disk failure. File writes to one
> or more of the disks in an array are not typically compared so a RAID array
> carrys on until the disk failure, or error count get to a certain level. So
> RAID does not fully protect you from data corruption.

True enough. So it seems you are referring to "silent data
corruption". Does this link help ?

    http://www.raidix.com/knowledge-base/silent-data-corruption/

This link also seems relevant:

    http://stackoverflow.com/questions/13107783/pipe-output-to-two-different-commands

Regards,
Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


Re: Verify Option with pg_dump

From
Karsten Hilbert
Date:
Also this

    https://en.wikipedia.org/wiki/Silent_data_corruption#Countermeasures

--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346