Home > mailing lists

Re: block corruption on slave db. - Mailing list pgsql-admin

From	Keith
Subject	Re: block corruption on slave db.
Date	June 16, 2020 07:54:20
Msg-id	CAHw75vuQ0zOPAp2iqjwg7O7qjY49R=SNqc=yorsYn0AE2nK-7w@mail.gmail.com Whole thread Raw
In response to	block corruption on slave db. (vinod kale <vinod16895@gmail.com>)
List	pgsql-admin

Tree view

On Mon, Jun 15, 2020 at 9:45 AM vinod kale <vinod16895@gmail.com> wrote:

Hello Team,

I have to clear some doughts regarding block corruption on slave db.

If m having streaming replication in my environment. And in some moments corruption happened in my slave server's data then how will i come to know the corruption occurred.

Bcoz if same happened on primary db then i wll immediately come to know bcoz there is daily dump backup we take on primary and in dump block corruption detects.

Bt my dought is how we can identify on slave db.

Kindly help me in above.

Thanks in advance.

--
Thanks & regards,
Vinod Kale

There's many different kinds of corruption, but one fairly effective way you can watch for some of it is by ensuring that page level checksums are enabled. If you're on a version of PG prior to 12, this can only be done during database initialization (when you run initdb), so if it wasn't enabled before, you'd have to dump and restore your database into a newly initialized instance.

If you are on PG12+, or you can upgrade to it, you can use the pg_checksum tool to enable page level checksums on an offline database

https://www.postgresql.org/docs/12/app-pgchecksums.html

Also on PG12+, you have the ability to monitor if a page level checksum is encountered from within the database by querying the pg_stat_database catalog. This is useful for monitoring tools such as check_postgres(https://bucardo.org/check_postgres/) or pgMonitor (https://github.com/CrunchyData/pgmonitor) An example query from pgMonitor to pull out the count and time since the last failure is as follows.

SELECT datname AS dbname
, checksum_failures AS count
, coalesce(extract(epoch from (now() - checksum_last_failure)), 0) AS time_since_last_failure_seconds
FROM pg_catalog.pg_stat_database;

Note that if statistics are reset, the errors detected and recorded here are also reset. That is also how you'd reset it if you encountered corruption and fixed it. Prior to PG12, page level checksum errors were only reported via the logs.

Keith

pgsql-admin by date:

From: Wells Oliver
Date: 16 June 2020, 04:39:02
Subject: Deleting more efficiently from large partitions

From: "Michaeldba@sqlexec.com"
Date: 16 June 2020, 13:48:43
Subject: Re: Deleting more efficiently from large partitions

Re: block corruption on slave db. - Mailing list pgsql-admin

Previous

Next