Re: block corruption on slave db. - Mailing list pgsql-admin

From Keith
Subject Re: block corruption on slave db.
Date
Msg-id CAHw75vuQ0zOPAp2iqjwg7O7qjY49R=SNqc=yorsYn0AE2nK-7w@mail.gmail.com
Whole thread Raw
In response to block corruption on slave db.  (vinod kale <vinod16895@gmail.com>)
List pgsql-admin


On Mon, Jun 15, 2020 at 9:45 AM vinod kale <vinod16895@gmail.com> wrote:
Hello Team,

I have to clear some doughts regarding block corruption on slave db.

If m having streaming replication in my environment. And in some moments corruption happened in my slave server's data then how will i come to know the corruption occurred.

Bcoz if same happened on primary db then i wll immediately come to know bcoz there is daily dump backup we take on primary and in dump block corruption detects.

Bt my dought is how we can identify on slave db.

Kindly help me in above.

Thanks in advance.

--
Thanks & regards,
Vinod  Kale

There's many different kinds of corruption, but one fairly effective way you can watch for some of it is by ensuring that page level checksums are enabled. If you're on a version of PG prior to 12, this can only be done during database initialization (when you run initdb), so if it wasn't enabled before, you'd have to dump and restore your database into a newly initialized instance.

If you are on PG12+, or you can upgrade to it, you can use the pg_checksum tool to enable page level checksums on an offline database


Also on PG12+, you have the ability to monitor if a page level checksum is encountered from within the database by querying the pg_stat_database catalog. This is useful for monitoring tools such as check_postgres(https://bucardo.org/check_postgres/) or pgMonitor (https://github.com/CrunchyData/pgmonitor)  An example query from pgMonitor to pull out the count and time since the last failure is as follows.

SELECT datname AS dbname
    , checksum_failures AS count
    , coalesce(extract(epoch from (now() - checksum_last_failure)), 0) AS time_since_last_failure_seconds
FROM pg_catalog.pg_stat_database;

Note that if statistics are reset, the errors detected and recorded here are also reset. That is also how you'd reset it if you encountered corruption and fixed it. Prior to PG12, page level checksum errors were only reported via the logs.

Keith

pgsql-admin by date:

Previous
From: Wells Oliver
Date:
Subject: Deleting more efficiently from large partitions
Next
From: "Michaeldba@sqlexec.com"
Date:
Subject: Re: Deleting more efficiently from large partitions