Re: Does PostgreSQL check database integrity at startup? - Mailing list pgsql-general

From Edson Carlos Ericksson Richter
Subject Re: Does PostgreSQL check database integrity at startup?
Date
Msg-id 2b0bd4bc-232a-ada6-0c13-a9863f6dd98b@simkorp.com.br
Whole thread Raw
In response to Re: Does PostgreSQL check database integrity at startup?  (Brent Wood <pcreso@yahoo.com>)
List pgsql-general
Em 28/12/2017 16:06, Brent Wood escreveu:
> Some thoughts....
>
> A tool to calculate a checksum of sorts based on the table (file) 
> content would provide a better surety of duplication than simply 
> checking file size - like differently vacuumed tables in each copy 
> could have the same content but be different file sizes.
>
> Something like these could be adapted to compare database content by 
> filesystem checks rather than db queries. Following tablespaces, etc 
> as well.
>
> http://www.commandlinefu.com/commands/view/3555/find-duplicate-files-based-on-size-first-then-md5-hash
> or other similar tools
>
> Yes, there is some overhead, especially for large databases but it 
> would be worth that to robustly ensure genuine and complete duplication.
>
> I do wonder though - given the order of records in a table (file) is 
> not necessarily identical (or is it?) event this may be problematic. 
> Perhaps a checksum based on the result of a query output ordered by 
> primary key could work?
>
> Brent Wood
>
> ------------------------------------------------------------------------
> *From:* Edson Carlos Ericksson Richter <richter@simkorp.com.br>
> *To:* pgsql-general@lists.postgresql.org
> *Sent:* Friday, December 29, 2017 6:47 AM
> *Subject:* Re: Does PostgreSQL check database integrity at startup?
>
> Em 28/12/2017 10:16, Stephen Frost escreveu:
> > Alvaro,
> >
> > * Alvaro Herrera (alvherre@alvh.no-ip.org 
> <mailto:alvherre@alvh.no-ip.org>) wrote:
> >> For context: this was first reported in the Barman forum here:
> >> 
> https://groups.google.com/forum/#!msg/pgbarman/3aXWpaKWRFI/weUIZxspDAAJ 
> <https://groups.google.com/forum/#%21msg/pgbarman/3aXWpaKWRFI/weUIZxspDAAJ>
> >> They are using Barman for the backups.
> > Ahhhh, I see.  I wasn't aware of that history.
> >
> >> Stephen Frost wrote:
> >>
> >>>> But at some point in time, slave became corrupt (one of the base
> >>>> files are zero size where it should be 16Mb in size), and IMHO a
> >>>> "red alert" should arise - Slave server shall not even startup at
> >>>> all.
> >>> How do you know it should be 16Mb in size...?  That sounds like you're
> >>> describing a WAL file, but you should be archiving your WAL files 
> during
> >>> a backup, not just using whatever is in pg_xlog/pg_wal..
> >> It's not a WAL file -- it's a file backing a table.
> > Interesting.
> >
> >>>> Since backups are taken from slave server, all backups are also 
> corrupt.
> >>> If you aren't following the appropriate process to perform a backup
> >>> then, yes, you're going to end up with corrupt and useless/bad 
> backups.
> >> A few guys went over the backup-taking protocol upthread already.
> >>
> >> But anyway the backup tool is a moot point. The problem doesn't
> >> originate in the backup -- it originates in the standby, from where the
> >> backup is taken.  The file can be seen as size 0 in the standby.
> >> Edson's question is: why wasn't the problem detected in the standby?
> >> It seems a valid question to me, to which we currently we don't 
> have any
> >> good answer.
> > The last message on that thread seems pretty clear to me- the comment is
> > "I think this is a failure in standby build." It's not clear what that
> > failure was but I agree it doesn't appear related to the backup tool
> > (the comment there is "I'm using rsync"), or, really, PostgreSQL at all
> > (a failure during the build of the replica isn't something we're
> > necessairly going to pick up on..).
> >
> > As discussed on this thread, zero-byte files are entirely valid to
> > appear in the PostgreSQL data directory.
> >
> > To try and dig into what happened, I'd probably look at what forks there
> > are of that relation, the entry in pg_class, and try to figure out how
> > it is that replication isn't complaining when the file on the primary
> > appeared to be modified well after the last modify timestamp on the
> > replica.  If it's possible to replica this into a test environment,
> > maybe even do a no-op update of a row of that table and see what happens
> > with replication.  One thing I wonder is if this table used to be
> > unlogged and it was later turned into a logged table but something
> > didn't quite happen correctly with that.  I'd also suggest looking for
> > other file size mismatches between the primary and the replica.
> >
> > Thanks!
> >
> > Stephen
> The table was never unlogged. From very beginning, it was always logged.
> I've dozens of databases with exactly same setup - and right now, I'm
> rebuilding the slave server. Instead of investigating something probably
> I will not find the cause, I would like to have a alert for the future.
>
> Would be possible to include in future versions:
> 1) After start standby, standby run all WAL files until it is
> synchronized with master (current behavior)
> 3) Before getting into "accept read only queries", check if all base
> files have same size as master server (new behavior). In case something
> is different, throw an error and stop database startup?
> 4) Then start "accept read only queries" (current behavior)
> ???
>
> Thanks,
>
>
> Edson

Thanks, Brent. I'll think about it.

Regards,

Edson


pgsql-general by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Does PostgreSQL check database integrity at startup?
Next
From: Edson Carlos Ericksson Richter
Date:
Subject: Re: Does PostgreSQL check database integrity at startup?