Re: Does PostgreSQL check database integrity at startup? - Mailing list pgsql-general
From | Edson Carlos Ericksson Richter |
---|---|
Subject | Re: Does PostgreSQL check database integrity at startup? |
Date | |
Msg-id | 2b0bd4bc-232a-ada6-0c13-a9863f6dd98b@simkorp.com.br Whole thread Raw |
In response to | Re: Does PostgreSQL check database integrity at startup? (Brent Wood <pcreso@yahoo.com>) |
List | pgsql-general |
Em 28/12/2017 16:06, Brent Wood escreveu: > Some thoughts.... > > A tool to calculate a checksum of sorts based on the table (file) > content would provide a better surety of duplication than simply > checking file size - like differently vacuumed tables in each copy > could have the same content but be different file sizes. > > Something like these could be adapted to compare database content by > filesystem checks rather than db queries. Following tablespaces, etc > as well. > > http://www.commandlinefu.com/commands/view/3555/find-duplicate-files-based-on-size-first-then-md5-hash > or other similar tools > > Yes, there is some overhead, especially for large databases but it > would be worth that to robustly ensure genuine and complete duplication. > > I do wonder though - given the order of records in a table (file) is > not necessarily identical (or is it?) event this may be problematic. > Perhaps a checksum based on the result of a query output ordered by > primary key could work? > > Brent Wood > > ------------------------------------------------------------------------ > *From:* Edson Carlos Ericksson Richter <richter@simkorp.com.br> > *To:* pgsql-general@lists.postgresql.org > *Sent:* Friday, December 29, 2017 6:47 AM > *Subject:* Re: Does PostgreSQL check database integrity at startup? > > Em 28/12/2017 10:16, Stephen Frost escreveu: > > Alvaro, > > > > * Alvaro Herrera (alvherre@alvh.no-ip.org > <mailto:alvherre@alvh.no-ip.org>) wrote: > >> For context: this was first reported in the Barman forum here: > >> > https://groups.google.com/forum/#!msg/pgbarman/3aXWpaKWRFI/weUIZxspDAAJ > <https://groups.google.com/forum/#%21msg/pgbarman/3aXWpaKWRFI/weUIZxspDAAJ> > >> They are using Barman for the backups. > > Ahhhh, I see. I wasn't aware of that history. > > > >> Stephen Frost wrote: > >> > >>>> But at some point in time, slave became corrupt (one of the base > >>>> files are zero size where it should be 16Mb in size), and IMHO a > >>>> "red alert" should arise - Slave server shall not even startup at > >>>> all. > >>> How do you know it should be 16Mb in size...? That sounds like you're > >>> describing a WAL file, but you should be archiving your WAL files > during > >>> a backup, not just using whatever is in pg_xlog/pg_wal.. > >> It's not a WAL file -- it's a file backing a table. > > Interesting. > > > >>>> Since backups are taken from slave server, all backups are also > corrupt. > >>> If you aren't following the appropriate process to perform a backup > >>> then, yes, you're going to end up with corrupt and useless/bad > backups. > >> A few guys went over the backup-taking protocol upthread already. > >> > >> But anyway the backup tool is a moot point. The problem doesn't > >> originate in the backup -- it originates in the standby, from where the > >> backup is taken. The file can be seen as size 0 in the standby. > >> Edson's question is: why wasn't the problem detected in the standby? > >> It seems a valid question to me, to which we currently we don't > have any > >> good answer. > > The last message on that thread seems pretty clear to me- the comment is > > "I think this is a failure in standby build." It's not clear what that > > failure was but I agree it doesn't appear related to the backup tool > > (the comment there is "I'm using rsync"), or, really, PostgreSQL at all > > (a failure during the build of the replica isn't something we're > > necessairly going to pick up on..). > > > > As discussed on this thread, zero-byte files are entirely valid to > > appear in the PostgreSQL data directory. > > > > To try and dig into what happened, I'd probably look at what forks there > > are of that relation, the entry in pg_class, and try to figure out how > > it is that replication isn't complaining when the file on the primary > > appeared to be modified well after the last modify timestamp on the > > replica. If it's possible to replica this into a test environment, > > maybe even do a no-op update of a row of that table and see what happens > > with replication. One thing I wonder is if this table used to be > > unlogged and it was later turned into a logged table but something > > didn't quite happen correctly with that. I'd also suggest looking for > > other file size mismatches between the primary and the replica. > > > > Thanks! > > > > Stephen > The table was never unlogged. From very beginning, it was always logged. > I've dozens of databases with exactly same setup - and right now, I'm > rebuilding the slave server. Instead of investigating something probably > I will not find the cause, I would like to have a alert for the future. > > Would be possible to include in future versions: > 1) After start standby, standby run all WAL files until it is > synchronized with master (current behavior) > 3) Before getting into "accept read only queries", check if all base > files have same size as master server (new behavior). In case something > is different, throw an error and stop database startup? > 4) Then start "accept read only queries" (current behavior) > ??? > > Thanks, > > > Edson Thanks, Brent. I'll think about it. Regards, Edson
pgsql-general by date: