Home > mailing lists

Re: index and table corruption - Mailing list pgsql-general

From	Jerry Sievers
Subject	Re: index and table corruption
Date	December 19, 2013 20:43:06
Msg-id	864n64373q.fsf@jerry.enova.com Whole thread
In response to	Re: index and table corruption ("Anand Kumar, Karthik" <Karthik.AnandKumar@classmates.com>)
Responses	Re: index and table corruption
List	pgsql-general

Tree view

"Anand Kumar, Karthik" <Karthik.AnandKumar@classmates.com> writes:

> Thanks Shaun!
>
> Yes, we're getting synchronous_commit on right now.
>
> The log_min_duration was briefly set to 0 at the time I sent out the post,
> just to see what statements were logged right before everything went to
> hell. Didn't yield much since we very quickly realized we couldn't cope
> with the volume of logs.
>
> We also noticed that when trying to recover from a snapshot and replay
> archived wal logs, it would corrupt right away, in under an hour. When
> recovering from snapshots *without* replaying wal logs, we go on for a day
> or two without the problem, so it does seem like wal logs are probably not
> being flushed to disk as expected.

Make sure your snapshots are atomic as you probably assume they are and
in fact must be if you expect a consistent cluster after startup and
crash recovery.

That is, if you are doing snaps at random times and not wrapping with
pgstart/stop backup() *and* replaying WAL till concisconsistent recovery
point.

If you're snapping something like a remote-site mirror running SAN
block-level replication, unless the snap is done at the end of flushing
all changed blocks since last tick, then the image you're snapping may
not be consistent.

I say that because, I came into a company that had been doing snaps this
way since eons ago and thought that since the clusters would start up
and could perform trivial checks, things were OK.

As soon aas you subjected an instance dirived this way however with
something wide-ranging such as an all-table vac/analyze, dumpall... etc,
soon after launching the foo, corruption was observed.

FWIW

>
> Will update once we get onto the new h/w to see if that fixes it.
>
> Thanks,
> Karthik

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800

pgsql-general by date:

From: Adrian Klaver
Date: 19 December 2013, 20:41:18
Subject: Re: pg_upgrade & tablespaces

From: Joseph Kregloh
Date: 19 December 2013, 20:46:31
Subject: Re: pg_upgrade & tablespaces

Re: index and table corruption - Mailing list pgsql-general

Previous

Next