Re: index and table corruption - Mailing list pgsql-general

From Jerry Sievers
Subject Re: index and table corruption
Date
Msg-id 864n64373q.fsf@jerry.enova.com
Whole thread Raw
In response to Re: index and table corruption  ("Anand Kumar, Karthik" <Karthik.AnandKumar@classmates.com>)
Responses Re: index and table corruption  ("Anand Kumar, Karthik" <Karthik.AnandKumar@classmates.com>)
List pgsql-general
"Anand Kumar, Karthik" <Karthik.AnandKumar@classmates.com> writes:

> Thanks Shaun!
>
> Yes, we're getting synchronous_commit on right now.
>
> The log_min_duration was briefly set to 0 at the time I sent out the post,
> just to see what statements were logged right before everything went to
> hell. Didn't yield much since we very quickly realized we couldn't cope
> with the volume of logs.
>
> We also noticed that when trying to recover from a snapshot and replay
> archived wal logs, it would corrupt right away, in under an hour. When
> recovering from snapshots *without* replaying wal logs, we go on for a day
> or two without the problem, so it does seem like wal logs are probably not
> being flushed to disk as expected.

Make sure your snapshots are atomic as you probably assume they are and
in fact must be if you expect a consistent cluster after startup and
crash recovery.

That is, if you are doing snaps at random times and not wrapping with
pgstart/stop backup() *and* replaying WAL till concisconsistent recovery
point.

If you're snapping something like a remote-site mirror running SAN
block-level replication, unless the snap is done at the end of flushing
all changed blocks since last tick, then the image you're snapping may
not be consistent.

I say that because, I came into a company that had been doing snaps this
way since eons ago and thought that since the clusters would start up
and could perform trivial checks, things were OK.

As soon aas you subjected an instance dirived this way however with
something wide-ranging such as an all-table vac/analyze, dumpall... etc,
soon after launching the foo, corruption was observed.

FWIW

>
> Will update once we get onto the new h/w to see if that fixes it.
>
> Thanks,
> Karthik

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: pg_upgrade & tablespaces
Next
From: Joseph Kregloh
Date:
Subject: Re: pg_upgrade & tablespaces