Re: DATA corruption after promoting slave to master - Mailing list pgsql-general

From Karthik Iyer
Subject Re: DATA corruption after promoting slave to master
Date
Msg-id 53AC31DE.7030902@directi.com
Whole thread Raw
In response to Re: DATA corruption after promoting slave to master  (Shaun Thomas <sthomas@optionshouse.com>)
List pgsql-general
Thanks Shaun.

We reindexed  all the primary and unique keys of all the tables, But we
did not reindex the tables. You think we should do that also ?

Also, do you think we should do a clean dump restore to eliminate all
data inconsistencies.

One more query :

We managed to get the old server up. We are trying to play the
difference in data by checking the log files(statement logs). You think
there is any other easy alternatives ?

Thanks in advance,

- Karthik


On Thursday 26 June 2014 01:09 AM, Shaun Thomas wrote:
> On 06/25/2014 06:29 AM, Karthik Iyer wrote:
>
>> [2]  We also have a daemon process which copies the latest partial WAL
>> log file (which postgres is currently writing to, under pg_xlog/) every
>> 3 secs to a different location.
>
> No. No, no, no, no no. No.
>
> Also, no. Partial WAL files are not valid for recovery. In fact, I'm
> surprised the standby node even applied it at all.
>
>> We are seeing these problems in the newly promoted master now:
>>
>>      1. when we run queries on primary key, we don't get the rows even
>> if it exist in db. However if we force query not to use index, we get
>> those entries.
>>      2. there are duplicate values for primary keys
>
> This is no surprise. Your slave has partial data commits, which means
> your table, index, or possibly both, are corrupt.
>
> The first thing you need to do is back up any tables you've noticed
> are having this behavior. Second, try to reindex the tables that are
> having problems. The errors you are seeing are due to the data and
> indexes being out of sync. If you get an error that says the reindex
> fails due to duplicate values, you can do this:
>
> SET enable_indexscan TO false;
> SET enable_bitmapscan TO false;
>
> SELECT primary_key, count(1)
>   FROM broken_table
>  GROUP BY 1
> HAVING count(1) > 1;
>
> For any ID that comes back, do this:
>
> SELECT ctid, *
>   FROM broken_table
>  WHERE primary_key = [value(s) from above];
>
> Then you need to delete one of the bad rows after deciding which. Use
> the CTID of the row you want to delete:
>
> DELETE FROM broken_table
>  WHERE ctid = 'whatever';
>
> Then reindex the table so the correct values are properly covered.
> Doing this for all of your corrupt tables may take a while depending
> on how many there are.
>



pgsql-general by date:

Previous
From: Dennis Ryan
Date:
Subject: Function Syntax Help
Next
From: Karthik Iyer
Date:
Subject: Re: DATA corruption after promoting slave to master