Home > mailing lists

Re: DATA corruption after promoting slave to master - Mailing list pgsql-general

From	Karthik Iyer
Subject	Re: DATA corruption after promoting slave to master
Date	June 26, 2014 16:22:05
Msg-id	53AC31DE.7030902@directi.com Whole thread Raw
In response to	Re: DATA corruption after promoting slave to master (Shaun Thomas <sthomas@optionshouse.com>)
List	pgsql-general

Tree view

Thanks Shaun.

We reindexed  all the primary and unique keys of all the tables, But we
did not reindex the tables. You think we should do that also ?

Also, do you think we should do a clean dump restore to eliminate all
data inconsistencies.

One more query :

We managed to get the old server up. We are trying to play the
difference in data by checking the log files(statement logs). You think
there is any other easy alternatives ?

Thanks in advance,

- Karthik


On Thursday 26 June 2014 01:09 AM, Shaun Thomas wrote:
> On 06/25/2014 06:29 AM, Karthik Iyer wrote:
>
>> [2]  We also have a daemon process which copies the latest partial WAL
>> log file (which postgres is currently writing to, under pg_xlog/) every
>> 3 secs to a different location.
>
> No. No, no, no, no no. No.
>
> Also, no. Partial WAL files are not valid for recovery. In fact, I'm
> surprised the standby node even applied it at all.
>
>> We are seeing these problems in the newly promoted master now:
>>
>>      1. when we run queries on primary key, we don't get the rows even
>> if it exist in db. However if we force query not to use index, we get
>> those entries.
>>      2. there are duplicate values for primary keys
>
> This is no surprise. Your slave has partial data commits, which means
> your table, index, or possibly both, are corrupt.
>
> The first thing you need to do is back up any tables you've noticed
> are having this behavior. Second, try to reindex the tables that are
> having problems. The errors you are seeing are due to the data and
> indexes being out of sync. If you get an error that says the reindex
> fails due to duplicate values, you can do this:
>
> SET enable_indexscan TO false;
> SET enable_bitmapscan TO false;
>
> SELECT primary_key, count(1)
>   FROM broken_table
>  GROUP BY 1
> HAVING count(1) > 1;
>
> For any ID that comes back, do this:
>
> SELECT ctid, *
>   FROM broken_table
>  WHERE primary_key = [value(s) from above];
>
> Then you need to delete one of the bad rows after deciding which. Use
> the CTID of the row you want to delete:
>
> DELETE FROM broken_table
>  WHERE ctid = 'whatever';
>
> Then reindex the table so the correct values are properly covered.
> Doing this for all of your corrupt tables may take a while depending
> on how many there are.
>

pgsql-general by date:

From: Dennis Ryan
Date: 26 June 2014, 16:21:35
Subject: Function Syntax Help

From: Karthik Iyer
Date: 26 June 2014, 16:22:14
Subject: Re: DATA corruption after promoting slave to master

Re: DATA corruption after promoting slave to master - Mailing list pgsql-general

Previous

Next