Re: Corrupted Data ? - Mailing list pgsql-general

From Ioana Danes
Subject Re: Corrupted Data ?
Date
Msg-id CAPg0s+4_EkQH3pSRvDLqZFNpEACMEWrSbYMPgbShLjgMVs1Kgw@mail.gmail.com
Whole thread Raw
In response to Re: Corrupted Data ?  (Melvin Davidson <melvin6925@gmail.com>)
List pgsql-general
Hi Melvin,

On Fri, Aug 12, 2016 at 9:36 AM, Melvin Davidson <melvin6925@gmail.com> wrote:


On Fri, Aug 12, 2016 at 9:09 AM, Ioana Danes <ioanadanes@gmail.com> wrote:
Hello Everyone,

I have new information on this case.  I also open a post for Bucardo because I am still not sure what triggers this problem.

The problem happened again on the same table but on another field. Few days ago I started a fourth database called drdb that is a PITR slave from db3.

DB1, DB2, DRDB:

-[ RECORD 1 ]-------+--------------------------
gameplayid           | 324455566
systemuserid        | 515151
gameid                 | 41
transactionid         | 666556533
drawid                  | 318220
....

DB3:
-[ RECORD 1 ]-------+--------------------------
gameplayid           | 32445556
systemuserid        | 515151
gameid                 | 41
transactionid         | 666556533
drawid                  | 318216



Here are the facts I know:

August 10 @ 11:10
    - The record was created on db1 and replicated to db2 and db3
August 11 @ 2:30
    - db1, db2 and db3 are in sync (I have a script that compares the data for all 3 dbs every night @ 2:30 am)
August 12 @ 2:30
    - db3 is out of sync because of this field (drawid)
    - drdb (which is PITRed from db3) is in sync with db1 and db2?????


Because drdb (PITR slave from db3) is in sync with db1 and db2 and because the base backup was taken before the record in case was created, I believe that the xlogs are fine and I have a data kind of corruption on db3 on the data file for that table that happened after August 11 @ 2:30 (because the compare script found the dbs in sync)...

Also the index is correct on db3 as the record in case (with drawid = 318216) is retrieved if I filter by drawid = 318220

Any help is greatly appreciated,

Thank you



Just out of curiosity, have you tried to REINDEX any of the indexes that might be involved on db3?
Even though PostgreSQL considers them valid, it's possible that their pointers could be incorrect.

Additionally, I would compare the EXPLAIN query on all 3 db's and see if there is any difference between the three.


The (drawid) index is correct:

There are too many records for that draw so I will create a temp table with all the records for that draw and then retrieve the faulty one:

# create temporary table tmp_test as select * from abrazo.matchgameplay where drawid = 318220 order by datemodified;
SELECT 16611

# select * from tmp_test where drawid = 318216; -- this query should return nothing
-[ RECORD 1 ]-------+--------------------------
gameplayid           | 159329792
systemuserid        | 12797
gameid                 | 4
transactionid         | 75006684
drawid                  | 318216

# explain select * from abrazo.matchgameplay where drawid = 318220 order by datemodified;
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Sort  (cost=1039.41..1058.65 rows=7696 width=103)
   Sort Key: datemodified
   ->  Index Scan using ix_matchgame_drawid on matchgameplay  (cost=0.44..542.64 rows=7696 width=103)
         Index Cond: (drawid = 318220)
(4 rows)


I will probably end up rebuilding the table from the other good dbs...

The only problem I have is that the values I see in these cases don't look like corruption to be, they are valid information (for example this is an id of another draw in the same period from another game... )

Thanks,
ioana


--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


pgsql-general by date:

Previous
From: Francisco Olarte
Date:
Subject: Re: Corrupted Data ?
Next
From: Adrian Klaver
Date:
Subject: Re: Standby crash