Re: Corrupted Data ? - Mailing list pgsql-general

From Melvin Davidson
Subject Re: Corrupted Data ?
Date
Msg-id CANu8FiwzJhZVWL-6R6WyVAeO6JVAv=dVEaavEO8oGsJP6NYCrw@mail.gmail.com
Whole thread Raw
In response to Re: Corrupted Data ?  (Ioana Danes <ioanadanes@gmail.com>)
List pgsql-general


On Mon, Aug 8, 2016 at 12:47 PM, Ioana Danes <ioanadanes@gmail.com> wrote:


On Mon, Aug 8, 2016 at 12:37 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 08/08/2016 09:28 AM, Ioana Danes wrote:


On Mon, Aug 8, 2016 at 12:19 PM, Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:

    On 08/08/2016 09:11 AM, Ioana Danes wrote:

        Hi,

        I suspect I am having a case of data corruption. Here are the
        details:

        I am running postgres 9.4.8:

        postgresql94-9.4.8-1PGDG.rhel7.x86_64
        postgresql94-contrib-9.4.8-1PGDG.rhel7.x86_64
        postgresql94-libs-9.4.8-1PGDG.rhel7.x86_64
        postgresql94-server-9.4.8-1PGDG.rhel7.x86_64

        on CentOS Linux release 7.2.1511 (Core)

        This is happening in a production environment but luckily on the
        reporting database.
        I have a cluster of 3 databases, db1 and db2 are masters and
        replicate
        between each other and also replicate to db3 (db1 <-> db2, db1
        -> db3,
        db2 -> db3).
        For replication I am using Bucardo.


    I would say this is more a question for the Burcardo list:

    https://mail.endcrypt.com/mailman/listinfo/bucardo-general
    <https://mail.endcrypt.com/mailman/listinfo/bucardo-general>

    I am just not seeing that replicating two masters on to a single
    database is going to end well.


Only one master is active at one time the other one is in stand by that
is a topic for another discussion but in our case that works well.

That was my first assumption, that it is a kind of a race condition or a
bug on replication but I quickly ruled that out because that does not
explain why when I filtered the table by transactionid = 75315815 it
shows one record with transactionid 75315811...

select gameplayid, transactionid, encodedplay from abrazo.matchgameplay
where transactionid in (75315815) order by transactionid;;
 gameplayid | transactionid | encodedplay
------------+---------------+--------------
  160019271 |      75315815 | mix:9,0,9
  160019269 |      75315815 | mix:9,8,9
  160019267 |      75315815 | mix:9,2,2
  160019265 |      75315815 | mix:2,2,8
  160019263 |      *75315811 *| backup:1,9,1
  160019261 |      75315815 | backup:2,0,9

So I don't think it is a replication issue...


Other that, if I am following correctly, it is on the database(db3) being replicated to. The only way db3 is getting its data is through replication, is that correct?. On the master databases the data is correct.

OK, let's assume that what you're saying is correct and the replication has a bug, or corruption or whatever that is and the record gets created with transactionid = 75315811. Bucardo replication is trigger based and it is using a copy command to insert the new records into the replicated database.

Then how can I explain that my query select gameplayid, transactionid, encodedplay from abrazo.matchgameplay where transactionid in (75315815) order by transactionid; returns me a record with transactionid 75315811???

Thanks,
ioana



--
Adrian Klaver
adrian.klaver@aklaver.com


You did provide the full table structure, but this might possibly be a case of index corruption.
Try executing the following query and if any show up as invalid, just drop and rebuild.

SELECT n.nspname,
               i.relname,
               i.indexrelname,
               CASE WHEN idx.indisprimary
                         THEN 'pkey'
                          WHEN idx.indisunique
                          THEN 'uidx'
                          ELSE 'idx'
                  END AS type,
                  'INVALID'
  FROM pg_stat_all_indexes i
    JOIN pg_class c     ON (c.oid = i.relid)
    JOIN pg_namespace n ON (n.oid = c.relnamespace)
    JOIN pg_index idx   ON (idx.indexrelid =  i.indexrelid )
WHERE idx.indisvalid = FALSE  
ORDER BY 1, 2;

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

pgsql-general by date:

Previous
From: Ioana Danes
Date:
Subject: Re: Corrupted Data ?
Next
From: Adrian Klaver
Date:
Subject: Re: Corrupted Data ?