Re: Corrupted Data ? - Mailing list pgsql-general

From Ioana Danes
Subject Re: Corrupted Data ?
Date
Msg-id CAPg0s+4WpL0W2mbvRbkt=67bRWttSUiJVJPwcTEVT_5Okw5jyg@mail.gmail.com
Whole thread Raw
In response to Re: Corrupted Data ?  (Adrian Klaver <adrian.klaver@aklaver.com>)
Responses Re: Corrupted Data ?  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general


On Mon, Aug 8, 2016 at 12:55 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 08/08/2016 09:47 AM, Ioana Danes wrote:


On Mon, Aug 8, 2016 at 12:37 PM, Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:

    On 08/08/2016 09:28 AM, Ioana Danes wrote:



        On Mon, Aug 8, 2016 at 12:19 PM, Adrian Klaver
        <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
        <mailto:adrian.klaver@aklaver.com
        <mailto:adrian.klaver@aklaver.com>>> wrote:

            On 08/08/2016 09:11 AM, Ioana Danes wrote:

                Hi,

                I suspect I am having a case of data corruption. Here
        are the
                details:

                I am running postgres 9.4.8:

                postgresql94-9.4.8-1PGDG.rhel7.x86_64
                postgresql94-contrib-9.4.8-1PGDG.rhel7.x86_64
                postgresql94-libs-9.4.8-1PGDG.rhel7.x86_64
                postgresql94-server-9.4.8-1PGDG.rhel7.x86_64

                on CentOS Linux release 7.2.1511 (Core)

                This is happening in a production environment but
        luckily on the
                reporting database.
                I have a cluster of 3 databases, db1 and db2 are masters and
                replicate
                between each other and also replicate to db3 (db1 <->
        db2, db1
                -> db3,
                db2 -> db3).
                For replication I am using Bucardo.


            I would say this is more a question for the Burcardo list:

            https://mail.endcrypt.com/mailman/listinfo/bucardo-general
        <https://mail.endcrypt.com/mailman/listinfo/bucardo-general>
            <https://mail.endcrypt.com/mailman/listinfo/bucardo-general
        <https://mail.endcrypt.com/mailman/listinfo/bucardo-general>>

            I am just not seeing that replicating two masters on to a single
            database is going to end well.


        Only one master is active at one time the other one is in stand
        by that
        is a topic for another discussion but in our case that works well.

        That was my first assumption, that it is a kind of a race
        condition or a
        bug on replication but I quickly ruled that out because that
        does not
        explain why when I filtered the table by transactionid = 75315815 it
        shows one record with transactionid 75315811...

        select gameplayid, transactionid, encodedplay from
        abrazo.matchgameplay
        where transactionid in (75315815) order by transactionid;;
         gameplayid | transactionid | encodedplay
        ------------+---------------+--------------
          160019271 |      75315815 | mix:9,0,9
          160019269 |      75315815 | mix:9,8,9
          160019267 |      75315815 | mix:9,2,2
          160019265 |      75315815 | mix:2,2,8
          160019263 |      *75315811 *| backup:1,9,1
          160019261 |      75315815 | backup:2,0,9

        So I don't think it is a replication issue...


    Other that, if I am following correctly, it is on the database(db3)
    being replicated to. The only way db3 is getting its data is through
    replication, is that correct?. On the master databases the data is
    correct.

OK, let's assume that what you're saying is correct and the replication
has a bug, or corruption or whatever that is and the record gets created
with transactionid = 75315811. Bucardo replication is trigger based and
it is using a copy command to insert the new records into the replicated
database.

Then how can I explain that my query select gameplayid, transactionid,
encodedplay from abrazo.matchgameplay where transactionid in (75315815)
order by transactionid; returns me a record with transactionid 75315811???

Corrupted index on db3?

yes
 
Might want to look in the db3 logs to see if anything pops out.

I checked the logs, no traces of errors or corruption.
 
I just do not know enough about Burcardo to be of much help beyond that.
 

it is trigger based, it saves the ids of the inserted record in a delta table and then on sync it creates copy commands to the slave. Even if there is a bug or corruption in that process I don't see how that corrupts the index on db3...


So it is either replication bug + index corruption on db3 or data corruption on db3...


In response to Melvin, the query returns no rows:


SELECT n.nspname,
               i.relname,
               i.indexrelname,
               CASE WHEN idx.indisprimary
                         THEN 'pkey'
                          WHEN idx.indisunique
                          THEN 'uidx'
                          ELSE 'idx'
                  END AS type,
                  'INVALID'
  FROM pg_stat_all_indexes i
    JOIN pg_class c     ON (c.oid = i.relid)
    JOIN pg_namespace n ON (n.oid = c.relnamespace)
    JOIN pg_index idx   ON (idx.indexrelid =  i.indexrelid )
WHERE idx.indisvalid = FALSE  
ORDER BY 1, 2;

 nspname | relname | indexrelname | type | ?column?
---------+---------+--------------+------+----------
(0 rows)


Thank you for your thoughts,
ioana


Thanks,
ioana



    --
    Adrian Klaver
    adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>




--
Adrian Klaver
adrian.klaver@aklaver.com

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Corrupted Data ?
Next
From: Adrian Klaver
Date:
Subject: Re: Corrupted Data ?