Re: Corrupted Data ? - Mailing list pgsql-general

From Ioana Danes
Subject Re: Corrupted Data ?
Date
Msg-id CAPg0s+7HTxoTKtTtfRdVtjqLwACdH+-8MY+3tCwAWJVncSw0Og@mail.gmail.com
Whole thread Raw
In response to Re: Corrupted Data ?  (Adrian Klaver <adrian.klaver@aklaver.com>)
Responses Re: Corrupted Data ?  (Adrian Klaver <adrian.klaver@aklaver.com>)
Re: Corrupted Data ?  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general


On Fri, Aug 12, 2016 at 11:26 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 08/12/2016 08:10 AM, Ioana Danes wrote:


On Fri, Aug 12, 2016 at 10:47 AM, Francisco Olarte
<folarte@peoplecall.com <mailto:folarte@peoplecall.com>> wrote:

    CCing to the list...

Thanks


    On Fri, Aug 12, 2016 at 4:10 PM, Ioana Danes <ioanadanes@gmail.com
    <mailto:ioanadanes@gmail.com>> wrote:
    >> given 318220 and 318216 are just a bit away ( 4db08/4db0c ), and it
    >> repeats sporadically, have you ruled out ( by having page
    checksums or
    >> other mechanism ) a potential disk read/write error ?
    >>
    >>
    >> > Also the index is correct on db3 as the record in case (with
    drawid =
    >> > 318216) is retrieved if I filter by drawid = 318220
    >>
    >> Specially if this happens, you may have some slightly bad disks/ram/
    >> leading to this kind of problems.
    >>
    >
    > Could be. I also had some issues with an rsync between db3 and
    drdb a week
    > ago that did not complete for bigger files (> 200MB) and gave me some
    > corruption messages. Then the system was revbooted and everything
    seemed
    > fine but apparently it is not.
    > I am planning to drop & create the table from a good backup and if
    that does
    > not fix the issue then I will rebuild the server.

    I would check whatever logs you can ( syslog or eventlog, smart log,
    etc.. ) hunting for disk errors ( sometimes they are reported ). This
    kind of problems, with programs as tested as postgres and rsync, tend
    to indicate controller/RAM/disk going bad ( in your case it could be
    caused by a single bit getting flipped in a sector for the data
    portion of the table, and not being propagated either because it
    happened after your sync of drdb or because it was synced from the WAL
    and not the table, or because it was read from the disk cache ).

I agree, unfortunately I did not find any clues about corruption or any
anomalies in the logs.
I will work tonight to rebuild that table and see where I go from there.

The db3 database is on a different machine from all the other databases you set up, correct?

Yes, they are all different vms first 3 dbs are on the same cluster but drdb is a remote machine,

Thank you
 

Thanks,
ioana

    Francisco Olarte.




--
Adrian Klaver
adrian.klaver@aklaver.com

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Corrupted Data ?
Next
From: Adrian Klaver
Date:
Subject: Re: Error at dynamic generated copy...