Re: Corrupted Data ? - Mailing list pgsql-general

From Adrian Klaver
Subject Re: Corrupted Data ?
Date
Msg-id 0eef7011-cff2-7144-a78e-bf5f00e53d40@aklaver.com
Whole thread Raw
In response to Re: Corrupted Data ?  (Ioana Danes <ioanadanes@gmail.com>)
Responses Re: Corrupted Data ?  (Ioana Danes <ioanadanes@gmail.com>)
List pgsql-general
On 08/12/2016 08:10 AM, Ioana Danes wrote:
>
>
> On Fri, Aug 12, 2016 at 10:47 AM, Francisco Olarte
> <folarte@peoplecall.com <mailto:folarte@peoplecall.com>> wrote:
>
>     CCing to the list...
>
> Thanks
>
>
>     On Fri, Aug 12, 2016 at 4:10 PM, Ioana Danes <ioanadanes@gmail.com
>     <mailto:ioanadanes@gmail.com>> wrote:
>     >> given 318220 and 318216 are just a bit away ( 4db08/4db0c ), and it
>     >> repeats sporadically, have you ruled out ( by having page
>     checksums or
>     >> other mechanism ) a potential disk read/write error ?
>     >>
>     >>
>     >> > Also the index is correct on db3 as the record in case (with
>     drawid =
>     >> > 318216) is retrieved if I filter by drawid = 318220
>     >>
>     >> Specially if this happens, you may have some slightly bad disks/ram/
>     >> leading to this kind of problems.
>     >>
>     >
>     > Could be. I also had some issues with an rsync between db3 and
>     drdb a week
>     > ago that did not complete for bigger files (> 200MB) and gave me some
>     > corruption messages. Then the system was revbooted and everything
>     seemed
>     > fine but apparently it is not.
>     > I am planning to drop & create the table from a good backup and if
>     that does
>     > not fix the issue then I will rebuild the server.
>
>     I would check whatever logs you can ( syslog or eventlog, smart log,
>     etc.. ) hunting for disk errors ( sometimes they are reported ). This
>     kind of problems, with programs as tested as postgres and rsync, tend
>     to indicate controller/RAM/disk going bad ( in your case it could be
>     caused by a single bit getting flipped in a sector for the data
>     portion of the table, and not being propagated either because it
>     happened after your sync of drdb or because it was synced from the WAL
>     and not the table, or because it was read from the disk cache ).
>
> I agree, unfortunately I did not find any clues about corruption or any
> anomalies in the logs.
> I will work tonight to rebuild that table and see where I go from there.

The db3 database is on a different machine from all the other databases
you set up, correct?

>
> Thanks,
> ioana
>
>     Francisco Olarte.
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Edmundo Robles
Date:
Subject: Re: Error at dynamic generated copy...
Next
From: Ioana Danes
Date:
Subject: Re: Corrupted Data ?