Re: Corrupted Data ? - Mailing list pgsql-general

From Adrian Klaver
Subject Re: Corrupted Data ?
Date
Msg-id 6499cfc7-2c89-4d3b-905d-18ceac71440d@aklaver.com
Whole thread Raw
In response to Re: Corrupted Data ?  (Ioana Danes <ioanadanes@gmail.com>)
Responses Re: Corrupted Data ?  (Ioana Danes <ioanadanes@gmail.com>)
List pgsql-general
On 08/12/2016 08:30 AM, Ioana Danes wrote:
>
>
> On Fri, Aug 12, 2016 at 11:26 AM, Adrian Klaver
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 08/12/2016 08:10 AM, Ioana Danes wrote:
>
>
>
>         On Fri, Aug 12, 2016 at 10:47 AM, Francisco Olarte
>         <folarte@peoplecall.com <mailto:folarte@peoplecall.com>
>         <mailto:folarte@peoplecall.com <mailto:folarte@peoplecall.com>>>
>         wrote:
>
>             CCing to the list...
>
>         Thanks
>
>
>             On Fri, Aug 12, 2016 at 4:10 PM, Ioana Danes
>         <ioanadanes@gmail.com <mailto:ioanadanes@gmail.com>
>             <mailto:ioanadanes@gmail.com <mailto:ioanadanes@gmail.com>>>
>         wrote:
>             >> given 318220 and 318216 are just a bit away ( 4db08/4db0c
>         ), and it
>             >> repeats sporadically, have you ruled out ( by having page
>             checksums or
>             >> other mechanism ) a potential disk read/write error ?
>             >>
>             >>
>             >> > Also the index is correct on db3 as the record in case
>         (with
>             drawid =
>             >> > 318216) is retrieved if I filter by drawid = 318220
>             >>
>             >> Specially if this happens, you may have some slightly bad
>         disks/ram/
>             >> leading to this kind of problems.
>             >>
>             >
>             > Could be. I also had some issues with an rsync between db3 and
>             drdb a week
>             > ago that did not complete for bigger files (> 200MB) and
>         gave me some
>             > corruption messages. Then the system was revbooted and
>         everything
>             seemed
>             > fine but apparently it is not.
>             > I am planning to drop & create the table from a good
>         backup and if
>             that does
>             > not fix the issue then I will rebuild the server.
>
>             I would check whatever logs you can ( syslog or eventlog,
>         smart log,
>             etc.. ) hunting for disk errors ( sometimes they are
>         reported ). This
>             kind of problems, with programs as tested as postgres and
>         rsync, tend
>             to indicate controller/RAM/disk going bad ( in your case it
>         could be
>             caused by a single bit getting flipped in a sector for the data
>             portion of the table, and not being propagated either because it
>             happened after your sync of drdb or because it was synced
>         from the WAL
>             and not the table, or because it was read from the disk cache ).
>
>         I agree, unfortunately I did not find any clues about corruption
>         or any
>         anomalies in the logs.
>         I will work tonight to rebuild that table and see where I go
>         from there.
>
>
>     The db3 database is on a different machine from all the other
>     databases you set up, correct?
>
> Yes, they are all different vms first 3 dbs are on the same cluster but
> drdb is a remote machine,

Aah, another player in the mix.

What virtualization technology are you using?

>
> Thank you
>
>
>
>         Thanks,
>         ioana
>
>             Francisco Olarte.
>
>
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Error at dynamic generated copy...
Next
From: Jeff Janes
Date:
Subject: Re: pgbasebackup is failing after truncate