Re: Corruption of files in PostgreSQL - Mailing list pgsql-general

From Franz.Rasper@izb.de
Subject Re: Corruption of files in PostgreSQL
Date
Msg-id 11EC9A592C31034C88965C87AF18C2A702B834BA@m0000s61
Whole thread Raw
In response to Corruption of files in PostgreSQL  ("Paolo Bizzarri" <pibizza@gmail.com>)
List pgsql-general
Ugrade to a new kernel would be a very good idea, but you will have to
test the new kernel too.

In my oponion I believe that the reason for your
problem is either the linux kernel or your hardware.

I guess you have to update your kernel because of security bugs too.
Changing the Kernel is maybe easier then changig the hardware.

Greetings,

-Franz

-----Ursprüngliche Nachricht-----
Von: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] Im Auftrag von Paolo Bizzarri
Gesendet: Mittwoch, 6. Juni 2007 10:18
An: Scott Marlowe
Cc: Greg Smith; pgsql-general@postgresql.org
Betreff: Re: [GENERAL] Corruption of files in PostgreSQL


On 6/5/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> Greg Smith wrote:
> > On Tue, 5 Jun 2007, Paolo Bizzarri wrote:
> >
> >> On 6/4/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> >>> http://lwn.net/Articles/215868/
> >>> documents a bug in the 2.6 linux kernel that can result in corrupted
> >>> files if there are a lot of processes accessing it at once.
> >>
> >> in fact, we were using a 2.6.12 kernel. Can this be a problem?
> >
> > That particular problem appears to be specific to newer kernels so I
> > wouldn't think it's related to your issue.
>
> That is not entirely correct.  The problem was present all the way back
> to the 2.5 kernels, before the 2.6 kernels were released.  However,
> there was an update to the 2.6.18/19 kernels that made this problem much
> more likely to bite.  There were reports of data loss for many people
> running on older 2.6 kernels that mysteriously went away after updating
> to post 2.6.19 kernels (or in the case of redhat, the updated 2.6.9-44
> or so kernels, which backported the fix.)
>

I understand this. At the same time, the system was under quite heavy
load, so it is possible that some peculiar, rather subtle bug was
biting us. There were many files manipulated all in the same way, but
only some (really little of them) were truncated.

I would like to remove all possible known cases of bugs.

BTW, as ou Postgresql was recompiled from sources, do you suggest to
recompile the whole after upgrading the kernel?

> So, it IS possible that it's the kernel, but not likely.  I'm still
> betting on a bad RAID controller or something like that.  But updating
> the kernel probably wouldn't be a bad idea.
>

The deployed configuration is quite large (two servers using a shared
SCSI-to-IDE large disk array), and it would be quite difficult to
replicate a different configuration.

At the same time, problems were visible only under heavy load, so
using a simpler system would not really help.

Ciao

Paolo Bizzarri
Icube S.r.l.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

pgsql-general by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: Postgresql 7.4.3/8.2.4 out of memory
Next
From: Hannes Dorbath
Date:
Subject: Re: Running v8.1 amd v8.2 at the same time for a transition