Thread: Corruption of files in PostgreSQL

Corruption of files in PostgreSQL

From
"Paolo Bizzarri"
Date:
Hi everyone,

my name is Paolo Bizzarri and I am a developer of PAFlow, an document
tracking and management system for public administrations.

We use postgres as a backend, and we are experimenting some corruption
problems on openoffice files.

As our application is rather complex (it includes Zope as an
application server, OpenOffice as a document server and as a client)
we need some info on how to check that we are interacting correctly
with Postgres.

Do you have any hints on how what is useful to check/see?

We are currently using:

- PostgreSQL 7.4.8;
- pyscopg 1.1.11;
- Zope 2.7.x;
- Openoffice 2.2.

Best regards.

Paolo Bizzarri

Re: Corruption of files in PostgreSQL

From
Richard Huxton
Date:
Paolo Bizzarri wrote:
> We use postgres as a backend, and we are experimenting some corruption
> problems on openoffice files.

1. How are you storing these files?
2. What is the nature of the corruption?

> As our application is rather complex (it includes Zope as an
> application server, OpenOffice as a document server and as a client)
> we need some info on how to check that we are interacting correctly
> with Postgres.

Shouldn't matter.

> We are currently using:
>
> - PostgreSQL 7.4.8;

Well, you need to upgrade this - version 7.4.17 is the latest in the 7.4
series. You are missing 9 separate batches of bug and security fixes.

> - pyscopg 1.1.11;
> - Zope 2.7.x;
> - Openoffice 2.2.

None of this should matter really, unless there's some subtle bug in
psycopg causing corruption of data in-transit.

Let's get some details on the two questions above and see if there's a
pattern to your problems.

--
   Richard Huxton
   Archonet Ltd

Re: Corruption of files in PostgreSQL

From
"Paolo Bizzarri"
Date:
On 5/30/07, Richard Huxton <dev@archonet.com> wrote:
> Paolo Bizzarri wrote:
> > We use postgres as a backend, and we are experimenting some corruption
> > problems on openoffice files.
>
> 1. How are you storing these files?

Files are stored as large objects. They are written with an lo_write
and its contents is passed as a Binary object.

> 2. What is the nature of the corruption?

Apparently, files get truncated.

> > As our application is rather complex (it includes Zope as an
> > application server, OpenOffice as a document server and as a client)
> > we need some info on how to check that we are interacting correctly
> > with Postgres.
>
> Shouldn't matter.

I hope so...

> > We are currently using:
> >
> > - PostgreSQL 7.4.8;
>
> Well, you need to upgrade this - version 7.4.17 is the latest in the 7.4
> series. You are missing 9 separate batches of bug and security fixes.

Ok. We will upgrade and see if this can help solve the problem.

>
> > - pyscopg 1.1.11;
> > - Zope 2.7.x;
> > - Openoffice 2.2.
>
> None of this should matter really, unless there's some subtle bug in
> psycopg causing corruption of data in-transit.
>
> Let's get some details on the two questions above and see if there's a
> pattern to your problems.

Ok. Thank you.

Paolo Bizzarri
Icube S.r.l.

Re: Corruption of files in PostgreSQL

From
Richard Huxton
Date:
Paolo Bizzarri wrote:
> On 5/30/07, Richard Huxton <dev@archonet.com> wrote:
>> Paolo Bizzarri wrote:
>> > We use postgres as a backend, and we are experimenting some corruption
>> > problems on openoffice files.
>>
>> 1. How are you storing these files?
>
> Files are stored as large objects. They are written with an lo_write
> and its contents is passed as a Binary object.
>
>> 2. What is the nature of the corruption?
>
> Apparently, files get truncated.

Interesting. You might want to read this recent thread (ongoing):

http://archives.postgresql.org/pgsql-general/2007-05/msg00734.php

--
   Richard Huxton
   Archonet Ltd

Re: Corruption of files in PostgreSQL

From
"Purusothaman A"
Date:
Paolo Bizzarri,

I am also using postgresql in my application and also facing file object corruption problem.

I already discussed several times with Richard Huxton, and ended without any clue.

Here I am briefing my problem, see if u find any clue about it.
I am storing/retrieving my file in postgresql using lo_export() and lo_import() api.

after few weeks (as application is being used - number of file objects in database also grows) my file object gets corrupted. And I have no clue about which causes this problem.

I confirmed the file corruption by the following query,

sfrs2=> select loid, pageno, length(data) from pg_largeobject where loid = 101177 and pageno = 630;
  loid  | pageno | length
--------+--------+--------
 101177 |    630 |    181
(1 row)

But actually the result of the above query before corruption(ie, immediately after file object added to table)

fasp_test=> select loid, pageno, length(data) from pg_largeobject where loid = 106310 and pageno = 630;
  loid  | pageno | length
--------+--------+--------
 106310 |    630 |    205
(1 row)

I uploaded same file in both(sfrs2, fasp_test) databases. The first one result is after the corruption. and the later is before corruption.

You also confirm you problem like this. And I strongly believe that, there is some bug in PostgreSQL.

Kindly don't forget to alert me once u find solution/cause.

Regards,
Purusothaman A

On 5/30/07, Paolo Bizzarri <pibizza@gmail.com> wrote:
On 5/30/07, Richard Huxton <dev@archonet.com> wrote:
> Paolo Bizzarri wrote:
> > We use postgres as a backend, and we are experimenting some corruption
> > problems on openoffice files.
>
> 1. How are you storing these files?

Files are stored as large objects. They are written with an lo_write
and its contents is passed as a Binary object.

> 2. What is the nature of the corruption?

Apparently, files get truncated.

> > As our application is rather complex (it includes Zope as an
> > application server, OpenOffice as a document server and as a client)
> > we need some info on how to check that we are interacting correctly
> > with Postgres.
>
> Shouldn't matter.

I hope so...

> > We are currently using:
> >
> > - PostgreSQL 7.4.8;
>
> Well, you need to upgrade this - version 7.4.17 is the latest in the 7.4
> series. You are missing 9 separate batches of bug and security fixes.

Ok. We will upgrade and see if this can help solve the problem.

>
> > - pyscopg 1.1.11 ;
> > - Zope 2.7.x;
> > - Openoffice 2.2.
>
> None of this should matter really, unless there's some subtle bug in
> psycopg causing corruption of data in-transit.
>
> Let's get some details on the two questions above and see if there's a
> pattern to your problems.

Ok. Thank you.

Paolo Bizzarri
Icube S.r.l.

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings



--
http://PurusothamanA.wordpress.com/

Re: Corruption of files in PostgreSQL

From
"Paolo Bizzarri"
Date:
Hi everyone,

a little update.

We have upgraded our system to 7.4.17. The problem of truncated files
seems now better, but it is still present. We have not found a clearly
understandable pattern on why this happens.

Just to provide some further information:

- we create a file and store on the DB;

- we give the file to the user, and he can modify at its wish the file;

- we store back the modified file on the DB;

- the last two points can happen several times.

Any hint?

Best regards.

Paolo Bizzarri
Icube S.r.l.



On 5/30/07, Purusothaman A <purusothaman.a@gmail.com> wrote:
> Paolo Bizzarri,
>
> I am also using postgresql in my application and also facing file object
> corruption problem.
>
> I already discussed several times with Richard Huxton, and ended without any
> clue.
>
> Here I am briefing my problem, see if u find any clue about it.
> I am storing/retrieving my file in postgresql using lo_export() and
> lo_import() api.
>
> after few weeks (as application is being used - number of file objects in
> database also grows) my file object gets corrupted. And I have no clue about
> which causes this problem.
>
> I confirmed the file corruption by the following query,
>
> sfrs2=> select loid, pageno, length(data) from pg_largeobject where loid =
> 101177 and pageno = 630;
>   loid  | pageno | length
> --------+--------+--------
>  101177 |    630 |    181
> (1 row)
>
> But actually the result of the above query before corruption(ie, immediately
> after file object added to table)
>
> fasp_test=> select loid, pageno, length(data) from pg_largeobject where loid
> = 106310 and pageno = 630;
>   loid  | pageno | length
> --------+--------+--------
>  106310 |    630 |    205
> (1 row)
>
> I uploaded same file in both(sfrs2, fasp_test) databases. The first one
> result is after the corruption. and the later is before corruption.
>
> You also confirm you problem like this. And I strongly believe that, there
> is some bug in PostgreSQL.
>
> Kindly don't forget to alert me once u find solution/cause.
>
> Regards,
> Purusothaman A
>
>
> On 5/30/07, Paolo Bizzarri <pibizza@gmail.com> wrote:
> >
> > On 5/30/07, Richard Huxton <dev@archonet.com> wrote:
> > > Paolo Bizzarri wrote:
> > > > We use postgres as a backend, and we are experimenting some corruption
> > > > problems on openoffice files.
> > >
> > > 1. How are you storing these files?
> >
> > Files are stored as large objects. They are written with an lo_write
> > and its contents is passed as a Binary object.
> >
> > > 2. What is the nature of the corruption?
> >
> > Apparently, files get truncated.
> >
> > > > As our application is rather complex (it includes Zope as an
> > > > application server, OpenOffice as a document server and as a client)
> > > > we need some info on how to check that we are interacting correctly
> > > > with Postgres.
> > >
> > > Shouldn't matter.
> >
> > I hope so...
> >
> > > > We are currently using:
> > > >
> > > > - PostgreSQL 7.4.8;
> > >
> > > Well, you need to upgrade this - version 7.4.17 is the latest in the 7.4
> > > series. You are missing 9 separate batches of bug and security fixes.
> >
> > Ok. We will upgrade and see if this can help solve the problem.
> >
> > >
> > > > - pyscopg 1.1.11 ;
> > > > - Zope 2.7.x;
> > > > - Openoffice 2.2.
> > >
> > > None of this should matter really, unless there's some subtle bug in
> > > psycopg causing corruption of data in-transit.
> > >
> > > Let's get some details on the two questions above and see if there's a
> > > pattern to your problems.
> >
> > Ok. Thank you.
> >
> > Paolo Bizzarri
> > Icube S.r.l.
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 5: don't forget to increase your free space map settings
> >
>
>
>
> --
> http://PurusothamanA.wordpress.com/

Re: Corruption of files in PostgreSQL

From
Tom Lane
Date:
"Paolo Bizzarri" <pibizza@gmail.com> writes:
> Any hint?

Please provide a reproducible test case ...

            regards, tom lane

Re: Corruption of files in PostgreSQL

From
"Paolo Bizzarri"
Date:
Hi Tom,

as explained above, the problem seems quite random. So I need to
understand what we have to check.

Best regards.

Paolo Bizzarri
Icube S.r.l.

On 6/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Paolo Bizzarri" <pibizza@gmail.com> writes:
> > Any hint?
>
> Please provide a reproducible test case ...
>
>                         regards, tom lane
>

Re: Corruption of files in PostgreSQL

From
"Purusothaman A"
Date:
Hi Tom Lane,

In my case, we upload/download files to/from postgresql.
And we don't change the content of the file after once loaded to postgresql.

But as days going, more files stored to postgresql and never change file content after that.
But download many times the stored files as per need.

What I am guessing from my situation is, postgresql is crossing boundaries of the file objects while accessing file objects. Because we simply use 2 api, lo_export() and lo_import(), for storing files and retriving the stored files, and never attempt to alter its contents.

And I am never able find any pattern to corrupt the file objects.

Regards,
Purusothaman A


On 6/2/07, Tom Lane < tgl@sss.pgh.pa.us> wrote:
"Paolo Bizzarri" < pibizza@gmail.com> writes:
> Any hint?

Please provide a reproducible test case ...

                        regards, tom lane

Re: Corruption of files in PostgreSQL

From
Tom Lane
Date:
"Paolo Bizzarri" <pibizza@gmail.com> writes:
> On 6/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Please provide a reproducible test case ...

> as explained above, the problem seems quite random. So I need to
> understand what we have to check.

In this context "reproducible" means that the failure happens
eventually.  I don't care if the test program only fails once in
thousands of tries --- I just want a complete self-contained example
that produces a failure.  I don't have the time to try to
reverse-engineer a test case from your rather vague description, whereas
I suppose you can make one by stripping down code you've already got.

The sub-text here is that I don't really believe that lo_import and
lo_export in themselves are broken.  There must be some extra factor ---
something else you are doing, or something in your environment ---
contributing to the bug.  Thus, the odds of someone else building a
usable test case from scratch aren't that good, and being able to
reproduce the failure outside your environment is an essential step.

            regards, tom lane

Re: Corruption of files in PostgreSQL

From
"Paolo Bizzarri"
Date:
On 6/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Paolo Bizzarri" <pibizza@gmail.com> writes:
> > On 6/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Please provide a reproducible test case ...
>
> > as explained above, the problem seems quite random. So I need to
> > understand what we have to check.
>
> In this context "reproducible" means that the failure happens
> eventually.  I don't care if the test program only fails once in
> thousands of tries --- I just want a complete self-contained example
> that produces a failure.

As said above, our application is rather complex and involves several
different pieces of software, including Zope, OpenOffice both as
server and client, and PostgreSQL. We are absolutely NOT sure that the
problem is inside PostgreSQL.

What we are trying to understand is, first and foremost, if there are
known cases under which PostgreSQL can truncate a file.

> I don't have the time to try to
> reverse-engineer a test case from your rather vague description, whereas
> I suppose you can make one by stripping down code you've already got.

I was not asking for a reverse engineering of a test case. I will try
to provide an example, but the problem is, without knowing what to
see, that I could omit fundamental details.

> The sub-text here is that I don't really believe that lo_import and
> lo_export in themselves are broken.  There must be some extra factor ---
> something else you are doing, or something in your environment ---
> contributing to the bug.

I certainly agree with you. I was asking what to see and what to check.

> Thus, the odds of someone else building a
> usable test case from scratch aren't that good, and being able to
> reproduce the failure outside your environment is an essential step.

I agree with you. I was not hoping for this. At the same time, I was
asking an help for what to see, so that I can reproduce a test case.

As an alternate, I can suggest to download and install PAFlow, but I
understand it is a rather large application....

Best regards.

Paolo Bizzarri
Icube S.r.l.

Re: Corruption of files in PostgreSQL

From
"Michael Nolan"
Date:


On 6/2/07, Paolo Bizzarri <pibizza@gmail.com> wrote:

What we are trying to understand is, first and foremost, if there are
known cases under which PostgreSQL can truncate a file.


I think it's somewhat more likely that whatever is sending the file to PG is the cause, either in how it handles the file or due to communications issues.

This sounds similar to a problem I experienced with an application I wrote that takes files extracted from email (using MHonarc) and stores them in a PG database so that I can render them using a web browser later on.  I wound up having to store the files in BASE64 encoding to keep them from getting corrupted.
--
Mike Nolan

Re: Corruption of files in PostgreSQL

From
Scott Ribe
Date:
I don't use lo_import and lo_export myself, but is there any way to log
their usage? It certainly sounds as though step 1 for this user is to keep
track of how much data is handed to PG for each file, and how much data is
returned to PG for each file (and how much data is in the file at the time
of the request).

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: Corruption of files in PostgreSQL

From
Franz.Rasper@izb.de
Date:
What OS are you running ?

Linux(32 or 64 Bit)? Ext 3 Filesystem ? Wich Kernel Version ?
Bug in Ext 3/Linux Kernel/Hardware(Raid Controller ?) ?

Does the error only happens under heavy load ?

regards,

-Franz

-----Ursprüngliche Nachricht-----
Von: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] Im Auftrag von Paolo Bizzarri
Gesendet: Samstag, 2. Juni 2007 07:46
An: Purusothaman A
Cc: Richard Huxton; pgsql-general@postgresql.org
Betreff: Re: [GENERAL] Corruption of files in PostgreSQL


Hi everyone,

a little update.

We have upgraded our system to 7.4.17. The problem of truncated files
seems now better, but it is still present. We have not found a clearly
understandable pattern on why this happens.

Just to provide some further information:

- we create a file and store on the DB;

- we give the file to the user, and he can modify at its wish the file;

- we store back the modified file on the DB;

- the last two points can happen several times.

Any hint?

Best regards.

Paolo Bizzarri
Icube S.r.l.



On 5/30/07, Purusothaman A <purusothaman.a@gmail.com> wrote:
> Paolo Bizzarri,
>
> I am also using postgresql in my application and also facing file object
> corruption problem.
>
> I already discussed several times with Richard Huxton, and ended without
any
> clue.
>
> Here I am briefing my problem, see if u find any clue about it.
> I am storing/retrieving my file in postgresql using lo_export() and
> lo_import() api.
>
> after few weeks (as application is being used - number of file objects in
> database also grows) my file object gets corrupted. And I have no clue
about
> which causes this problem.
>
> I confirmed the file corruption by the following query,
>
> sfrs2=> select loid, pageno, length(data) from pg_largeobject where loid =
> 101177 and pageno = 630;
>   loid  | pageno | length
> --------+--------+--------
>  101177 |    630 |    181
> (1 row)
>
> But actually the result of the above query before corruption(ie,
immediately
> after file object added to table)
>
> fasp_test=> select loid, pageno, length(data) from pg_largeobject where
loid
> = 106310 and pageno = 630;
>   loid  | pageno | length
> --------+--------+--------
>  106310 |    630 |    205
> (1 row)
>
> I uploaded same file in both(sfrs2, fasp_test) databases. The first one
> result is after the corruption. and the later is before corruption.
>
> You also confirm you problem like this. And I strongly believe that, there
> is some bug in PostgreSQL.
>
> Kindly don't forget to alert me once u find solution/cause.
>
> Regards,
> Purusothaman A
>
>
> On 5/30/07, Paolo Bizzarri <pibizza@gmail.com> wrote:
> >
> > On 5/30/07, Richard Huxton <dev@archonet.com> wrote:
> > > Paolo Bizzarri wrote:
> > > > We use postgres as a backend, and we are experimenting some
corruption
> > > > problems on openoffice files.
> > >
> > > 1. How are you storing these files?
> >
> > Files are stored as large objects. They are written with an lo_write
> > and its contents is passed as a Binary object.
> >
> > > 2. What is the nature of the corruption?
> >
> > Apparently, files get truncated.
> >
> > > > As our application is rather complex (it includes Zope as an
> > > > application server, OpenOffice as a document server and as a client)
> > > > we need some info on how to check that we are interacting correctly
> > > > with Postgres.
> > >
> > > Shouldn't matter.
> >
> > I hope so...
> >
> > > > We are currently using:
> > > >
> > > > - PostgreSQL 7.4.8;
> > >
> > > Well, you need to upgrade this - version 7.4.17 is the latest in the
7.4
> > > series. You are missing 9 separate batches of bug and security fixes.
> >
> > Ok. We will upgrade and see if this can help solve the problem.
> >
> > >
> > > > - pyscopg 1.1.11 ;
> > > > - Zope 2.7.x;
> > > > - Openoffice 2.2.
> > >
> > > None of this should matter really, unless there's some subtle bug in
> > > psycopg causing corruption of data in-transit.
> > >
> > > Let's get some details on the two questions above and see if there's a
> > > pattern to your problems.
> >
> > Ok. Thank you.
> >
> > Paolo Bizzarri
> > Icube S.r.l.
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 5: don't forget to increase your free space map settings
> >
>
>
>
> --
> http://PurusothamanA.wordpress.com/

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: Corruption of files in PostgreSQL

From
Franz.Rasper@izb.de
Date:
If there is any database driver (which was bild with the
old postgresql sources/libs), (re)build this driver with
the new postgresql sources/libs.

Greetings,

-Franz

-----Ursprüngliche Nachricht-----
Von: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] Im Auftrag von Paolo Bizzarri
Gesendet: Samstag, 2. Juni 2007 07:46
An: Purusothaman A
Cc: Richard Huxton; pgsql-general@postgresql.org
Betreff: Re: [GENERAL] Corruption of files in PostgreSQL


Hi everyone,

a little update.

We have upgraded our system to 7.4.17. The problem of truncated files
seems now better, but it is still present. We have not found a clearly
understandable pattern on why this happens.

Just to provide some further information:

- we create a file and store on the DB;

- we give the file to the user, and he can modify at its wish the file;

- we store back the modified file on the DB;

- the last two points can happen several times.

Any hint?

Best regards.

Paolo Bizzarri
Icube S.r.l.



On 5/30/07, Purusothaman A <purusothaman.a@gmail.com> wrote:
> Paolo Bizzarri,
>
> I am also using postgresql in my application and also facing file object
> corruption problem.
>
> I already discussed several times with Richard Huxton, and ended without
any
> clue.
>
> Here I am briefing my problem, see if u find any clue about it.
> I am storing/retrieving my file in postgresql using lo_export() and
> lo_import() api.
>
> after few weeks (as application is being used - number of file objects in
> database also grows) my file object gets corrupted. And I have no clue
about
> which causes this problem.
>
> I confirmed the file corruption by the following query,
>
> sfrs2=> select loid, pageno, length(data) from pg_largeobject where loid =
> 101177 and pageno = 630;
>   loid  | pageno | length
> --------+--------+--------
>  101177 |    630 |    181
> (1 row)
>
> But actually the result of the above query before corruption(ie,
immediately
> after file object added to table)
>
> fasp_test=> select loid, pageno, length(data) from pg_largeobject where
loid
> = 106310 and pageno = 630;
>   loid  | pageno | length
> --------+--------+--------
>  106310 |    630 |    205
> (1 row)
>
> I uploaded same file in both(sfrs2, fasp_test) databases. The first one
> result is after the corruption. and the later is before corruption.
>
> You also confirm you problem like this. And I strongly believe that, there
> is some bug in PostgreSQL.
>
> Kindly don't forget to alert me once u find solution/cause.
>
> Regards,
> Purusothaman A
>
>
> On 5/30/07, Paolo Bizzarri <pibizza@gmail.com> wrote:
> >
> > On 5/30/07, Richard Huxton <dev@archonet.com> wrote:
> > > Paolo Bizzarri wrote:
> > > > We use postgres as a backend, and we are experimenting some
corruption
> > > > problems on openoffice files.
> > >
> > > 1. How are you storing these files?
> >
> > Files are stored as large objects. They are written with an lo_write
> > and its contents is passed as a Binary object.
> >
> > > 2. What is the nature of the corruption?
> >
> > Apparently, files get truncated.
> >
> > > > As our application is rather complex (it includes Zope as an
> > > > application server, OpenOffice as a document server and as a client)
> > > > we need some info on how to check that we are interacting correctly
> > > > with Postgres.
> > >
> > > Shouldn't matter.
> >
> > I hope so...
> >
> > > > We are currently using:
> > > >
> > > > - PostgreSQL 7.4.8;
> > >
> > > Well, you need to upgrade this - version 7.4.17 is the latest in the
7.4
> > > series. You are missing 9 separate batches of bug and security fixes.
> >
> > Ok. We will upgrade and see if this can help solve the problem.
> >
> > >
> > > > - pyscopg 1.1.11 ;
> > > > - Zope 2.7.x;
> > > > - Openoffice 2.2.
> > >
> > > None of this should matter really, unless there's some subtle bug in
> > > psycopg causing corruption of data in-transit.
> > >
> > > Let's get some details on the two questions above and see if there's a
> > > pattern to your problems.
> >
> > Ok. Thank you.
> >
> > Paolo Bizzarri
> > Icube S.r.l.
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 5: don't forget to increase your free space map settings
> >
>
>
>
> --
> http://PurusothamanA.wordpress.com/

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: Corruption of files in PostgreSQL

From
Scott Marlowe
Date:
Paolo Bizzarri wrote:
> On 6/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Paolo Bizzarri" <pibizza@gmail.com> writes:
>> > On 6/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >> Please provide a reproducible test case ...
>>
>> > as explained above, the problem seems quite random. So I need to
>> > understand what we have to check.
>>
>> In this context "reproducible" means that the failure happens
>> eventually.  I don't care if the test program only fails once in
>> thousands of tries --- I just want a complete self-contained example
>> that produces a failure.
>
> As said above, our application is rather complex and involves several
> different pieces of software, including Zope, OpenOffice both as
> server and client, and PostgreSQL. We are absolutely NOT sure that the
> problem is inside PostgreSQL.
>
> What we are trying to understand is, first and foremost, if there are
> known cases under which PostgreSQL can truncate a file.

I would suspect either your hardware (RAID controller, hard drive, cache
etc) or your OS (kernel bug, file system bug, etc)

For instance:

http://lwn.net/Articles/215868/

documents a bug in the 2.6 linux kernel that can result in corrupted
files if there are a lot of processes accessing it at once.


Re: Corruption of files in PostgreSQL

From
"Paolo Bizzarri"
Date:
Hi Scott,

in fact, we were using a 2.6.12 kernel. Can this be a problem?

Best regards.

Paolo Bizzarri

On 6/4/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> Paolo Bizzarri wrote:
> > On 6/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> "Paolo Bizzarri" <pibizza@gmail.com> writes:
> >> > On 6/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> >> Please provide a reproducible test case ...
> >>
> >> > as explained above, the problem seems quite random. So I need to
> >> > understand what we have to check.
> >>
> >> In this context "reproducible" means that the failure happens
> >> eventually.  I don't care if the test program only fails once in
> >> thousands of tries --- I just want a complete self-contained example
> >> that produces a failure.
> >
> > As said above, our application is rather complex and involves several
> > different pieces of software, including Zope, OpenOffice both as
> > server and client, and PostgreSQL. We are absolutely NOT sure that the
> > problem is inside PostgreSQL.
> >
> > What we are trying to understand is, first and foremost, if there are
> > known cases under which PostgreSQL can truncate a file.
>
> I would suspect either your hardware (RAID controller, hard drive, cache
> etc) or your OS (kernel bug, file system bug, etc)
>
> For instance:
>
> http://lwn.net/Articles/215868/
>
> documents a bug in the 2.6 linux kernel that can result in corrupted
> files if there are a lot of processes accessing it at once.
>
>

Re: Corruption of files in PostgreSQL

From
Greg Smith
Date:
On Tue, 5 Jun 2007, Paolo Bizzarri wrote:

> On 6/4/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
>> http://lwn.net/Articles/215868/
>> documents a bug in the 2.6 linux kernel that can result in corrupted
>> files if there are a lot of processes accessing it at once.
>
> in fact, we were using a 2.6.12 kernel. Can this be a problem?

That particular problem appears to be specific to newer kernels so I
wouldn't think it's related to your issue.

Tracking down random crashes of the sort you're reporting is hard.  As
Scott rightly suggested, the source of problem could be easily be any
number of hardware components or low-level software like the kernel.  The
tests required to really certify that a server is suitable for production
use can take several days worth of testing.  The normal approach here
would be to move this application+data to another system and see if the
problem is still there; that lets you rule out all the hardware at once.
That would do something else you should be thinking about--making
absolutely sure you can backup and restore your data, and that the
corruption you're seeing isn't causing information to be lost in your
database.

The general flow of figuring out the cause for random problems goes
something like this:

1) Check for memory errors.  http://www.memtest86.com/ is a good tool for
PCs.  That will need to run for many hours.

2) Run the manufacturer's disk utilities to see if any of your disks are
going bad.  You might be able to do this using Linux's SMART tools instead
without even taking the server down; if you're not using those already you
should look into that.  http://www.linuxjournal.com/article/6983 is a good
intro here.

3) Boot another version of Linux and run some low-level disk tests there.
A live CD/DVD like Knoppix and Ubuntu is the easiest way to do that.

4) If everything above passes, upgrade to the kernel version used on the
live CD/DVD and see if the problem goes away.

You can try skipping right to #4 here and playing with the kernel first,
but understand that if your underlying hardware has issues, that may cause
more corruption (with possible data loss) rather than less.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Corruption of files in PostgreSQL

From
Scott Marlowe
Date:
Greg Smith wrote:
> On Tue, 5 Jun 2007, Paolo Bizzarri wrote:
>
>> On 6/4/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
>>> http://lwn.net/Articles/215868/
>>> documents a bug in the 2.6 linux kernel that can result in corrupted
>>> files if there are a lot of processes accessing it at once.
>>
>> in fact, we were using a 2.6.12 kernel. Can this be a problem?
>
> That particular problem appears to be specific to newer kernels so I
> wouldn't think it's related to your issue.

That is not entirely correct.  The problem was present all the way back
to the 2.5 kernels, before the 2.6 kernels were released.  However,
there was an update to the 2.6.18/19 kernels that made this problem much
more likely to bite.  There were reports of data loss for many people
running on older 2.6 kernels that mysteriously went away after updating
to post 2.6.19 kernels (or in the case of redhat, the updated 2.6.9-44
or so kernels, which backported the fix.)

So, it IS possible that it's the kernel, but not likely.  I'm still
betting on a bad RAID controller or something like that.  But updating
the kernel probably wouldn't be a bad idea.

Re: Corruption of files in PostgreSQL

From
"Paolo Bizzarri"
Date:
On 6/5/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> Greg Smith wrote:
> > On Tue, 5 Jun 2007, Paolo Bizzarri wrote:
> >
> >> On 6/4/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> >>> http://lwn.net/Articles/215868/
> >>> documents a bug in the 2.6 linux kernel that can result in corrupted
> >>> files if there are a lot of processes accessing it at once.
> >>
> >> in fact, we were using a 2.6.12 kernel. Can this be a problem?
> >
> > That particular problem appears to be specific to newer kernels so I
> > wouldn't think it's related to your issue.
>
> That is not entirely correct.  The problem was present all the way back
> to the 2.5 kernels, before the 2.6 kernels were released.  However,
> there was an update to the 2.6.18/19 kernels that made this problem much
> more likely to bite.  There were reports of data loss for many people
> running on older 2.6 kernels that mysteriously went away after updating
> to post 2.6.19 kernels (or in the case of redhat, the updated 2.6.9-44
> or so kernels, which backported the fix.)
>

I understand this. At the same time, the system was under quite heavy
load, so it is possible that some peculiar, rather subtle bug was
biting us. There were many files manipulated all in the same way, but
only some (really little of them) were truncated.

I would like to remove all possible known cases of bugs.

BTW, as ou Postgresql was recompiled from sources, do you suggest to
recompile the whole after upgrading the kernel?

> So, it IS possible that it's the kernel, but not likely.  I'm still
> betting on a bad RAID controller or something like that.  But updating
> the kernel probably wouldn't be a bad idea.
>

The deployed configuration is quite large (two servers using a shared
SCSI-to-IDE large disk array), and it would be quite difficult to
replicate a different configuration.

At the same time, problems were visible only under heavy load, so
using a simpler system would not really help.

Ciao

Paolo Bizzarri
Icube S.r.l.

Re: Corruption of files in PostgreSQL

From
Franz.Rasper@izb.de
Date:
Ugrade to a new kernel would be a very good idea, but you will have to
test the new kernel too.

In my oponion I believe that the reason for your
problem is either the linux kernel or your hardware.

I guess you have to update your kernel because of security bugs too.
Changing the Kernel is maybe easier then changig the hardware.

Greetings,

-Franz

-----Ursprüngliche Nachricht-----
Von: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] Im Auftrag von Paolo Bizzarri
Gesendet: Mittwoch, 6. Juni 2007 10:18
An: Scott Marlowe
Cc: Greg Smith; pgsql-general@postgresql.org
Betreff: Re: [GENERAL] Corruption of files in PostgreSQL


On 6/5/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> Greg Smith wrote:
> > On Tue, 5 Jun 2007, Paolo Bizzarri wrote:
> >
> >> On 6/4/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> >>> http://lwn.net/Articles/215868/
> >>> documents a bug in the 2.6 linux kernel that can result in corrupted
> >>> files if there are a lot of processes accessing it at once.
> >>
> >> in fact, we were using a 2.6.12 kernel. Can this be a problem?
> >
> > That particular problem appears to be specific to newer kernels so I
> > wouldn't think it's related to your issue.
>
> That is not entirely correct.  The problem was present all the way back
> to the 2.5 kernels, before the 2.6 kernels were released.  However,
> there was an update to the 2.6.18/19 kernels that made this problem much
> more likely to bite.  There were reports of data loss for many people
> running on older 2.6 kernels that mysteriously went away after updating
> to post 2.6.19 kernels (or in the case of redhat, the updated 2.6.9-44
> or so kernels, which backported the fix.)
>

I understand this. At the same time, the system was under quite heavy
load, so it is possible that some peculiar, rather subtle bug was
biting us. There were many files manipulated all in the same way, but
only some (really little of them) were truncated.

I would like to remove all possible known cases of bugs.

BTW, as ou Postgresql was recompiled from sources, do you suggest to
recompile the whole after upgrading the kernel?

> So, it IS possible that it's the kernel, but not likely.  I'm still
> betting on a bad RAID controller or something like that.  But updating
> the kernel probably wouldn't be a bad idea.
>

The deployed configuration is quite large (two servers using a shared
SCSI-to-IDE large disk array), and it would be quite difficult to
replicate a different configuration.

At the same time, problems were visible only under heavy load, so
using a simpler system would not really help.

Ciao

Paolo Bizzarri
Icube S.r.l.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match