Thread: linux bug and lost rows

linux bug and lost rows

From
Jaime Silvela
Date:
A long time ago I wrote to the list about a problem I was having with
COPY losing rows from an import file: the number of imported rows was
not equal to the number of rows in the file, and two consecutive imports
from the same file would get different row counts. Several people tried
to reproduce it unsuccessfully. Reference:
http://archives.postgresql.org/pgsql-general/2006-07/msg00925.php

More recently, as I was practicing a database upgrade to 8.2.3, I
captured an "unexpected data beyond EOF" in the log, which led to
missing tables in the upgraded db. I opened a thread, and it turned out
someone had previously had the same problem, and it was due to the Linux
kernel version: 2.6.5-7.244
Reference: http://archives.postgresql.org/pgsql-general/2007-03/msg01543.php

Now my server has been upgraded to 2.6.5-7.282, and I'm happy to report
that BOTH problems have disappeared. The first problem, that of lost
rows for COPY, tended to present itself for large import files, nearing
1GB, but I was never able to get reproducible results. As I understand,
the Linux bug responsible for the "unexpected data beyond EOF" had to do
with faulty disk reads. Probably this was also affecting the COPY
command, only failing silently?


***********************************************************************
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.

Bear Stearns does not provide tax, legal or accounting advice.  You
should consult your own tax, legal and accounting advisors before
engaging in any transaction. In order for Bear Stearns to comply with
Internal Revenue Service Circular 230 (if applicable), you are notified
that any discussion of U.S. federal tax issues contained or referred to
herein is not intended or written to be used, and cannot be used, for
the purpose of:  (A) avoiding penalties that may be imposed under the
Internal Revenue Code; nor (B) promoting, marketing or recommending to
another party any transaction or matter addressed herein.
***********************************************************************

Re: linux bug and lost rows

From
Tom Lane
Date:
Jaime Silvela <JSilvela@Bear.com> writes:
> A long time ago I wrote to the list about a problem I was having with
> COPY losing rows from an import file: the number of imported rows was
> not equal to the number of rows in the file, and two consecutive imports
> from the same file would get different row counts. Several people tried
> to reproduce it unsuccessfully. Reference:
> http://archives.postgresql.org/pgsql-general/2006-07/msg00925.php

> More recently, as I was practicing a database upgrade to 8.2.3, I
> captured an "unexpected data beyond EOF" in the log, which led to
> missing tables in the upgraded db. I opened a thread, and it turned out
> someone had previously had the same problem, and it was due to the Linux
> kernel version: 2.6.5-7.244
> Reference: http://archives.postgresql.org/pgsql-general/2007-03/msg01543.php

> Now my server has been upgraded to 2.6.5-7.282, and I'm happy to report
> that BOTH problems have disappeared. The first problem, that of lost
> rows for COPY, tended to present itself for large import files, nearing
> 1GB, but I was never able to get reproducible results. As I understand,
> the Linux bug responsible for the "unexpected data beyond EOF" had to do
> with faulty disk reads. Probably this was also affecting the COPY
> command, only failing silently?

Your COPY problems were all on PG 8.1.x, right?  The "unexpected data
beyond EOF" check was added in 8.2.0 specifically because we realized we
were getting bit by a Linux kernel bug.  In 8.1, manifestations of that
same bug would have just led to silent data loss.  The cases that we
identified before all seemed to involve concurrent insertions by
different backends, but I don't think anyone has hard proof that it
couldn't happen for successive insertions by a single backend.  So yeah,
it now seems highly likely that that bug explains the COPY problem.

Out of sheer conservatism, I didn't backpatch the "unexpected data
beyond EOF" check into pre-8.2 stable branches, but I wonder if we
shouldn't do that now.

            regards, tom lane

Re: linux bug and lost rows

From
Jaime Silvela
Date:

Tom Lane wrote:
> Jaime Silvela <JSilvela@Bear.com> writes:
>
>> A long time ago I wrote to the list about a problem I was having with
>> COPY losing rows from an import file: the number of imported rows was
>> not equal to the number of rows in the file, and two consecutive imports
>> from the same file would get different row counts. Several people tried
>> to reproduce it unsuccessfully. Reference:
>> http://archives.postgresql.org/pgsql-general/2006-07/msg00925.php
>>
>
>
>> More recently, as I was practicing a database upgrade to 8.2.3, I
>> captured an "unexpected data beyond EOF" in the log, which led to
>> missing tables in the upgraded db. I opened a thread, and it turned out
>> someone had previously had the same problem, and it was due to the Linux
>> kernel version: 2.6.5-7.244
>> Reference: http://archives.postgresql.org/pgsql-general/2007-03/msg01543.php
>>
>
>
>> Now my server has been upgraded to 2.6.5-7.282, and I'm happy to report
>> that BOTH problems have disappeared. The first problem, that of lost
>> rows for COPY, tended to present itself for large import files, nearing
>> 1GB, but I was never able to get reproducible results. As I understand,
>> the Linux bug responsible for the "unexpected data beyond EOF" had to do
>> with faulty disk reads. Probably this was also affecting the COPY
>> command, only failing silently?
>>
>
> Your COPY problems were all on PG 8.1.x, right?  The "unexpected data
> beyond EOF" check was added in 8.2.0 specifically because we realized we
> were getting bit by a Linux kernel bug.  In 8.1, manifestations of that
> same bug would have just led to silent data loss.  The cases that we
> identified before all seemed to involve concurrent insertions by
> different backends, but I don't think anyone has hard proof that it
> couldn't happen for successive insertions by a single backend.  So yeah,
> it now seems highly likely that that bug explains the COPY problem.
>
> Out of sheer conservatism, I didn't backpatch the "unexpected data
> beyond EOF" check into pre-8.2 stable branches, but I wonder if we
> shouldn't do that now.
>
>             regards, tom lane
>
>
Right, the problems showed up on 8.1.3.

Thanks
Jaime


***********************************************************************
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.

Bear Stearns does not provide tax, legal or accounting advice.  You
should consult your own tax, legal and accounting advisors before
engaging in any transaction. In order for Bear Stearns to comply with
Internal Revenue Service Circular 230 (if applicable), you are notified
that any discussion of U.S. federal tax issues contained or referred to
herein is not intended or written to be used, and cannot be used, for
the purpose of:  (A) avoiding penalties that may be imposed under the
Internal Revenue Code; nor (B) promoting, marketing or recommending to
another party any transaction or matter addressed herein.
***********************************************************************