Thread: could not read transaction log directory ...?

could not read transaction log directory ...?

From

Michael Brusser

Date:

07 May 2003, 15:30:18

Running some process on Postgres 7.3.2 I consistently end up with a crash.
This happens on Linux RedHat 6.2, kernel 2.2
Here's the fragment from the database log.

2003-05-07 14:38:48 LOG:  recycled transaction log file 0000000000000005
2003-05-07 14:48:56 LOG:  recycled transaction log file 0000000000000006
2003-05-07 15:04:10 LOG:  recycled transaction log file 0000000000000007
2003-05-07 15:04:10 PANIC:  could not read transaction log directory -
(<my_dir_path>/pg_xlog):Unknown error 523
 
2003-05-07 15:04:11 LOG:  server process (pid 449) was terminated by signal
6
2003-05-07 15:04:11 LOG:  terminating any other active server processes
2003-05-07 15:04:11 WARNING:  Message from PostgreSQL backend:       The Postmaster has informed me that some other
backend      died abnormally and possibly corrupted shared memory.       I have rolled back the current transaction and
am      going to terminate your database system connection and exit.       Please reconnect to the database system and
repeatyour query.
 
2003-05-07 15:04:11 WARNING:  Message from PostgreSQL backend:       The Postmaster has informed me that some other
backend      died abnormally and possibly corrupted shared memory.       I have rolled back the current transaction and
am      going to terminate your database system connection and exit.       Please reconnect to the database system and
repeatyour query.
 

The process is loading database with a large number of records,
it runs for about 20-30 minutes before it crashes.
Problem apparently originated in function MoveOfflineLogs (xlog.c)
At this point there are two files in the transaction log directory:
16777216 May  7 15:04 0000000000000008
16777216 May  7 14:55 0000000000000009

Does anyone have an idea why this could happen?

Thanks,
Mike.

Re: could not read transaction log directory ...?

From

Tom Lane

Date:

08 May 2003, 00:25:09

Michael Brusser <michael@synchronicity.com> writes:
> 2003-05-07 15:04:10 PANIC:  could not read transaction log directory -
>                            (<my_dir_path>/pg_xlog): Unknown error 523

Bizarre.  Can you dig around in your kernel sources and see what errno
523 might mean?
        regards, tom lane

Re: could not read transaction log directory ...?

From

Michael Brusser

Date:

08 May 2003, 09:02:25

From errno.h :
... ...
/* Defined for the NFSv3 protocol */
#define EBADHANDLE      521     /* Illegal NFS file handle */
#define ENOTSYNC        522     /* Update synchronization mismatch */
#define EBADCOOKIE      523     /* Cookie is stale */
... ...

"Cookie is stale" - ..? 
Should I consider some problems with the file server?
The strange thing is that process always crashes some 30 minutes
after start. Another point is that it works fine on another machine -
Red Hat 7.2 with 2.4.9 kernel.

I'm not sure what to make out of this.
Thanks,
Mike.


> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Thursday, May 08, 2003 12:25 AM
> To: michael@synchronicity.com
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] could not read transaction log directory ...?
> 
> 
> Michael Brusser <michael@synchronicity.com> writes:
> > 2003-05-07 15:04:10 PANIC:  could not read transaction log directory -
> >                            (<my_dir_path>/pg_xlog): Unknown error 523
> 
> Bizarre.  Can you dig around in your kernel sources and see what errno
> 523 might mean?
> 
>             regards, tom lane
>

Re: could not read transaction log directory ...?

From

Tom Lane

Date:

08 May 2003, 09:55:09

Michael Brusser <michael@synchronicity.com> writes:
> "Cookie is stale" - ..? 
> Should I consider some problems with the file server?

Running a database over an NFS mount is widely considered foolish.
It exposes you to all sorts of failure modes that don't exist with
a local filesystem.

> The strange thing is that process always crashes some 30 minutes
> after start.

Well, that would fit fine with the notion that there's some kind of
30-minute timeout on open files in your NFS stack.

> Another point is that it works fine on another machine -
> Red Hat 7.2 with 2.4.9 kernel.
> I'm not sure what to make out of this.

I'd call it a kernel bug or NFS server bug.
        regards, tom lane

Re: could not read transaction log directory ...?

From

Christopher Browne

Date:

08 May 2003, 12:08:22

> 
> >From errno.h :
> ... ...
> /* Defined for the NFSv3 protocol */
> #define EBADHANDLE      521     /* Illegal NFS file handle */
> #define ENOTSYNC        522     /* Update synchronization mismatch */
> #define EBADCOOKIE      523     /* Cookie is stale */
> ... ...
> 
> "Cookie is stale" - ..? 
> Should I consider some problems with the file server?
> The strange thing is that process always crashes some 30 minutes
> after start. Another point is that it works fine on another machine -
> Red Hat 7.2 with 2.4.9 kernel.
> 
> I'm not sure what to make out of this.

Actually, that is somewhat suggestive...

chvatal:/usr/src/linux/fs/nfs# grep EBADCOOKIE *.c
dir.c:                 if (res == -EBADCOOKIE) {
nfs2xdr.c:                return ERR_PTR(-EBADCOOKIE);
nfs2xdr.c:                { NFSERR_BAD_COOKIE,
EBADCOOKIE    },
nfs3xdr.c:        return ERR_PTR(-EBADCOOKIE);
chvatal:/usr/src/linux/fs/nfs# grep EBADCOOKIE *.c

Does this mean that you are storing your filesystems on NFS?

That could well be the root of the problem; NFS has been somewhat in
flux, and is usually not a highly recommended way of storing PG data.

I hear D'Arcy Cain uses NFS fairly successfully for the purpose, but I
believe he's using NetApp boxes, which are _quite_ different from the
norm, and likely aren't what you are using.

My first suggestion would be Stop Using NFS (unless you are really quite
certain of what you're doing).
--
(reverse (concatenate 'string "gro.mca@" "enworbbc"))
http://cbbrowne.com/info/x.html
Oh,  boy, virtual memory!  Now I'm  gonna make  myself a  really *big*
RAMdisk!

Re: could not read transaction log directory ...?

From

"Christopher Kings-Lynne"

Date:

08 May 2003, 22:16:51

> Does this mean that you are storing your filesystems on NFS?
>
> That could well be the root of the problem; NFS has been somewhat in
> flux, and is usually not a highly recommended way of storing PG data.
>
> I hear D'Arcy Cain uses NFS fairly successfully for the purpose, but I
> believe he's using NetApp boxes, which are _quite_ different from the
> norm, and likely aren't what you are using.
>
> My first suggestion would be Stop Using NFS (unless you are really quite
> certain of what you're doing).

Or switch to FreeBSD - ever since the 'fsx' NFS automatic testing tool came
out from Apple, FreeBSD has had an excellent NFS implementation.

Chris

Re: could not read transaction log directory ...?

From

Michael Brusser

Date:

08 May 2003, 23:19:21

I don't have much choice here - these are development and
test machines, few different platforms but all on NFS.
Testing is very intensive and Postgres takes up a lot of beating.
I think this is the first time we ran into this kind of problem.

I just want to thank everyone for help.
Mike.


> -----Original Message-----
> From: Christopher Kings-Lynne [mailto:chriskl@familyhealth.com.au]
> Sent: Thursday, May 08, 2003 10:16 PM
> To: michael@synchronicity.com; Christopher Browne
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] could not read transaction log directory ...?
>
>
> > Does this mean that you are storing your filesystems on NFS?
> >
> > That could well be the root of the problem; NFS has been somewhat in
> > flux, and is usually not a highly recommended way of storing PG data.
> >
> > I hear D'Arcy Cain uses NFS fairly successfully for the purpose, but I
> > believe he's using NetApp boxes, which are _quite_ different from the
> > norm, and likely aren't what you are using.
> >
> > My first suggestion would be Stop Using NFS (unless you are really quite
> > certain of what you're doing).
>
> Or switch to FreeBSD - ever since the 'fsx' NFS automatic testing
> tool came
> out from Apple, FreeBSD has had an excellent NFS implementation.
>
> Chris
>
>

Re: could not read transaction log directory ...?

From

"scott.marlowe"

Date:

09 May 2003, 11:56:32

Do you have any options for NFS set in your fstab for this share?  We had 
to make a few changes to get NFS to work reliably (we still don't use it 
for database, just raw text files and such.)  My options are:

timeo=10,retry=1,bg,soft,intr,rsize=8192,wsize=8192

If anyone has any suggestions to make to mine feel free, those are the 
settings I got from one of the networking guys, they may well be 
non-optimal.  But NFS doesn't just disappear for seconds at a time anymore 
like it used to during snapshots.
On Thu, 8 May 2003, Michael 
Brusser wrote:

> I don't have much choice here - these are development and
> test machines, few different platforms but all on NFS.
> Testing is very intensive and Postgres takes up a lot of beating.
> I think this is the first time we ran into this kind of problem.
> 
> I just want to thank everyone for help.
> Mike.
> 
> 
> > -----Original Message-----
> > From: Christopher Kings-Lynne [mailto:chriskl@familyhealth.com.au]
> > Sent: Thursday, May 08, 2003 10:16 PM
> > To: michael@synchronicity.com; Christopher Browne
> > Cc: pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] could not read transaction log directory ...?
> >
> >
> > > Does this mean that you are storing your filesystems on NFS?
> > >
> > > That could well be the root of the problem; NFS has been somewhat in
> > > flux, and is usually not a highly recommended way of storing PG data.
> > >
> > > I hear D'Arcy Cain uses NFS fairly successfully for the purpose, but I
> > > believe he's using NetApp boxes, which are _quite_ different from the
> > > norm, and likely aren't what you are using.
> > >
> > > My first suggestion would be Stop Using NFS (unless you are really quite
> > > certain of what you're doing).
> >
> > Or switch to FreeBSD - ever since the 'fsx' NFS automatic testing
> > tool came
> > out from Apple, FreeBSD has had an excellent NFS implementation.
> >
> > Chris
> >
> >
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>