Thread: could not read transaction log directory ...?
Running some process on Postgres 7.3.2 I consistently end up with a crash. This happens on Linux RedHat 6.2, kernel 2.2 Here's the fragment from the database log. 2003-05-07 14:38:48 LOG: recycled transaction log file 0000000000000005 2003-05-07 14:48:56 LOG: recycled transaction log file 0000000000000006 2003-05-07 15:04:10 LOG: recycled transaction log file 0000000000000007 2003-05-07 15:04:10 PANIC: could not read transaction log directory - (<my_dir_path>/pg_xlog):Unknown error 523 2003-05-07 15:04:11 LOG: server process (pid 449) was terminated by signal 6 2003-05-07 15:04:11 LOG: terminating any other active server processes 2003-05-07 15:04:11 WARNING: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeatyour query. 2003-05-07 15:04:11 WARNING: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeatyour query. The process is loading database with a large number of records, it runs for about 20-30 minutes before it crashes. Problem apparently originated in function MoveOfflineLogs (xlog.c) At this point there are two files in the transaction log directory: 16777216 May 7 15:04 0000000000000008 16777216 May 7 14:55 0000000000000009 Does anyone have an idea why this could happen? Thanks, Mike.
Michael Brusser <michael@synchronicity.com> writes: > 2003-05-07 15:04:10 PANIC: could not read transaction log directory - > (<my_dir_path>/pg_xlog): Unknown error 523 Bizarre. Can you dig around in your kernel sources and see what errno 523 might mean? regards, tom lane
From errno.h : ... ... /* Defined for the NFSv3 protocol */ #define EBADHANDLE 521 /* Illegal NFS file handle */ #define ENOTSYNC 522 /* Update synchronization mismatch */ #define EBADCOOKIE 523 /* Cookie is stale */ ... ... "Cookie is stale" - ..? Should I consider some problems with the file server? The strange thing is that process always crashes some 30 minutes after start. Another point is that it works fine on another machine - Red Hat 7.2 with 2.4.9 kernel. I'm not sure what to make out of this. Thanks, Mike. > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Thursday, May 08, 2003 12:25 AM > To: michael@synchronicity.com > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] could not read transaction log directory ...? > > > Michael Brusser <michael@synchronicity.com> writes: > > 2003-05-07 15:04:10 PANIC: could not read transaction log directory - > > (<my_dir_path>/pg_xlog): Unknown error 523 > > Bizarre. Can you dig around in your kernel sources and see what errno > 523 might mean? > > regards, tom lane >
Michael Brusser <michael@synchronicity.com> writes: > "Cookie is stale" - ..? > Should I consider some problems with the file server? Running a database over an NFS mount is widely considered foolish. It exposes you to all sorts of failure modes that don't exist with a local filesystem. > The strange thing is that process always crashes some 30 minutes > after start. Well, that would fit fine with the notion that there's some kind of 30-minute timeout on open files in your NFS stack. > Another point is that it works fine on another machine - > Red Hat 7.2 with 2.4.9 kernel. > I'm not sure what to make out of this. I'd call it a kernel bug or NFS server bug. regards, tom lane
> > >From errno.h : > ... ... > /* Defined for the NFSv3 protocol */ > #define EBADHANDLE 521 /* Illegal NFS file handle */ > #define ENOTSYNC 522 /* Update synchronization mismatch */ > #define EBADCOOKIE 523 /* Cookie is stale */ > ... ... > > "Cookie is stale" - ..? > Should I consider some problems with the file server? > The strange thing is that process always crashes some 30 minutes > after start. Another point is that it works fine on another machine - > Red Hat 7.2 with 2.4.9 kernel. > > I'm not sure what to make out of this. Actually, that is somewhat suggestive... chvatal:/usr/src/linux/fs/nfs# grep EBADCOOKIE *.c dir.c: if (res == -EBADCOOKIE) { nfs2xdr.c: return ERR_PTR(-EBADCOOKIE); nfs2xdr.c: { NFSERR_BAD_COOKIE, EBADCOOKIE }, nfs3xdr.c: return ERR_PTR(-EBADCOOKIE); chvatal:/usr/src/linux/fs/nfs# grep EBADCOOKIE *.c Does this mean that you are storing your filesystems on NFS? That could well be the root of the problem; NFS has been somewhat in flux, and is usually not a highly recommended way of storing PG data. I hear D'Arcy Cain uses NFS fairly successfully for the purpose, but I believe he's using NetApp boxes, which are _quite_ different from the norm, and likely aren't what you are using. My first suggestion would be Stop Using NFS (unless you are really quite certain of what you're doing). -- (reverse (concatenate 'string "gro.mca@" "enworbbc")) http://cbbrowne.com/info/x.html Oh, boy, virtual memory! Now I'm gonna make myself a really *big* RAMdisk!
> Does this mean that you are storing your filesystems on NFS? > > That could well be the root of the problem; NFS has been somewhat in > flux, and is usually not a highly recommended way of storing PG data. > > I hear D'Arcy Cain uses NFS fairly successfully for the purpose, but I > believe he's using NetApp boxes, which are _quite_ different from the > norm, and likely aren't what you are using. > > My first suggestion would be Stop Using NFS (unless you are really quite > certain of what you're doing). Or switch to FreeBSD - ever since the 'fsx' NFS automatic testing tool came out from Apple, FreeBSD has had an excellent NFS implementation. Chris
I don't have much choice here - these are development and test machines, few different platforms but all on NFS. Testing is very intensive and Postgres takes up a lot of beating. I think this is the first time we ran into this kind of problem. I just want to thank everyone for help. Mike. > -----Original Message----- > From: Christopher Kings-Lynne [mailto:chriskl@familyhealth.com.au] > Sent: Thursday, May 08, 2003 10:16 PM > To: michael@synchronicity.com; Christopher Browne > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] could not read transaction log directory ...? > > > > Does this mean that you are storing your filesystems on NFS? > > > > That could well be the root of the problem; NFS has been somewhat in > > flux, and is usually not a highly recommended way of storing PG data. > > > > I hear D'Arcy Cain uses NFS fairly successfully for the purpose, but I > > believe he's using NetApp boxes, which are _quite_ different from the > > norm, and likely aren't what you are using. > > > > My first suggestion would be Stop Using NFS (unless you are really quite > > certain of what you're doing). > > Or switch to FreeBSD - ever since the 'fsx' NFS automatic testing > tool came > out from Apple, FreeBSD has had an excellent NFS implementation. > > Chris > >
Do you have any options for NFS set in your fstab for this share? We had to make a few changes to get NFS to work reliably (we still don't use it for database, just raw text files and such.) My options are: timeo=10,retry=1,bg,soft,intr,rsize=8192,wsize=8192 If anyone has any suggestions to make to mine feel free, those are the settings I got from one of the networking guys, they may well be non-optimal. But NFS doesn't just disappear for seconds at a time anymore like it used to during snapshots. On Thu, 8 May 2003, Michael Brusser wrote: > I don't have much choice here - these are development and > test machines, few different platforms but all on NFS. > Testing is very intensive and Postgres takes up a lot of beating. > I think this is the first time we ran into this kind of problem. > > I just want to thank everyone for help. > Mike. > > > > -----Original Message----- > > From: Christopher Kings-Lynne [mailto:chriskl@familyhealth.com.au] > > Sent: Thursday, May 08, 2003 10:16 PM > > To: michael@synchronicity.com; Christopher Browne > > Cc: pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] could not read transaction log directory ...? > > > > > > > Does this mean that you are storing your filesystems on NFS? > > > > > > That could well be the root of the problem; NFS has been somewhat in > > > flux, and is usually not a highly recommended way of storing PG data. > > > > > > I hear D'Arcy Cain uses NFS fairly successfully for the purpose, but I > > > believe he's using NetApp boxes, which are _quite_ different from the > > > norm, and likely aren't what you are using. > > > > > > My first suggestion would be Stop Using NFS (unless you are really quite > > > certain of what you're doing). > > > > Or switch to FreeBSD - ever since the 'fsx' NFS automatic testing > > tool came > > out from Apple, FreeBSD has had an excellent NFS implementation. > > > > Chris > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >