Thread: pg_standby could not open wal file after selecting new timeline

pg_standby could not open wal file after selecting new timeline

From
Dave Cramer
Date:
I am trying to move a db from one machine to another.

pg_standby applies all the logs fine, then I trigger it and this  
happens ???

2008-11-05 11:43:45 EST [14853]  LOG:  restored log file  
"00000003000016ED0000007E" from archive
2008-11-05 11:43:45 EST [14853]  LOG:  selected new timeline ID: 4
2008-11-05 11:43:45 EST [14853]  LOG:  archive recovery complete
2008-11-05 11:43:48 EST [14853]  FATAL:  could not open file "pg_xlog/ 
00000004000016ED0000007F" (log file 5869, segment 127): Invalid argument
2008-11-05 11:43:48 EST [14846]  LOG:  startup process (PID 14853)  
exited with exit code 1


Re: pg_standby could not open wal file after selecting new timeline

From
Tom Lane
Date:
Dave Cramer <pg@fastcrypt.com> writes:
> 2008-11-05 11:43:45 EST [14853]  LOG:  selected new timeline ID: 4
> 2008-11-05 11:43:45 EST [14853]  LOG:  archive recovery complete
> 2008-11-05 11:43:48 EST [14853]  FATAL:  could not open file "pg_xlog/ 
> 00000004000016ED0000007F" (log file 5869, segment 127): Invalid argument

"Invalid argument"??  On the platforms I have handy, the only documented
reason for open(2) to fail with EINVAL is illegal value of the flags
argument, which should be impossible.  What platform is this and what
wal_sync_method are you using?
        regards, tom lane


Re: pg_standby could not open wal file after selecting new timeline

From
Dave Cramer
Date:
Tom,


On 5-Nov-08, at 12:21 PM, Tom Lane wrote:

> nvalid argument"??  On the platforms I have handy, the only documented
> reason for open(2) to fail with EINVAL is illegal value of the flags
> argument, which should be impossible.  What platform is this and what
> wal_sync_method are you using?

Red Hat Enterprise Linux Server release 5.2 (Tikanga)

wal_sync method is open_sync
Thing is the server is running off of a ramdisk for the move (very  
temporarily)

Dave



Re: pg_standby could not open wal file after selecting new timeline

From
Tom Lane
Date:
Dave Cramer <pg@fastcrypt.com> writes:
> On 5-Nov-08, at 12:21 PM, Tom Lane wrote:
>> nvalid argument"??  On the platforms I have handy, the only documented
>> reason for open(2) to fail with EINVAL is illegal value of the flags
>> argument, which should be impossible.  What platform is this and what
>> wal_sync_method are you using?

> Red Hat Enterprise Linux Server release 5.2 (Tikanga)

> wal_sync method is open_sync
> Thing is the server is running off of a ramdisk for the move (very  
> temporarily)

Huh, is it possible that Linux rejects O_SYNC for a file on ramdisk?
I guess I could see an argument for doing that but it seems a tad
anal-retentive.  Try setting fsync = off and see what happens.
        regards, tom lane


Re: pg_standby could not open wal file after selecting new timeline

From
Tom Lane
Date:
I wrote:
> Huh, is it possible that Linux rejects O_SYNC for a file on ramdisk?

I found this in the Fedora 9 manpage for open(2):
      O_DIRECT support was added under Linux in kernel version 2.4.10.  Older      Linux kernels simply ignore this
flag. Some filesystems may not imple-      ment the flag and open() will fail with EINVAL if it is used.
 

so it may not be ramdisk per se that's the issue, but the filesystem
you're using on it.

We set O_DIRECT along with O_SYNC whenever O_DIRECT is defined.  I
wonder whether there's a need to make that decision more configurable.
        regards, tom lane


Re: pg_standby could not open wal file after selecting new timeline

From
Dave Cramer
Date:
On 5-Nov-08, at 1:00 PM, Tom Lane wrote:

> I wrote:
>> Huh, is it possible that Linux rejects O_SYNC for a file on ramdisk?
>
> I found this in the Fedora 9 manpage for open(2):
>
>       O_DIRECT support was added under Linux in kernel version  
> 2.4.10.  Older
>       Linux kernels simply ignore this flag.  Some filesystems may  
> not imple-
>       ment the flag and open() will fail with EINVAL if it is used.
>
> so it may not be ramdisk per se that's the issue, but the filesystem
> you're using on it.
>
> We set O_DIRECT along with O_SYNC whenever O_DIRECT is defined.  I
> wonder whether there's a need to make that decision more configurable.
>

fsync=off works fine if that helps


>             regards, tom lane