Thread: FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or directory

FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or directory

From
james@unifiedmind.com (James Thornton)
Date:
What does this mean, and what could be causing it?

FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or
directory

That's the second time in as many months that I have received this
error when trying to start postmaster after a crash -- both times a
server reboot remedied the issue.

Thanks.


Just curious ... how often does the server crash? Thanks

"James Thornton" <james@unifiedmind.com> wrote in message
news:cabf0e7b.0206150908.1edab2f8@posting.google.com...
> What does this mean, and what could be causing it?
>
> FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or
> directory
>
> That's the second time in as many months that I have received this
> error when trying to start postmaster after a crash -- both times a
> server reboot remedied the issue.
>
> Thanks.




james@unifiedmind.com (James Thornton) writes:
> What does this mean, and what could be causing it?
> FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or
> directory
> That's the second time in as many months that I have received this
> error when trying to start postmaster after a crash -- both times a
> server reboot remedied the issue.

That really should be impossible --- it says that a rename() failed for
a file we just created.

I judge from the spelling of the error message that you are running 7.1.
I would recommend an update to 7.2, wherein the error message looks
more like this:
   if (rename(tmppath, path) < 0)       elog(STOP, "rename from %s to %s (initialization of log file %u, segment %u)
failed:%m",            tmppath, path, log, seg);
 

(Alternatively, you could just edit the message in your existing sources
to include the actual source and destination pathnames given to rename()
--- it's in src/backend/access/transam/xlog.c, line 1396 in 7.1.3.)

That will allow us to eliminate the faint possibility that the code is
somehow miscomputing the pathnames occasionally.

However, given that you state a system reboot is necessary and
sufficient to make the problem go away, I am going to stick my neck
*way* out and suggest that:

1. You have the $PGDATA directory (or at least its pg_xlog subdirectory)  mounted via NFS.

2. This is an NFS problem.

In my book, no adequately-paranoid DBA will trust his database to NFS.
There are some cautionary tales in our mailing list archives...
        regards, tom lane


James Thornton <thornton@cs.ecs.baylor.edu> writes:
> I am not running NFS on this system.

Oh well, scratch that theory.  Perhaps you should tell us what you *are*
running --- what OS, what hardware?  I still believe that this must be
a system-level bug and not directly Postgres' fault.
        regards, tom lane


Re: FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or directory

From
nield@usol.com
Date:
6/17/02 10:16:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>james@unifiedmind.com (James Thornton) writes:
>> What does this mean, and what could be causing it?
>> FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or
>> directory
>> That's the second time in as many months that I have received this
>> error when trying to start postmaster after a crash -- both times a
>> server reboot remedied the issue.
>
>That really should be impossible --- it says that a rename() failed for
>a file we just created.
>
>I judge from the spelling of the error message that you are running 7.1.
>I would recommend an update to 7.2, wherein the error message looks
>more like this:
>
>    if (rename(tmppath, path) < 0)
>        elog(STOP, "rename from %s to %s (initialization of log file %u, 
segment %u) failed: %m",
>             tmppath, path, log, seg);
>
[snip]

From the xlog.c file in 7.3devel in InstallXLogFileSegment(), look at the
code near:

>   while ((fd = BasicOpenFile(path, O_RDWR | PG_BINARY,
>                                       S_IRUSR | S_IWUSR)) >= 0)

It would seem like we assume that ANY failure of BasicOpenFile() implies
that 'path' does not exist. So then we don't handle any other cases, and
rename might fail because 'path' actually exists. 

What if BasicOpenFile() got some other error?

This would seem to be wrong, but it still doesn't explain why 
BasicOpenFile() would be failing when 'path' exists in this 
particular case.

I don't have the 7.1 or 7.2 code around, and I've never looked at it.

J.R. Nield
nield@usol.com






Re: FATAL 2: InitRelink(logfile 0 seg 173) failed: No such

From
James Thornton
Date:
Tom Lane wrote:
>
> That really should be impossible --- it says that a rename() failed for
> a file we just created.
> 
> I judge from the spelling of the error message that you are running 7.1.

7.1.3

> However, given that you state a system reboot is necessary and
> sufficient to make the problem go away, I am going to stick my neck
> *way* out and suggest that:
> 
> 1. You have the $PGDATA directory (or at least its pg_xlog subdirectory)
>    mounted via NFS.
> 
> 2. This is an NFS problem.

I am not running NFS on this system.


Re: FATAL 2: InitRelink(logfile 0 seg 173) failed: No such

From
James Thornton
Date:
Tom Lane wrote:
> 
> James Thornton <thornton@cs.ecs.baylor.edu> writes:
> > I am not running NFS on this system.
> 
> Oh well, scratch that theory.  Perhaps you should tell us what you *are*
> running --- what OS, what hardware?  I still believe that this must be
> a system-level bug and not directly Postgres' fault.

[nsadmin@roam proc]$ cat version cpuinfo meminfo pci 

Linux version 2.4.7-10smp (bhcompile@stripples.devel.redhat.com) (gcc
version 2.96 20000731 (Red Hat Linux 7.1 2.96-98)) #1 SMP Thu Sep 6
17:09:31 EDT 2001

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 7
model name      : Pentium III (Katmai)
stepping        : 3
cpu MHz         : 548.324
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse
bogomips        : 1094.45
       total:    used:    free:  shared: buffers:  cached:
Mem:  327278592 321400832  5877760   720896 10825728 52867072
Swap: 271392768 13783040 257609728
MemTotal:       319608 kB
MemFree:          5740 kB
MemShared:         704 kB
Buffers:         10572 kB
Cached:          39552 kB
SwapCached:      12076 kB
Active:          21956 kB
Inact_dirty:     40668 kB
Inact_clean:       280 kB
Inact_target:      480 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       319608 kB
LowFree:          5740 kB
SwapTotal:      265032 kB
SwapFree:       251572 kB
NrSwapPages:     62893 pages

PCI devices found: Bus  0, device   0, function  0:   Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge
(rev 3).     Master Capable.  Latency=64.       Prefetchable 32 bit memory at 0xf0000000 [0xf3ffffff]. Bus  0, device
1,function  0:   PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev
 
3).     Master Capable.  Latency=64.  Min Gnt=136. Bus  0, device   7, function  0:   ISA bridge: Intel Corporation
82371ABPIIX4 ISA (rev 2). Bus  0, device   7, function  1:   IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev
1).    Master Capable.  Latency=32.       I/O at 0x1000 [0x100f]. Bus  0, device   7, function  2:   USB Controller:
IntelCorporation 82371AB PIIX4 USB (rev 1).     IRQ 14.     Master Capable.  Latency=64.       I/O at 0xdce0 [0xdcff].
Bus 0, device   7, function  3:   Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 2).     IRQ 9. Bus  0, device  14,
function 0:   Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev
 
4).     IRQ 11.     Master Capable.  Latency=64.  Min Gnt=8.Max Lat=56.     Prefetchable 32 bit memory at 0xf7000000
[0xf7000fff].    I/O at 0xdcc0 [0xdcdf].     Non-prefetchable 32 bit memory at 0xff000000 [0xff0fffff]. Bus  0, device
15,function  0:   PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 3).     Master Capable.  Latency=64.
MinGnt=2. Bus  0, device  17, function  0:   Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
 
(rev 36).     IRQ 14.     Master Capable.  Latency=64.  Min Gnt=10.Max Lat=10.     I/O at 0xdc00 [0xdc7f].
Non-prefetchable32 bit memory at 0xff100000 [0xff10007f]. Bus  1, device   0, function  0:   VGA compatible controller:
ATITechnologies Inc 3D Rage Pro AGP
 
1X/2X (rev 92).     IRQ 9.     Master Capable.  Latency=64.  Min Gnt=8.     Non-prefetchable 32 bit memory at
0xfd000000[0xfdffffff].     I/O at 0xfc00 [0xfcff].     Non-prefetchable 32 bit memory at 0xfcfff000 [0xfcffffff]. Bus
2,device   9, function  0:   Unknown mass storage controller: Promise Technology, Inc. 20262 (rev
 
1).     IRQ 9.     Master Capable.  Latency=64.       I/O at 0xecf8 [0xecff].     I/O at 0xecf0 [0xecf3].     I/O at
0xece0[0xece7].     I/O at 0xecd8 [0xecdb].     I/O at 0xec80 [0xecbf].     Non-prefetchable 32 bit memory at
0xfafe0000[0xfaffffff].
 


nield@usol.com writes:
> What if BasicOpenFile() got some other error?

Doesn't really matter; anything else would be a problem we can't recover
from anyhow.  Besides, given that rename is failing with ENOENT, a
conflict on the destination name does not appear to be the issue.
        regards, tom lane