Re: [HACKERS] 6.5.1, error in pg_dump - Mailing list pgsql-hackers

From Don Baccus
Subject Re: [HACKERS] 6.5.1, error in pg_dump
Date
Msg-id 3.0.1.32.19990807080117.00b0d804@mail.pacifier.com
Whole thread Raw
In response to Re: [HACKERS] 6.5.1, error in pg_dump  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
At 06:48 PM 8/6/99 -0400, Tom Lane wrote:
>Don Baccus <dhogaza@pacifier.com> writes:
>> Can not create pgdump_oid table.  Explanation from backend: 'ERROR:  cannot
>> create pgdump_oid
>> '.
>
>AFAICT, the only way that you'd get that precise wording out of the
>backend is if its attempt to create the physical file for the table
>fails --- that is, open(filename, O_RDWR | O_CREAT | O_EXCL, 0600)
>fails.  The only error message I can find with that wording is in
>smgrcreate(), and it looks like all the other potential causes of
>failure below that point (such as running out of memory) would yield
>different messages.
>
>So, it would seem that the initial cause of the problem is that there
>was already a file by that name --- perhaps because some earlier
>instance of postgres failed to unlink it.

thanks for tracking this down, I was busy yesterday, started to
poke at sources but didn't really have time to do so in earnest.

This does fit with the fact that the files for the offending
tables did indeed still exist.

I did try deleting the one for my "foo" test case, and could
then create and supposedly drop the table, but the relation
in this case then was not removed from pg_class, nor was the
file removed.  

I have had occassional filesystem problems on this machine,
and not only in postgres.  Linux flakies?  System flakies?
I've not quite been able to decide, nor do I have a spare
machine to rebuild on.  I keep hearing that Linux is very
solid, particularly the 2.0.36 kernal I'm using, but this
is my first experience running Linux on a steady basis (I
have tons of other Unix experience, of course).

>
>I'm not sure about the bizarrenesses you report later on --- they
>sound like the system tables may have gotten corrupted, or perhaps
>just the relation cache inside the backend.  (Did killing the backend
>and starting a new one help?)

No, unfortunately.

>  But I am thinking the error messages
>must have been different at that point...

Yes, the relation was still in the pg_class table (I think that's
right as opposed to pg_tables).  As was the file.  This makes more
sense, actually, in that it appears that the unlink was failing
and it didn't get removed from the system table, either - though
it's weird it reported success on the drop.

One thought crossing my mind was that the Linux filesystem may've
gotten itself into a weird state, in particular the cache.  I
rebooted, initdb'd and rebuilt and everything's OK now.

>We saw vaguely similar behaviors with temp tables when we were still
>flushing the bugs out of temp tables.  I wonder if there are still
>some temp-table-related bugs?

At this point I'm willing to believe it is a Linux or (perhaps
more likely) system problem.   If it happens again, I'm better
prepared as to where to look for more details, at least.




- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Inefficiencies in COPY command
Next
From: Tom Lane
Date:
Subject: plpgsql grammar fix not so easy after all