Thread: Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files

Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files

From
"Joel Burton"
Date:

On 25 Nov 2000, at 17:35, Tom Lane wrote:

> > So, I began restarting pgsql w/a  line like
>
> > rm -f /tmp/.PGSQL.* && postmaster -i >log 2>log &
>
> > Which works great. Except that I *kept* using this for two weeks
> > after the view problem (damn that bash up-arrow laziness!), and
> > yesterday, used it to restart PostgreSQL except (oops!) it was
> > already running.
>
> > Results: no database at all. All classes (tables/views/etc) returned
> > 0 records (meaning that no tables showed up in psql's \d, since
> > pg_class returned nothing.)
>
> Ugh.  The reason that removing the socket file allowed a second
> postmaster to start up is that we use an advisory lock on the socket
> file as the interlock that prevents two PMs on the same port number.
> Remove the socket file, poof no interlock.
>
> *However*, there is a second line of defense to prevent two
> postmasters in the same directory, and I don't understand why that
> didn't trigger. Unless you are running a version old enough to not
> have it.  What PG version is this, anyway?

7.1devel, from about 1 week ago.

> Assuming you got past both interlocks, the second postmaster would
> have reinitialized Postgres' shared memory block for that database,
> which would have been a Bad Thing(tm) ... but it would not have led to
> any immediate damage to your on-disk files, AFAICS.  Was the database
> still hosed after you stopped both postmasters and started a fresh
> one?  (Did you even try that?)

Yes, I stopped both, rebooted machine, restarted postmaster.
Rebooted machine, used just postgres, tried to vacuum, tried to
dump, etc. Always the same story.


--
Joel Burton, Director of Information Systems -*- jburton@scw.org
Support Center of Washington (www.scw.org)

Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files

From
Tom Lane
Date:
"Joel Burton" <jburton@scw.org> writes:
> On 25 Nov 2000, at 17:35, Tom Lane wrote:
>> Ugh.  The reason that removing the socket file allowed a second
>> postmaster to start up is that we use an advisory lock on the socket
>> file as the interlock that prevents two PMs on the same port number.
>> Remove the socket file, poof no interlock.
>> 
>> *However*, there is a second line of defense to prevent two
>> postmasters in the same directory, and I don't understand why that
>> didn't trigger. Unless you are running a version old enough to not
>> have it.  What PG version is this, anyway?

> 7.1devel, from about 1 week ago.

Ah, I see why the data-directory interlock file wasn't helping: it
wasn't checked until *after* shared memory was set up (read clobbered
:-().  This was not a very bright choice.  I'm still surprised that
the shared-memory reset should've trashed your database so thoroughly,
though.

Over the past two days I've committed changes that should make the data
directory, socket file, and shared memory interlocks considerably more
robust.  In particular, mechanically doing "rm -f /tmp/.s.PGSQL.5432"
should never be necessary anymore.

Sorry about your trouble...

BTW, your original message mentioned something about a recursive view
definition that wasn't being recognized as such.  Could you provide
details on that?
        regards, tom lane


Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files

From
"Joel Burton"
Date:
> Ah, I see why the data-directory interlock file wasn't helping: it
> wasn't checked until *after* shared memory was set up (read clobbered
> :-().  This was not a very bright choice.  I'm still surprised that
> the shared-memory reset should've trashed your database so thoroughly,
> though.
> 
> Over the past two days I've committed changes that should make the
> data directory, socket file, and shared memory interlocks considerably
> more robust.  In particular, mechanically doing "rm -f
> /tmp/.s.PGSQL.5432" should never be necessary anymore.

That's fantastic. Thanks for the quick fix. 

> BTW, your original message mentioned something about a recursive view
> definition that wasn't being recognized as such.  Could you provide
> details on that?

I can't. It's a few weeks ago, the database has been in furious 
development, and, of course, I didn't bother to save all those views 
that crashed my server. I keep trying to re-create it, but can't 
figure it out. I'm sorry.

I think it wasn't just two views pointing at each other (it would, of 
course, be next to impossible to even create those, unless you hand 
tweaked the system tables), but I think was a view-relies-on-a-
function-relies-on-a-view kind of problem. If I ever see it again, I'll 
save it.

Thanks!

--
Joel Burton, Director of Information Systems -*- jburton@scw.org
Support Center of Washington (www.scw.org)


Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files

From
Tom Lane
Date:
"Joel Burton" <jburton@scw.org> writes:
> I think it wasn't just two views pointing at each other (it would, of 
> course, be next to impossible to even create those, unless you hand 
> tweaked the system tables), but I think was a view-relies-on-a-
> function-relies-on-a-view kind of problem.

Oh, OK.  I wouldn't expect the rewriter to realize that that sort of
situation is recursive.  Depending on what your function is doing, it
might or might not be an infinite recursion, so I don't think I'd want
the system arbitrarily preventing you from doing this sort of thing.

Perhaps there should be an upper bound on function-call recursion depth
enforced someplace?  Not sure.
        regards, tom lane