Re: Warning: Don't delete those /tmp/.PGSQL.* files - Mailing list pgsql-general

From Tom Lane
Subject Re: Warning: Don't delete those /tmp/.PGSQL.* files
Date
Msg-id 26974.975191713@sss.pgh.pa.us
Whole thread Raw
In response to Warning: Don't delete those /tmp/.PGSQL.* files  ("Joel Burton" <jburton@scw.org>)
Responses Re: Warning: Don't delete those /tmp/.PGSQL.* files
Re: Warning: Don't delete those /tmp/.PGSQL.* files
List pgsql-general
"Joel Burton" <jburton@scw.org> writes:
> Working on my database, I had a view that would lock up the
> machine (eats all available memory, soon goes belly-up.) Turned out
> to be a recursive view: view A asked a question of view B that
> asked view A. [is it possible for pgsql to detect this?

It should have been detected --- there is a check in the rewriter that's
supposed to error out after ten recursive rewrite calls.  Maybe that
logic is broken, or misses certain cases.  Could you exhibit the views
that caused this behavior for you?

> So, I began restarting pgsql w/a  line like

> rm -f /tmp/.PGSQL.* && postmaster -i >log 2>log &

> Which works great. Except that I *kept* using this for two weeks
> after the view problem (damn that bash up-arrow laziness!), and
> yesterday, used it to restart PostgreSQL except (oops!) it was
> already running.

> Results: no database at all. All classes (tables/views/etc) returned
> 0 records (meaning that no tables showed up in psql's \d, since
> pg_class returned nothing.)

Ugh.  The reason that removing the socket file allowed a second
postmaster to start up is that we use an advisory lock on the socket
file as the interlock that prevents two PMs on the same port number.
Remove the socket file, poof no interlock.

*However*, there is a second line of defense to prevent two postmasters
in the same directory, and I don't understand why that didn't trigger.
Unless you are running a version old enough to not have it.  What PG
version is this, anyway?

Assuming you got past both interlocks, the second postmaster would have
reinitialized Postgres' shared memory block for that database, which
would have been a Bad Thing(tm) ... but it would not have led to any
immediate damage to your on-disk files, AFAICS.  Was the database still
hosed after you stopped both postmasters and started a fresh one?  (Did
you even try that?)

This story does indicate that we need a less fragile interlock against
starting two postmasters on one database.  I have to admit that it
hadn't occurred to me that you could break the port-number interlock
so easily as that :-(.  But obviously you can, so we need a different
way of representing the interlock.  Hackers, any thoughts?

Note: I've narrowed followups to just pghackers, since that seems like
the right forum for discussing a better interlock mechanism.

            regards, tom lane

pgsql-general by date:

Previous
From: "Joel Burton"
Date:
Subject: Warning: Don't delete those /tmp/.PGSQL.* files
Next
From: GH
Date:
Subject: Re: Warning: Don't delete those /tmp/.PGSQL.* files