Wayne Piekarski <wayne@senet.com.au> writes:
>>>> whole thing to fall over, but now we get
>>>> "fmgr_info: function 111257088: cache lookup failed"
>>>> after 64 backends (which is what we compiled postgres for) which I
>>>> assume isn't so fatal and the whole system keeps running.
> ... One question though: is the cache lookup failed message
> really bad or is it a cryptic way of saying that the connection is refused
> but everything else is cool?
I'd put it in the "really bad" category, mainly because I don't see the
cause-and-effect chain. It is *not* anything to do with connection
validation, that's for sure. My guess is that the additional backend
has connected and is trying to make queries, and that queries are now
failing for some resource-exhaustion kind of reason. But I don't know
why that would tend to show up as an fmgr_info failure before anything
else. Do you use user-defined functions especially heavily in this
database? For that matter, does the OID reported by fmgr_info actually
correspond to any row of pg_proc?
> As another general question, if I randomly kill postgres backends during
> the middle of transactions, is there a possibility for corruption, or is
> it safe due to the way transactions are commited, etc.
I'd regard it as very risky --- if that backend is in the middle of
modifying shared memory, you could leave shared memory datastructures
and/or disk blocks in inconsistent states. You could probably get away
with it for a backend that was blocked waiting for a lock.
regards, tom lane