Thread: 7.4.3 server panic

7.4.3 server panic

From
"Chris Ochs"
Date:
I have been able to crash the server a few times with the following function
when it is called on a user and schema that was recently deleted.  Following
is a log of what happened and also the function that was called.  This is
7.4.3 on Freebsd 5.2.1.  It doesn't always panic when it encounters a
deleted user/schema, it's kind of sporadic but happens often enough that
it's easy to duplicate.

Chris

----------------------------------------------------------------------------
------
CREATE OR REPLACE FUNCTION setuser(varchar) RETURNS integer AS
'
DECLARE
   in_username ALIAS FOR \$1;
BEGIN
   EXECUTE ''SET SESSION AUTHORIZATION '' || in_username;
   EXECUTE ''SET SEARCH_PATH TO '' || in_username || '',PUBLIC'';
   RETURN 1;
END '
LANGUAGE 'plpgsql';

----------------------------------------------------------------------------
-----
LOG:  statement:                                               select * from
setuser('support')
LOG:  statement: SET SESSION AUTHORIZATION support
CONTEXT:  PL/pgSQL function "setuser" line 4 at execute statement
LOG:  statement: SET SEARCH_PATH TO support,PUBLIC
CONTEXT:  PL/pgSQL function "setuser" line 5 at execute statement
ERROR:  schema "support" does not exist
CONTEXT:  PL/pgSQL function "setuser" line 5 at execute statement
STATEMENT:                                                select * from
setuser('support')
ERROR:  invalid user ID: 194
PANIC:  error during error recovery, giving up
LOG:  server process (PID 38302) was terminated by signal 6
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
 abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.



Re: 7.4.3 server panic

From
Tom Lane
Date:
"Chris Ochs" <chris@paymentonline.com> writes:
> ERROR:  invalid user ID: 194
> PANIC:  error during error recovery, giving up
> LOG:  server process (PID 38302) was terminated by signal 6

Can you get a stack traceback from this crash?  The only occurrence of
"invalid user ID:" that I see in the source code is in
GetUserNameFromId(), but there's no visible reason why that would be
called during error recovery.

Also, a recipe for reproducing it would help.  I spent a little time
trying with your function with no success.  You might be able to make
it more reproducible by inserting delays in the setuser() function.
(See the sleep() function in src/test/regress/sql/stats.sql for a quick
and dirty way to delay.)  I didn't have any luck, but since I don't know
what's going on concurrently with this function in your environment,
I was probably trying the wrong things.

            regards, tom lane

Re: 7.4.3 server panic

From
"Chris"
Date:
> "Chris Ochs" <chris@paymentonline.com> writes:
>> ERROR:  invalid user ID: 194
>> PANIC:  error during error recovery, giving up
>> LOG:  server process (PID 38302) was terminated by signal 6
>
> Can you get a stack traceback from this crash?  The only occurrence of
> "invalid user ID:" that I see in the source code is in
> GetUserNameFromId(), but there's no visible reason why that would be
> called during error recovery.

(gdb) bt
#0  0x284f9dcf in kill () from /lib/libc.so.5
#1  0x284ee878 in raise () from /lib/libc.so.5
#2  0x28566f82 in abort () from /lib/libc.so.5
#3  0x08226a6a in errfinish ()
#4  0x08226953 in errfinish ()
#5  0x0822f54d in GetUserNameFromId ()
#6  0x081183a3 in show_session_authorization ()
#7  0x08232fe7 in show_all_settings ()
#8  0x0823196b in AtEOXact_GUC ()
#9  0x0823164f in AtEOXact_GUC ()
#10 0x08094cbd in XactPopRollback ()
#11 0x081a2334 in PostgresMain ()
#12 0x0817576b in PostmasterMain ()
#13 0x08174fa3 in PostmasterMain ()
#14 0x08173436 in PostmasterMain ()
#15 0x08172c1c in PostmasterMain ()
#16 0x0813b7fa in main ()
#17 0x0806d822 in _start ()
(gdb)


>
> Also, a recipe for reproducing it would help.  I spent a little time
> trying with your function with no success.  You might be able to make
> it more reproducible by inserting delays in the setuser() function.
> (See the sleep() function in src/test/regress/sql/stats.sql for a quick
> and dirty way to delay.)  I didn't have any luck, but since I don't know
> what's going on concurrently with this function in your environment,
> I was probably trying the wrong things.

I'm not sure myself exactly what the trigger is.  The database connections
are all through mod perl/Apache::DBI which I suspect have something to do
with the problem.  Maybe when a user is dropped, it changes the state of the
database in some way that an active, open connection doesn't pick up?  Just
a guess.  One thing I noticed is that the panic happens not when setuser is
called on the user that is deleted, but on a user that doesn't, and never
did, exist.  It also keeps panicing on the same user even though I am
calling setuser on several other users that don't exist (or did and have
been deleted).  It does appear though that deleting a user/schema has some
sort of effect, because it always seems to happen when I have deleted a
user/schema.  In other words it's never happened in the absence of a
recently deleted user that setuser was called on, even though it's not that
user that it appears to panic on.  If that makes sense....

Chris

>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>      joining column's datatypes do not match
>
>


Re: 7.4.3 server panic

From
Tom Lane
Date:
"Chris" <chris@paymentonline.com> writes:
>> Can you get a stack traceback from this crash?

> (gdb) bt
> #0  0x284f9dcf in kill () from /lib/libc.so.5
> #1  0x284ee878 in raise () from /lib/libc.so.5
> #2  0x28566f82 in abort () from /lib/libc.so.5
> #3  0x08226a6a in errfinish ()
> #4  0x08226953 in errfinish ()
> #5  0x0822f54d in GetUserNameFromId ()
> #6  0x081183a3 in show_session_authorization ()
> #7  0x08232fe7 in show_all_settings ()
> #8  0x0823196b in AtEOXact_GUC ()
> #9  0x0823164f in AtEOXact_GUC ()
> #10 0x08094cbd in XactPopRollback ()
> #11 0x081a2334 in PostgresMain ()

The trace is a bit bogus, but it did help me figure out what's going on:
I think that we're trying to report the current session authorization to
the client (which is something that happens automatically on any change)
and it's referencing a deleted user.  Is it possible that your setup is
deleting the user that the entire session is running as?

            regards, tom lane

Re: 7.4.3 server panic

From
"Chris Ochs"
Date:
>
> > (gdb) bt
> > #0  0x284f9dcf in kill () from /lib/libc.so.5
> > #1  0x284ee878 in raise () from /lib/libc.so.5
> > #2  0x28566f82 in abort () from /lib/libc.so.5
> > #3  0x08226a6a in errfinish ()
> > #4  0x08226953 in errfinish ()
> > #5  0x0822f54d in GetUserNameFromId ()
> > #6  0x081183a3 in show_session_authorization ()
> > #7  0x08232fe7 in show_all_settings ()
> > #8  0x0823196b in AtEOXact_GUC ()
> > #9  0x0823164f in AtEOXact_GUC ()
> > #10 0x08094cbd in XactPopRollback ()
> > #11 0x081a2334 in PostgresMain ()
>
> The trace is a bit bogus, but it did help me figure out what's going on:
> I think that we're trying to report the current session authorization to
> the client (which is something that happens automatically on any change)
> and it's referencing a deleted user.  Is it possible that your setup is
> deleting the user that the entire session is running as?
>
> regards, tom lane

By entire session do you mean the superuser that initially makes the
connection?  If so, then definitely no that user is not deleted.  It's
possible that a set session authorization was issued, the current session
user was then deleted by another connection, and then the session of the
deleted user tried to report.  I am calling reset session authorization at
the start of every run of our application, but I could have missed something
so that it does end up in a state of trying to report on a user session
where the user no longer exists.

Chris




Re: 7.4.3 server panic

From
Tom Lane
Date:
"Chris Ochs" <chris@paymentonline.com> writes:
>> I think that we're trying to report the current session authorization to
>> the client (which is something that happens automatically on any change)
>> and it's referencing a deleted user.  Is it possible that your setup is
>> deleting the user that the entire session is running as?

> By entire session do you mean the superuser that initially makes the
> connection?  If so, then definitely no that user is not deleted.  It's
> possible that a set session authorization was issued, the current session
> user was then deleted by another connection, and then the session of the
> deleted user tried to report.  I am calling reset session authorization at
> the start of every run of our application, but I could have missed something
> so that it does end up in a state of trying to report on a user session
> where the user no longer exists.

Okay.  The ideal fix would be for the system to refuse to allow
dropping a user that someone is currently connected or authorized as,
but that seems impractical to enforce.  (An authorization could, for
example, be hidden a few levels down a session's call stack if there
are nested SECURITY DEFINER function calls.  It'd be expensive even
to check this for your own session, let alone any other sessions.)

What I think I will do is alter the SET SESSION AUTHORIZATION code so
that it stores both the name and userID of the currently authorized
user, and then it can just report the name part to the client without
having to do a lookup, which is a Good Thing if we're in an aborted
transaction.

The downside of this is that an ALTER USER RENAME won't affect the
stored data and so the reported authorization name won't change in
existing sessions ... but I don't think that works now anyway.

            regards, tom lane