Thread: 7.4.3 server panic
I have been able to crash the server a few times with the following function when it is called on a user and schema that was recently deleted. Following is a log of what happened and also the function that was called. This is 7.4.3 on Freebsd 5.2.1. It doesn't always panic when it encounters a deleted user/schema, it's kind of sporadic but happens often enough that it's easy to duplicate. Chris ---------------------------------------------------------------------------- ------ CREATE OR REPLACE FUNCTION setuser(varchar) RETURNS integer AS ' DECLARE in_username ALIAS FOR \$1; BEGIN EXECUTE ''SET SESSION AUTHORIZATION '' || in_username; EXECUTE ''SET SEARCH_PATH TO '' || in_username || '',PUBLIC''; RETURN 1; END ' LANGUAGE 'plpgsql'; ---------------------------------------------------------------------------- ----- LOG: statement: select * from setuser('support') LOG: statement: SET SESSION AUTHORIZATION support CONTEXT: PL/pgSQL function "setuser" line 4 at execute statement LOG: statement: SET SEARCH_PATH TO support,PUBLIC CONTEXT: PL/pgSQL function "setuser" line 5 at execute statement ERROR: schema "support" does not exist CONTEXT: PL/pgSQL function "setuser" line 5 at execute statement STATEMENT: select * from setuser('support') ERROR: invalid user ID: 194 PANIC: error during error recovery, giving up LOG: server process (PID 38302) was terminated by signal 6 LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command.
"Chris Ochs" <chris@paymentonline.com> writes: > ERROR: invalid user ID: 194 > PANIC: error during error recovery, giving up > LOG: server process (PID 38302) was terminated by signal 6 Can you get a stack traceback from this crash? The only occurrence of "invalid user ID:" that I see in the source code is in GetUserNameFromId(), but there's no visible reason why that would be called during error recovery. Also, a recipe for reproducing it would help. I spent a little time trying with your function with no success. You might be able to make it more reproducible by inserting delays in the setuser() function. (See the sleep() function in src/test/regress/sql/stats.sql for a quick and dirty way to delay.) I didn't have any luck, but since I don't know what's going on concurrently with this function in your environment, I was probably trying the wrong things. regards, tom lane
> "Chris Ochs" <chris@paymentonline.com> writes: >> ERROR: invalid user ID: 194 >> PANIC: error during error recovery, giving up >> LOG: server process (PID 38302) was terminated by signal 6 > > Can you get a stack traceback from this crash? The only occurrence of > "invalid user ID:" that I see in the source code is in > GetUserNameFromId(), but there's no visible reason why that would be > called during error recovery. (gdb) bt #0 0x284f9dcf in kill () from /lib/libc.so.5 #1 0x284ee878 in raise () from /lib/libc.so.5 #2 0x28566f82 in abort () from /lib/libc.so.5 #3 0x08226a6a in errfinish () #4 0x08226953 in errfinish () #5 0x0822f54d in GetUserNameFromId () #6 0x081183a3 in show_session_authorization () #7 0x08232fe7 in show_all_settings () #8 0x0823196b in AtEOXact_GUC () #9 0x0823164f in AtEOXact_GUC () #10 0x08094cbd in XactPopRollback () #11 0x081a2334 in PostgresMain () #12 0x0817576b in PostmasterMain () #13 0x08174fa3 in PostmasterMain () #14 0x08173436 in PostmasterMain () #15 0x08172c1c in PostmasterMain () #16 0x0813b7fa in main () #17 0x0806d822 in _start () (gdb) > > Also, a recipe for reproducing it would help. I spent a little time > trying with your function with no success. You might be able to make > it more reproducible by inserting delays in the setuser() function. > (See the sleep() function in src/test/regress/sql/stats.sql for a quick > and dirty way to delay.) I didn't have any luck, but since I don't know > what's going on concurrently with this function in your environment, > I was probably trying the wrong things. I'm not sure myself exactly what the trigger is. The database connections are all through mod perl/Apache::DBI which I suspect have something to do with the problem. Maybe when a user is dropped, it changes the state of the database in some way that an active, open connection doesn't pick up? Just a guess. One thing I noticed is that the panic happens not when setuser is called on the user that is deleted, but on a user that doesn't, and never did, exist. It also keeps panicing on the same user even though I am calling setuser on several other users that don't exist (or did and have been deleted). It does appear though that deleting a user/schema has some sort of effect, because it always seems to happen when I have deleted a user/schema. In other words it's never happened in the absence of a recently deleted user that setuser was called on, even though it's not that user that it appears to panic on. If that makes sense.... Chris > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > >
"Chris" <chris@paymentonline.com> writes: >> Can you get a stack traceback from this crash? > (gdb) bt > #0 0x284f9dcf in kill () from /lib/libc.so.5 > #1 0x284ee878 in raise () from /lib/libc.so.5 > #2 0x28566f82 in abort () from /lib/libc.so.5 > #3 0x08226a6a in errfinish () > #4 0x08226953 in errfinish () > #5 0x0822f54d in GetUserNameFromId () > #6 0x081183a3 in show_session_authorization () > #7 0x08232fe7 in show_all_settings () > #8 0x0823196b in AtEOXact_GUC () > #9 0x0823164f in AtEOXact_GUC () > #10 0x08094cbd in XactPopRollback () > #11 0x081a2334 in PostgresMain () The trace is a bit bogus, but it did help me figure out what's going on: I think that we're trying to report the current session authorization to the client (which is something that happens automatically on any change) and it's referencing a deleted user. Is it possible that your setup is deleting the user that the entire session is running as? regards, tom lane
> > > (gdb) bt > > #0 0x284f9dcf in kill () from /lib/libc.so.5 > > #1 0x284ee878 in raise () from /lib/libc.so.5 > > #2 0x28566f82 in abort () from /lib/libc.so.5 > > #3 0x08226a6a in errfinish () > > #4 0x08226953 in errfinish () > > #5 0x0822f54d in GetUserNameFromId () > > #6 0x081183a3 in show_session_authorization () > > #7 0x08232fe7 in show_all_settings () > > #8 0x0823196b in AtEOXact_GUC () > > #9 0x0823164f in AtEOXact_GUC () > > #10 0x08094cbd in XactPopRollback () > > #11 0x081a2334 in PostgresMain () > > The trace is a bit bogus, but it did help me figure out what's going on: > I think that we're trying to report the current session authorization to > the client (which is something that happens automatically on any change) > and it's referencing a deleted user. Is it possible that your setup is > deleting the user that the entire session is running as? > > regards, tom lane By entire session do you mean the superuser that initially makes the connection? If so, then definitely no that user is not deleted. It's possible that a set session authorization was issued, the current session user was then deleted by another connection, and then the session of the deleted user tried to report. I am calling reset session authorization at the start of every run of our application, but I could have missed something so that it does end up in a state of trying to report on a user session where the user no longer exists. Chris
"Chris Ochs" <chris@paymentonline.com> writes: >> I think that we're trying to report the current session authorization to >> the client (which is something that happens automatically on any change) >> and it's referencing a deleted user. Is it possible that your setup is >> deleting the user that the entire session is running as? > By entire session do you mean the superuser that initially makes the > connection? If so, then definitely no that user is not deleted. It's > possible that a set session authorization was issued, the current session > user was then deleted by another connection, and then the session of the > deleted user tried to report. I am calling reset session authorization at > the start of every run of our application, but I could have missed something > so that it does end up in a state of trying to report on a user session > where the user no longer exists. Okay. The ideal fix would be for the system to refuse to allow dropping a user that someone is currently connected or authorized as, but that seems impractical to enforce. (An authorization could, for example, be hidden a few levels down a session's call stack if there are nested SECURITY DEFINER function calls. It'd be expensive even to check this for your own session, let alone any other sessions.) What I think I will do is alter the SET SESSION AUTHORIZATION code so that it stores both the name and userID of the currently authorized user, and then it can just report the name part to the client without having to do a lookup, which is a Good Thing if we're in an aborted transaction. The downside of this is that an ALTER USER RENAME won't affect the stored data and so the reported authorization name won't change in existing sessions ... but I don't think that works now anyway. regards, tom lane