Thread: BUG #4680: Server crashed if using wrong (mismatch) conversion functions
The following bug has been logged online: Bug reference: 4680 Logged by: Denis Afonin Email address: vadm@itkm.ru PostgreSQL version: 8.3.6 Operating system: Linux Debian Lenny Description: Server crashed if using wrong (mismatch) conversion functions Details: I do: =cut= postgres@sunset:~$ createdb test -E KOI8 postgres@sunset:~$ psql test Welcome to psql 8.3.6, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit test=# SHOW server_version; server_version ---------------- 8.3.6 (1 row) test=# CREATE DEFAULT CONVERSION test1 FOR 'LATIN1' TO 'KOI8' FROM ascii_to_mic; CREATE CONVERSION test=# CREATE DEFAULT CONVERSION test2 FOR 'KOI8' TO 'LATIN1' FROM mic_to_ascii; CREATE CONVERSION test=# set client_encoding to 'LATIN1'; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Соединение Ñ ÑеÑвеÑом бÑло поÑеÑÑно. ÐопÑÑка пеÑеÑÑÑановиÑÑ: ÐезÑÑпеÑно. !> \q =end cut= In Logs: =cut= 2009-02-27 10:29:40 UTC LOG: database system was shut down at 2009-02-27 10:29:38 UTC 2009-02-27 10:29:40 UTC LOG: autovacuum launcher started 2009-02-27 10:29:40 UTC LOG: database system is ready to accept connections 2009-02-27 10:29:50 UTC ERROR: expected source encoding "MULE_INTERNAL", but got "KOI8" 2009-02-27 10:29:50 UTC STATEMENT: set client_encoding to 'LATIN1'; 2009-02-27 10:29:50 UTC ERROR: expected source encoding "MULE_INTERNAL", but got "KOI8" 2009-02-27 10:29:50 UTC STATEMENT: set client_encoding to 'LATIN1'; 2009-02-27 10:29:50 UTC ERROR: expected source encoding "MULE_INTERNAL", but got "KOI8" 2009-02-27 10:29:50 UTC ERROR: expected source encoding "MULE_INTERNAL", but got "KOI8" 2009-02-27 10:29:50 UTC ERROR: expected source encoding "MULE_INTERNAL", but got "KOI8" 2009-02-27 10:29:50 UTC PANIC: ERRORDATA_STACK_SIZE exceeded 2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL", but got "KOI8" 2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL", but got "KOI8" 2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL", but got "KOI8" 2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL", but got "KOI8" 2009-02-27 10:29:50 UTC PANIC: ERRORDATA_STACK_SIZE exceeded <<<<many-many-many-of-above-lines-skipped>>>>> 2009-02-27 10:29:50 UTC PANIC: ERRORDATA_STACK_SIZE exceeded 2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL", but got "KOI8" 2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL", but got "KOI8" 2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL", but got "KOI8" 2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL", but got "KOI8" 2009-02-27 10:29:50 UTC PANIC: ERRORDATA_STACK_SIZE exceeded 2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL", but got "KOI8" 2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL", but got "KOI8" 2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL", but got "KOI8" 2009-02-27 10:29:50 UTC LOG: server process (PID 4958) was terminated by signal 11: Segmentation fault 2009-02-27 10:29:50 UTC LOG: terminating any other active server processes 2009-02-27 10:29:50 UTC LOG: all server processes terminated; reinitializing 2009-02-27 10:29:50 UTC LOG: database system was interrupted; last known up at 2009-02-27 10:29:40 UTC 2009-02-27 10:29:50 UTC LOG: database system was not properly shut down; automatic recovery in progress =end cut=
Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions
From
Heikki Linnakangas
Date:
Denis Afonin wrote: > test=# CREATE DEFAULT CONVERSION test1 FOR 'LATIN1' TO 'KOI8' FROM > ascii_to_mic; > CREATE CONVERSION > test=# CREATE DEFAULT CONVERSION test2 FOR 'KOI8' TO 'LATIN1' FROM > mic_to_ascii; > CREATE CONVERSION > test=# set client_encoding to 'LATIN1'; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > Соединение с сервером было потеряно. > Попытка переустановить: Безуспешно. Hmm, this seems to be another variant of the recursive error issue fixed earlier: http://archives.postgresql.org/message-id/20081027193722.283B67545A4@cvs.postgresql.org Only this time the problem the error doesn't occur in translation, but in encoding conversion. We could doo something similar to what we did before with the translation, and try not to call conversion function in case of a recursive error. However, sending an error message to the client in wrong encoding is not as sane as sending it untranslated. I think we should instead try to break the PANIC cycle. If we exceed ERRORDATA_STACK_SIZE, and we're already PANICing, we should just die immediately instead of throwing another PANIC about exceeding the stack size. The attached patch does that. However, a more serious issue is that a regular user can do that and bring down the whole system. I suggest that we make "CREATE [DEFAULT] CONVERSION" to call the conversion function with a empty string, to check that it is in fact capable of doing the conversion. See 2nd attached patch. This is a usability improvement too, as you -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index a33c94e..24230e5 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -149,6 +149,39 @@ static void write_csvlog(ErrorData *edata); static void setup_formatted_log_time(void); static void setup_formatted_start_time(void); +/* + * Increment errordata stack depth. Return pointer to the new topmost + * entry. The caller should fill it in. + */ +static ErrorData * +push_errordata(void) +{ + if (++errordata_stack_depth >= ERRORDATA_STACK_SIZE) + { + /* + * Wups, stack not big enough. We treat this as a PANIC condition + * because it suggests an infinite loop of errors during error + * recovery. If we had PANIC'd already, just die now, because a new + * ereport(PANIC) might fail again and cause infinite recursion. + */ + if (errordata[ERRORDATA_STACK_SIZE - 1].elevel >= PANIC) + { + ImmediateInterruptOK = false; + fflush(stdout); + fflush(stderr); + abort(); + } + errordata_stack_depth = -1; /* make room on stack */ + + /* + * Note that the message is intentionally not localized, else + * failure to convert it to client encoding could cause further + * recursion. + */ + ereport(PANIC, (errmsg_internal("ERRORDATA_STACK_SIZE exceeded"))); + } + return &errordata[errordata_stack_depth]; +} /* * in_error_recursion_trouble --- are we at risk of infinite error recursion? @@ -287,19 +320,9 @@ errstart(int elevel, const char *filename, int lineno, debug_query_string = NULL; } } - if (++errordata_stack_depth >= ERRORDATA_STACK_SIZE) - { - /* - * Wups, stack not big enough. We treat this as a PANIC condition - * because it suggests an infinite loop of errors during error - * recovery. - */ - errordata_stack_depth = -1; /* make room on stack */ - ereport(PANIC, (errmsg_internal("ERRORDATA_STACK_SIZE exceeded"))); - } /* Initialize data for this error frame */ - edata = &errordata[errordata_stack_depth]; + edata = push_errordata(); MemSet(edata, 0, sizeof(ErrorData)); edata->elevel = elevel; edata->output_to_server = output_to_server; @@ -971,20 +994,7 @@ elog_start(const char *filename, int lineno, const char *funcname) { ErrorData *edata; - if (++errordata_stack_depth >= ERRORDATA_STACK_SIZE) - { - /* - * Wups, stack not big enough. We treat this as a PANIC condition - * because it suggests an infinite loop of errors during error - * recovery. Note that the message is intentionally not localized, - * else failure to convert it to client encoding could cause further - * recursion. - */ - errordata_stack_depth = -1; /* make room on stack */ - ereport(PANIC, (errmsg_internal("ERRORDATA_STACK_SIZE exceeded"))); - } - - edata = &errordata[errordata_stack_depth]; + edata = push_errordata(); edata->filename = filename; edata->lineno = lineno; edata->funcname = funcname; @@ -1165,18 +1175,7 @@ ReThrowError(ErrorData *edata) recursion_depth++; MemoryContextSwitchTo(ErrorContext); - if (++errordata_stack_depth >= ERRORDATA_STACK_SIZE) - { - /* - * Wups, stack not big enough. We treat this as a PANIC condition - * because it suggests an infinite loop of errors during error - * recovery. - */ - errordata_stack_depth = -1; /* make room on stack */ - ereport(PANIC, (errmsg_internal("ERRORDATA_STACK_SIZE exceeded"))); - } - - newedata = &errordata[errordata_stack_depth]; + newedata = push_errordata(); memcpy(newedata, edata, sizeof(ErrorData)); /* Make copies of separately-allocated fields */ diff --git a/src/backend/commands/conversioncmds.c b/src/backend/commands/conversioncmds.c index f067da2..ab744e4 100644 --- a/src/backend/commands/conversioncmds.c +++ b/src/backend/commands/conversioncmds.c @@ -49,6 +49,7 @@ CreateConversionCommand(CreateConversionStmt *stmt) const char *to_encoding_name = stmt->to_encoding_name; List *func_name = stmt->func_name; static Oid funcargs[] = {INT4OID, INT4OID, CSTRINGOID, INTERNALOID, INT4OID}; + char result[1]; /* Convert list of names to a name and namespace */ namespaceId = QualifiedNameGetCreationNamespace(stmt->conversion_name, @@ -96,6 +97,19 @@ CreateConversionCommand(CreateConversionStmt *stmt) NameListToString(func_name)); /* + * Check that the conversion function is suitable for the requested + * source and target encodings. We do that by calling the function with + * an empty string; the conversion function should throw an error if it + * can't perform the requested conversion + */ + OidFunctionCall5(funcoid, + Int32GetDatum(from_encoding), + Int32GetDatum(to_encoding), + CStringGetDatum(""), + CStringGetDatum(result), + Int32GetDatum(0)); + + /* * All seem ok, go ahead (possible failure would be a duplicate conversion * name) */
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > I think we should instead try to break the PANIC cycle. If we exceed > ERRORDATA_STACK_SIZE, and we're already PANICing, we should just die > immediately instead of throwing another PANIC about exceeding the stack > size. The attached patch does that. I don't think that's an improvement. I'm not sure exactly why the previous fix for this type of problem failed to cover this case --- did you identify why? > However, a more serious issue is that a regular user can do that and > bring down the whole system. I suggest that we make "CREATE [DEFAULT] > CONVERSION" to call the conversion function with a empty string, to > check that it is in fact capable of doing the conversion. That part seems like a good idea, now that the conversion functions throw errors (rather than Asserts) for wrong calls. regards, tom lane
Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions
From
Heikki Linnakangas
Date:
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> I think we should instead try to break the PANIC cycle. If we exceed >> ERRORDATA_STACK_SIZE, and we're already PANICing, we should just die >> immediately instead of throwing another PANIC about exceeding the stack >> size. The attached patch does that. > > I don't think that's an improvement. > > I'm not sure exactly why the previous fix for this type of problem > failed to cover this case --- did you identify why? When the conversion function throws an ERROR, we try to send the error message to the client. As part of that, we try to convert the text "ERROR" to the client encoding (pq_sendstring does that). That calls the conversion function again, which errors again, lathe, rinse, repeat. until ERRORDATA_STACK_SIZE is reached. At that point, the same happens with the text "PANIC", until ERRORDATA_STACK_SIZE is reached again, and then we get into an endless recursion. When the conversion function doesn't work, any attempt to send any text to the client will fail. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > When the conversion function doesn't work, any attempt to send any text > to the client will fail. Ah, now I remember: we arranged to short-circuit translation of the error message when we were in this situation. But it will still get passed through the encoding converter (rather uselessly, if it's all ASCII). I wonder how ugly it would be to try to suppress encoding conversion as well? regards, tom lane
Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions
From
Heikki Linnakangas
Date:
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> When the conversion function doesn't work, any attempt to send any text >> to the client will fail. > > Ah, now I remember: we arranged to short-circuit translation of the > error message when we were in this situation. But it will still get > passed through the encoding converter (rather uselessly, if it's all > ASCII). I wonder how ugly it would be to try to suppress encoding > conversion as well? The error message might contain user data, which might contain non-ASCII characters. Or maybe not, but it seems like a shaky assumption to make. Sending invalidly encoded data to the client is a bad idea; who knows how the client will react. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Tom Lane wrote: >> I wonder how ugly it would be to try to suppress encoding >> conversion as well? > The error message might contain user data, which might contain non-ASCII > characters. Or maybe not, but it seems like a shaky assumption to make. The particular cases we are concerned about here will not. In any case it's a better result than dying without sending any message at all, which would be the result of the patch you propose. regards, tom lane
Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions
From
Heikki Linnakangas
Date:
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> Tom Lane wrote: >>> I wonder how ugly it would be to try to suppress encoding >>> conversion as well? > >> The error message might contain user data, which might contain non-ASCII >> characters. Or maybe not, but it seems like a shaky assumption to make. > > The particular cases we are concerned about here will not. In any case > it's a better result than dying without sending any message at all, > which would be the result of the patch you propose. Well, you'd still have the message in the log, which is more important. We could bypass the normal encoding conversion and sanitize the message from any non-ASCII characters before sending it. Or just send a hard-coded "we're in big trouble" message that we know to not contain any non-ASCII characters. But seems like a "can't happen" scenario like this doesn't deserve that much extra effort. And it would only handle conversion errors; if there's any other similar scenario involving, um, something else, we'll have the same problem again. We didn't envision this encoding issue when we fixed the translation case, mind you. I think it would be good to have a circuit-breaker to break the infinite recursion in case PANIC fails and recurses, for any reason. (I committed the other patch, BTW, to test that the conversion function specified in CREATE CONVERSION works) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > I think it would be good to have a circuit-breaker to break the infinite > recursion in case PANIC fails and recurses, for any reason. Well, your proposed patch replaces core dump due to stack overflow with core dump due to abort(), which is no improvement at all as far as avoiding a DOS situation goes. The only way we could really improve matters on that scale would be if we were willing to consider this a non-PANIC situation, which is a bit scary. Though I suppose that if the error originally being reported weren't a PANIC, there is no reason we shouldn't try to convert the scenario to a plain FATAL exit. In any case, that's orthogonal to the part that I was focusing on, which was to try to prevent error recursion as a result of trouble in the encoding conversion subsystem. It looks like we could do that with some additional hacking in send_message_to_frontend() to avoid conversion, as well as translation, when in_error_recursion_trouble() is true. Your point about there possibly being non-ASCII user-inserted data in the message is a bit troubling, but for the cases where recursion is actually occurring I don't think that that will happen. regards, tom lane