Thread: BUG #4680: Server crashed if using wrong (mismatch) conversion functions

BUG #4680: Server crashed if using wrong (mismatch) conversion functions

From

"Denis Afonin"

Date:

27 February 2009, 06:55:24

The following bug has been logged online:

Bug reference:      4680
Logged by:          Denis Afonin
Email address:      vadm@itkm.ru
PostgreSQL version: 8.3.6
Operating system:   Linux Debian Lenny
Description:        Server crashed if using wrong (mismatch) conversion
functions
Details:

I do:

=cut=
postgres@sunset:~$ createdb test -E KOI8
postgres@sunset:~$ psql test
Welcome to psql 8.3.6, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

test=# SHOW server_version;
 server_version
----------------
 8.3.6
(1 row)

test=# CREATE DEFAULT CONVERSION test1 FOR 'LATIN1' TO 'KOI8' FROM
ascii_to_mic;
CREATE CONVERSION
test=# CREATE DEFAULT CONVERSION test2 FOR 'KOI8' TO 'LATIN1' FROM
mic_to_ascii;
CREATE CONVERSION
test=# set client_encoding to 'LATIN1';
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
Ð¡Ð¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ðµ Ñ ÑÐµÑÐ²ÐµÑÐ¾Ð¼ Ð±ÑÐ»Ð¾ Ð¿Ð¾ÑÐµÑÑÐ½Ð¾.
ÐÐ¾Ð¿ÑÑÐºÐ° Ð¿ÐµÑÐµÑÑÑÐ°Ð½Ð¾Ð²Ð¸ÑÑ: ÐÐµÐ·ÑÑÐ¿ÐµÑÐ½Ð¾.
!> \q
=end cut=


In Logs:
=cut=
2009-02-27 10:29:40 UTC LOG:  database system was shut down at 2009-02-27
10:29:38 UTC
2009-02-27 10:29:40 UTC LOG:  autovacuum launcher started
2009-02-27 10:29:40 UTC LOG:  database system is ready to accept
connections
2009-02-27 10:29:50 UTC ERROR:  expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC STATEMENT:  set client_encoding to 'LATIN1';
2009-02-27 10:29:50 UTC ERROR:  expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC STATEMENT:  set client_encoding to 'LATIN1';
2009-02-27 10:29:50 UTC ERROR:  expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC ERROR:  expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC ERROR:  expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC:  ERRORDATA_STACK_SIZE exceeded
2009-02-27 10:29:50 UTC PANIC:  expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC:  expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC:  expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC:  expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC:  ERRORDATA_STACK_SIZE exceeded
<<<<many-many-many-of-above-lines-skipped>>>>>
2009-02-27 10:29:50 UTC PANIC:  ERRORDATA_STACK_SIZE exceeded
2009-02-27 10:29:50 UTC PANIC:  expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC:  expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC:  expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC:  expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC:  ERRORDATA_STACK_SIZE exceeded
2009-02-27 10:29:50 UTC PANIC:  expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC:  expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC:  expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC LOG:  server process (PID 4958) was terminated by
signal 11: Segmentation fault
2009-02-27 10:29:50 UTC LOG:  terminating any other active server processes
2009-02-27 10:29:50 UTC LOG:  all server processes terminated;
reinitializing
2009-02-27 10:29:50 UTC LOG:  database system was interrupted; last known up
at 2009-02-27 10:29:40 UTC
2009-02-27 10:29:50 UTC LOG:  database system was not properly shut down;
automatic recovery in progress
=end cut=

Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions

From

Heikki Linnakangas

Date:

27 February 2009, 08:14:47

Denis Afonin wrote:
> test=# CREATE DEFAULT CONVERSION test1 FOR 'LATIN1' TO 'KOI8' FROM
> ascii_to_mic;
> CREATE CONVERSION
> test=# CREATE DEFAULT CONVERSION test2 FOR 'KOI8' TO 'LATIN1' FROM
> mic_to_ascii;
> CREATE CONVERSION
> test=# set client_encoding to 'LATIN1';
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> Соединение с сервером было потеряно.
> Попытка переустановить: Безуспешно.

Hmm, this seems to be another variant of the recursive error issue fixed
earlier:
http://archives.postgresql.org/message-id/20081027193722.283B67545A4@cvs.postgresql.org

Only this time the problem the error doesn't occur in translation, but
in encoding conversion. We could doo something similar to what we did
before with the translation, and try not to call conversion function in
case of a recursive error. However, sending an error message to the
client in wrong encoding is not as sane as sending it untranslated.

I think we should instead try to break the PANIC cycle. If we exceed
ERRORDATA_STACK_SIZE, and we're already PANICing, we should just die
immediately instead of throwing another PANIC about exceeding the stack
size. The attached patch does that.

However, a more serious issue is that a regular user can do that and
bring down the whole system. I suggest that we make "CREATE [DEFAULT]
CONVERSION" to call the conversion function with a empty string, to
check that it is in fact capable of doing the conversion. See 2nd
attached patch. This is a usability improvement too, as you

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index a33c94e..24230e5 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -149,6 +149,39 @@ static void write_csvlog(ErrorData *edata);
 static void setup_formatted_log_time(void);
 static void setup_formatted_start_time(void);

+/*
+ * Increment errordata stack depth. Return pointer to the new topmost
+ * entry. The caller should fill it in.
+ */
+static ErrorData *
+push_errordata(void)
+{
+    if (++errordata_stack_depth >= ERRORDATA_STACK_SIZE)
+    {
+        /*
+         * Wups, stack not big enough.    We treat this as a PANIC condition
+         * because it suggests an infinite loop of errors during error
+         * recovery. If we had PANIC'd already, just die now, because a new
+         * ereport(PANIC) might fail again and cause infinite recursion.
+         */
+        if (errordata[ERRORDATA_STACK_SIZE - 1].elevel >= PANIC)
+        {
+            ImmediateInterruptOK = false;
+            fflush(stdout);
+            fflush(stderr);
+            abort();
+        }
+        errordata_stack_depth = -1;        /* make room on stack */
+
+        /*
+         * Note that the message is intentionally not localized, else
+         * failure to convert it to client encoding could cause further
+         * recursion.
+         */
+        ereport(PANIC, (errmsg_internal("ERRORDATA_STACK_SIZE exceeded")));
+    }
+    return &errordata[errordata_stack_depth];
+}

 /*
  * in_error_recursion_trouble --- are we at risk of infinite error recursion?
@@ -287,19 +320,9 @@ errstart(int elevel, const char *filename, int lineno,
             debug_query_string = NULL;
         }
     }
-    if (++errordata_stack_depth >= ERRORDATA_STACK_SIZE)
-    {
-        /*
-         * Wups, stack not big enough.    We treat this as a PANIC condition
-         * because it suggests an infinite loop of errors during error
-         * recovery.
-         */
-        errordata_stack_depth = -1;        /* make room on stack */
-        ereport(PANIC, (errmsg_internal("ERRORDATA_STACK_SIZE exceeded")));
-    }

     /* Initialize data for this error frame */
-    edata = &errordata[errordata_stack_depth];
+    edata = push_errordata();
     MemSet(edata, 0, sizeof(ErrorData));
     edata->elevel = elevel;
     edata->output_to_server = output_to_server;
@@ -971,20 +994,7 @@ elog_start(const char *filename, int lineno, const char *funcname)
 {
     ErrorData  *edata;

-    if (++errordata_stack_depth >= ERRORDATA_STACK_SIZE)
-    {
-        /*
-         * Wups, stack not big enough.    We treat this as a PANIC condition
-         * because it suggests an infinite loop of errors during error
-         * recovery.  Note that the message is intentionally not localized,
-         * else failure to convert it to client encoding could cause further
-         * recursion.
-         */
-        errordata_stack_depth = -1;        /* make room on stack */
-        ereport(PANIC, (errmsg_internal("ERRORDATA_STACK_SIZE exceeded")));
-    }
-
-    edata = &errordata[errordata_stack_depth];
+    edata = push_errordata();
     edata->filename = filename;
     edata->lineno = lineno;
     edata->funcname = funcname;
@@ -1165,18 +1175,7 @@ ReThrowError(ErrorData *edata)
     recursion_depth++;
     MemoryContextSwitchTo(ErrorContext);

-    if (++errordata_stack_depth >= ERRORDATA_STACK_SIZE)
-    {
-        /*
-         * Wups, stack not big enough.    We treat this as a PANIC condition
-         * because it suggests an infinite loop of errors during error
-         * recovery.
-         */
-        errordata_stack_depth = -1;        /* make room on stack */
-        ereport(PANIC, (errmsg_internal("ERRORDATA_STACK_SIZE exceeded")));
-    }
-
-    newedata = &errordata[errordata_stack_depth];
+    newedata = push_errordata();
     memcpy(newedata, edata, sizeof(ErrorData));

     /* Make copies of separately-allocated fields */
diff --git a/src/backend/commands/conversioncmds.c b/src/backend/commands/conversioncmds.c
index f067da2..ab744e4 100644
--- a/src/backend/commands/conversioncmds.c
+++ b/src/backend/commands/conversioncmds.c
@@ -49,6 +49,7 @@ CreateConversionCommand(CreateConversionStmt *stmt)
     const char *to_encoding_name = stmt->to_encoding_name;
     List       *func_name = stmt->func_name;
     static Oid    funcargs[] = {INT4OID, INT4OID, CSTRINGOID, INTERNALOID, INT4OID};
+    char result[1];

     /* Convert list of names to a name and namespace */
     namespaceId = QualifiedNameGetCreationNamespace(stmt->conversion_name,
@@ -96,6 +97,19 @@ CreateConversionCommand(CreateConversionStmt *stmt)
                        NameListToString(func_name));

     /*
+     * Check that the conversion function is suitable for the requested
+     * source and target encodings. We do that by calling the function with
+     * an empty string; the conversion function should throw an error if it
+     * can't perform the requested conversion
+     */
+    OidFunctionCall5(funcoid,
+                     Int32GetDatum(from_encoding),
+                     Int32GetDatum(to_encoding),
+                     CStringGetDatum(""),
+                     CStringGetDatum(result),
+                     Int32GetDatum(0));
+
+    /*
      * All seem ok, go ahead (possible failure would be a duplicate conversion
      * name)
      */

Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions

From

Tom Lane

Date:

27 February 2009, 11:31:48

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> I think we should instead try to break the PANIC cycle. If we exceed
> ERRORDATA_STACK_SIZE, and we're already PANICing, we should just die
> immediately instead of throwing another PANIC about exceeding the stack
> size. The attached patch does that.

I don't think that's an improvement.

I'm not sure exactly why the previous fix for this type of problem
failed to cover this case --- did you identify why?

> However, a more serious issue is that a regular user can do that and
> bring down the whole system. I suggest that we make "CREATE [DEFAULT]
> CONVERSION" to call the conversion function with a empty string, to
> check that it is in fact capable of doing the conversion.

That part seems like a good idea, now that the conversion functions
throw errors (rather than Asserts) for wrong calls.

            regards, tom lane

Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions

From

Heikki Linnakangas

Date:

27 February 2009, 11:51:16

Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> I think we should instead try to break the PANIC cycle. If we exceed
>> ERRORDATA_STACK_SIZE, and we're already PANICing, we should just die
>> immediately instead of throwing another PANIC about exceeding the stack
>> size. The attached patch does that.
>
> I don't think that's an improvement.
>
> I'm not sure exactly why the previous fix for this type of problem
> failed to cover this case --- did you identify why?

When the conversion function throws an ERROR, we try to send the error
message to the client. As part of that, we try to convert the text
"ERROR" to the client encoding (pq_sendstring does that). That calls the
conversion function again, which errors again, lathe, rinse, repeat.
until ERRORDATA_STACK_SIZE is reached. At that point, the same happens
with the text "PANIC", until ERRORDATA_STACK_SIZE is reached again, and
then we get into an endless recursion.

When the conversion function doesn't work, any attempt to send any text
to the client will fail.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions

From

Tom Lane

Date:

27 February 2009, 11:56:51

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> When the conversion function doesn't work, any attempt to send any text
> to the client will fail.

Ah, now I remember: we arranged to short-circuit translation of the
error message when we were in this situation.  But it will still get
passed through the encoding converter (rather uselessly, if it's all
ASCII).  I wonder how ugly it would be to try to suppress encoding
conversion as well?

            regards, tom lane

Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions

From

Heikki Linnakangas

Date:

27 February 2009, 12:18:41

Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> When the conversion function doesn't work, any attempt to send any text
>> to the client will fail.
>
> Ah, now I remember: we arranged to short-circuit translation of the
> error message when we were in this situation.  But it will still get
> passed through the encoding converter (rather uselessly, if it's all
> ASCII).  I wonder how ugly it would be to try to suppress encoding
> conversion as well?

The error message might contain user data, which might contain non-ASCII
characters. Or maybe not, but it seems like a shaky assumption to make.
Sending invalidly encoded data to the client is a bad idea; who knows
how the client will react.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions

From

Tom Lane

Date:

27 February 2009, 12:37:42

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> I wonder how ugly it would be to try to suppress encoding
>> conversion as well?

> The error message might contain user data, which might contain non-ASCII
> characters. Or maybe not, but it seems like a shaky assumption to make.

The particular cases we are concerned about here will not.  In any case
it's a better result than dying without sending any message at all,
which would be the result of the patch you propose.

            regards, tom lane

Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions

From

Heikki Linnakangas

Date:

27 February 2009, 12:57:36

Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Tom Lane wrote:
>>> I wonder how ugly it would be to try to suppress encoding
>>> conversion as well?
>
>> The error message might contain user data, which might contain non-ASCII
>> characters. Or maybe not, but it seems like a shaky assumption to make.
>
> The particular cases we are concerned about here will not.  In any case
> it's a better result than dying without sending any message at all,
> which would be the result of the patch you propose.

Well, you'd still have the message in the log, which is more important.

We could bypass the normal encoding conversion and sanitize the message
from any non-ASCII characters before sending it. Or just send a
hard-coded "we're in big trouble" message that we know to not contain
any non-ASCII characters. But seems like a "can't happen" scenario like
this doesn't deserve that much extra effort. And it would only handle
conversion errors; if there's any other similar scenario involving, um,
something else, we'll have the same problem again. We didn't envision
this encoding issue when we fixed the translation case, mind you.

I think it would be good to have a circuit-breaker to break the infinite
recursion in case PANIC fails and recurses, for any reason.

(I committed the other patch, BTW, to test that the conversion function
specified in CREATE CONVERSION works)

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions

From

Tom Lane

Date:

27 February 2009, 15:43:36

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> I think it would be good to have a circuit-breaker to break the infinite
> recursion in case PANIC fails and recurses, for any reason.

Well, your proposed patch replaces core dump due to stack overflow with
core dump due to abort(), which is no improvement at all as far as
avoiding a DOS situation goes.  The only way we could really improve
matters on that scale would be if we were willing to consider this a
non-PANIC situation, which is a bit scary.  Though I suppose that if the
error originally being reported weren't a PANIC, there is no reason
we shouldn't try to convert the scenario to a plain FATAL exit.

In any case, that's orthogonal to the part that I was focusing on,
which was to try to prevent error recursion as a result of trouble
in the encoding conversion subsystem.  It looks like we could do that
with some additional hacking in send_message_to_frontend() to avoid
conversion, as well as translation, when in_error_recursion_trouble()
is true.  Your point about there possibly being non-ASCII user-inserted
data in the message is a bit troubling, but for the cases where
recursion is actually occurring I don't think that that will happen.

            regards, tom lane