Thread: ERRORDATA_STACK_SIZE exceeded (server crash)

ERRORDATA_STACK_SIZE exceeded (server crash)

From
"Ibrar Ahmed"
Date:
Hi,

I have encountered a server crash while working with different locale
settings. After searching on the internet I have seen a similar issue
with 8.3.1 release and Tom has fixed that issue. That bug was only in
Windows but I am getting same server crash on Linux, although I am
using a later release with the patch applied.

Here is the link of previous bug

http://archives.postgresql.org/pgsql-committers/2008-05/msg00349.php




OS = ubuntu
PG Version = psql (8.4devel)


postgres=# show server_encoding;server_encoding
-----------------UTF8
(1건 있음)

postgres=# show client_encoding;client_encoding
-----------------UTF8
(1건 있음)

postgres=# set client_encoding ='euc-jp';
SET
postgres=# x;
서버가 갑자기 연결을 닫았음       이런 처리는 클라이언트의 요구를 처리하는 동안이나       처리하기 전에 서버가 갑자기 종료되었음을 의미함
서버로부터 연결이 끊어졌습니다. 다시 연결을 시도합니다: 실패.
!>



--   Ibrar Ahmed  EnterpriseDB   http://www.enterprisedb.com

Re: ERRORDATA_STACK_SIZE exceeded (server crash)

From
Heikki Linnakangas
Date:
Ibrar Ahmed wrote:
> I have encountered a server crash while working with different locale
> settings. After searching on the internet I have seen a similar issue
> with 8.3.1 release and Tom has fixed that issue. That bug was only in
> Windows but I am getting same server crash on Linux, although I am
> using a later release with the patch applied.
> 
> Here is the link of previous bug
> 
> http://archives.postgresql.org/pgsql-committers/2008-05/msg00349.php
> 
> OS = ubuntu
> PG Version = psql (8.4devel)
> 
> 
> postgres=# show server_encoding;
>  server_encoding
> -----------------
>  UTF8
> (1건 있음)
> 
> postgres=# show client_encoding;
>  client_encoding
> -----------------
>  UTF8
> (1건 있음)
> 
> postgres=# set client_encoding ='euc-jp';
> SET
> postgres=# x;
> 서버가 갑자기 연결을 닫았음
>         이런 처리는 클라이언트의 요구를 처리하는 동안이나
>         처리하기 전에 서버가 갑자기 종료되었음을 의미함
> 서버로부터 연결이 끊어졌습니다. 다시 연결을 시도합니다: 실패.
> !>

Do you have  core dump, stack trace or any other details?

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: ERRORDATA_STACK_SIZE exceeded (server crash)

From
Tom Lane
Date:
"Ibrar Ahmed" <ibrar.ahmad@gmail.com> writes:
> I have encountered a server crash while working with different locale
> settings.

Are you going to give us a hint what settings those would be?
        regards, tom lane


Re: ERRORDATA_STACK_SIZE exceeded (server crash)

From
"Ibrar Ahmed"
Date:
Sure!

CODE
------
/configure --enable-nls --enable-depend --enable-debug
make
make install

SERVER SIDE
-----------------
1 - export LANG=ko_KR.UTF-8
2 - ./initdb -E UTF8 -D ../data
3 - ./postmaster -D ../data

CLIENT SIDE
---------------
1 - export LANG=ko_KR.UTF-8
2 - psql postgres

postgres=# show server_encoding;server_encoding
-----------------UTF8
(1건 있음)

postgres=# show client_encoding;client_encoding
-----------------UTF8
(1건 있음)



postgres=# set client_encoding ='euc-jp';    --[<<<--Negative test scenario]
SET

postgres=# x;


On Mon, Oct 27, 2008 at 6:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Ibrar Ahmed" <ibrar.ahmad@gmail.com> writes:
>> I have encountered a server crash while working with different locale
>> settings.
>
> Are you going to give us a hint what settings those would be?
>
>                        regards, tom lane
>



--   Ibrar Ahmed  EnterpriseDB   http://www.enterprisedb.com

Re: ERRORDATA_STACK_SIZE exceeded (server crash)

From
Tom Lane
Date:
"Ibrar Ahmed" <ibrar.ahmad@gmail.com> writes:
> 1 - export LANG=ko_KR.UTF-8

Hmph ... I can reproduce that on Fedora 9.  It seems the problem is
that that translation is full of characters that don't exist in EUC-JP;
in particular the translations of both "ERROR" and "PANIC" contain
untranslatable characters.  This means that every time we go to send
a message to the client, we get a recursive error trap.

The "ERRORDATA_STACK_SIZE exceeded" message is intentionally not exposed
to gettext translation, in hopes of stopping this problem, but that
doesn't help much when the PANIC message is exposed :-(.

So one thing we might try to do about it is to intentionally not allow
translation of PANIC (at line 2446 of elog.c).  However, that only gets
us down from a stack-overflow crash to a PANIC, which is just about as
bad from a reliability standpoint.

I think the only permanent solution to this class of problem is going
to be something like this:

* When we hit the stack depth overflow PANIC situation in elog.c,
disable gettext so that the error will always be reported in ASCII.

* Reduce the PANIC to a FATAL so that misconfiguration of this sort
just kills the one session and doesn't cause a database crash.

This is still not very nice because what the user would get is
a complaint about ERRORDATA_STACK_SIZE exceeded with no hint that
he's got an encoding problem.  It'd be better if we could get the
disable-gettext-and-FATAL-out behavior to apply to the "character
has no equivalent" error message, but I'm not sure how we do that
without bollixing up less-critical occurrences of that message.

Thoughts?
        regards, tom lane


Re: ERRORDATA_STACK_SIZE exceeded (server crash)

From
Tom Lane
Date:
I wrote:
> This is still not very nice because what the user would get is
> a complaint about ERRORDATA_STACK_SIZE exceeded with no hint that
> he's got an encoding problem.  It'd be better if we could get the
> disable-gettext-and-FATAL-out behavior to apply to the "character
> has no equivalent" error message, but I'm not sure how we do that
> without bollixing up less-critical occurrences of that message.

After poking around a bit I decided that this could be done in a not
horrendously ugly way if we are willing to make a couple more places
know about escaping from error recursion situations.  Attached is a
proposed patch that prevents the crash shown previously.  BTW, a better
stress test for this is to set LANG = tr_TR.utf8, client_encoding =
latin1, and then try "select E'\305\237';".  That's because the
"character has no equivalent" message isn't itself translated in the
present ko translation, but it is in the tr translation.  My first-cut
patch worked for the ko case and not the tr case :-(

One thing that is still a bit ugly about this patch is the hack in
wchar.c to ensure that the "character has no equivalent" message
doesn't get translated:

     * ...  Note that we have to
     * spell the message slightly differently, which we do by sticking a
     * space on the end --- using errmsg_internal() doesn't actually keep
     * elog.c from calling gettext, it only prevents the string from being
     * entered into the translation lists.

It might be better to modify elog.c so that errmsg_internal really
doesn't call gettext.  This would require kluging up EVALUATE_MESSAGE()
a bit, so I'm not quite sure which is cleaner.  Thoughts?

            regards, tom lane

Index: src/backend/utils/error/elog.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/error/elog.c,v
retrieving revision 1.208
diff -c -r1.208 elog.c
*** src/backend/utils/error/elog.c    17 Oct 2008 22:56:16 -0000    1.208
--- src/backend/utils/error/elog.c    27 Oct 2008 16:16:28 -0000
***************
*** 149,154 ****
--- 149,169 ----
  static void setup_formatted_log_time(void);
  static void setup_formatted_start_time(void);

+
+ /*
+  * in_error_recursion_trouble --- are we at risk of infinite error recursion?
+  *
+  * This function exists to provide common control of various fallback steps
+  * that we take if we think we are facing infinite error recursion.  See the
+  * callers for details.
+  */
+ bool
+ in_error_recursion_trouble(void)
+ {
+     /* Pull the plug if recurse more than once */
+     return (recursion_depth > 2);
+ }
+
  /*
   * errstart --- begin an error-reporting cycle
   *
***************
*** 261,272 ****
          MemoryContextReset(ErrorContext);

          /*
!          * If we recurse more than once, the problem might be something broken
           * in a context traceback routine.    Abandon them too.  We also abandon
           * attempting to print the error statement (which, if long, could
           * itself be the source of the recursive failure).
           */
!         if (recursion_depth > 2)
          {
              error_context_stack = NULL;
              debug_query_string = NULL;
--- 276,287 ----
          MemoryContextReset(ErrorContext);

          /*
!          * Infinite error recursion might be due to something broken
           * in a context traceback routine.    Abandon them too.  We also abandon
           * attempting to print the error statement (which, if long, could
           * itself be the source of the recursive failure).
           */
!         if (in_error_recursion_trouble())
          {
              error_context_stack = NULL;
              debug_query_string = NULL;
***************
*** 2408,2413 ****
--- 2423,2432 ----

  /*
   * error_severity --- get localized string representing elevel
+  *
+  * Note: in an error recursion situation, we stop localizing the tags
+  * for ERROR and above.  This is necessary because the problem might be
+  * failure to convert one of these strings to the client encoding.
   */
  static const char *
  error_severity(int elevel)
***************
*** 2437,2449 ****
              prefix = _("WARNING");
              break;
          case ERROR:
!             prefix = _("ERROR");
              break;
          case FATAL:
!             prefix = _("FATAL");
              break;
          case PANIC:
!             prefix = _("PANIC");
              break;
          default:
              prefix = "???";
--- 2456,2477 ----
              prefix = _("WARNING");
              break;
          case ERROR:
!             if (in_error_recursion_trouble())
!                 prefix = "ERROR";
!             else
!                 prefix = _("ERROR");
              break;
          case FATAL:
!             if (in_error_recursion_trouble())
!                 prefix = "FATAL";
!             else
!                 prefix = _("FATAL");
              break;
          case PANIC:
!             if (in_error_recursion_trouble())
!                 prefix = "PANIC";
!             else
!                 prefix = _("PANIC");
              break;
          default:
              prefix = "???";
Index: src/backend/utils/mb/wchar.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/mb/wchar.c,v
retrieving revision 1.66
diff -c -r1.66 wchar.c
*** src/backend/utils/mb/wchar.c    15 Nov 2007 21:14:40 -0000    1.66
--- src/backend/utils/mb/wchar.c    27 Oct 2008 16:16:28 -0000
***************
*** 1567,1578 ****
      for (j = 0; j < jlimit; j++)
          p += sprintf(p, "%02x", (unsigned char) mbstr[j]);

!     ereport(ERROR,
!             (errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
!       errmsg("character 0x%s of encoding \"%s\" has no equivalent in \"%s\"",
!              buf,
!              pg_enc2name_tbl[src_encoding].name,
!              pg_enc2name_tbl[dest_encoding].name)));
  }

  #endif
--- 1567,1595 ----
      for (j = 0; j < jlimit; j++)
          p += sprintf(p, "%02x", (unsigned char) mbstr[j]);

!     /*
!      * In an error recursion situation, don't try to translate the message.
!      * This gets us out of trouble if the problem is failure to convert
!      * the translated message to the client encoding.  Note that we have to
!      * spell the message slightly differently, which we do by sticking a
!      * space on the end --- using errmsg_internal() doesn't actually keep
!      * elog.c from calling gettext, it only prevents the string from being
!      * entered into the translation lists.
!      */
!     if (in_error_recursion_trouble())
!         ereport(ERROR,
!                 (errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
!                  errmsg_internal("character 0x%s of encoding \"%s\" has no equivalent in \"%s\" ",
!                                  buf,
!                                  pg_enc2name_tbl[src_encoding].name,
!                                  pg_enc2name_tbl[dest_encoding].name)));
!     else
!         ereport(ERROR,
!                 (errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
!                  errmsg("character 0x%s of encoding \"%s\" has no equivalent in \"%s\"",
!                         buf,
!                         pg_enc2name_tbl[src_encoding].name,
!                         pg_enc2name_tbl[dest_encoding].name)));
  }

  #endif
Index: src/include/utils/elog.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/utils/elog.h,v
retrieving revision 1.96
diff -c -r1.96 elog.h
*** src/include/utils/elog.h    9 Oct 2008 22:22:31 -0000    1.96
--- src/include/utils/elog.h    27 Oct 2008 16:16:28 -0000
***************
*** 324,329 ****
--- 324,330 ----
  /* Other exported functions */
  extern void DebugFileOpen(void);
  extern char *unpack_sql_state(int sql_state);
+ extern bool in_error_recursion_trouble(void);

  #ifdef HAVE_SYSLOG
  extern void set_syslog_parameters(const char *ident, int facility);

Re: ERRORDATA_STACK_SIZE exceeded (server crash)

From
Alvaro Herrera
Date:
Tom Lane escribió:

> It might be better to modify elog.c so that errmsg_internal really
> doesn't call gettext.  This would require kluging up EVALUATE_MESSAGE()
> a bit, so I'm not quite sure which is cleaner.  Thoughts?

I think we document somewhere that a translator can add errmsg_internal to the
list of gettext triggers in nls.mk, so I agree that we should making sure it
doesn't call gettext at all, just to be sure some overly zealous translator
gets past the barrier.

(Also, actually adding errmsg_internal is pretty pointless, because the
translated messages would be lost when the catalog is passed through our
new translation status web system).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: ERRORDATA_STACK_SIZE exceeded (server crash)

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane escribi�:
>> It might be better to modify elog.c so that errmsg_internal really
>> doesn't call gettext.  This would require kluging up EVALUATE_MESSAGE()
>> a bit, so I'm not quite sure which is cleaner.  Thoughts?

> I think we document somewhere that a translator can add errmsg_internal to the
> list of gettext triggers in nls.mk, so I agree that we should making sure it
> doesn't call gettext at all, just to be sure some overly zealous translator
> gets past the barrier.

Yeah, I agree --- will do it that way.
        regards, tom lane