Thread: trivial DoS on char recoding

trivial DoS on char recoding

From
Alvaro Herrera
Date:
Oswaldo Hernandez just reported this in the pgsql-es-ayuda list.
Basically, a conversion between UTF8 and windows_1250 can crash the
server.

I recall a bug around this general code but I don't recall it being able
to provoke a PANIC.

To reproduce, create a cluster with UTF-8 encoding and locale es_ES (I'm
actually using es_CL but it should be the same).  Note that the es_ES
locale is declared to use Latin1 encoding, not UTF-8.  In a psql
session,

template1=# copy foo from '/tmp/foo' ;
ERROR:  no existe la relación «foo»
template1=# \encoding latin1
template1=# copy foo from '/tmp/foo' ;
ERROR:  could not convert UTF8 character 0x00f3 to ISO8859-1
template1=# \encoding windows_1250
template1=# copy foo from '/tmp/foo' ;
PANIC:  ERRORDATA_STACK_SIZE exceeded

Table "foo" nor the /tmp/foo file need to exist.

In the server logs, I set "log_line_prefix" to %x (Xid) to make it
obvious that these reports are in processing the same message.  When the
PANIC occurs, the server logs this:

574 ERROR:  no existe la relación «foo»
574 WARNING:  ignorando el carácter UTF-8 no convertible 0xf36e20ab
574 WARNING:  ignorando el carácter UTF-8 no convertible 0xe16374
574 WARNING:  ignorando el carácter UTF-8 no convertible 0xe16374
574 WARNING:  ignorando el carácter UTF-8 no convertible 0xe16374
574 PANIC:  ERRORDATA_STACK_SIZE exceeded
574 SENTENCIA:  copy foo from '/tmp/datoscopy' ;


To reproduce, you using a non-C locale is (es_ES works for me).  If I
start the postmaster with -C lc_messages=C, the problem does not occur.
Note that the PO file for the spanish translation is written in Latin1,
not UTF8.  So I can adventure that the server is trying to recode a
string which is originally in Latin1, but assuming it is UTF-8, to
Win1250.

Now, it can be argued that this is really operator error -- because I
can't crash the server if I correctly initdb with es_CL.UTF8.  Should we
get firmer in rejecting invalid configurations?

I'm not sure up to what point this affects other translations, collates,
encodings -- right now I only have "es" (spanish) compiled and my system
is not configured to accept anything else.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: trivial DoS on char recoding

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> To reproduce, you using a non-C locale is (es_ES works for me).

*blush*  Sorry, I rewrote this phrase and obviously didn't reread it
very carefully :-)  It means that you must use a non-C locale.


-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: trivial DoS on char recoding

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Note that the PO file for the spanish translation is written in Latin1,
> not UTF8.  So I can adventure that the server is trying to recode a
> string which is originally in Latin1, but assuming it is UTF-8, to
> Win1250.

Yeah, this is a known problem --- basically it seems a shortcoming of
the gettext() API.  You can find details in the archives.

> Should we get firmer in rejecting invalid configurations?

The question is how sure are we whether a configuration is "invalid".
AFAIK there's not a really portable way to determine which encoding
matches a locale.  initdb has a kluge that seems to work most of the
time, but do we want the database to refuse to start when it doesn't?
        regards, tom lane


Re: trivial DoS on char recoding

From
Martijn van Oosterhout
Date:
On Tue, Jun 20, 2006 at 06:10:38PM -0400, Tom Lane wrote:
> > Should we get firmer in rejecting invalid configurations?
>
> The question is how sure are we whether a configuration is "invalid".
> AFAIK there's not a really portable way to determine which encoding
> matches a locale.  initdb has a kluge that seems to work most of the
> time, but do we want the database to refuse to start when it doesn't?

Well, this "kludge" is the recommended and documented way to do it on
glibc based systems as well as many others.

It turns out however that there is a libcharset[1] for portably
determining the charset for your current locale. What's most
interesting about it is that it has tables for various OSes and
mappings from their names to standard names (the ones used by Glibc).
It's LGPL so we can't include the stuff verbatim, but it's not a lot of
code.

I'm not sure why we persist in beleiving this test is so unreliable we
won't even emit a warning...

[1] http://www.haible.de/bruno/packages-libcharset.html

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.