Thread: PQclientEncoding() returns -1, resulting in possible assertion failure in psql

PQclientEncoding() returns -1, resulting in possible assertion failure in psql

From
Peter Geoghegan
Date:
It's possible to get psql to abort() in an assertion failure from
legitimate input. If the server is shutdown, and PQclientEncoding()
returns -1, that value can be passed to a site that expects a valid
encoding.

If the call to PQclientEncoding() from processSQLNamePattern() (plus
probably from other places) gets an encoding of -1, (called when we
"\df+ somefunc" after the server is gone - you may have to do it
twice), that can be passed down to PQmblen(), which will in turn pass
that value to pg_encoding_mblen(), which doesn't expect it. Here is a
backtrace from GDB:

#0  0x00007f0d4a3ba037 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f0d4a3bd698 in __GI_abort () at abort.c:90
#2  0x00007f0d4a3b2e03 in __assert_fail_base (fmt=0x7f0d4a50a158
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x7f0d4ac17de8 "((encoding) >= 0 &&
(encoding) < _PG_LAST_ENCODING_)", file=file@entry=0x7f0d4ac17de0
"wchar.c", line=line@entry=1781,
function=function@entry=0x7f0d4ac17e80 <__PRETTY_FUNCTION__.5143>
"pg_encoding_mblen") at assert.c:92
#3  0x00007f0d4a3b2eb2 in __GI___assert_fail (assertion=0x7f0d4ac17de8
"((encoding) >= 0 && (encoding) < _PG_LAST_ENCODING_)",
file=0x7f0d4ac17de0 "wchar.c", line=1781, function=0x7f0d4ac17e80
<__PRETTY_FUNCTION__.5143> "pg_encoding_mblen") at assert.c:101
#4  0x00007f0d4ac13b77 in pg_encoding_mblen (encoding=-1,
mbstr=0xd05d00 "pg_stat_statements") at wchar.c:1781
#5  0x00007f0d4ac01f48 in PQmblen (s=0xd05d00 "pg_stat_statements",
encoding=-1) at fe-misc.c:1143
#6  0x0000000000433ef3 in processSQLNamePattern (conn=0x0,
buf=0x7ffff15dd560, pattern=0xd05d00 "pg_stat_statements",
have_where=0 '\000', force_escape=0 '\000', schemavar=0x44cec1
"n.nspname", namevar=0x44cef9 "p.proname", altnamevar=0x0,
visibilityrule=0x44ced0 "pg_catalog.pg_function_is_visible(p.oid)") at
dumputils.c:1098
#7  0x000000000041e139 in describeFunctions (functypes=0xd036d2 "+",
pattern=0xd05d00 "pg_stat_statements", verbose=1 '\001', showSystem=0
'\000') at describe.c:441
#8  0x0000000000404830 in exec_command (cmd=0xd036d0 "df+",
scan_state=0xcf5550, query_buf=0xcf56c0) at command.c:389
#9  0x0000000000403f7a in HandleSlashCmds (scan_state=0xcf5550,
query_buf=0xcf56c0) at command.c:111
#10 0x000000000040ec66 in MainLoop (source=0x7f0d4a746360
<_IO_2_1_stdin_>) at mainloop.c:301
#11 0x0000000000414b86 in main (argc=1, argv=0x7ffff15ddaf8) at startup.c:329

It's not immediately obvious what the best fix here is - I have some
ideas, but they all seem pretty invasive.

I think it's a bug that processSQLNamePattern() trusts
PQclientEncoding() to return a sane encodingid. Numerous points in
dumputils similarly expect this. The assertion failure isn't spurious,
because the encodingid will subscript an array in pg_encoding_mblen(),
which is undefined here.

--
Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes:
> It's possible to get psql to abort() in an assertion failure from
> legitimate input. If the server is shutdown, and PQclientEncoding()
> returns -1, that value can be passed to a site that expects a valid
> encoding.

> If the call to PQclientEncoding() from processSQLNamePattern() (plus
> probably from other places) gets an encoding of -1, (called when we
> "\df+ somefunc" after the server is gone - you may have to do it
> twice), that can be passed down to PQmblen(), which will in turn pass
> that value to pg_encoding_mblen(), which doesn't expect it.

A simple fix might be to return SQL_ASCII not -1.  I don't think the
-1 behavior is documented or relied on anywhere, is it?

Otherwise, we could check for live connection somewhere much earlier
in the processing of \d and friends.  But it might be difficult to
find a single choke point for doing that, rendering the reliability
of the fix somewhat suspect.

            regards, tom lane
I wrote:
> Peter Geoghegan <pg@heroku.com> writes:
>> If the call to PQclientEncoding() from processSQLNamePattern() (plus
>> probably from other places) gets an encoding of -1, (called when we
>> "\df+ somefunc" after the server is gone - you may have to do it
>> twice), that can be passed down to PQmblen(), which will in turn pass
>> that value to pg_encoding_mblen(), which doesn't expect it.

> A simple fix might be to return SQL_ASCII not -1.  I don't think the
> -1 behavior is documented or relied on anywhere, is it?

I looked at this a bit more closely and it seems to be a legitimate
fix.  I would also suggest that we remove the connection status check;
there is no very good reason to forget what we knew about the encoding
the moment the connection drops.  So something like

 int
 PQclientEncoding(const PGconn *conn)
 {
-    if (!conn || conn->status != CONNECTION_OK)
-        return -1;
+    if (!conn)
+        return PG_SQL_ASCII;
     return conn->client_encoding;
 }


A compromise that would probably fix the problem for psql, if not
other places, would be to just remove the status check (ie, first
line of above diff but not second).  That might be the thing to
do if people are afraid to change the behavior this much in back
branches.  I would argue however that the documentation nowhere
suggests that PQclientEncoding can return a bogus encoding ID,
so this is more likely to be a bug fix than a new bug for other
programs as well.  Also, it looks to me like there are probably
other corner cases anyway where you can get PG_SQL_ASCII rather
than the actual encoding, because that's how we initialize
conn->client_encoding ...

            regards, tom lane

Re: PQclientEncoding() returns -1, resulting in possible assertion failure in psql

From
Peter Geoghegan
Date:
On Sun, Dec 8, 2013 at 10:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I looked at this a bit more closely and it seems to be a legitimate
> fix.  I would also suggest that we remove the connection status check;
> there is no very good reason to forget what we knew about the encoding
> the moment the connection drops.

+1 to the idea of teaching PQclientEncoding() to report the last
encodingid, or returning PG_SQL_ASCII when that isn't possible. I was
previously a little unsure about that, simply because that behavior
has existed for many years, but as you say nobody has any right to
rely on it. I'm inclined to agree that it will help more than it will
hurt everyone else.


--
Peter Geoghegan
> branches.  I would argue however that the documentation nowhere
> suggests that PQclientEncoding can return a bogus encoding ID,
> so this is more likely to be a bug fix than a new bug for other
> programs as well.  Also, it looks to me like there are probably

This sounds like a little bit unfair argument. The libpq documentation
is pretty sloppy for the error case for other PQ* as well. For
example, look at the PQdb document:

PQdb
    Returns the database name of the connection.
    char *PQdb(const PGconn *conn);

This says nothing about when the connection is bad. Reality is PQdb
returns NULL in the case. But are we allowed to change PQdb returns
say, "template1" when the connection is bad because the doc says
nothing about error case?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
On Mon, Dec  9, 2013 at 08:17:35AM +0900, Tatsuo Ishii wrote:
> > branches.  I would argue however that the documentation nowhere
> > suggests that PQclientEncoding can return a bogus encoding ID,
> > so this is more likely to be a bug fix than a new bug for other
> > programs as well.  Also, it looks to me like there are probably
>
> This sounds like a little bit unfair argument. The libpq documentation
> is pretty sloppy for the error case for other PQ* as well. For
> example, look at the PQdb document:
>
> PQdb
>     Returns the database name of the connection.
>     char *PQdb(const PGconn *conn);
>
> This says nothing about when the connection is bad. Reality is PQdb
> returns NULL in the case. But are we allowed to change PQdb returns
> say, "template1" when the connection is bad because the doc says
> nothing about error case?

So, what did we decide on this?  Should we document the -1 return, or
return SQL_ASCII.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +
On Fri, Mar 21, 2014 at 09:53:25PM -0400, Bruce Momjian wrote:
> On Mon, Dec  9, 2013 at 08:17:35AM +0900, Tatsuo Ishii wrote:
> > > branches.  I would argue however that the documentation nowhere
> > > suggests that PQclientEncoding can return a bogus encoding ID,
> > > so this is more likely to be a bug fix than a new bug for other
> > > programs as well.  Also, it looks to me like there are probably
> >
> > This sounds like a little bit unfair argument. The libpq documentation
> > is pretty sloppy for the error case for other PQ* as well. For
> > example, look at the PQdb document:
> >
> > PQdb
> >     Returns the database name of the connection.
> >     char *PQdb(const PGconn *conn);
> >
> > This says nothing about when the connection is bad. Reality is PQdb
> > returns NULL in the case. But are we allowed to change PQdb returns
> > say, "template1" when the connection is bad because the doc says
> > nothing about error case?
>
> So, what did we decide on this?  Should we document the -1 return, or
> return SQL_ASCII.

OK, hearing nothing, I dug into this, and I think the solution is
simpler than we thought.  Basically, the Assert is checking for the
encoding value to be in a valid range, but the main code is also
checking for an invalid encoding and returning PG_SQL_ASCII:

    pg_encoding_mblen(int encoding, const char *mbstr)
    {
        Assert(PG_VALID_ENCODING(encoding));

        return ((encoding >= 0 &&
           encoding < sizeof(pg_wchar_table) / sizeof(pg_wchar_tbl)) ?
              ((*pg_wchar_table[encoding].mblen) ((const unsigned char *) mbstr)) :
          ((*pg_wchar_table[PG_SQL_ASCII].mblen) ((const unsigned char *) mbstr)));
    }

I think the Assert can be removed as it is checking for something that
the main code handles just fine.  I assume Asserts are only for checks we
don't want to make in the main code path.  Throwing an error for an
assert build and handling the value just fine in a non-assert build
makes no sense.

I have updated the documentation to mention the libpq's
PQclientEncoding() possible return value of -1, and removed the Asserts
from three functions that are already handling invalid encoding values
just fine.

This does fix the reported failure with Asserts enabled:

BEFORE:

    test=> \df lkjasdf
    server closed the connection unexpectedly
            This probably means the server terminated abnormally
            before or while processing the request.
    The connection to the server was lost. Attempting reset: Failed.
    !> \df lkjasdf
    psql: wchar.c:1781: pg_encoding_mblen: Assertion `((encoding) >= 0 && (encoding) < _PG_LAST_ENCODING_)' failed.
    /usr/lbin/execargs: line 10: 25883 Aborted                 "$@"

AFTER:

    test=> \df lkjasdf
    server closed the connection unexpectedly
            This probably means the server terminated abnormally
            before or while processing the request.
    The connection to the server was lost. Attempting reset: Failed.
    !> \df lkjasdf
    You are currently not connected to a database.
    !>

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment
On Sat, Mar 22, 2014 at 11:17:23AM -0400, Bruce Momjian wrote:
> OK, hearing nothing, I dug into this, and I think the solution is
> simpler than we thought.  Basically, the Assert is checking for the
> encoding value to be in a valid range, but the main code is also
> checking for an invalid encoding and returning PG_SQL_ASCII:
>
>     pg_encoding_mblen(int encoding, const char *mbstr)
>     {
>         Assert(PG_VALID_ENCODING(encoding));
>
>         return ((encoding >= 0 &&
>            encoding < sizeof(pg_wchar_table) / sizeof(pg_wchar_tbl)) ?
>               ((*pg_wchar_table[encoding].mblen) ((const unsigned char *) mbstr)) :
>           ((*pg_wchar_table[PG_SQL_ASCII].mblen) ((const unsigned char *) mbstr)));
>     }

Also, what is the difference between _PG_LAST_ENCODING_, which is the
last encoding enum, and the pg_wchar_tbl length calculation?  Are they
the same?  If so, we should just use _PG_LAST_ENCODING_.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +
Bruce Momjian <bruce@momjian.us> writes:
> OK, hearing nothing, I dug into this, and I think the solution is
> simpler than we thought.  Basically, the Assert is checking for the
> encoding value to be in a valid range, but the main code is also
> checking for an invalid encoding and returning PG_SQL_ASCII:

Agreed that that's pretty useless.  I wonder though why these functions
are not coded like

       return PG_VALID_ENCODING(encoding) ?
           ((*pg_wchar_table[encoding].mblen) ((const unsigned char *) mbstr)) :
           ((*pg_wchar_table[PG_SQL_ASCII].mblen) ((const unsigned char *) mbstr)));

instead of the hard-to-read explicit range check.

            regards, tom lane
Bruce Momjian <bruce@momjian.us> writes:
> Also, what is the difference between _PG_LAST_ENCODING_, which is the
> last encoding enum, and the pg_wchar_tbl length calculation?  Are they
> the same?

They certainly should be; I don't think it's the responsibility of these
functions to defend against them not being so.

> If so, we should just use _PG_LAST_ENCODING_.

Well, we should use PG_VALID_ENCODING() imo.

            regards, tom lane
On Sat, Mar 22, 2014 at 11:55:56AM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > OK, hearing nothing, I dug into this, and I think the solution is
> > simpler than we thought.  Basically, the Assert is checking for the
> > encoding value to be in a valid range, but the main code is also
> > checking for an invalid encoding and returning PG_SQL_ASCII:
>
> Agreed that that's pretty useless.  I wonder though why these functions
> are not coded like
>
>        return PG_VALID_ENCODING(encoding) ?
>            ((*pg_wchar_table[encoding].mblen) ((const unsigned char *) mbstr)) :
>            ((*pg_wchar_table[PG_SQL_ASCII].mblen) ((const unsigned char *) mbstr)));
>
> instead of the hard-to-read explicit range check.

Agreed.  Modified patch attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment
On Sat, Mar 22, 2014 at 08:17:11PM -0400, Bruce Momjian wrote:
> On Sat, Mar 22, 2014 at 11:55:56AM -0400, Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > OK, hearing nothing, I dug into this, and I think the solution is
> > > simpler than we thought.  Basically, the Assert is checking for the
> > > encoding value to be in a valid range, but the main code is also
> > > checking for an invalid encoding and returning PG_SQL_ASCII:
> >
> > Agreed that that's pretty useless.  I wonder though why these functions
> > are not coded like
> >
> >        return PG_VALID_ENCODING(encoding) ?
> >            ((*pg_wchar_table[encoding].mblen) ((const unsigned char *) mbstr)) :
> >            ((*pg_wchar_table[PG_SQL_ASCII].mblen) ((const unsigned char *) mbstr)));
> >
> > instead of the hard-to-read explicit range check.
>
> Agreed.  Modified patch attached.

Patch applied.  I did not backpatch because it only affects assert
builds, which I assume is only being done in head.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +