Re: [PATCH] Fix out-of-bouds access (src/common/wchar.c) - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: [PATCH] Fix out-of-bouds access (src/common/wchar.c)
Date
Msg-id 20220217075009.73wxg4cetspkgexz@jrouhaud
Whole thread Raw
In response to Re: [PATCH] Fix out-of-bouds access (src/common/wchar.c)  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: [PATCH] Fix out-of-bouds access (src/common/wchar.c)  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On Thu, Feb 17, 2022 at 03:51:26PM +0900, Kyotaro Horiguchi wrote:
> At Thu, 17 Feb 2022 14:58:38 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> > (Sorry for the broken mail...)
> > > >
> > > Ok, if -1 is wrong, what should the value of return if
> > > somebody calling this function like:
> > > pg_encoding_max_length(63);
> > 
> > Should result in assertion failure, I think.  If that fails, the
> > caller side is anyhow broken.  On the other hand we haven't had a
> > complain about that, maybe.
> > 
> > > Of course, with patch applied, because with original code
> > > has memory corruption, if built and run without DEBUG.
> > 
> > So we don't assume corruption in production build.  It should be
> > logically guaranteed.
> > 
> > I'll dig into that further.
> 
> The number comes from pg_char_to_encoding, which is the internal
> function of the SQL function PG_char_encoding(). It looks fine but I
> can confirm that for all possible encoding names in pg_encname_tbl[].
> 
> =# select * from (
> select e as name, PG_char_to_encoding(e) enc
> from unnest(array['abc', 'alt', 'big5', 'euccn', 'eucjis2004', 'eucjp',
>     'euckr', 'euctw', 'gb18030', 'gbk', 'iso88591', 'iso885910', 'iso885913',
>     'iso885914', 'iso885915', 'iso885916', 'iso88592', 'iso88593', 'iso88594',
>     'iso88595', 'iso88596', 'iso88597', 'iso88598', 'iso88599', 'johab',
>     'koi8', 'koi8r', 'koi8u', 'latin1', 'latin10', 'latin2', 'latin3',
>     'latin4', 'latin5', 'latin6', 'latin7', 'latin8', 'latin9', 'mskanji',
>     'muleinternal', 'shiftjis', 'shiftjis2004', 'sjis', 'sqlascii', 'tcvn',
>     'tcvn5712', 'uhc', 'unicode', 'utf8', 'vscii', 'win', 'win1250', 'win1251',
>     'win1252', 'win1253', 'win1254', 'win1255', 'win1256', 'win1257',
>     'win1258', 'win866', 'win874', 'win932', 'win936', 'win949', 'win950',
>     'windows1250', 'windows1251', 'windows1252', 'windows1253', 'windows1254',
>     'windows1255', 'windows1256', 'windows1257', 'windows1258', 'windows866',
>     'windows874', 'windows932', 'windows936', 'windows949', 'windows950',
>     'hoge']) as e) as t where enc < 0 or enc > 41;
>  name | enc 
> ------+-----
>  hoge |  -1
> (1 row)
> 
> So, the function doesn't return 63 for all registered names and wrong
> names.
> 
> So other possibilities I can think of are..
> - Someone had broken pg_encname_tbl[]
> - Cosmic ray hit, or ill memory cell.
> - Coverity worked wrong way.
> 
> Could you show the workload for the Coverity warning here?

The 63 upthread was hypothetical right?  pg_encoding_max_length() shouldn't be
called with user-dependent data (unlike pg_encoding_max_length_sql()), so I
also don't see any value spending cycles in release builds.  The error should
only happen with bogus code, and assert builds are there to avoid that, or
corrupted memory and in that case we can't make any promise.



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Make mesage at end-of-recovery less scary.
Next
From: Michael Paquier
Date:
Subject: Re: Time to drop plpython2?