Thread: encoding conversion functions versus zero-length inputs

encoding conversion functions versus zero-length inputs

From
Tom Lane
Date:
The REL7_4 members of the buildfarm are all red this morning,
with this symptom in initdb:

creating template1 database in
/usr/src/pg/build-farm-2.17/build/REL7_4_STABLE/pgsql.18854/src/test/regress/./tmp_check/data/base/1...ok
 
initializing pg_shadow... ok
enabling unlimited row size for system tables... ok
initializing pg_depend... ok
creating system views... ok
loading pg_description... ok
creating conversions... ERROR:  invalid memory alloc request size 0

This is coming from Heikki's patch of yesterday to check whether
a proposed conversion function accepts the indicated encoding IDs.
He has it passing a zero-length input string, which is a case that
the encoding conversion machinery normally short-circuits; and sure
enough some of the converters fall over if they're actually called on
such an input.  The one that dies first here is koi8r_to_win1251,
which contains
buf = palloc(len * ENCODING_GROWTH_RATE);koi8r2mic(src, buf, len);mic2win1251(buf, dest, strlen((char *)
buf));pfree(buf);

Pre-8.0, we had palloc throwing an error for a zero-size request, thus
this doesn't work.  It's worse than that though --- if the combination
of that palloc and the subsequent strlen call hasn't already set off
alarm bells in your mind, then you haven't been writing C long enough.
Sure enough, koi8r2mic null-terminates its result, so the strlen is
correct, but the palloc isn't leaving space for a trailing null.
It manages to fail to fail in normal usage (len > 0) because
ENCODING_GROWTH_RATE is ridiculously overstated, but on zero-length
input there is actually a memory clobber happening here in the 8.x
branches:

regression=# CREATE CONVERSION foo FOR 'KOI8R' TO 'WIN1251' FROM koi8r_to_win1251;
WARNING:  detected write past chunk end in PortalHeapMemory 400f3be0
CREATE CONVERSION

My first thought about fixing this was just to alter the check patch to
pass a length-one instead of length-zero test string, but I now think
that that's just hiding our heads in the sand; the right fix is to go
around and make all these palloc's "len * ENCODING_GROWTH_RATE + 1"
so that they are honestly accounting for the terminating null.  It's
a bit more tedious but it's the right fix.
        regards, tom lane


Re: encoding conversion functions versus zero-length inputs

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> The REL7_4 members of the buildfarm are all red this morning,
> with this symptom in initdb:

Oh dear. I must confess that I didn't test the 7.4 commit, because the 
7.4 branch isn't compiling on my laptop for some reason. Seemed safe 
enough since the changed codepath hadn't been modified between 7.4 and 
later version. I guess I need to fix my 7.4 installation after all...

> My first thought about fixing this was just to alter the check patch to
> pass a length-one instead of length-zero test string, but I now think
> that that's just hiding our heads in the sand; the right fix is to go
> around and make all these palloc's "len * ENCODING_GROWTH_RATE + 1"
> so that they are honestly accounting for the terminating null.  It's
> a bit more tedious but it's the right fix.

Agreed, thanks for the fix.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com