Thread: BUG #16200: returned data from ESQL/C FETCH is trampling outside assigned memory for CHAR column

The following bug has been logged on the website:

Bug reference:      16200
Logged by:          Matthias Apitz
Email address:      guru@unixarea.de
PostgreSQL version: 11.4
Operating system:   SuSE Linux SLES 12 SP4
Description:

We encounter the following problem with ESQL/C: Imagine a table with two
columns: CHAR(16) and DATE

In the database the CHAR column can contain not only 16 bytes, but 16
Unicode chars,
which are longer than 16 bytes if one or more of the chars is an UTF-8
multibyte
encoded char.

If one provides in C a host structure to FETCH the data as:

    EXEC SQL BEGIN DECLARE SECTION;
    struct  r_d02ben_ec {
        char    string[17];
        char    date[11];
    };
    typedef struct r_d02ben_ec t_d02ben_ec;
    t_d02ben_ec *hp_d02ben, hrec_d02ben;
    EXEC SQL END DECLARE SECTION;

and fetches the data with ESQL/C as:

    EXEC SQL FETCH hc_d02ben INTO :hrec_d02ben;

The generated C-code looks like this:

    ...
    ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, "fetch hc_d02ben",
ECPGt_EOIT,
        ECPGt_char,&(hrec_d02ben.string),(long)17,(long)1,sizeof( struct
r_d02ben_ec ),
        ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L,
        ECPGt_char,&(hrec_d02ben.date),(long)11,(long)1,sizeof( struct
r_d02ben_ec ),
        ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L,
        ...

As you can see for the first element CHAR the length 17 is given as argument
to ECPGdo()
together with the pointer to where the data should be stored
and for the second element DATE the length 11 is ggiven as argument (which
is big enough to
receive the ASCII string "MM.DD.YYYY" and a trailing \0).

What we now see using GDB is that for the first element all UTF-8 data
is returned, lets asume only one multibyte char, which gives 17 bytes,
not only 16, and the trailing NULL is already placed into the memory for
the DATE. Now the function ECPGdo() places the DATE as "MM.DD.YYYY"
into the area pointed to for the 2nd argument and with this overwrites
the NULL terminator of the string[17] element. Result is later a
SIGSEGV because the expected string in string[17] is not NULL
terminated anymore :-)

I would call it a bug, that ECPGdo() puts more than 17 bytes (16 bytes +
NULL) as return into the place pointed to by the host var pointer when
the column in the database has more (UTF-8) chars as will fit into
16+1 byte.


Hi,

> We encounter the following problem with ESQL/C: Imagine a table with 

I assume you mean ECPG, right? ESQL/C would be the Informix compiler.

> In the database the CHAR column can contain not only 16 bytes, but 16
> Unicode chars,
> which are longer than 16 bytes if one or more of the chars is an UTF-
> 8
> multibyte
> encoded char.
> ...
> the DATE. Now the function ECPGdo() places the DATE as "MM.DD.YYYY"
> into the area pointed to for the 2nd argument and with this
> overwrites
> the NULL terminator of the string[17] element. Result is later a
> SIGSEGV because the expected string in string[17] is not NULL
> terminated anymore :-)
> 
> I would call it a bug, that ECPGdo() puts more than 17 bytes (16
> bytes +
> NULL) as return into the place pointed to by the host var pointer
> when
> the column in the database has more (UTF-8) chars as will fit into
> 16+1 byte.

Actually I am not sure if this is a bug. I do not remember the standard
asking for a null termination at the end of a partial string copy.
Please correct me if I am wrong. What it does ask for is setting the
indicator accordingly. However, you do not mention any indicator, so I
wonder if you checked that one at all. If the string is truncated and
the appropriate error action is not taken, that would definitely
qualify as a bug.

Could you please verify if the indicator is set accordingly? 

If you have a small test case to reproduce the issue, please send that
one. Otherwise I can create my own but probably won't have time before
next week.

Thanks

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL




El día viernes, enero 10, 2020 a las 01:43:38p. m. +0100, Michael Meskes escribió:

> Hi,
>
> > We encounter the following problem with ESQL/C: Imagine a table with
>
> I assume you mean ECPG, right? ESQL/C would be the Informix compiler.

Hi.

I mean with ESQL/C the general term "Embedded SQL in C". Of course I do
know that for PostgreSQL the precompiler is named "ecpg".

> > In the database the CHAR column can contain not only 16 bytes, but 16
> > Unicode chars,
> > which are longer than 16 bytes if one or more of the chars is an UTF-
> > 8
> > multibyte
> > encoded char.
> > ...
> > the DATE. Now the function ECPGdo() places the DATE as "MM.DD.YYYY"
> > into the area pointed to for the 2nd argument and with this
> > overwrites
> > the NULL terminator of the string[17] element. Result is later a
> > SIGSEGV because the expected string in string[17] is not NULL
> > terminated anymore :-)
> >
> > I would call it a bug, that ECPGdo() puts more than 17 bytes (16
> > bytes +
> > NULL) as return into the place pointed to by the host var pointer
> > when
> > the column in the database has more (UTF-8) chars as will fit into
> > 16+1 byte.
>
> Actually I am not sure if this is a bug. I do not remember the standard
> asking for a null termination at the end of a partial string copy.

Neither do I know if there is some written standard, but char strings IMHO
should be NULL terminated. Our application expects in this example no
more than 16 bytes and that's why we provide a host variable as the
struct member of 17 bytes so the NULL fits into.

I digged into the sources of the ecpglib and this small fix solved our problem:

postgresql-11.4/src/interfaces/ecpg> diff ecpglib/data.c ecpglib/data.c.orig
527,531c527
<                                                       /* guru@unixarea.de: strncpy() only varcharsize-1 */
<                                                       strncpy(str, pval, varcharsize-1);
<                                                       /* guru@unixarea.de: and terminate the string */
<                                                       str[varcharsize - 1] = '\0';
<                                                       ecpg_log("DEBUG ESQL/C: result [%s] len %d\n", str,
strlen(str));
---
>                                                       strncpy(str, pval, varcharsize);

If I now store 16 UTF-8 chars German Umlaut 'ä' into the column, the ECPG log is this:

[30481] [10.01.2020 12:29:28:936]: ecpg_get_data on line 3181: RESULT: ääääääääääääääää offset: 1600; array: no
[30481] [10.01.2020 12:29:28:936]: DEBUG ESQL/C: varcharsize 17 size 32
[30481] [10.01.2020 12:29:28:936]: DEBUG ESQL/C: result [ääääääää] len 16

(The two lines with DEBUG are added by me in execute.c to understand
better the problem).

ECPG would return as RESULT as it says 'ääääääääääääääää' which is 32
bytes (16 UTF-8 chars), our varcharsize is only 17 and we now truncate the RESULT
to 16 bytes (8 UTF-8 'ä') and a trailing \0 with my proposed change and all is fine.

Btw: truncating UTF-8 strings in any case brings the risk, that the resulting
string is not longer clean UTF-8 if you hit into the middle of a multibyte
char. But such issue should be handled by the application and not by ECPG.


> Please correct me if I am wrong. What it does ask for is setting the
> indicator accordingly. However, you do not mention any indicator, so I
> wonder if you checked that one at all. If the string is truncated and
> the appropriate error action is not taken, that would definitely
> qualify as a bug.
>
> Could you please verify if the indicator is set accordingly?

If I read the docs here https://www.postgresql.org/docs/11/ecpg-variables.html#ECPG-INDICATORS
indicator vars are meant to show null values in the table and not the
string truncation. Am I wrong? We do not use indicators at all, but
catch the error condition -213.

In any case, thanks for your attention.

    matthias

--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

"Glaube wenig, hinterfrage alles, denke selbst: Wie man Manipulationen durchschaut"
"Believe little, scrutinise all, think by your own: How see through manipulations"
ISBN-10: 386489218X

Attachment
> I mean with ESQL/C the general term "Embedded SQL in C". Of course I
> do
> know that for PostgreSQL the precompiler is named "ecpg".

No worries, I was just making sure we do not talk about different
things as I have never seen anyone use this abbreviation before.

> Neither do I know if there is some written standard, but char strings

There is.

> IMHO
> should be NULL terminated. Our application expects in this example no
> more than 16 bytes and that's why we provide a host variable as the
> struct member of 17 bytes so the NULL fits into.
>
> I digged into the sources of the ecpglib and this small fix solved
> our problem:
> ...

Sure, this fixes your use case, but as I said, I'm not sure it's the
right thing to do for general use.

> > Could you please verify if the indicator is set accordingly?
>
> If I read the docs here
> https://www.postgresql.org/docs/11/ecpg-variables.html#ECPG-INDICATORS
> indicator vars are meant to show null values in the table and not the
> string truncation. Am I wrong? We do not use indicators at all, but
> catch the error condition -213.

Sorry, you're right of course. What I meant was the sqlca structure
which includes:

...
char            sqlwarn[8];
        /* Element 0: set to 'W' if at least one other is 'W'   */
        /* 1: if 'W' at least one character string              */
        /* value was truncated when it was                      */
        /* stored into a host variable.             */
...

Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL

Attachment