Thread: ESQL/C FETCH of CHAR data delivers to much data for UTF-8

ESQL/C FETCH of CHAR data delivers to much data for UTF-8

From
Matthias Apitz
Date:
Hello,

We encounter the following problem with ESQL/C: Imagine a table with two
columns: CHAR(16) and DATE

The CHAR column can contain not only 16 bytes, but 16 Unicode chars,
which are longer than 16 bytes if one or more of the chars is a UTF-8 multibyte
encoded.

If one provides in C a host structure to FETCH the data as:

EXEC SQL BEGIN DECLARE SECTION;
struct  r_d02ben_ec {
        char    string[17];
        char    date[11];
};
typedef struct r_d02ben_ec t_d02ben_ec;
t_d02ben_ec *hp_d02ben, hrec_d02ben;
EXEC SQL END DECLARE SECTION;

and fetches the data with ESQL/C as:

EXEC SQL FETCH hc_d02ben INTO :hrec_d02ben;

The generated C-code looks like this:

    ...
    ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, "fetch hc_d02ben", ECPGt_EOIT,
        ECPGt_char,&(hrec_d02ben.string),(long)17,(long)1,sizeof( struct r_d02ben_ec ),
        ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L,
        ECPGt_char,&(hrec_d02ben.date),(long)11,(long)1,sizeof( struct r_d02ben_ec ),
        ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L,
        ...

As you can see for the first item the length 17 is sent to the PG server
together with the pointer to where the data should be stored
and for the second element the length 11 is sent (which is big enough to
receive in ASCII MM.DD.YYYY and a trailing \0).

What we now see using GDB is that for the first element all UTF-8 data
is returned, lets asume only one multibyte char, which gives 17 bytes,
not only 16, and the trailing NULL is already placed into the element for
the date. Now the function ECPGdo() returns the date as MM.DD.YYYY
into the area pointed to for the 2nd element and with this overwrites
the NULL terminator of the string[17] element. Result is later a
SIGSEGV because the expected string in string[17] is not NULL
terminated anymore :-)

I would call it a bug, that ECPGdo() puts more than 17 bytes (16 bytes +
NULL) as return into the place pointed to by the host var pointer when
the column in the database has more (UTF-8) chars as will fit into
16+1 byte.

Comments?
Proposals for a solution?

Thanks 

    matthias


-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: ESQL/C FETCH of CHAR data delivers to much data for UTF-8

From
Olivier Gautherot
Date:
Hi Matthias,

On Thu, Jan 9, 2020, 20:21 Matthias Apitz <guru@unixarea.de> wrote:
Hello,

We encounter the following problem with ESQL/C: Imagine a table with two
columns: CHAR(16) and DATE

The CHAR column can contain not only 16 bytes, but 16 Unicode chars,
which are longer than 16 bytes if one or more of the chars is a UTF-8 multibyte
encoded.

If one provides in C a host structure to FETCH the data as:

EXEC SQL BEGIN DECLARE SECTION;
struct  r_d02ben_ec {
        char    string[17];
        char    date[11];
};
typedef struct r_d02ben_ec t_d02ben_ec;
t_d02ben_ec *hp_d02ben, hrec_d02ben;
EXEC SQL END DECLARE SECTION;

and fetches the data with ESQL/C as:

EXEC SQL FETCH hc_d02ben INTO :hrec_d02ben;

The generated C-code looks like this:

    ...
    ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, "fetch hc_d02ben", ECPGt_EOIT,
        ECPGt_char,&(hrec_d02ben.string),(long)17,(long)1,sizeof( struct r_d02ben_ec ),
        ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L,
        ECPGt_char,&(hrec_d02ben.date),(long)11,(long)1,sizeof( struct r_d02ben_ec ),
        ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L,
        ...

As you can see for the first item the length 17 is sent to the PG server
together with the pointer to where the data should be stored
and for the second element the length 11 is sent (which is big enough to
receive in ASCII MM.DD.YYYY and a trailing \0).

What we now see using GDB is that for the first element all UTF-8 data
is returned, lets asume only one multibyte char, which gives 17 bytes,
not only 16, and the trailing NULL is already placed into the element for
the date. Now the function ECPGdo() returns the date as MM.DD.YYYY
into the area pointed to for the 2nd element and with this overwrites
the NULL terminator of the string[17] element. Result is later a
SIGSEGV because the expected string in string[17] is not NULL
terminated anymore :-)

I would call it a bug, that ECPGdo() puts more than 17 bytes (16 bytes +
NULL) as return into the place pointed to by the host var pointer when
the column in the database has more (UTF-8) chars as will fit into
16+1 byte.

Comments?
Proposals for a solution?

Thanks

        matthias


--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub


I would be cautious about naming this a bug as it is a classical buffer overflow (i.e. design) issue: if you have UTF-8 characters, your text is no longer 16-byte long and you should plan extra space in your variables.