Thread: Weird behaviour of C extension function

Weird behaviour of C extension function

From
Amaury Bouchard
Date:
Hello everybody,

I have a really strange behaviour with a C function, wich gets a text as parameter.
Everything works fine when I call the function directly, giving a text string as parameter. But a problem occurs when I try to read data from a table.

To illustrate the problem, I stripped the function down to the minimum. The source code is below, but first, here is the behaviour :

Direct call
-----------
> select passthru('hello world!'), passthru('utf8 çhàràtérs'), passthru(' h3110 123 456 ');
INFO:  INPUT STRING: 'hello world!' (12)
INFO:  INPUT STRING: 'utf8 çhàràtérs' (18)
INFO:  INPUT STRING: ' h3110 123 456 ' (15)

(as you can see, the log messages show the correct input, with the number of bytes between parentheses)

Reading a table data
--------------------
> create table mytable ( str text);
> insert into mytable (str) values ('hello world!'), ('utf8 çhàràtérs'), (' h3110 123 456 ');
> select passthru(str) from mytable;
INFO:  INPUT STRING: 'lo world!' (12)
INFO:  INPUT STRING: '8 çhàràtérs' (18)
INFO:  INPUT STRING: '110 123 456 �
' (15)
INFO:  INPUT STRING: '��' (5)
INFO:  INPUT STRING: '' (3)

There, you can see that the pointer seems to be shifted 3 bytes farther.

Do you have any clue for this strange behaviour?


The source code
---------------

#include "postgres.h"
#include "fmgr.h"
#include "funcapi.h"

// PG module init
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
void _PG_init(void);
Datum passthru(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(passthru);

void _PG_init() {
}

Datum passthru(PG_FUNCTION_ARGS) {
        // get the input string
        text *input = PG_GETARG_TEXT_PP(0);
        char *input_pt = (char*)VARDATA(input);
        int32 input_len = VARSIZE_ANY_EXHDR(input);
        // create a null terminated copy of the input string
        char *str_copy = calloc(1, input_len + 1);
        memcpy(str_copy, input_pt, input_len);
        // log message
        elog(INFO, "INPUT STRING: '%s' (%d)", str_copy, input_len);
        free(str_copy);
        PG_RETURN_NULL();
}



Thank you.
Best regards,

Amaury Bouchard

Re: Weird behaviour of C extension function

From
Laurenz Albe
Date:
On Fri, 2020-04-24 at 14:53 +0200, Amaury Bouchard wrote:
> I have a really strange behaviour with a C function, wich gets a text as parameter.
> Everything works fine when I call the function directly, giving a text string as parameter. But a problem occurs when
Itry to read data from a table.
 
> 
> To illustrate the problem, I stripped the function down to the minimum. The source code is below, but first, here is
thebehaviour :
 
> 
> Direct call
> -----------
> > select passthru('hello world!'), passthru('utf8 çhàràtérs'), passthru(' h3110 123 456 ');
> INFO:  INPUT STRING: 'hello world!' (12)
> INFO:  INPUT STRING: 'utf8 çhàràtérs' (18)
> INFO:  INPUT STRING: ' h3110 123 456 ' (15)
> 
> (as you can see, the log messages show the correct input, with the number of bytes between parentheses)
> 
> Reading a table data
> --------------------
> > create table mytable ( str text);
> > insert into mytable (str) values ('hello world!'), ('utf8 çhàràtérs'), (' h3110 123 456 ');
> > select passthru(str) from mytable;
> INFO:  INPUT STRING: 'lo world!' (12)
> INFO:  INPUT STRING: '8 çhàràtérs' (18)
> INFO:  INPUT STRING: '110 123 456 �
> ' (15)
> INFO:  INPUT STRING: '��' (5)
> INFO:  INPUT STRING: '' (3)
> 
> There, you can see that the pointer seems to be shifted 3 bytes farther.
> 
> Do you have any clue for this strange behaviour?
> 
> 
> The source code
> ---------------
> 
> #include "postgres.h"
> #include "fmgr.h"
> #include "funcapi.h"
> 
> // PG module init
> #ifdef PG_MODULE_MAGIC
> PG_MODULE_MAGIC;
> #endif
> void _PG_init(void);
> Datum passthru(PG_FUNCTION_ARGS);
> PG_FUNCTION_INFO_V1(passthru);
> 
> void _PG_init() {
> }
> 
> Datum passthru(PG_FUNCTION_ARGS) {
>         // get the input string
>         text *input = PG_GETARG_TEXT_PP(0);
>         char *input_pt = (char*)VARDATA(input);
>         int32 input_len = VARSIZE_ANY_EXHDR(input);
>         // create a null terminated copy of the input string
>         char *str_copy = calloc(1, input_len + 1); 
>         memcpy(str_copy, input_pt, input_len);
>         // log message
>         elog(INFO, "INPUT STRING: '%s' (%d)", str_copy, input_len);
>         free(str_copy);
>         PG_RETURN_NULL();
> }

You find this in "postgres.h":

 * In consumers oblivious to data alignment, call PG_DETOAST_DATUM_PACKED(),
 * VARDATA_ANY(), VARSIZE_ANY() and VARSIZE_ANY_EXHDR().  Elsewhere, call
 * PG_DETOAST_DATUM(), VARDATA() and VARSIZE().  Directly fetching an int16,
 * int32 or wider field in the struct representing the datum layout requires
 * aligned data.  memcpy() is alignment-oblivious, as are most operations on
 * datatypes, such as text, whose layout struct contains only char fields.

So you should use VARDATA_ANY.

What happens is that these short text columns have a 1-byte TOAST header,
but you ship the first 4 bytes unconditionally, assuming they were detoasted.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com