Re: Weird behaviour of C extension function - Mailing list pgsql-general

From Laurenz Albe
Subject Re: Weird behaviour of C extension function
Date
Msg-id 24e2c22c9f1f1bc38c5a88d3e0c8fef5b980cd9a.camel@cybertec.at
Whole thread Raw
In response to Weird behaviour of C extension function  (Amaury Bouchard <amaury.bouchard@anasen.com>)
List pgsql-general
On Fri, 2020-04-24 at 14:53 +0200, Amaury Bouchard wrote:
> I have a really strange behaviour with a C function, wich gets a text as parameter.
> Everything works fine when I call the function directly, giving a text string as parameter. But a problem occurs when
Itry to read data from a table.
 
> 
> To illustrate the problem, I stripped the function down to the minimum. The source code is below, but first, here is
thebehaviour :
 
> 
> Direct call
> -----------
> > select passthru('hello world!'), passthru('utf8 çhàràtérs'), passthru(' h3110 123 456 ');
> INFO:  INPUT STRING: 'hello world!' (12)
> INFO:  INPUT STRING: 'utf8 çhàràtérs' (18)
> INFO:  INPUT STRING: ' h3110 123 456 ' (15)
> 
> (as you can see, the log messages show the correct input, with the number of bytes between parentheses)
> 
> Reading a table data
> --------------------
> > create table mytable ( str text);
> > insert into mytable (str) values ('hello world!'), ('utf8 çhàràtérs'), (' h3110 123 456 ');
> > select passthru(str) from mytable;
> INFO:  INPUT STRING: 'lo world!' (12)
> INFO:  INPUT STRING: '8 çhàràtérs' (18)
> INFO:  INPUT STRING: '110 123 456 �
> ' (15)
> INFO:  INPUT STRING: '��' (5)
> INFO:  INPUT STRING: '' (3)
> 
> There, you can see that the pointer seems to be shifted 3 bytes farther.
> 
> Do you have any clue for this strange behaviour?
> 
> 
> The source code
> ---------------
> 
> #include "postgres.h"
> #include "fmgr.h"
> #include "funcapi.h"
> 
> // PG module init
> #ifdef PG_MODULE_MAGIC
> PG_MODULE_MAGIC;
> #endif
> void _PG_init(void);
> Datum passthru(PG_FUNCTION_ARGS);
> PG_FUNCTION_INFO_V1(passthru);
> 
> void _PG_init() {
> }
> 
> Datum passthru(PG_FUNCTION_ARGS) {
>         // get the input string
>         text *input = PG_GETARG_TEXT_PP(0);
>         char *input_pt = (char*)VARDATA(input);
>         int32 input_len = VARSIZE_ANY_EXHDR(input);
>         // create a null terminated copy of the input string
>         char *str_copy = calloc(1, input_len + 1); 
>         memcpy(str_copy, input_pt, input_len);
>         // log message
>         elog(INFO, "INPUT STRING: '%s' (%d)", str_copy, input_len);
>         free(str_copy);
>         PG_RETURN_NULL();
> }

You find this in "postgres.h":

 * In consumers oblivious to data alignment, call PG_DETOAST_DATUM_PACKED(),
 * VARDATA_ANY(), VARSIZE_ANY() and VARSIZE_ANY_EXHDR().  Elsewhere, call
 * PG_DETOAST_DATUM(), VARDATA() and VARSIZE().  Directly fetching an int16,
 * int32 or wider field in the struct representing the datum layout requires
 * aligned data.  memcpy() is alignment-oblivious, as are most operations on
 * datatypes, such as text, whose layout struct contains only char fields.

So you should use VARDATA_ANY.

What happens is that these short text columns have a 1-byte TOAST header,
but you ship the first 4 bytes unconditionally, assuming they were detoasted.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




pgsql-general by date:

Previous
From: Amaury Bouchard
Date:
Subject: Weird behaviour of C extension function
Next
From: Radoslav Nedyalkov
Date:
Subject: create index insist on 2 workers only