Thread: Weird problems with C extension and bytea as input type

Weird problems with C extension and bytea as input type

From
Adrian Schreyer
Date:
Hi,

I have a weird problem with my custom functions (written in C,C++)
that use bytea as input type (cstring works fine). The functions will
work as expected if they are the only function that uses the bytea
column in a query; as soon as there is a second function using the
same column, the C function will return one of the following: an empty
cstring, a substring of the bytea or the correct cstring. Based on
these symptoms I assume there is something fundamental that I do wrong
(or that is missing) with handling the bytea pointer.

In one specific example, the bytea contains a binary file format that
the function converts into a string format. I convert the bytea to a
C++ string with string(VARDATA(b), VARSIZE(b)-VARHDRSZ).

bytea *b = PG_GETARG_BYTEA_P(0);
char *ism;

ism = function(b);

PG_RETURN_CSTRING(ism);

Re: Weird problems with C extension and bytea as input type

From
Merlin Moncure
Date:
On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:
> Hi,
>
> I have a weird problem with my custom functions (written in C,C++)
> that use bytea as input type (cstring works fine). The functions will
> work as expected if they are the only function that uses the bytea
> column in a query; as soon as there is a second function using the
> same column, the C function will return one of the following: an empty
> cstring, a substring of the bytea or the correct cstring. Based on
> these symptoms I assume there is something fundamental that I do wrong
> (or that is missing) with handling the bytea pointer.
>
> In one specific example, the bytea contains a binary file format that
> the function converts into a string format. I convert the bytea to a
> C++ string with string(VARDATA(b), VARSIZE(b)-VARHDRSZ).
>
> bytea *b = PG_GETARG_BYTEA_P(0);
> char *ism;
>
> ism = function(b);
>
> PG_RETURN_CSTRING(ism);

your problem is probably inside 'function' -- are you properly copying
the data out of the bytea struct?.  also, are you really sure you want
to be returning cstring type, not text?

merlin

Re: Weird problems with C extension and bytea as input type

From
Adrian Schreyer
Date:
On Tue, Mar 22, 2011 at 16:07, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:
>> Hi,
>>
>> I have a weird problem with my custom functions (written in C,C++)
>> that use bytea as input type (cstring works fine). The functions will
>> work as expected if they are the only function that uses the bytea
>> column in a query; as soon as there is a second function using the
>> same column, the C function will return one of the following: an empty
>> cstring, a substring of the bytea or the correct cstring. Based on
>> these symptoms I assume there is something fundamental that I do wrong
>> (or that is missing) with handling the bytea pointer.
>>
>> In one specific example, the bytea contains a binary file format that
>> the function converts into a string format. I convert the bytea to a
>> C++ string with string(VARDATA(b), VARSIZE(b)-VARHDRSZ).
>>
>> bytea *b = PG_GETARG_BYTEA_P(0);
>> char *ism;
>>
>> ism = function(b);
>>
>> PG_RETURN_CSTRING(ism);
>
> your problem is probably inside 'function' -- are you properly copying
> the data out of the bytea struct?.  also, are you really sure you want
> to be returning cstring type, not text?
>
> merlin
>
yes, the function in the C++ toolkit I use takes a std::string as
input. So far I have used string(VARDATA(oeb), VARSIZE(oeb)-VARHDRSZ)
to convert it. I want to change both input and output to text later on
once this is working. I suspect I will get similar problems with
pg_getarg_text_p() or any other struct. I think you are right, I am
not coping the data out of the struct correctly; all my functions that
use VARDATA() have these problems. What would be the proper way?

Cheers,

Adrian

Re: Weird problems with C extension and bytea as input type

From
Adrian Schreyer
Date:
On Tue, Mar 22, 2011 at 16:07, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:
>> Hi,
>>
>> I have a weird problem with my custom functions (written in C,C++)
>> that use bytea as input type (cstring works fine). The functions will
>> work as expected if they are the only function that uses the bytea
>> column in a query; as soon as there is a second function using the
>> same column, the C function will return one of the following: an empty
>> cstring, a substring of the bytea or the correct cstring. Based on
>> these symptoms I assume there is something fundamental that I do wrong
>> (or that is missing) with handling the bytea pointer.
>>
>> In one specific example, the bytea contains a binary file format that
>> the function converts into a string format. I convert the bytea to a
>> C++ string with string(VARDATA(b), VARSIZE(b)-VARHDRSZ).
>>
>> bytea *b = PG_GETARG_BYTEA_P(0);
>> char *ism;
>>
>> ism = function(b);
>>
>> PG_RETURN_CSTRING(ism);
>
> your problem is probably inside 'function' -- are you properly copying
> the data out of the bytea struct?.  also, are you really sure you want
> to be returning cstring type, not text?
>
> merlin
>

yes, the function in the C++ toolkit I use takes a std::string as
input. So far I have used string(VARDATA(oeb), VARSIZE(oeb)-VARHDRSZ)
to convert it. I want to change both input and output to text later on
once this is working. I suspect I will get similar problems with
pg_getarg_text_p() or any other struct. I think you are right, I am
not coping the data out of the struct correctly; all my functions that
use VARDATA() have these problems. What would be the proper way?

Cheers,

Adrian

Re: Weird problems with C extension and bytea as input type

From
Merlin Moncure
Date:
On Tue, Mar 22, 2011 at 11:57 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:
> On Tue, Mar 22, 2011 at 16:07, Merlin Moncure <mmoncure@gmail.com> wrote:
>> On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:
>>> Hi,
>>>
>>> I have a weird problem with my custom functions (written in C,C++)
>>> that use bytea as input type (cstring works fine). The functions will
>>> work as expected if they are the only function that uses the bytea
>>> column in a query; as soon as there is a second function using the
>>> same column, the C function will return one of the following: an empty
>>> cstring, a substring of the bytea or the correct cstring. Based on
>>> these symptoms I assume there is something fundamental that I do wrong
>>> (or that is missing) with handling the bytea pointer.
>>>
>>> In one specific example, the bytea contains a binary file format that
>>> the function converts into a string format. I convert the bytea to a
>>> C++ string with string(VARDATA(b), VARSIZE(b)-VARHDRSZ).
>>>
>>> bytea *b = PG_GETARG_BYTEA_P(0);
>>> char *ism;
>>>
>>> ism = function(b);
>>>
>>> PG_RETURN_CSTRING(ism);
>>
>> your problem is probably inside 'function' -- are you properly copying
>> the data out of the bytea struct?.  also, are you really sure you want
>> to be returning cstring type, not text?
>>
>> merlin
>>
>
> yes, the function in the C++ toolkit I use takes a std::string as
> input. So far I have used string(VARDATA(oeb), VARSIZE(oeb)-VARHDRSZ)
> to convert it. I want to change both input and output to text later on
> once this is working. I suspect I will get similar problems with
> pg_getarg_text_p() or any other struct. I think you are right, I am
> not coping the data out of the struct correctly; all my functions that
> use VARDATA() have these problems. What would be the proper way?

Well, C++ string constructor is proper in the sense it makes copy of
the source data.  however, it's a little weird that you are passing
bytea like this...bytea can contain null and c++ string initialization
stops at any 0 byte.  Maybe you should be encoding the data to text
(say, to hex) first?

merlin

Re: Weird problems with C extension and bytea as input type

From
David W Noon
Date:
On Tue, 22 Mar 2011 16:14:47 -0500, Merlin Moncure wrote about Re:
[GENERAL] Weird problems with C extension and bytea as input type:

[snip]
>>> On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk>
>>> wrote:
[snip]
>>>> bytea *b = PG_GETARG_BYTEA_P(0);
>>>> char *ism;
>>>>
>>>> ism = function(b);
>>>>
>>>> PG_RETURN_CSTRING(ism);

What is the prototype for function()?  If it returns a char * then you
will likely have either scope problems, reentrancy problems or memory
leaks. If you are going to buy the C++ religion then you usually need to
buy it wholesale: do all if your string processing as std::string
objects and only return to char * when you revert to C.

As a rough example:

  bytea *b = PG_GETARG_BYTEA_P(0);
  std::string ism;

  ism = function(std::string(VARDATA(b), VARSIZE(b)-VARHDRSZ));

  PG_RETURN_CSTRING(ism.c_str());

Note that this returns an ASCIIZ string, which is not necessarily the
same as the C++ string.  You would be better off creating a
PostgreSQL text object and then return that.

>Well, C++ string constructor is proper in the sense it makes copy of
>the source data.  however, it's a little weird that you are passing
>bytea like this...bytea can contain null and c++ string initialization
>stops at any 0 byte.

Not so.  If the constructor also specifies a length then the data
pointer's area is not assumed to be NUL-terminated.

>Maybe you should be encoding the data to text (say, to hex) first?

Better to use the supplied length in the varlena descriptor.
--
Regards,

Dave  [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
dwnoon@ntlworld.com (David W Noon)
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

Attachment

Re: Weird problems with C extension and bytea as input type

From
Adrian Schreyer
Date:
On Tue, Mar 22, 2011 at 22:21, David W Noon <dwnoon@ntlworld.com> wrote:
> On Tue, 22 Mar 2011 16:14:47 -0500, Merlin Moncure wrote about Re:
> [GENERAL] Weird problems with C extension and bytea as input type:
>
> [snip]
>>>> On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk>
>>>> wrote:
> [snip]
>>>>> bytea *b = PG_GETARG_BYTEA_P(0);
>>>>> char *ism;
>>>>>
>>>>> ism = function(b);
>>>>>
>>>>> PG_RETURN_CSTRING(ism);
>
> What is the prototype for function()?  If it returns a char * then you
> will likely have either scope problems, reentrancy problems or memory
> leaks. If you are going to buy the C++ religion then you usually need to
> buy it wholesale: do all if your string processing as std::string
> objects and only return to char * when you revert to C.

you are right, it returns a char *.

The prototype:

char *function(bytea *b);

The actual C++ function looks roughly like this

extern "C"
char *function(bytea *b)
{
   string ism;
   [...]
   return ism.c_str();
}

The postgres wrapper in C like this:

PG_FUNCTION_INFO_V1(bin_to_string);
Datum bin_to_string(PG_FUNCTION_ARGS)
{
   bytea      *b= PG_GETARG_BYTEA_P(0);

   char *ism = function(b);

   PG_RETURN_CSTRING(ism);
}

I have another function in C++ that parses the binary string (file)
into an object that is then further processed. This works for all
functions returning boolean or numeric values, only the string methods
produce these odd results. So as you said, the way in which strings
are passed between C++ and C in my code must be horribly wrong. What
would be the correct way?

> As a rough example:
>
>  bytea *b = PG_GETARG_BYTEA_P(0);
>  std::string ism;
>
>  ism = function(std::string(VARDATA(b), VARSIZE(b)-VARHDRSZ));
>
>  PG_RETURN_CSTRING(ism.c_str());
>
> Note that this returns an ASCIIZ string, which is not necessarily the
> same as the C++ string.  You would be better off creating a
> PostgreSQL text object and then return that.
>
>>Well, C++ string constructor is proper in the sense it makes copy of
>>the source data.  however, it's a little weird that you are passing
>>bytea like this...bytea can contain null and c++ string initialization
>>stops at any 0 byte.
>
> Not so.  If the constructor also specifies a length then the data
> pointer's area is not assumed to be NUL-terminated.
>
>>Maybe you should be encoding the data to text (say, to hex) first?
>
> Better to use the supplied length in the varlena descriptor.
> --
> Regards,
>
> Dave  [RLU #314465]
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> dwnoon@ntlworld.com (David W Noon)
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
>

Re: Weird problems with C extension and bytea as input type

From
Adrian Schreyer
Date:
On Tue, Mar 22, 2011 at 22:21, David W Noon <dwnoon@ntlworld.com> wrote:
> On Tue, 22 Mar 2011 16:14:47 -0500, Merlin Moncure wrote about Re:
> [GENERAL] Weird problems with C extension and bytea as input type:
>
> [snip]
>>>> On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk>
>>>> wrote:
> [snip]
>>>>> bytea *b = PG_GETARG_BYTEA_P(0);
>>>>> char *ism;
>>>>>
>>>>> ism = function(b);
>>>>>
>>>>> PG_RETURN_CSTRING(ism);
>
> What is the prototype for function()?  If it returns a char * then you
> will likely have either scope problems, reentrancy problems or memory
> leaks. If you are going to buy the C++ religion then you usually need to
> buy it wholesale: do all if your string processing as std::string
> objects and only return to char * when you revert to C.

you are right, it returns a char *.

The prototype:

char *function(bytea *b);

The actual C++ function looks roughly like this

extern "C"
char *function(bytea *b)
{
    string ism;
    [...]
    return ism.c_str();
}

The postgres wrapper in C like this:

PG_FUNCTION_INFO_V1(bin_to_string);
Datum bin_to_string(PG_FUNCTION_ARGS)
{
    bytea      *b= PG_GETARG_BYTEA_P(0);

    char *ism = function(b);

    PG_RETURN_CSTRING(ism);
}

I have another function in C++ that parses the binary string (file)
into an object that is then further processed. This works for all
functions returning boolean or numeric values, only the string methods
produce these odd results. So as you said, the way in which strings
are passed between C++ and C in my code must be horribly wrong. What
would be the correct way?

> As a rough example:
>
>  bytea *b = PG_GETARG_BYTEA_P(0);
>  std::string ism;
>
>  ism = function(std::string(VARDATA(b), VARSIZE(b)-VARHDRSZ));
>
>  PG_RETURN_CSTRING(ism.c_str());
>
> Note that this returns an ASCIIZ string, which is not necessarily the
> same as the C++ string.  You would be better off creating a
> PostgreSQL text object and then return that.
>
>>Well, C++ string constructor is proper in the sense it makes copy of
>>the source data.  however, it's a little weird that you are passing
>>bytea like this...bytea can contain null and c++ string initialization
>>stops at any 0 byte.
>
> Not so.  If the constructor also specifies a length then the data
> pointer's area is not assumed to be NUL-terminated.
>
>>Maybe you should be encoding the data to text (say, to hex) first?
>
> Better to use the supplied length in the varlena descriptor.
> --
> Regards,
>
> Dave  [RLU #314465]
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> dwnoon@ntlworld.com (David W Noon)
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
>

Re: Weird problems with C extension and bytea as input type

From
dennis jenkins
Date:
On Wed, Mar 23, 2011 at 5:08 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:
>
> you are right, it returns a char *.
>
> The prototype:
>
> char *function(bytea *b);
>
> The actual C++ function looks roughly like this
>
> extern "C"
> char *function(bytea *b)
> {
>   string ism;
>   [...]
>   return ism.c_str();
> }
>


Don't do that.  You are returning a pointer to an unallocated buffer
(previously held by a local variable).  c_str() is just a const
pointer to a buffer held inside "ism".  When ism goes out of scope,
that buffer if freed.

Either return "std::string", or strdup() the string and have the
caller free that.  (but use the postgresql alloc pool function to
handle the strdup.  I don't recall that function's name off the top of
my head).

Re: Weird problems with C extension and bytea as input type

From
Merlin Moncure
Date:
On Wed, Mar 23, 2011 at 9:04 AM, dennis jenkins
<dennis.jenkins.75@gmail.com> wrote:
> On Wed, Mar 23, 2011 at 5:08 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:
>>
>> you are right, it returns a char *.
>>
>> The prototype:
>>
>> char *function(bytea *b);
>>
>> The actual C++ function looks roughly like this
>>
>> extern "C"
>> char *function(bytea *b)
>> {
>>   string ism;
>>   [...]
>>   return ism.c_str();
>> }
>>
>
>
> Don't do that.  You are returning a pointer to an unallocated buffer
> (previously held by a local variable).  c_str() is just a const
> pointer to a buffer held inside "ism".  When ism goes out of scope,
> that buffer if freed.
>
> Either return "std::string", or strdup() the string and have the
> caller free that.  (but use the postgresql alloc pool function to
> handle the strdup.  I don't recall that function's name off the top of
> my head).

that would be pstrdup, and it's the way to go (you don't have to
pfree).  who says C doesn't have garbage collection?

merlin

Re: Weird problems with C extension and bytea as input type

From
Tom Lane
Date:
Adrian Schreyer <ams214@cam.ac.uk> writes:
> The actual C++ function looks roughly like this

> extern "C"
> char *function(bytea *b)
> {
>    string ism;
>    [...]
>    return ism.c_str();
> }

My C++ is pretty rusty, but is the pointer returned by c_str() still
valid after the string variable goes out of scope?  I'm wondering if you
need a pstrdup before returning.

            regards, tom lane

Re: Weird problems with C extension and bytea as input type

From
Adrian Schreyer
Date:
On Wed, Mar 23, 2011 at 14:08, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Mar 23, 2011 at 9:04 AM, dennis jenkins
> <dennis.jenkins.75@gmail.com> wrote:
>> On Wed, Mar 23, 2011 at 5:08 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:
>>>
>>> you are right, it returns a char *.
>>>
>>> The prototype:
>>>
>>> char *function(bytea *b);
>>>
>>> The actual C++ function looks roughly like this
>>>
>>> extern "C"
>>> char *function(bytea *b)
>>> {
>>>   string ism;
>>>   [...]
>>>   return ism.c_str();
>>> }
>>>
>>
>>
>> Don't do that.  You are returning a pointer to an unallocated buffer
>> (previously held by a local variable).  c_str() is just a const
>> pointer to a buffer held inside "ism".  When ism goes out of scope,
>> that buffer if freed.
>>
>> Either return "std::string", or strdup() the string and have the
>> caller free that.  (but use the postgresql alloc pool function to
>> handle the strdup.  I don't recall that function's name off the top of
>> my head).
>
> that would be pstrdup, and it's the way to go (you don't have to
> pfree).  who says C doesn't have garbage collection?
>
> merlin
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
I am using pstrdup and it works now as expected. Thank you all for your help,

Adrian

Re: Weird problems with C extension and bytea as input type

From
Adrian Schreyer
Date:
On Wed, Mar 23, 2011 at 14:08, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Mar 23, 2011 at 9:04 AM, dennis jenkins
> <dennis.jenkins.75@gmail.com> wrote:
>> On Wed, Mar 23, 2011 at 5:08 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:
>>>
>>> you are right, it returns a char *.
>>>
>>> The prototype:
>>>
>>> char *function(bytea *b);
>>>
>>> The actual C++ function looks roughly like this
>>>
>>> extern "C"
>>> char *function(bytea *b)
>>> {
>>>   string ism;
>>>   [...]
>>>   return ism.c_str();
>>> }
>>>
>>
>>
>> Don't do that.  You are returning a pointer to an unallocated buffer
>> (previously held by a local variable).  c_str() is just a const
>> pointer to a buffer held inside "ism".  When ism goes out of scope,
>> that buffer if freed.
>>
>> Either return "std::string", or strdup() the string and have the
>> caller free that.  (but use the postgresql alloc pool function to
>> handle the strdup.  I don't recall that function's name off the top of
>> my head).
>
> that would be pstrdup, and it's the way to go (you don't have to
> pfree).  who says C doesn't have garbage collection?
>
> merlin
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

I am using pstrdup and it works now as expected. Thank you all for your help,

Adrian