Thread: Weird problems with C extension and bytea as input type
Hi, I have a weird problem with my custom functions (written in C,C++) that use bytea as input type (cstring works fine). The functions will work as expected if they are the only function that uses the bytea column in a query; as soon as there is a second function using the same column, the C function will return one of the following: an empty cstring, a substring of the bytea or the correct cstring. Based on these symptoms I assume there is something fundamental that I do wrong (or that is missing) with handling the bytea pointer. In one specific example, the bytea contains a binary file format that the function converts into a string format. I convert the bytea to a C++ string with string(VARDATA(b), VARSIZE(b)-VARHDRSZ). bytea *b = PG_GETARG_BYTEA_P(0); char *ism; ism = function(b); PG_RETURN_CSTRING(ism);
On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote: > Hi, > > I have a weird problem with my custom functions (written in C,C++) > that use bytea as input type (cstring works fine). The functions will > work as expected if they are the only function that uses the bytea > column in a query; as soon as there is a second function using the > same column, the C function will return one of the following: an empty > cstring, a substring of the bytea or the correct cstring. Based on > these symptoms I assume there is something fundamental that I do wrong > (or that is missing) with handling the bytea pointer. > > In one specific example, the bytea contains a binary file format that > the function converts into a string format. I convert the bytea to a > C++ string with string(VARDATA(b), VARSIZE(b)-VARHDRSZ). > > bytea *b = PG_GETARG_BYTEA_P(0); > char *ism; > > ism = function(b); > > PG_RETURN_CSTRING(ism); your problem is probably inside 'function' -- are you properly copying the data out of the bytea struct?. also, are you really sure you want to be returning cstring type, not text? merlin
On Tue, Mar 22, 2011 at 16:07, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote: >> Hi, >> >> I have a weird problem with my custom functions (written in C,C++) >> that use bytea as input type (cstring works fine). The functions will >> work as expected if they are the only function that uses the bytea >> column in a query; as soon as there is a second function using the >> same column, the C function will return one of the following: an empty >> cstring, a substring of the bytea or the correct cstring. Based on >> these symptoms I assume there is something fundamental that I do wrong >> (or that is missing) with handling the bytea pointer. >> >> In one specific example, the bytea contains a binary file format that >> the function converts into a string format. I convert the bytea to a >> C++ string with string(VARDATA(b), VARSIZE(b)-VARHDRSZ). >> >> bytea *b = PG_GETARG_BYTEA_P(0); >> char *ism; >> >> ism = function(b); >> >> PG_RETURN_CSTRING(ism); > > your problem is probably inside 'function' -- are you properly copying > the data out of the bytea struct?. also, are you really sure you want > to be returning cstring type, not text? > > merlin > yes, the function in the C++ toolkit I use takes a std::string as input. So far I have used string(VARDATA(oeb), VARSIZE(oeb)-VARHDRSZ) to convert it. I want to change both input and output to text later on once this is working. I suspect I will get similar problems with pg_getarg_text_p() or any other struct. I think you are right, I am not coping the data out of the struct correctly; all my functions that use VARDATA() have these problems. What would be the proper way? Cheers, Adrian
On Tue, Mar 22, 2011 at 16:07, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote: >> Hi, >> >> I have a weird problem with my custom functions (written in C,C++) >> that use bytea as input type (cstring works fine). The functions will >> work as expected if they are the only function that uses the bytea >> column in a query; as soon as there is a second function using the >> same column, the C function will return one of the following: an empty >> cstring, a substring of the bytea or the correct cstring. Based on >> these symptoms I assume there is something fundamental that I do wrong >> (or that is missing) with handling the bytea pointer. >> >> In one specific example, the bytea contains a binary file format that >> the function converts into a string format. I convert the bytea to a >> C++ string with string(VARDATA(b), VARSIZE(b)-VARHDRSZ). >> >> bytea *b = PG_GETARG_BYTEA_P(0); >> char *ism; >> >> ism = function(b); >> >> PG_RETURN_CSTRING(ism); > > your problem is probably inside 'function' -- are you properly copying > the data out of the bytea struct?. also, are you really sure you want > to be returning cstring type, not text? > > merlin > yes, the function in the C++ toolkit I use takes a std::string as input. So far I have used string(VARDATA(oeb), VARSIZE(oeb)-VARHDRSZ) to convert it. I want to change both input and output to text later on once this is working. I suspect I will get similar problems with pg_getarg_text_p() or any other struct. I think you are right, I am not coping the data out of the struct correctly; all my functions that use VARDATA() have these problems. What would be the proper way? Cheers, Adrian
On Tue, Mar 22, 2011 at 11:57 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote: > On Tue, Mar 22, 2011 at 16:07, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote: >>> Hi, >>> >>> I have a weird problem with my custom functions (written in C,C++) >>> that use bytea as input type (cstring works fine). The functions will >>> work as expected if they are the only function that uses the bytea >>> column in a query; as soon as there is a second function using the >>> same column, the C function will return one of the following: an empty >>> cstring, a substring of the bytea or the correct cstring. Based on >>> these symptoms I assume there is something fundamental that I do wrong >>> (or that is missing) with handling the bytea pointer. >>> >>> In one specific example, the bytea contains a binary file format that >>> the function converts into a string format. I convert the bytea to a >>> C++ string with string(VARDATA(b), VARSIZE(b)-VARHDRSZ). >>> >>> bytea *b = PG_GETARG_BYTEA_P(0); >>> char *ism; >>> >>> ism = function(b); >>> >>> PG_RETURN_CSTRING(ism); >> >> your problem is probably inside 'function' -- are you properly copying >> the data out of the bytea struct?. also, are you really sure you want >> to be returning cstring type, not text? >> >> merlin >> > > yes, the function in the C++ toolkit I use takes a std::string as > input. So far I have used string(VARDATA(oeb), VARSIZE(oeb)-VARHDRSZ) > to convert it. I want to change both input and output to text later on > once this is working. I suspect I will get similar problems with > pg_getarg_text_p() or any other struct. I think you are right, I am > not coping the data out of the struct correctly; all my functions that > use VARDATA() have these problems. What would be the proper way? Well, C++ string constructor is proper in the sense it makes copy of the source data. however, it's a little weird that you are passing bytea like this...bytea can contain null and c++ string initialization stops at any 0 byte. Maybe you should be encoding the data to text (say, to hex) first? merlin
On Tue, 22 Mar 2011 16:14:47 -0500, Merlin Moncure wrote about Re: [GENERAL] Weird problems with C extension and bytea as input type: [snip] >>> On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk> >>> wrote: [snip] >>>> bytea *b = PG_GETARG_BYTEA_P(0); >>>> char *ism; >>>> >>>> ism = function(b); >>>> >>>> PG_RETURN_CSTRING(ism); What is the prototype for function()? If it returns a char * then you will likely have either scope problems, reentrancy problems or memory leaks. If you are going to buy the C++ religion then you usually need to buy it wholesale: do all if your string processing as std::string objects and only return to char * when you revert to C. As a rough example: bytea *b = PG_GETARG_BYTEA_P(0); std::string ism; ism = function(std::string(VARDATA(b), VARSIZE(b)-VARHDRSZ)); PG_RETURN_CSTRING(ism.c_str()); Note that this returns an ASCIIZ string, which is not necessarily the same as the C++ string. You would be better off creating a PostgreSQL text object and then return that. >Well, C++ string constructor is proper in the sense it makes copy of >the source data. however, it's a little weird that you are passing >bytea like this...bytea can contain null and c++ string initialization >stops at any 0 byte. Not so. If the constructor also specifies a length then the data pointer's area is not assumed to be NUL-terminated. >Maybe you should be encoding the data to text (say, to hex) first? Better to use the supplied length in the varlena descriptor. -- Regards, Dave [RLU #314465] *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* dwnoon@ntlworld.com (David W Noon) *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
Attachment
On Tue, Mar 22, 2011 at 22:21, David W Noon <dwnoon@ntlworld.com> wrote: > On Tue, 22 Mar 2011 16:14:47 -0500, Merlin Moncure wrote about Re: > [GENERAL] Weird problems with C extension and bytea as input type: > > [snip] >>>> On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk> >>>> wrote: > [snip] >>>>> bytea *b = PG_GETARG_BYTEA_P(0); >>>>> char *ism; >>>>> >>>>> ism = function(b); >>>>> >>>>> PG_RETURN_CSTRING(ism); > > What is the prototype for function()? If it returns a char * then you > will likely have either scope problems, reentrancy problems or memory > leaks. If you are going to buy the C++ religion then you usually need to > buy it wholesale: do all if your string processing as std::string > objects and only return to char * when you revert to C. you are right, it returns a char *. The prototype: char *function(bytea *b); The actual C++ function looks roughly like this extern "C" char *function(bytea *b) { string ism; [...] return ism.c_str(); } The postgres wrapper in C like this: PG_FUNCTION_INFO_V1(bin_to_string); Datum bin_to_string(PG_FUNCTION_ARGS) { bytea *b= PG_GETARG_BYTEA_P(0); char *ism = function(b); PG_RETURN_CSTRING(ism); } I have another function in C++ that parses the binary string (file) into an object that is then further processed. This works for all functions returning boolean or numeric values, only the string methods produce these odd results. So as you said, the way in which strings are passed between C++ and C in my code must be horribly wrong. What would be the correct way? > As a rough example: > > bytea *b = PG_GETARG_BYTEA_P(0); > std::string ism; > > ism = function(std::string(VARDATA(b), VARSIZE(b)-VARHDRSZ)); > > PG_RETURN_CSTRING(ism.c_str()); > > Note that this returns an ASCIIZ string, which is not necessarily the > same as the C++ string. You would be better off creating a > PostgreSQL text object and then return that. > >>Well, C++ string constructor is proper in the sense it makes copy of >>the source data. however, it's a little weird that you are passing >>bytea like this...bytea can contain null and c++ string initialization >>stops at any 0 byte. > > Not so. If the constructor also specifies a length then the data > pointer's area is not assumed to be NUL-terminated. > >>Maybe you should be encoding the data to text (say, to hex) first? > > Better to use the supplied length in the varlena descriptor. > -- > Regards, > > Dave [RLU #314465] > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > dwnoon@ntlworld.com (David W Noon) > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >
On Tue, Mar 22, 2011 at 22:21, David W Noon <dwnoon@ntlworld.com> wrote: > On Tue, 22 Mar 2011 16:14:47 -0500, Merlin Moncure wrote about Re: > [GENERAL] Weird problems with C extension and bytea as input type: > > [snip] >>>> On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk> >>>> wrote: > [snip] >>>>> bytea *b = PG_GETARG_BYTEA_P(0); >>>>> char *ism; >>>>> >>>>> ism = function(b); >>>>> >>>>> PG_RETURN_CSTRING(ism); > > What is the prototype for function()? If it returns a char * then you > will likely have either scope problems, reentrancy problems or memory > leaks. If you are going to buy the C++ religion then you usually need to > buy it wholesale: do all if your string processing as std::string > objects and only return to char * when you revert to C. you are right, it returns a char *. The prototype: char *function(bytea *b); The actual C++ function looks roughly like this extern "C" char *function(bytea *b) { string ism; [...] return ism.c_str(); } The postgres wrapper in C like this: PG_FUNCTION_INFO_V1(bin_to_string); Datum bin_to_string(PG_FUNCTION_ARGS) { bytea *b= PG_GETARG_BYTEA_P(0); char *ism = function(b); PG_RETURN_CSTRING(ism); } I have another function in C++ that parses the binary string (file) into an object that is then further processed. This works for all functions returning boolean or numeric values, only the string methods produce these odd results. So as you said, the way in which strings are passed between C++ and C in my code must be horribly wrong. What would be the correct way? > As a rough example: > > bytea *b = PG_GETARG_BYTEA_P(0); > std::string ism; > > ism = function(std::string(VARDATA(b), VARSIZE(b)-VARHDRSZ)); > > PG_RETURN_CSTRING(ism.c_str()); > > Note that this returns an ASCIIZ string, which is not necessarily the > same as the C++ string. You would be better off creating a > PostgreSQL text object and then return that. > >>Well, C++ string constructor is proper in the sense it makes copy of >>the source data. however, it's a little weird that you are passing >>bytea like this...bytea can contain null and c++ string initialization >>stops at any 0 byte. > > Not so. If the constructor also specifies a length then the data > pointer's area is not assumed to be NUL-terminated. > >>Maybe you should be encoding the data to text (say, to hex) first? > > Better to use the supplied length in the varlena descriptor. > -- > Regards, > > Dave [RLU #314465] > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > dwnoon@ntlworld.com (David W Noon) > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >
On Wed, Mar 23, 2011 at 5:08 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote: > > you are right, it returns a char *. > > The prototype: > > char *function(bytea *b); > > The actual C++ function looks roughly like this > > extern "C" > char *function(bytea *b) > { > string ism; > [...] > return ism.c_str(); > } > Don't do that. You are returning a pointer to an unallocated buffer (previously held by a local variable). c_str() is just a const pointer to a buffer held inside "ism". When ism goes out of scope, that buffer if freed. Either return "std::string", or strdup() the string and have the caller free that. (but use the postgresql alloc pool function to handle the strdup. I don't recall that function's name off the top of my head).
On Wed, Mar 23, 2011 at 9:04 AM, dennis jenkins <dennis.jenkins.75@gmail.com> wrote: > On Wed, Mar 23, 2011 at 5:08 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote: >> >> you are right, it returns a char *. >> >> The prototype: >> >> char *function(bytea *b); >> >> The actual C++ function looks roughly like this >> >> extern "C" >> char *function(bytea *b) >> { >> string ism; >> [...] >> return ism.c_str(); >> } >> > > > Don't do that. You are returning a pointer to an unallocated buffer > (previously held by a local variable). c_str() is just a const > pointer to a buffer held inside "ism". When ism goes out of scope, > that buffer if freed. > > Either return "std::string", or strdup() the string and have the > caller free that. (but use the postgresql alloc pool function to > handle the strdup. I don't recall that function's name off the top of > my head). that would be pstrdup, and it's the way to go (you don't have to pfree). who says C doesn't have garbage collection? merlin
Adrian Schreyer <ams214@cam.ac.uk> writes: > The actual C++ function looks roughly like this > extern "C" > char *function(bytea *b) > { > string ism; > [...] > return ism.c_str(); > } My C++ is pretty rusty, but is the pointer returned by c_str() still valid after the string variable goes out of scope? I'm wondering if you need a pstrdup before returning. regards, tom lane
On Wed, Mar 23, 2011 at 14:08, Merlin Moncure <mmoncure@gmail.com> wrote: > On Wed, Mar 23, 2011 at 9:04 AM, dennis jenkins > <dennis.jenkins.75@gmail.com> wrote: >> On Wed, Mar 23, 2011 at 5:08 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote: >>> >>> you are right, it returns a char *. >>> >>> The prototype: >>> >>> char *function(bytea *b); >>> >>> The actual C++ function looks roughly like this >>> >>> extern "C" >>> char *function(bytea *b) >>> { >>> string ism; >>> [...] >>> return ism.c_str(); >>> } >>> >> >> >> Don't do that. You are returning a pointer to an unallocated buffer >> (previously held by a local variable). c_str() is just a const >> pointer to a buffer held inside "ism". When ism goes out of scope, >> that buffer if freed. >> >> Either return "std::string", or strdup() the string and have the >> caller free that. (but use the postgresql alloc pool function to >> handle the strdup. I don't recall that function's name off the top of >> my head). > > that would be pstrdup, and it's the way to go (you don't have to > pfree). who says C doesn't have garbage collection? > > merlin > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > I am using pstrdup and it works now as expected. Thank you all for your help, Adrian
On Wed, Mar 23, 2011 at 14:08, Merlin Moncure <mmoncure@gmail.com> wrote: > On Wed, Mar 23, 2011 at 9:04 AM, dennis jenkins > <dennis.jenkins.75@gmail.com> wrote: >> On Wed, Mar 23, 2011 at 5:08 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote: >>> >>> you are right, it returns a char *. >>> >>> The prototype: >>> >>> char *function(bytea *b); >>> >>> The actual C++ function looks roughly like this >>> >>> extern "C" >>> char *function(bytea *b) >>> { >>> string ism; >>> [...] >>> return ism.c_str(); >>> } >>> >> >> >> Don't do that. You are returning a pointer to an unallocated buffer >> (previously held by a local variable). c_str() is just a const >> pointer to a buffer held inside "ism". When ism goes out of scope, >> that buffer if freed. >> >> Either return "std::string", or strdup() the string and have the >> caller free that. (but use the postgresql alloc pool function to >> handle the strdup. I don't recall that function's name off the top of >> my head). > > that would be pstrdup, and it's the way to go (you don't have to > pfree). who says C doesn't have garbage collection? > > merlin > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > I am using pstrdup and it works now as expected. Thank you all for your help, Adrian