Thread: libpq, ecpg and the bytea data type

libpq, ecpg and the bytea data type

From
Mark Richardson
Date:
I can't get the bytea type to behave as I would expect it to.  I've tried a number of things and this is what I've found (I should also say that I'm still using 7.4 - I plan on switching to 8.0.3 soon).
 
If I have a table defined...
CREATE TABLE myTable (myByteaField bytea NOT NULL);
 
then populate it...
COPY myTable (myByteaField) FROM stdin;

\000\000\000\000\000\000\000\000

\011\101\206\155\136\035\071\135

\012\000\162\047\105\223\322\121

\.

 
Then query (using psql)
SELECT length(myByteaField) FROM stdin;
Length
--------
0
8
1
 
I'm assuming that this is because of the length function (seeing the '\0' causes strlen to be shorter than the actual data).
 
So I try to use the libpq interface, and when I execute the code
PQgetLength(pgresult, 0, 0)
the result is 0 (for record 0, field )
 
PQgetLength(pgresult, 2, 0)
The result is 1 (for record 2, field 0).  So this leads me to believe that the values are stored in the database as a string, and they are being terminated during the copy (I also tried inserts, with the same results).
 
On a last ditch effort, I used ECPG with the following code...
EXEC SQL BEGIN DECLARE SECTION;
char myByteaField_[3][8];
EXEC SQL END DECLARE SECTION;
 
EXEC SQL SELECT myByteaField INTO :myByteaField_ FROM myTable;
for (int i=0; i<3; i++)
  for (int j=0; j<8; j++)
    std::cout << myByteaField_[i][j] << "(" << int(myByteaField_[i][j] << ").";
std::cout << std::endl;
 
I get the following....
(0).(0).(0).(0).(0).(0).(0).(0).                            <- this is expected
\(92).0(48).1(49).1(49).A(65).\(92).2(50).0(48)  <- why escape sequence?
\(92).0(48).1(49).2(50).(0).(0).(0).(0)                <- dropped the last 6 bytes
 
So I'm pretty sure that the data is stored as a string, and it is being terminated at the '\0'.  ECPG not only has the same behavior, but the conversion of the data to a char array seems incorrect (maybe the ECPG pre-compiler is thinking that I want to_char() of the value of the byte array).
 
So my question is, how would you be able to store any  data in a bytea if it contained any bytes with the value of 0 (I'm sure there are images stored as bytea and they have 0 byte values).  I don't need to store an image, just these 8 bytes.
 
Thanks in advance for any help that you can offer (I've run out of things to try).


Yahoo! for Good
Click here to donate to the Hurricane Katrina relief effort.

Re: libpq, ecpg and the bytea data type

From
Michael Fuhr
Date:
On Wed, Sep 14, 2005 at 04:20:13PM -0700, Mark Richardson wrote:
> If I have a table defined...
> CREATE TABLE myTable (myByteaField bytea NOT NULL);
>  
> then populate it...
> COPY myTable (myByteaField) FROM stdin;
> \000\000\000\000\000\000\000\000
> \011\101\206\155\136\035\071\135
> \012\000\162\047\105\223\322\121
> \.

Bytea values should be escaped with two backslashes, not one.  See
"Binary Data Types" in the documentation:

http://www.postgresql.org/docs/7.4/interactive/datatype-binary.html

> SELECT length(myByteaField) FROM stdin;

Please post the actual command you ran; the above fails with 'relation
"stdin" does not exist.'  Presumably you really queried myTable (this
might seem like nitpicking, but sometimes little differences matter).

> Length
> --------
> 0
> 8
> 1
>  
> I'm assuming that this is because of the length function (seeing
> the '\0' causes strlen to be shorter than the actual data).

Actually it's because the data wasn't fully loaded due to the
incorrectly-escaped \000 characters:

SELECT '\000\001'::bytea;bytea 
-------
(1 row)

SELECT '\\000\\001'::bytea; bytea   
----------\000\001
(1 row)

The length function doesn't have a problem with \000 characters:

SELECT length('\\000\\001'::bytea);length 
--------     2
(1 row)

If you look at the implementation of the length(bytea) function
(byteaoctetlen() in src/backend/utils/adt/varlena.c) you'll see
that it doesn't read the data at all; it simply reads a size
attribute that's stored with the data.

> So I try to use the libpq interface, and when I execute the code
> PQgetLength(pgresult, 0, 0)
> the result is 0 (for record 0, field )

Again, please post what you actually did; the above should have
failed to link with an error like "undefined reference to `PQgetLength'"
(the actual function name is PQgetlength).

> PQgetLength(pgresult, 2, 0)
> The result is 1 (for record 2, field 0).  So this leads me to
> believe that the values are stored in the database as a string, and
> they are being terminated during the copy (I also tried inserts,
> with the same results).

The values are stored as a sequence of bytes, and they are indeed
being truncated during the COPY because the data was incorrectly
escaped.

> On a last ditch effort, I used ECPG with the following code...
> EXEC SQL BEGIN DECLARE SECTION;
> char myByteaField_[3][8];
> EXEC SQL END DECLARE SECTION;
>  
> EXEC SQL SELECT myByteaField INTO :myByteaField_ FROM myTable;
> for (int i=0; i<3; i++)
>   for (int j=0; j<8; j++)
>     std::cout << myByteaField_[i][j] << "(" << int(myByteaField_[i][j] << ").";
> std::cout << std::endl;
>  
> I get the following....
> (0).(0).(0).(0).(0).(0).(0).(0).                            <- this is expected

The myByteaField_ array might have been zeroed to begin with, and
you're probably seeing that data, not data filled in from the query.
What happens if you fill the array with a character like '*' before
doing the SELECT?  If I do that with the data input as in your
example, I see the first character as \0 and the rest of the data
as the garbage in the array:

(0).*(42).*(42).*(42).*(42).*(42).*(42).*(42).

> \(92).0(48).1(49).1(49).A(65).\(92).2(50).0(48)  <- why escape sequence?

Non-printable characters are escaped unless you request the results
in binary; see the documentation link posted earlier.

> \(92).0(48).1(49).2(50).(0).(0).(0).(0)                <- dropped the last 6 bytes
>  
> So I'm pretty sure that the data is stored as a string, and it
> is being terminated at the '\0'.

Bytea and other variable-length types are stored as a length attribute
and then the data itself.  See the discussion of TOAST in the
documentation:

http://www.postgresql.org/docs/8.0/interactive/storage-toast.html

>  ECPG not only has the same behavior, but the conversion of the
> data to a char array seems incorrect (maybe the ECPG pre-compiler
> is thinking that I want to_char() of the value of the byte array).

Unless you request binary results, queries return text representations
of the data; for bytea that means that non-printable characters will
be escaped.

> So my question is, how would you be able to store any  data in a
> bytea if it contained any bytes with the value of 0 (I'm sure there
> are images stored as bytea and they have 0 byte values).  I don't
> need to store an image, just these 8 bytes.

Escape the data correctly with two backslashes; the "Binary Data
Types" documentation explains why this is necessary.

-- 
Michael Fuhr


Re: libpq, ecpg and the bytea data type

From
Mark Richardson
Date:
Thanks Michael, I appreciate your help.
I could have sworn that I tried the COPY from stdin with the double backslash and ended up with the same result.  But I
triedthat and it seems to work. 

I also mistyped a couple of things, like the
SELECT length(myByteField) FROM stdin;
which should have been
SELECT length(myByteField) FROM myTable;

You are also correct that I ran the function PQgetlength and not PQgetLength. I tried that after I used the double
slashwith the COPY function and that worked as expected (my bad on the double slash). 

But the ECPG part still doesn't seem to have the correct behavior.  With the following in a .pgc file...
EXEC SQL DECLARE SECTION;
char myByteField_[3][8];
int fetchRows=3;
EXEC SQL END DECLARE SECTION;
for (int i=0; i<3; i++)
  for (int j=0; j<8; j++)
    myByteField[i][j]='*';
EXEC SQL CONNECT ...
EXEC SQL DECLARE myCursor BINARY CURSOR FOR SELECT myByteaField FROM myTable FOR READ ONLY;
EXEC SQL OPEN myCursor;
EXEC SQL FETCH :fetchRows myCursor INTO :myByteField_;
EXEC SQL CLOSE myCursor;
for (int i=0;i<3;i++)
  for (int j=0;j<8;j++)
    std::cout << myByteaField_[i][j] << "(" << int(myByteaField_[i][j] << ").";
std::cout << std::endl;

Shows the following output...
(0).(0).(0).(0).(0).(0).(0).(0).                               <- correct
 (9).A(65).?(-122).m(109).^(94). (29).9(57).](93).  <- correct
 (10).(0).(0).(0).(0).(0).(0).(0).                             <- not correct (stopped at '\0')

So all is as it should be with only the ECPG part eluding me.  Thank you again for showing me the error of my ways (I
swearI did see the double slash for populating the bytea field, but it didn't seem to make a difference - now I know
better).

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: libpq, ecpg and the bytea data type

From
Michael Fuhr
Date:
On Thu, Sep 15, 2005 at 12:18:05PM -0700, Mark Richardson wrote:
> But the ECPG part still doesn't seem to have the correct behavior.
> With the following in a .pgc file...

The code you posted has several syntax errors so it can't be what
you're running.  When posting examples, please show exactly what
you're doing so people don't have to guess at what corrections to
make.

In any case, others have reported trouble handling bytea with ECPG;
see the list archives for previous discussion.  I see a lot of calls
to strlen() in the ECPG source code, so it might not be designed
to handle binary data.  You might need to fetch the data in non-binary
mode and use PQunescapeBytea() to convert the escape sequences to
binary.

-- 
Michael Fuhr


Re: libpq, ecpg and the bytea data type

From
Mark Richardson
Date:
Thanks Michael, I appreciate your help.
I could have sworn that I tried the COPY from stdin with the double backslash and ended up with the same result.  But I tried that and it seems to work.
 
I also mistyped a couple of things, like the
SELECT length(myByteField) FROM stdin;
which should have been
SELECT length(myByteField) FROM myTable;
 
You are also correct that I ran the function PQgetlength and not PQgetLength. I tried that after I used the double slash with the COPY function and that worked as expected (my bad on the double slash).
 
But the ECPG part still doesn't seem to have the correct behavior.  With the following in a .pgc file...
EXEC SQL DECLARE SECTION;
char myByteField_[3][8];
int fetchRows=3;
EXEC SQL END DECLARE SECTION;
for (int i=0; i<3; i++)
  for (int j=0; j<8; j++)
    myByteField[i][j]='*';
EXEC SQL CONNECT ...
EXEC SQL DECLARE myCursor BINARY CURSOR FOR SELECT myByteaField FROM myTable FOR READ ONLY;
EXEC SQL OPEN myCursor;
EXEC SQL FETCH :fetchRows myCursor INTO :myByteField_;
EXEC SQL CLOSE myCursor;
for (int i=0;i<3;i++)
  for (int j=0;j<8;j++)
    std::cout << myByteaField_[i][j] << "(" << int(myByteaField_[i][j] << ").";
std::cout << std::endl;
 
Shows the following output...
(0).(0).(0).(0).(0).(0).(0).(0).                               <- correct
 (9).A(65).?(-122).m(109).^(94). (29).9(57).](93).  <- correct
 (10).(0).(0).(0).(0).(0).(0).(0).                             <- not correct (stopped at '\0')
 
So all is as it should be with only the ECPG part eluding me.  Thank you again for showing me the error of my ways (I swear I did see the double slash for populating the bytea field, but it didn't seem to make a difference - now I know better).

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: libpq, ecpg and the bytea data type

From
Michael Meskes
Date:
On Thu, Sep 15, 2005 at 06:22:22PM -0600, Michael Fuhr wrote:
> In any case, others have reported trouble handling bytea with ECPG;
> see the list archives for previous discussion.  I see a lot of calls
> to strlen() in the ECPG source code, so it might not be designed
> to handle binary data.  You might need to fetch the data in non-binary

Your right, it is not. ECPG at the moment expects data to be ascii and
translates it into the type of the variable. 

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!