Thread: libpq, ecpg and the bytea data type
I can't get the bytea type to behave as I would expect it to. I've tried a number of things and this is what I've found (I should also say that I'm still using 7.4 - I plan on switching to 8.0.3 soon).
If I have a table defined...
CREATE TABLE myTable (myByteaField bytea NOT NULL);
then populate it...
COPY myTable (myByteaField) FROM stdin;
\000\000\000\000\000\000\000\000
\011\101\206\155\136\035\071\135
\012\000\162\047\105\223\322\121
\.
Then query (using psql)
SELECT length(myByteaField) FROM stdin;
Length
--------
0
8
1
I'm assuming that this is because of the length function (seeing the '\0' causes strlen to be shorter than the actual data).
So I try to use the libpq interface, and when I execute the code
PQgetLength(pgresult, 0, 0)
the result is 0 (for record 0, field )
PQgetLength(pgresult, 2, 0)
The result is 1 (for record 2, field 0). So this leads me to believe that the values are stored in the database as a string, and they are being terminated during the copy (I also tried inserts, with the same results).
On a last ditch effort, I used ECPG with the following code...
EXEC SQL BEGIN DECLARE SECTION;
char myByteaField_[3][8];
EXEC SQL END DECLARE SECTION;
EXEC SQL SELECT myByteaField INTO :myByteaField_ FROM myTable;
for (int i=0; i<3; i++)
for (int j=0; j<8; j++)
std::cout << myByteaField_[i][j] << "(" << int(myByteaField_[i][j] << ").";
std::cout << std::endl;
I get the following....
(0).(0).(0).(0).(0).(0).(0).(0). <- this is expected
\(92).0(48).1(49).1(49).A(65).\(92).2(50).0(48) <- why escape sequence?
\(92).0(48).1(49).2(50).(0).(0).(0).(0) <- dropped the last 6 bytes
So I'm pretty sure that the data is stored as a string, and it is being terminated at the '\0'. ECPG not only has the same behavior, but the conversion of the data to a char array seems incorrect (maybe the ECPG pre-compiler is thinking that I want to_char() of the value of the byte array).
So my question is, how would you be able to store any data in a bytea if it contained any bytes with the value of 0 (I'm sure there are images stored as bytea and they have 0 byte values). I don't need to store an image, just these 8 bytes.
Thanks in advance for any help that you can offer (I've run out of things to try).
Yahoo! for Good
Click here to donate to the Hurricane Katrina relief effort.
On Wed, Sep 14, 2005 at 04:20:13PM -0700, Mark Richardson wrote: > If I have a table defined... > CREATE TABLE myTable (myByteaField bytea NOT NULL); > > then populate it... > COPY myTable (myByteaField) FROM stdin; > \000\000\000\000\000\000\000\000 > \011\101\206\155\136\035\071\135 > \012\000\162\047\105\223\322\121 > \. Bytea values should be escaped with two backslashes, not one. See "Binary Data Types" in the documentation: http://www.postgresql.org/docs/7.4/interactive/datatype-binary.html > SELECT length(myByteaField) FROM stdin; Please post the actual command you ran; the above fails with 'relation "stdin" does not exist.' Presumably you really queried myTable (this might seem like nitpicking, but sometimes little differences matter). > Length > -------- > 0 > 8 > 1 > > I'm assuming that this is because of the length function (seeing > the '\0' causes strlen to be shorter than the actual data). Actually it's because the data wasn't fully loaded due to the incorrectly-escaped \000 characters: SELECT '\000\001'::bytea;bytea ------- (1 row) SELECT '\\000\\001'::bytea; bytea ----------\000\001 (1 row) The length function doesn't have a problem with \000 characters: SELECT length('\\000\\001'::bytea);length -------- 2 (1 row) If you look at the implementation of the length(bytea) function (byteaoctetlen() in src/backend/utils/adt/varlena.c) you'll see that it doesn't read the data at all; it simply reads a size attribute that's stored with the data. > So I try to use the libpq interface, and when I execute the code > PQgetLength(pgresult, 0, 0) > the result is 0 (for record 0, field ) Again, please post what you actually did; the above should have failed to link with an error like "undefined reference to `PQgetLength'" (the actual function name is PQgetlength). > PQgetLength(pgresult, 2, 0) > The result is 1 (for record 2, field 0). So this leads me to > believe that the values are stored in the database as a string, and > they are being terminated during the copy (I also tried inserts, > with the same results). The values are stored as a sequence of bytes, and they are indeed being truncated during the COPY because the data was incorrectly escaped. > On a last ditch effort, I used ECPG with the following code... > EXEC SQL BEGIN DECLARE SECTION; > char myByteaField_[3][8]; > EXEC SQL END DECLARE SECTION; > > EXEC SQL SELECT myByteaField INTO :myByteaField_ FROM myTable; > for (int i=0; i<3; i++) > for (int j=0; j<8; j++) > std::cout << myByteaField_[i][j] << "(" << int(myByteaField_[i][j] << ")."; > std::cout << std::endl; > > I get the following.... > (0).(0).(0).(0).(0).(0).(0).(0). <- this is expected The myByteaField_ array might have been zeroed to begin with, and you're probably seeing that data, not data filled in from the query. What happens if you fill the array with a character like '*' before doing the SELECT? If I do that with the data input as in your example, I see the first character as \0 and the rest of the data as the garbage in the array: (0).*(42).*(42).*(42).*(42).*(42).*(42).*(42). > \(92).0(48).1(49).1(49).A(65).\(92).2(50).0(48) <- why escape sequence? Non-printable characters are escaped unless you request the results in binary; see the documentation link posted earlier. > \(92).0(48).1(49).2(50).(0).(0).(0).(0) <- dropped the last 6 bytes > > So I'm pretty sure that the data is stored as a string, and it > is being terminated at the '\0'. Bytea and other variable-length types are stored as a length attribute and then the data itself. See the discussion of TOAST in the documentation: http://www.postgresql.org/docs/8.0/interactive/storage-toast.html > ECPG not only has the same behavior, but the conversion of the > data to a char array seems incorrect (maybe the ECPG pre-compiler > is thinking that I want to_char() of the value of the byte array). Unless you request binary results, queries return text representations of the data; for bytea that means that non-printable characters will be escaped. > So my question is, how would you be able to store any data in a > bytea if it contained any bytes with the value of 0 (I'm sure there > are images stored as bytea and they have 0 byte values). I don't > need to store an image, just these 8 bytes. Escape the data correctly with two backslashes; the "Binary Data Types" documentation explains why this is necessary. -- Michael Fuhr
Thanks Michael, I appreciate your help. I could have sworn that I tried the COPY from stdin with the double backslash and ended up with the same result. But I triedthat and it seems to work. I also mistyped a couple of things, like the SELECT length(myByteField) FROM stdin; which should have been SELECT length(myByteField) FROM myTable; You are also correct that I ran the function PQgetlength and not PQgetLength. I tried that after I used the double slashwith the COPY function and that worked as expected (my bad on the double slash). But the ECPG part still doesn't seem to have the correct behavior. With the following in a .pgc file... EXEC SQL DECLARE SECTION; char myByteField_[3][8]; int fetchRows=3; EXEC SQL END DECLARE SECTION; for (int i=0; i<3; i++) for (int j=0; j<8; j++) myByteField[i][j]='*'; EXEC SQL CONNECT ... EXEC SQL DECLARE myCursor BINARY CURSOR FOR SELECT myByteaField FROM myTable FOR READ ONLY; EXEC SQL OPEN myCursor; EXEC SQL FETCH :fetchRows myCursor INTO :myByteField_; EXEC SQL CLOSE myCursor; for (int i=0;i<3;i++) for (int j=0;j<8;j++) std::cout << myByteaField_[i][j] << "(" << int(myByteaField_[i][j] << ")."; std::cout << std::endl; Shows the following output... (0).(0).(0).(0).(0).(0).(0).(0). <- correct (9).A(65).?(-122).m(109).^(94). (29).9(57).](93). <- correct (10).(0).(0).(0).(0).(0).(0).(0). <- not correct (stopped at '\0') So all is as it should be with only the ECPG part eluding me. Thank you again for showing me the error of my ways (I swearI did see the double slash for populating the bytea field, but it didn't seem to make a difference - now I know better). __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Thu, Sep 15, 2005 at 12:18:05PM -0700, Mark Richardson wrote: > But the ECPG part still doesn't seem to have the correct behavior. > With the following in a .pgc file... The code you posted has several syntax errors so it can't be what you're running. When posting examples, please show exactly what you're doing so people don't have to guess at what corrections to make. In any case, others have reported trouble handling bytea with ECPG; see the list archives for previous discussion. I see a lot of calls to strlen() in the ECPG source code, so it might not be designed to handle binary data. You might need to fetch the data in non-binary mode and use PQunescapeBytea() to convert the escape sequences to binary. -- Michael Fuhr
Thanks Michael, I appreciate your help.
I could have sworn that I tried the COPY from stdin with the double backslash and ended up with the same result. But I tried that and it seems to work.
I also mistyped a couple of things, like the
SELECT length(myByteField) FROM stdin;
which should have been
SELECT length(myByteField) FROM myTable;
You are also correct that I ran the function PQgetlength and not PQgetLength. I tried that after I used the double slash with the COPY function and that worked as expected (my bad on the double slash).
But the ECPG part still doesn't seem to have the correct behavior. With the following in a .pgc file...
EXEC SQL DECLARE SECTION;
char myByteField_[3][8];
int fetchRows=3;
EXEC SQL END DECLARE SECTION;
for (int i=0; i<3; i++)
for (int j=0; j<8; j++)
myByteField[i][j]='*';
EXEC SQL CONNECT ...
EXEC SQL DECLARE myCursor BINARY CURSOR FOR SELECT myByteaField FROM myTable FOR READ ONLY;
EXEC SQL OPEN myCursor;
EXEC SQL FETCH :fetchRows myCursor INTO :myByteField_;
EXEC SQL CLOSE myCursor;
for (int i=0;i<3;i++)
for (int j=0;j<8;j++)
std::cout << myByteaField_[i][j] << "(" << int(myByteaField_[i][j] << ").";
std::cout << std::endl;
Shows the following output...
(0).(0).(0).(0).(0).(0).(0).(0). <- correct
(9).A(65).?(-122).m(109).^(94). (29).9(57).](93). <- correct
(10).(0).(0).(0).(0).(0).(0).(0). <- not correct (stopped at '\0')
So all is as it should be with only the ECPG part eluding me. Thank you again for showing me the error of my ways (I swear I did see the double slash for populating the bytea field, but it didn't seem to make a difference - now I know better).
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
On Thu, Sep 15, 2005 at 06:22:22PM -0600, Michael Fuhr wrote: > In any case, others have reported trouble handling bytea with ECPG; > see the list archives for previous discussion. I see a lot of calls > to strlen() in the ECPG source code, so it might not be designed > to handle binary data. You might need to fetch the data in non-binary Your right, it is not. ECPG at the moment expects data to be ascii and translates it into the type of the variable. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!