pgtcl large object read/write corrupts binary data - Mailing list pgsql-bugs

From ljb
Subject pgtcl large object read/write corrupts binary data
Date
Msg-id bni06l$2jh$2@news.hub.org
Whole thread Raw
Responses Re: pgtcl large object read/write corrupts binary data
Re: pgtcl large object read/write corrupts binary data
List pgsql-bugs
[Using PostgreSQL-7.3.4 and -7.4beta5, Tcl-8.4.x.]

Binary data written to a Large Object with libpgtcl's pg_lo_write is
corrupted.  Tcl is mangling the data - something to do with UTF-8
conversion.  Example: 0x80 becomes 0xc2 0x80, and 0xff becomes 0xc3 0xbf.

The problem with pg_lo_read is more subtle. If you compare the expected and
actual data with == or [string equal], they do not match, but if you check
byte by byte, or write the two values to files, they do match. I believe
this is happening because pg_lo_read is returning an object which is
inconsistent between its Tcl "string rep" and internal byte array.

Here are 2 test scripts to show the problem. They assume your environment
variables are set up to allow a connection to PostgreSQL with an empty
'conninfo' string.

Quick test script for pg_lo_write problem:
========================
# Write to large object with pg_lo_write, export with pg_lo_export:
set data "\x80\xffzzzz"
set datalen 6
set conn [pg_connect -conninfo ""]
pg_execute $conn begin
set loid [pg_lo_creat $conn INV_READ|INV_WRITE]
set lofd [pg_lo_open $conn $loid w]
pg_lo_write $conn $lofd $data $datalen
pg_lo_close $conn $lofd
pg_lo_export $conn $loid lo.out
pg_lo_unlink $conn $loid
pg_execute $conn commit
pg_disconnect $conn
========================
Run this script with pgtclsh, then hexdump the file "lo.out".
Expected result: file contains "0x80 0xff 0x7a 0x7a 0x7a 0x7a"
Observed result: file contains "0xc2 0x80 0xc3 0xbf 0x7a 0x7a"


Quick test script for pg_lo_read problem:
========================
# Import large object with pg_lo_import, read back with pg_lo_read:
set data "\x80\xffzzzz"
set datalen 6
set f [open lo.in w]
fconfigure $f -translation binary
puts -nonewline $f $data
close $f
set conn [pg_connect -conninfo ""]
pg_execute $conn begin
set loid [pg_lo_import $conn lo.in]
set lofd [pg_lo_open $conn $loid r]
pg_lo_read $conn $lofd buf $datalen
pg_lo_close $conn $lofd
pg_lo_unlink $conn $loid
pg_execute $conn commit
pg_disconnect $conn
if {[string equal $buf $data]} { puts Match } else { puts Differ }
set f [open lo.in2 w]
fconfigure $f -translation binary
puts -nonewline $f $buf
close $f
========================
Run this script with pgtclsh.
Expected result: prints "Match"
Observed result: prints "Differ"
But hexdump the files "lo.in" and "lo2.in" to see identical contents.


Proposed Patch:  (I think this requires Tcl >= 8.1)
===================
--- src/interfaces/libpgtcl/pgtclCmds.c.orig    2003-08-03 22:40:16.000000000 -0400
+++ src/interfaces/libpgtcl/pgtclCmds.c    2003-10-25 20:36:58.000000000 -0400
@@ -1215,7 +1215,7 @@
     buf = ckalloc(len + 1);

     nbytes = lo_read(conn, fd, buf, len);
-    bufObj = Tcl_NewStringObj(buf, nbytes);
+    bufObj = Tcl_NewByteArrayObj(buf, nbytes);

     if (Tcl_ObjSetVar2(interp, bufVar, NULL, bufObj,
                        TCL_LEAVE_ERR_MSG | TCL_PARSE_PART1) == NULL)
@@ -1307,7 +1307,7 @@
     if (Tcl_GetIntFromObj(interp, objv[2], &fd) != TCL_OK)
         return TCL_ERROR;

-    buf = Tcl_GetStringFromObj(objv[3], &nbytes);
+    buf = Tcl_GetByteArrayFromObj(objv[3], &nbytes);

     if (Tcl_GetIntFromObj(interp, objv[4], &len) != TCL_OK)
         return TCL_ERROR;
===================

pgsql-bugs by date:

Previous
From: Gaetano Mendola
Date:
Subject: Autocomplete on Postgres7.4beta5 not working?
Next
From: Brage Førland
Date:
Subject: Transaction bug