Thread: pgtcl large object read/write corrupts binary data

pgtcl large object read/write corrupts binary data

From
ljb
Date:
[Using PostgreSQL-7.3.4 and -7.4beta5, Tcl-8.4.x.]

Binary data written to a Large Object with libpgtcl's pg_lo_write is
corrupted.  Tcl is mangling the data - something to do with UTF-8
conversion.  Example: 0x80 becomes 0xc2 0x80, and 0xff becomes 0xc3 0xbf.

The problem with pg_lo_read is more subtle. If you compare the expected and
actual data with == or [string equal], they do not match, but if you check
byte by byte, or write the two values to files, they do match. I believe
this is happening because pg_lo_read is returning an object which is
inconsistent between its Tcl "string rep" and internal byte array.

Here are 2 test scripts to show the problem. They assume your environment
variables are set up to allow a connection to PostgreSQL with an empty
'conninfo' string.

Quick test script for pg_lo_write problem:
========================
# Write to large object with pg_lo_write, export with pg_lo_export:
set data "\x80\xffzzzz"
set datalen 6
set conn [pg_connect -conninfo ""]
pg_execute $conn begin
set loid [pg_lo_creat $conn INV_READ|INV_WRITE]
set lofd [pg_lo_open $conn $loid w]
pg_lo_write $conn $lofd $data $datalen
pg_lo_close $conn $lofd
pg_lo_export $conn $loid lo.out
pg_lo_unlink $conn $loid
pg_execute $conn commit
pg_disconnect $conn
========================
Run this script with pgtclsh, then hexdump the file "lo.out".
Expected result: file contains "0x80 0xff 0x7a 0x7a 0x7a 0x7a"
Observed result: file contains "0xc2 0x80 0xc3 0xbf 0x7a 0x7a"


Quick test script for pg_lo_read problem:
========================
# Import large object with pg_lo_import, read back with pg_lo_read:
set data "\x80\xffzzzz"
set datalen 6
set f [open lo.in w]
fconfigure $f -translation binary
puts -nonewline $f $data
close $f
set conn [pg_connect -conninfo ""]
pg_execute $conn begin
set loid [pg_lo_import $conn lo.in]
set lofd [pg_lo_open $conn $loid r]
pg_lo_read $conn $lofd buf $datalen
pg_lo_close $conn $lofd
pg_lo_unlink $conn $loid
pg_execute $conn commit
pg_disconnect $conn
if {[string equal $buf $data]} { puts Match } else { puts Differ }
set f [open lo.in2 w]
fconfigure $f -translation binary
puts -nonewline $f $buf
close $f
========================
Run this script with pgtclsh.
Expected result: prints "Match"
Observed result: prints "Differ"
But hexdump the files "lo.in" and "lo2.in" to see identical contents.


Proposed Patch:  (I think this requires Tcl >= 8.1)
===================
--- src/interfaces/libpgtcl/pgtclCmds.c.orig    2003-08-03 22:40:16.000000000 -0400
+++ src/interfaces/libpgtcl/pgtclCmds.c    2003-10-25 20:36:58.000000000 -0400
@@ -1215,7 +1215,7 @@
     buf = ckalloc(len + 1);

     nbytes = lo_read(conn, fd, buf, len);
-    bufObj = Tcl_NewStringObj(buf, nbytes);
+    bufObj = Tcl_NewByteArrayObj(buf, nbytes);

     if (Tcl_ObjSetVar2(interp, bufVar, NULL, bufObj,
                        TCL_LEAVE_ERR_MSG | TCL_PARSE_PART1) == NULL)
@@ -1307,7 +1307,7 @@
     if (Tcl_GetIntFromObj(interp, objv[2], &fd) != TCL_OK)
         return TCL_ERROR;

-    buf = Tcl_GetStringFromObj(objv[3], &nbytes);
+    buf = Tcl_GetByteArrayFromObj(objv[3], &nbytes);

     if (Tcl_GetIntFromObj(interp, objv[4], &len) != TCL_OK)
         return TCL_ERROR;
===================

Re: pgtcl large object read/write corrupts binary data

From
Tom Lane
Date:
ljb <ljb220@mindspring.com> writes:
> [Using PostgreSQL-7.3.4 and -7.4beta5, Tcl-8.4.x.]
> Binary data written to a Large Object with libpgtcl's pg_lo_write is
> corrupted.  Tcl is mangling the data - something to do with UTF-8
> conversion.  Example: 0x80 becomes 0xc2 0x80, and 0xff becomes 0xc3 0xbf.
> ...
> Proposed Patch:  (I think this requires Tcl >= 8.1)

Yeah, it appears that ByteArrayObj was added in Tcl 8.1.  We were
already requiring Tcl 8.0 of course.  Does anyone think that it's
important to continue support for Tcl 8.0.* in libpgtcl?
        regards, tom lane


Re: pgtcl large object read/write corrupts binary data

From
Tom Lane
Date:
ljb <ljb220@mindspring.com> writes:
> [Using PostgreSQL-7.3.4 and -7.4beta5, Tcl-8.4.x.]
> Binary data written to a Large Object with libpgtcl's pg_lo_write is
> corrupted.  Tcl is mangling the data - something to do with UTF-8
> conversion.  Example: 0x80 becomes 0xc2 0x80, and 0xff becomes 0xc3 0xbf.

Patch applied to 7.4, along with a further fix to handle a negative return
value from lo_read properly.  Thanks for the report and patch!

            regards, tom lane

Re: [INTERFACES] pgtcl large object read/write corrupts binary data

From
Brett Schwarz
Date:
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ljb <ljb220@mindspring.com> writes:
> > [Using PostgreSQL-7.3.4 and -7.4beta5, Tcl-8.4.x.]
> > Binary data written to a Large Object with
> libpgtcl's pg_lo_write is
> > corrupted.  Tcl is mangling the data - something
> to do with UTF-8
> > conversion.  Example: 0x80 becomes 0xc2 0x80, and
> 0xff becomes 0xc3 0xbf.
> > ...
> > Proposed Patch:  (I think this requires Tcl >=
> 8.1)
> 
> Yeah, it appears that ByteArrayObj was added in Tcl
> 8.1.  We were
> already requiring Tcl 8.0 of course.  Does anyone
> think that it's
> important to continue support for Tcl 8.0.* in
> libpgtcl?
> 

I don't think it is important to require Tcl 8.0
anymore. Tcl 8.0 came out in 1998. We might be able to
get rid of some of the version checks as well in the
code.
    --brett


__________________________________
Do you Yahoo!?
Exclusive Video Premiere - Britney Spears
http://launch.yahoo.com/promos/britneyspears/