Thread: bytea corruption?
got some binary data that changes when i insert and retrieve it later from bytea column: http://nate.quandra.org/data.bin.0.702601051229191 running 8.3 on debian 5.0. example: root=# create database testdb; CREATE DATABASE root=# \c testdb You are now connected to database "testdb". testdb=# create table testtable (id serial, data bytea); NOTICE: CREATE TABLE will create implicit sequence "testtable_id_seq" for serial column "testtable.id" CREATE TABLE testdb=# -- #!/usr/bin/perl -w use DBI; use Digest::MD5 qw(md5 md5_hex md5_base64); my $fh; open( $fh, '/tmp/data.bin.0.702601051229191' ) or die $!; binmode $fh; my $data = do { local( $/ ) ; <$fh> } ; close($fh); #$data = '123abc'; my $encodeddata = $data; $encodeddata =~ s!(\\|[^ -~])!sprintf("\\%03o",ord($1))!ge; #prepare data for bytea column storage my $connection = DBI->connect_cached("dbi:Pg:dbname=testdb;port=5432", "root", "", {RaiseError=>1}); my $insert_sth = $connection->prepare('insert into testtable (data) values (?) returning id'); $insert_sth->execute($encodeddata); my $ref = $insert_sth->fetchrow_hashref; my $id = $ref->{id}; my $getall_sth = $connection->prepare('select * from testtable where id=?'); $getall_sth->execute($id); my $newref = $getall_sth->fetchrow_hashref; my $newdata = $newref->{data}; $newdata =~ s!\\(?:\\|(\d{3}))!$1 ? chr(oct($1)) : "\\"!ge; #decode bytea column storage format print md5_hex($data).' '; print '!' if md5_hex($data) ne md5_hex($newdata); print '= '.md5_hex($newdata); print "\n"; __END__ hash of data changes ... if you uncomment the $data = '123abc' line you can see that it works with those six bytes fine, and it also works with most other binary data, just not this binary data. any insight would be appreciated. thanks. nathan
Nathan Jahnke <njahnke@gmail.com> wrote: > [...] > my $encodeddata = $data; > $encodeddata =~ s!(\\|[^ -~])!sprintf("\\%03o",ord($1))!ge; #prepare > data for bytea column storage > [...] > my $insert_sth = $connection->prepare('insert into testtable (data) > values (?) returning id'); > $insert_sth->execute($encodeddata); > my $ref = $insert_sth->fetchrow_hashref; > my $id = $ref->{id}; > my $getall_sth = $connection->prepare('select * from testtable where id=?'); > $getall_sth->execute($id); > my $newref = $getall_sth->fetchrow_hashref; > my $newdata = $newref->{data}; > $newdata =~ s!\\(?:\\|(\d{3}))!$1 ? chr(oct($1)) : "\\"!ge; #decode > bytea column storage format > [...] > hash of data changes ... if you uncomment the $data = '123abc' line > you can see that it works with those six bytes fine, and it also works > with most other binary data, just not this binary data. any insight > would be appreciated. thanks. Why do you encode/decode the data in your own application a second time? It is already encoded by DBD::Pg. Tim
good catch - it's because i'm used to working in plperlu. unfortunately commenting out those lines makes no difference for this particular data (that i linked in my original email); it's still corrupted: # ./bytea.pl 37652cf91fb8d5e41d3a90ea3a22ea61 != ce3fc63b88993af73fb360c70b7ec965 nathan On Fri, Aug 21, 2009 at 6:11 PM, Tim Landscheidt<tim@tim-landscheidt.de> wrote: > Nathan Jahnke <njahnke@gmail.com> wrote: > >> [...] >> my $encodeddata = $data; >> $encodeddata =~ s!(\\|[^ -~])!sprintf("\\%03o",ord($1))!ge; #prepare >> data for bytea column storage > >> [...] > >> my $insert_sth = $connection->prepare('insert into testtable (data) >> values (?) returning id'); >> $insert_sth->execute($encodeddata); >> my $ref = $insert_sth->fetchrow_hashref; >> my $id = $ref->{id}; > >> my $getall_sth = $connection->prepare('select * from testtable where id=?'); >> $getall_sth->execute($id); >> my $newref = $getall_sth->fetchrow_hashref; >> my $newdata = $newref->{data}; >> $newdata =~ s!\\(?:\\|(\d{3}))!$1 ? chr(oct($1)) : "\\"!ge; #decode >> bytea column storage format >> [...] > >> hash of data changes ... if you uncomment the $data = '123abc' line >> you can see that it works with those six bytes fine, and it also works >> with most other binary data, just not this binary data. any insight >> would be appreciated. thanks. > > Why do you encode/decode the data in your own application a > second time? It is already encoded by DBD::Pg. > > Tim > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Nathan Jahnke wrote: > good catch - it's because i'm used to working in plperlu. > unfortunately commenting out those lines makes no difference for this > particular data (that i linked in my original email); it's still > corrupted: Don't remove both: remove only the custom decoding. It's different for the encoding step. It can also be removed, but in this case you need to tell DBD::Pg that your data is binary, like this: $insert_sth->bind_param(1, $data, { pg_type => DBD::Pg::PG_BYTEA }); $insert_sth->execute(); (and have $data be raw binary, no custom encoding). -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
wrong reply-address; please disregard the last message from me. thanks for your help. unfortunately i'm still getting corruption on this particular data (available at http://nate.quandra.org/data.bin.0.702601051229191 ) even with these changes: # ./bytea.pl Argument "DBD::Pg::PG_BYTEA" isn't numeric in subroutine entry at ./bytea.pl line 18. 37652cf91fb8d5e41d3a90ea3a22ea61 != ce3fc63b88993af73fb360c70b7ec965 things work fine if i make the data "123abc": # ./bytea.pl Argument "DBD::Pg::PG_BYTEA" isn't numeric in subroutine entry at ./bytea.pl line 18. a906449d5769fa7361d7ecc6aa3f6d28 = a906449d5769fa7361d7ecc6aa3f6d28 below is my script as it stands now: #!/usr/bin/perl -w use DBI; use Digest::MD5 qw(md5 md5_hex md5_base64); my $fh; open( $fh, '/tmp/data.bin.0.702601051229191' ) or die $!; binmode $fh; my $data = do { local( $/ ) ; <$fh> } ; close($fh); #$data = '123abc'; my $connection = DBI->connect_cached("dbi:Pg:dbname=testdb;port=5432", "root", "", {RaiseError=>1}); my $insert_sth = $connection->prepare('insert into testtable (data) values (?) returning id'); $insert_sth->bind_param(1, $data, { pg_type => DBD::Pg::PG_BYTEA }); $insert_sth->execute(); my $ref = $insert_sth->fetchrow_hashref; my $id = $ref->{id}; my $getall_sth = $connection->prepare('select * from testtable where id=?'); $getall_sth->execute($id); my $newref = $getall_sth->fetchrow_hashref; my $newdata = $newref->{data}; print md5_hex($data).' '; print '!' if md5_hex($data) ne md5_hex($newdata); print '= '.md5_hex($newdata); print "\n"; -- nathan On Sat, Aug 22, 2009 at 9:17 AM, Daniel Verite<daniel@manitou-mail.org> wrote: > Nathan Jahnke wrote: > >> good catch - it's because i'm used to working in plperlu. >> unfortunately commenting out those lines makes no difference for this >> particular data (that i linked in my original email); it's still >> corrupted: > > Don't remove both: remove only the custom decoding. > > It's different for the encoding step. It can also be removed, but in this > case you need to tell DBD::Pg that your data is binary, like this: > > $insert_sth->bind_param(1, $data, { pg_type => DBD::Pg::PG_BYTEA }); > $insert_sth->execute(); > > (and have $data be raw binary, no custom encoding). > > -- > Daniel > PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org >
Nathan Jahnke wrote: > thanks for your help. unfortunately i'm still getting corruption on > this particular data (available at > http://nate.quandra.org/data.bin.0.702601051229191 ) even with these > changes: > > # ./bytea.pl > Argument "DBD::Pg::PG_BYTEA" isn't numeric in subroutine entry at > ./bytea.pl line 18. > 37652cf91fb8d5e41d3a90ea3a22ea61 != ce3fc63b88993af73fb360c70b7ec965 Ah, you also need to add use DBD::Pg; at the beginning of the script for DBD::Pg::PG_BYTEA to be properly evaluated. Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
I'm probably a little late to this discussion, but I have had issues before with BYTEA in postgres before as well, this is what I found worked. use Digest::MD5; use DBI qw(:sql_types); use DBD::Pg qw(:pg_types); .... .... sub InsertBin($$$) { my ( $dbh, $md5sum, $filename ) = @_; open BIN,"<$filename" || die "Unable to open"; my $bin; while (<BIN>){ $bin .= $_; } my $insertsql = qq( INSERT INTO enc_virus VALUES( ?,?,? )); # filename - md5sum - binary my $sth = $dbh->prepare($insertsql); $sth->bind_param(1,$filename); $sth->bind_param(2,$md5sum); $sth->bind_param(3,$bin, { pg_type => PG_BYTEA }); $sth->execute(); } I hope this helps if you haven't figured it out yet On Saturday 22 August 2009 03:48:25 pm Daniel Verite wrote: > Nathan Jahnke wrote: > > thanks for your help. unfortunately i'm still getting corruption on > > this particular data (available at > > http://nate.quandra.org/data.bin.0.702601051229191 ) even with these > > changes: > > > > # ./bytea.pl > > Argument "DBD::Pg::PG_BYTEA" isn't numeric in subroutine entry at > > ./bytea.pl line 18. > > 37652cf91fb8d5e41d3a90ea3a22ea61 != ce3fc63b88993af73fb360c70b7ec965 > > Ah, you also need to add > use DBD::Pg; > at the beginning of the script for DBD::Pg::PG_BYTEA to be properly > evaluated. > > Best regards,
thank you very much, all. i was able to insert my data and get it back out with a matching hash. my problems were caused by confusion going between plperlu - which has the bytea storage explicit custom encoding requirement - and regular perl using dbd::pg - which does not as long as the data type is specified. so, for reference: plperlu: explicitly encode bytea before insert via spi: yes explicitly decode bytea after select via spi: yes perl (dbd::pg): explicitly encode bytea before insert via sth: no if using { pg_type => DBD::Pg::PG_BYTEA } as third arg in bind_param() (thanks daniel verite) explicitly decode bytea after select via sth: no nathan On Sun, Aug 23, 2009 at 9:20 AM, Colin Streicher<colin@obviouslymalicious.com> wrote: > I'm probably a little late to this discussion, but I have had issues before > with BYTEA in postgres before as well, this is what I found worked. > > use Digest::MD5; > use DBI qw(:sql_types); > use DBD::Pg qw(:pg_types); > > .... > .... > > sub InsertBin($$$) > { > my ( $dbh, $md5sum, $filename ) = @_; > open BIN,"<$filename" || die "Unable to open"; > my $bin; > while (<BIN>){ > $bin .= $_; > } > my $insertsql = qq( INSERT INTO enc_virus VALUES( ?,?,? )); # filename - > md5sum - binary > my $sth = $dbh->prepare($insertsql); > $sth->bind_param(1,$filename); > $sth->bind_param(2,$md5sum); > $sth->bind_param(3,$bin, { pg_type => PG_BYTEA }); > $sth->execute(); > } > > > I hope this helps if you haven't figured it out yet > > > On Saturday 22 August 2009 03:48:25 pm Daniel Verite wrote: >> Nathan Jahnke wrote: >> > thanks for your help. unfortunately i'm still getting corruption on >> > this particular data (available at >> > http://nate.quandra.org/data.bin.0.702601051229191 ) even with these >> > changes: >> > >> > # ./bytea.pl >> > Argument "DBD::Pg::PG_BYTEA" isn't numeric in subroutine entry at >> > ./bytea.pl line 18. >> > 37652cf91fb8d5e41d3a90ea3a22ea61 != ce3fc63b88993af73fb360c70b7ec965 >> >> Ah, you also need to add >> use DBD::Pg; >> at the beginning of the script for DBD::Pg::PG_BYTEA to be properly >> evaluated. >> >> Best regards, > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >