Thread: bytea corruption?

bytea corruption?

From
Nathan Jahnke
Date:
got some binary data that changes when i insert and retrieve it later
from bytea column:

http://nate.quandra.org/data.bin.0.702601051229191

running 8.3 on debian 5.0.

example:

root=# create database testdb;
CREATE DATABASE
root=# \c testdb
You are now connected to database "testdb".
testdb=# create table testtable (id serial, data bytea);
NOTICE:  CREATE TABLE will create implicit sequence "testtable_id_seq"
for serial column "testtable.id"
CREATE TABLE
testdb=#

--

#!/usr/bin/perl -w

use DBI;
use Digest::MD5 qw(md5 md5_hex md5_base64);


my $fh;
open( $fh, '/tmp/data.bin.0.702601051229191' ) or die $!;
binmode $fh;
my $data = do { local( $/ ) ; <$fh> } ;
close($fh);

#$data = '123abc';

my $encodeddata = $data;
$encodeddata =~ s!(\\|[^ -~])!sprintf("\\%03o",ord($1))!ge; #prepare
data for bytea column storage

my $connection = DBI->connect_cached("dbi:Pg:dbname=testdb;port=5432",
"root", "", {RaiseError=>1});

my $insert_sth = $connection->prepare('insert into testtable (data)
values (?) returning id');
$insert_sth->execute($encodeddata);
my $ref = $insert_sth->fetchrow_hashref;
my $id = $ref->{id};

my $getall_sth = $connection->prepare('select * from testtable where id=?');
$getall_sth->execute($id);
my $newref = $getall_sth->fetchrow_hashref;
my $newdata = $newref->{data};
$newdata =~ s!\\(?:\\|(\d{3}))!$1 ? chr(oct($1)) : "\\"!ge; #decode
bytea column storage format

print md5_hex($data).' ';
print '!' if md5_hex($data) ne md5_hex($newdata);
print '= '.md5_hex($newdata);

print "\n";


__END__


hash of data changes ... if you uncomment the $data = '123abc' line
you can see that it works with those six bytes fine, and it also works
with most other binary data, just not this binary data. any insight
would be appreciated. thanks.


nathan

Re: bytea corruption?

From
Tim Landscheidt
Date:
Nathan Jahnke <njahnke@gmail.com> wrote:

> [...]
> my $encodeddata = $data;
> $encodeddata =~ s!(\\|[^ -~])!sprintf("\\%03o",ord($1))!ge; #prepare
> data for bytea column storage

> [...]

> my $insert_sth = $connection->prepare('insert into testtable (data)
> values (?) returning id');
> $insert_sth->execute($encodeddata);
> my $ref = $insert_sth->fetchrow_hashref;
> my $id = $ref->{id};

> my $getall_sth = $connection->prepare('select * from testtable where id=?');
> $getall_sth->execute($id);
> my $newref = $getall_sth->fetchrow_hashref;
> my $newdata = $newref->{data};
> $newdata =~ s!\\(?:\\|(\d{3}))!$1 ? chr(oct($1)) : "\\"!ge; #decode
> bytea column storage format
> [...]

> hash of data changes ... if you uncomment the $data = '123abc' line
> you can see that it works with those six bytes fine, and it also works
> with most other binary data, just not this binary data. any insight
> would be appreciated. thanks.

Why do you encode/decode the data in your own application a
second time? It is already encoded by DBD::Pg.

Tim

Re: bytea corruption?

From
Nathan Jahnke
Date:
good catch - it's because i'm used to working in plperlu.
unfortunately commenting out those lines makes no difference for this
particular data (that i linked in my original email); it's still
corrupted:

# ./bytea.pl
37652cf91fb8d5e41d3a90ea3a22ea61 != ce3fc63b88993af73fb360c70b7ec965


nathan


On Fri, Aug 21, 2009 at 6:11 PM, Tim Landscheidt<tim@tim-landscheidt.de> wrote:
> Nathan Jahnke <njahnke@gmail.com> wrote:
>
>> [...]
>> my $encodeddata = $data;
>> $encodeddata =~ s!(\\|[^ -~])!sprintf("\\%03o",ord($1))!ge; #prepare
>> data for bytea column storage
>
>> [...]
>
>> my $insert_sth = $connection->prepare('insert into testtable (data)
>> values (?) returning id');
>> $insert_sth->execute($encodeddata);
>> my $ref = $insert_sth->fetchrow_hashref;
>> my $id = $ref->{id};
>
>> my $getall_sth = $connection->prepare('select * from testtable where id=?');
>> $getall_sth->execute($id);
>> my $newref = $getall_sth->fetchrow_hashref;
>> my $newdata = $newref->{data};
>> $newdata =~ s!\\(?:\\|(\d{3}))!$1 ? chr(oct($1)) : "\\"!ge; #decode
>> bytea column storage format
>> [...]
>
>> hash of data changes ... if you uncomment the $data = '123abc' line
>> you can see that it works with those six bytes fine, and it also works
>> with most other binary data, just not this binary data. any insight
>> would be appreciated. thanks.
>
> Why do you encode/decode the data in your own application a
> second time? It is already encoded by DBD::Pg.
>
> Tim
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: bytea corruption?

From
"Daniel Verite"
Date:
    Nathan Jahnke wrote:

> good catch - it's because i'm used to working in plperlu.
> unfortunately commenting out those lines makes no difference for this
> particular data (that i linked in my original email); it's still
> corrupted:

Don't remove both: remove only the custom decoding.

It's different for the encoding step. It can also be removed, but in this
case you need to tell DBD::Pg that your data is binary, like this:

$insert_sth->bind_param(1, $data, { pg_type => DBD::Pg::PG_BYTEA });
$insert_sth->execute();

(and have $data be raw binary, no custom encoding).

--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org

Re: bytea corruption?

From
Nathan Jahnke
Date:
wrong reply-address; please disregard the last message from me.


thanks for your help. unfortunately i'm still getting corruption on
this particular data (available at
http://nate.quandra.org/data.bin.0.702601051229191 ) even with these
changes:

# ./bytea.pl
Argument "DBD::Pg::PG_BYTEA" isn't numeric in subroutine entry at
./bytea.pl line 18.
37652cf91fb8d5e41d3a90ea3a22ea61 != ce3fc63b88993af73fb360c70b7ec965

things work fine if i make the data "123abc":

# ./bytea.pl
Argument "DBD::Pg::PG_BYTEA" isn't numeric in subroutine entry at
./bytea.pl line 18.
a906449d5769fa7361d7ecc6aa3f6d28 = a906449d5769fa7361d7ecc6aa3f6d28

below is my script as it stands now:

#!/usr/bin/perl -w

use DBI;
use Digest::MD5 qw(md5 md5_hex md5_base64);


my $fh;
open( $fh, '/tmp/data.bin.0.702601051229191' ) or die $!;
binmode $fh;
my $data = do { local( $/ ) ; <$fh> } ;
close($fh);

#$data = '123abc';

my $connection = DBI->connect_cached("dbi:Pg:dbname=testdb;port=5432",
"root", "", {RaiseError=>1});

my $insert_sth = $connection->prepare('insert into testtable (data)
values (?) returning id');
$insert_sth->bind_param(1, $data, { pg_type => DBD::Pg::PG_BYTEA });
$insert_sth->execute();
my $ref = $insert_sth->fetchrow_hashref;
my $id = $ref->{id};

my $getall_sth = $connection->prepare('select * from testtable where id=?');
$getall_sth->execute($id);
my $newref = $getall_sth->fetchrow_hashref;
my $newdata = $newref->{data};

print md5_hex($data).' ';
print '!' if md5_hex($data) ne md5_hex($newdata);
print '= '.md5_hex($newdata);

print "\n";

--


nathan


On Sat, Aug 22, 2009 at 9:17 AM, Daniel Verite<daniel@manitou-mail.org> wrote:
>        Nathan Jahnke wrote:
>
>> good catch - it's because i'm used to working in plperlu.
>> unfortunately commenting out those lines makes no difference for this
>> particular data (that i linked in my original email); it's still
>> corrupted:
>
> Don't remove both: remove only the custom decoding.
>
> It's different for the encoding step. It can also be removed, but in this
> case you need to tell DBD::Pg that your data is binary, like this:
>
> $insert_sth->bind_param(1, $data, { pg_type => DBD::Pg::PG_BYTEA });
> $insert_sth->execute();
>
> (and have $data be raw binary, no custom encoding).
>
> --
> Daniel
> PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
>

Re: bytea corruption?

From
"Daniel Verite"
Date:
    Nathan Jahnke wrote:

> thanks for your help. unfortunately i'm still getting corruption on
> this particular data (available at
> http://nate.quandra.org/data.bin.0.702601051229191 ) even with these
> changes:
>
> # ./bytea.pl
> Argument "DBD::Pg::PG_BYTEA" isn't numeric in subroutine entry at
> ./bytea.pl line 18.
> 37652cf91fb8d5e41d3a90ea3a22ea61 != ce3fc63b88993af73fb360c70b7ec965

Ah, you also need to add
use DBD::Pg;
at the beginning of the script for DBD::Pg::PG_BYTEA to be properly
evaluated.

Best regards,
--
 Daniel
 PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org

Re: bytea corruption?

From
Colin Streicher
Date:
I'm probably a little late to this discussion, but I have had issues before
with BYTEA in postgres before as well, this is what I found worked.

use Digest::MD5;
use DBI qw(:sql_types);
use DBD::Pg qw(:pg_types);

....
....

sub InsertBin($$$)
{
    my ( $dbh, $md5sum, $filename ) = @_;
    open BIN,"<$filename" || die "Unable to open";
    my $bin;
    while (<BIN>){
        $bin .= $_;
        }
    my $insertsql = qq( INSERT INTO enc_virus VALUES( ?,?,? )); #  filename -
md5sum - binary
    my $sth = $dbh->prepare($insertsql);
    $sth->bind_param(1,$filename);
    $sth->bind_param(2,$md5sum);
    $sth->bind_param(3,$bin, { pg_type => PG_BYTEA });
    $sth->execute();
    }


I hope this helps if you haven't figured it out yet


On Saturday 22 August 2009 03:48:25 pm Daniel Verite wrote:
>     Nathan Jahnke wrote:
> > thanks for your help. unfortunately i'm still getting corruption on
> > this particular data (available at
> > http://nate.quandra.org/data.bin.0.702601051229191 ) even with these
> > changes:
> >
> > # ./bytea.pl
> > Argument "DBD::Pg::PG_BYTEA" isn't numeric in subroutine entry at
> > ./bytea.pl line 18.
> > 37652cf91fb8d5e41d3a90ea3a22ea61 != ce3fc63b88993af73fb360c70b7ec965
>
> Ah, you also need to add
> use DBD::Pg;
> at the beginning of the script for DBD::Pg::PG_BYTEA to be properly
> evaluated.
>
> Best regards,

Re: bytea corruption?

From
Nathan Jahnke
Date:
thank you very much, all. i was able to insert my data and get it back
out with a matching hash. my problems were caused by confusion going
between plperlu - which has the bytea storage explicit custom encoding
requirement - and regular perl using dbd::pg - which does not as long
as the data type is specified.

so, for reference:

plperlu:
explicitly encode bytea before insert via spi: yes
explicitly decode bytea after select via spi: yes

perl (dbd::pg):
explicitly encode bytea before insert via sth: no if using { pg_type
=> DBD::Pg::PG_BYTEA } as third arg in bind_param() (thanks daniel
verite)
explicitly decode bytea after select via sth: no


nathan


On Sun, Aug 23, 2009 at 9:20 AM, Colin
Streicher<colin@obviouslymalicious.com> wrote:
> I'm probably a little late to this discussion, but I have had issues before
> with BYTEA in postgres before as well, this is what I found worked.
>
> use Digest::MD5;
> use DBI qw(:sql_types);
> use DBD::Pg qw(:pg_types);
>
> ....
> ....
>
> sub InsertBin($$$)
> {
>        my ( $dbh, $md5sum, $filename ) = @_;
>        open BIN,"<$filename" || die "Unable to open";
>        my $bin;
>        while (<BIN>){
>                $bin .= $_;
>                }
>        my $insertsql = qq( INSERT INTO enc_virus VALUES( ?,?,? )); #  filename -
> md5sum - binary
>        my $sth = $dbh->prepare($insertsql);
>        $sth->bind_param(1,$filename);
>        $sth->bind_param(2,$md5sum);
>        $sth->bind_param(3,$bin, { pg_type => PG_BYTEA });
>        $sth->execute();
>        }
>
>
> I hope this helps if you haven't figured it out yet
>
>
> On Saturday 22 August 2009 03:48:25 pm Daniel Verite wrote:
>>       Nathan Jahnke wrote:
>> > thanks for your help. unfortunately i'm still getting corruption on
>> > this particular data (available at
>> > http://nate.quandra.org/data.bin.0.702601051229191 ) even with these
>> > changes:
>> >
>> > # ./bytea.pl
>> > Argument "DBD::Pg::PG_BYTEA" isn't numeric in subroutine entry at
>> > ./bytea.pl line 18.
>> > 37652cf91fb8d5e41d3a90ea3a22ea61 != ce3fc63b88993af73fb360c70b7ec965
>>
>> Ah, you also need to add
>> use DBD::Pg;
>> at the beginning of the script for DBD::Pg::PG_BYTEA to be properly
>> evaluated.
>>
>> Best regards,
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>