Thread: Binary data migration from MSSQL

Binary data migration from MSSQL

From
"Roman Fail"
Date:
Getting the management to let go of their death grip on MSSQL was the first (big!) hurdle, but I succeeded. 
 
Now I need to prove that PostgreSQL will work for us, but I can't seem to find a painless way to migrate my binary
data. I've searched the MS docs, Postgres docs, pg TechDocs, List archives, Google, you name it.  I have a table in
MSSQL2000 with 20 million rows, about 60 columns.  One of the columns is of type VARBINARY.  
 
 
** pgAdminII Migration Wizard (1.4.12):  ignores the binary column when importing.  To be fair, this possibility is
mentionedin the pgAdminII docs.  However, it is totally painless for all my other non-binary tables and I love it!
 
 
** MS DTS Transformation: Migrates the data (including varbinary) perfectly.  However, the migration fails at about
400,000rows because it runs out of memory (1.5 GB physical memory on the MSSQL machine).  I'm trying to just import
smallchunks of the table using a query for the source, but it is very slow and tedious even over Gigabit Ethernet.
 
 
** bcp: I'd like to use bcp to do a text import, but I can't figure out how to make the binary data work on either end.
It appears to me that Postgres COPY requires literal binary data to be in octal format (e.g. '\\047').   bcp outputs
thebinary data as an non-escaped ASCII string of hex values (e.g. DF9B52A3). I guess I could write a Java program to
convertthe hex to escaped octal, then run COPY. This would probably be pretty slow (both from a 20-million row
performancestandpoint and my personal productivity).
 
 
Is there a better way to do this?
 
Source machine:
Dual P3 1.1 GHz, 1.5 GB RAM, single 18GB SCSI disk
Win2k Server SP3, MS SQL 2000 SP3, pgAdmin 1.4.12, pgODBC 7.02.00.05
 
Destination machine:
Dual Xeon 2.0 GHz, 4.0 GB RAM, 104GB 6-disk RAID-10 dedicated to /usr/local/pgsql/data
RedHat 8.0
PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2 20020903 
 
 
Thanks!!!
 
Roman Fail
Sr. Web Application Developer
POS Portal, Inc.

Re: Binary data migration from MSSQL

From
Joe Conway
Date:
Roman Fail wrote:
> ** bcp: I'd like to use bcp to do a text import, but I can't figure out how
> to make the binary data work on either end.  It appears to me that Postgres
> COPY requires literal binary data to be in octal format (e.g. '\\047').
> bcp outputs the binary data as an non-escaped ASCII string of hex values
> (e.g. DF9B52A3). I guess I could write a Java program to convert the hex to
> escaped octal, then run COPY. This would probably be pretty slow (both from
> a 20-million row performance standpoint and my personal productivity).
>
> Is there a better way to do this?
>

I'd lean toward bcp as the fastest method, even if you need to do some data
massaging. In MSSQL 2000 it appears that bcp can bulk copy a query in similar
fashion to a table. Perhaps you could do the binary-to-octal transformation in
the export query from MSSQL?

Joe


Re: Binary data migration from MSSQL

From
"Roman Fail"
Date:
>> ** bcp: I'd like to use bcp to do a text import, but I can't figure out how
>> to make the binary data work on either end.  It appears to me that Postgres
>> COPY requires literal binary data to be in octal format (e.g. '\\047').
>> bcp outputs the binary data as an non-escaped ASCII string of hex values
>> (e.g. DF9B52A3). I guess I could write a Java program to convert the hex to
>> escaped octal, then run COPY. This would probably be pretty slow (both from
>> a 20-million row performance standpoint and my personal productivity).
>>
>> Is there a better way to do this?

>I'd lean toward bcp as the fastest method, even if you need to do some data
>massaging. In MSSQL 2000 it appears that bcp can bulk copy a query in similar
>fashion to a table. Perhaps you could do the binary-to-octal transformation in
>the export query from MSSQL?
>
>Joe
 
Thanks for the idea, Joe.  I wrote a 'varbinary-to-octal_string' UDF on the MSSQL Server that I could call within a
queryon the bcp command line.  After some tweaking, it worked great and takes about an hour to dump all 20 million rows
totext (which I think is pretty fast!).
 
 
Wes Sheldah suggested modifying the bcp output with a perl script, but I opted not to do it that way.  Although I don't
knowperl, I could have just as easily used Java.  The drawback of doing it this way is that it's harder for others to
followif there is another language involved.  I know everyone who wants to do this will already have a MSSQL server,
andthat's all you need to create the modified file.
 
 
I'd be happy to write up what I did and contribute it to techdocs, if someone would let me know where to send the
document. I looked around on the techdocs site and didn't see any instructions.   I'm still curious why the binary data
won'tmigrate using pgAdminII, but I guess I'll ask that question on another list.
 
 
Roman Fail
Sr. Web Application Developer
POS Portal, Inc.
 

Re: Binary data migration from MSSQL

From
Joe Conway
Date:
Roman Fail wrote:
> Thanks for the idea, Joe.  I wrote a 'varbinary-to-octal_string' UDF on the
> MSSQL Server that I could call within a query on the bcp command line.
> After some tweaking, it worked great and takes about an hour to dump all 20
> million rows to text (which I think is pretty fast!).
>
> Wes Sheldah suggested modifying the bcp output with a perl script, but I
> opted not to do it that way.  Although I don't know perl, I could have just
> as easily used Java.  The drawback of doing it this way is that it's harder
> for others to follow if there is another language involved.  I know
> everyone who wants to do this will already have a MSSQL server, and that's
> all you need to create the modified file.
>
> I'd be happy to write up what I did and contribute it to techdocs, if
> someone would let me know where to send the document.  I looked around on
> the techdocs site and didn't see any instructions.   I'm still curious why
> the binary data won't migrate using pgAdminII, but I guess I'll ask that
> question on another list.

I think Justin Clift is the right person to help you there -- Justin?

Joe