Thread: Binary data migration from MSSQL
Getting the management to let go of their death grip on MSSQL was the first (big!) hurdle, but I succeeded. Now I need to prove that PostgreSQL will work for us, but I can't seem to find a painless way to migrate my binary data. I've searched the MS docs, Postgres docs, pg TechDocs, List archives, Google, you name it. I have a table in MSSQL2000 with 20 million rows, about 60 columns. One of the columns is of type VARBINARY. ** pgAdminII Migration Wizard (1.4.12): ignores the binary column when importing. To be fair, this possibility is mentionedin the pgAdminII docs. However, it is totally painless for all my other non-binary tables and I love it! ** MS DTS Transformation: Migrates the data (including varbinary) perfectly. However, the migration fails at about 400,000rows because it runs out of memory (1.5 GB physical memory on the MSSQL machine). I'm trying to just import smallchunks of the table using a query for the source, but it is very slow and tedious even over Gigabit Ethernet. ** bcp: I'd like to use bcp to do a text import, but I can't figure out how to make the binary data work on either end. It appears to me that Postgres COPY requires literal binary data to be in octal format (e.g. '\\047'). bcp outputs thebinary data as an non-escaped ASCII string of hex values (e.g. DF9B52A3). I guess I could write a Java program to convertthe hex to escaped octal, then run COPY. This would probably be pretty slow (both from a 20-million row performancestandpoint and my personal productivity). Is there a better way to do this? Source machine: Dual P3 1.1 GHz, 1.5 GB RAM, single 18GB SCSI disk Win2k Server SP3, MS SQL 2000 SP3, pgAdmin 1.4.12, pgODBC 7.02.00.05 Destination machine: Dual Xeon 2.0 GHz, 4.0 GB RAM, 104GB 6-disk RAID-10 dedicated to /usr/local/pgsql/data RedHat 8.0 PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2 20020903 Thanks!!! Roman Fail Sr. Web Application Developer POS Portal, Inc.
Roman Fail wrote: > ** bcp: I'd like to use bcp to do a text import, but I can't figure out how > to make the binary data work on either end. It appears to me that Postgres > COPY requires literal binary data to be in octal format (e.g. '\\047'). > bcp outputs the binary data as an non-escaped ASCII string of hex values > (e.g. DF9B52A3). I guess I could write a Java program to convert the hex to > escaped octal, then run COPY. This would probably be pretty slow (both from > a 20-million row performance standpoint and my personal productivity). > > Is there a better way to do this? > I'd lean toward bcp as the fastest method, even if you need to do some data massaging. In MSSQL 2000 it appears that bcp can bulk copy a query in similar fashion to a table. Perhaps you could do the binary-to-octal transformation in the export query from MSSQL? Joe
>> ** bcp: I'd like to use bcp to do a text import, but I can't figure out how >> to make the binary data work on either end. It appears to me that Postgres >> COPY requires literal binary data to be in octal format (e.g. '\\047'). >> bcp outputs the binary data as an non-escaped ASCII string of hex values >> (e.g. DF9B52A3). I guess I could write a Java program to convert the hex to >> escaped octal, then run COPY. This would probably be pretty slow (both from >> a 20-million row performance standpoint and my personal productivity). >> >> Is there a better way to do this? >I'd lean toward bcp as the fastest method, even if you need to do some data >massaging. In MSSQL 2000 it appears that bcp can bulk copy a query in similar >fashion to a table. Perhaps you could do the binary-to-octal transformation in >the export query from MSSQL? > >Joe Thanks for the idea, Joe. I wrote a 'varbinary-to-octal_string' UDF on the MSSQL Server that I could call within a queryon the bcp command line. After some tweaking, it worked great and takes about an hour to dump all 20 million rows totext (which I think is pretty fast!). Wes Sheldah suggested modifying the bcp output with a perl script, but I opted not to do it that way. Although I don't knowperl, I could have just as easily used Java. The drawback of doing it this way is that it's harder for others to followif there is another language involved. I know everyone who wants to do this will already have a MSSQL server, andthat's all you need to create the modified file. I'd be happy to write up what I did and contribute it to techdocs, if someone would let me know where to send the document. I looked around on the techdocs site and didn't see any instructions. I'm still curious why the binary data won'tmigrate using pgAdminII, but I guess I'll ask that question on another list. Roman Fail Sr. Web Application Developer POS Portal, Inc.
Roman Fail wrote: > Thanks for the idea, Joe. I wrote a 'varbinary-to-octal_string' UDF on the > MSSQL Server that I could call within a query on the bcp command line. > After some tweaking, it worked great and takes about an hour to dump all 20 > million rows to text (which I think is pretty fast!). > > Wes Sheldah suggested modifying the bcp output with a perl script, but I > opted not to do it that way. Although I don't know perl, I could have just > as easily used Java. The drawback of doing it this way is that it's harder > for others to follow if there is another language involved. I know > everyone who wants to do this will already have a MSSQL server, and that's > all you need to create the modified file. > > I'd be happy to write up what I did and contribute it to techdocs, if > someone would let me know where to send the document. I looked around on > the techdocs site and didn't see any instructions. I'm still curious why > the binary data won't migrate using pgAdminII, but I guess I'll ask that > question on another list. I think Justin Clift is the right person to help you there -- Justin? Joe