Re: [GENERAL] SQL Server to PostgreSQL HOWTO - Mailing list pgsql-docs

From Justin Clift
Subject Re: [GENERAL] SQL Server to PostgreSQL HOWTO
Date
Msg-id 3B99C828.80630A3B@postgresql.org
Whole thread Raw
In response to Re: [GENERAL] SQL Server to PostgreSQL HOWTO  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-docs
Hi Bruce,

Sure can.

It's now been added to my TODO list.  Could be a week or so until it
shows up though.

:-/

Regards and best wishes,

Justin Clift


Bruce Momjian wrote:
>
> Justin, can you add this to techdocs?
>
> > Someone asked me to do this, and I have a draft now.   While it is clearly lacking in some areas, it might be of
helpgenerally. 
> >
> > Ian
> >
> > Ian A. Harding
> > Programmer/Analyst II
> > Tacoma-Pierce County Health Department
> > (253) 798-3549
> > mailto: ianh@tpchd.org
> >
>
> [ Attachment, skipping... ]
>
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
>
>   ------------------------------------------------------------------------
> The Microsoft SQL Server to PostgreSQL Migration HOWTO
> Ian A. Harding <ianh@tpchd.org>
> v1.00, 23 July 2000
>
> How to move a database from a popular proprietary database to the world's most powerful open source database.
> ______________________________________________________________________
>
> Table of Contents
>
> 1. Disclaimer
>
> 2. Introduction
>
> 3. Considerations
>
> 4. Tables
>
> 5. Data
>
> 6. Views
>
> 7. Summary
>
>  ______________________________________________________________________
>
> 1.Disclaimer
>
> The following document is offered in good faith as comprising only safe programming and procedures. No responsibility
isaccepted by the author for any loss or damage caused in any way to any person or equipment, as a direct or indirect
consequenceof following these instructions. 
>
> 2.Introduction
>
> The most recent version of this document can always be found at http://www.tpchd.org/????.html
>
> Microsoft SQL Server is very popular relational database management systems (RDBMS) with highly restrictive licensing
andhigh cost of ownership if the database is of significant size, or is used by a significant number of clients.  It
does,however, provide a very user-friendly interface, is easy to learn and use, and has low cost entry level
configurations. This has resulted in a very large installed user base. 
>
> PostgreSQL now challenges MS SQL Server in basic feature set, reliability and performance, has a much less
restrictivelicense, and is open source.  As a matter of course, users are migrating to PostgreSQL from MS SQL Server as
thecost of ownership becomes an issue, and as their knowledge of relational database systems increases. 
>
> This HOW-TO is intended for the MS SQL Server user who is now ready to migrate databases to PostgreSQL.
>
> 3.Considerations
>
> RDBMS features are implemented differently and to different degrees by programmers.  Some applications rely heavily
onso-called middleware, or on the client application to handle business logic.  Others attempt to put as much logic as
possiblein the database.  Your migration will be far more difficult if your application is in the latter group.  While
itis a sound design choice to put logic in the database server, it will require programming in a vendor specific
StructuredQuery Language (SQL) extension such as Microsoft's Transact SQL (T-SQL).  This is also the case with
PostgreSQL. There is no easy way to migrate stored procedures, triggers, or rules.  On the bright side, PostgreSQL
providesseveral language options, all of which are more graceful than T-SQL. 
>
> RDBMS all provide built-in functions.  However, like procedural extensions to SQL, they are not portable.
Fortunately,there is some overlap, and the simple syntax makes migration relatively easy. 
>
> Finally, the programmer's choice of SQL syntax can affect this process.  Most RDBMS are approaching the evolving SQL
standards. That is, they are leaning away from vendor specific syntax such as the '*=' syntax for a left outer join.
Thissyntax is still supported in MS SQL Server as of version 7.0, but was never supported in PostgreSQL. 
>
> This process will require either a mind-numbing amount of hand editing of script and data files, or use of a
scriptinglanguage to programmatically modify these files, followed by a somewhat less enormous amount of editing.  I am
notsmart enough to identify every possible option for the migration, or to accomodate them in a script.  I have done
thismigration on a relatively complex database application in a reasonable amount of time.  This, rather than a
technicallyflawless script, should be your goal. 
>
> I use Tool Command Language (TCL) for almost everything, so I use it here.You can use whatever language you like.
>
> 4.  Tables
>
> Dump the table defininitions with the MS SQL Server scripting tool.  From the Enterprise Manager, right click on your
databaseand select 'All Tasks', then 'Generate SQL Scripts' from the context menu.  Uncheck 'Script All Objects', and
select'All Tables'.  On the 'Formatting' tab, de-select 'Generate DROP...'.  On the 'Options' tab, select 'Script
indexes'and Script PRIMARY KEYS...'.  Select the 'MS-DOS' file format, and make sure 'Create one file' is checked.
ClickOK, give it a name, and put it somewhere you can find it. 
>
> A brief look at this file will show you what we are up against.  MS uses square brackets around all identifiers, to
protectyou from poor design choices such as using reserved keywords so crazy things like: 
>
> CREATE TABLE [dbo].[Select] ([Union] [int])
>
> are possible.  PostgreSQL uses double quotes instead.  MS uses the object owner qualification for all objects, 'dbo'
inthis case.  PostgreSQL has no such qualifications in object names. 
>
> Another thing to note is that MS SQL identifiers are case preserved, but in practice most installations are installed
ascase insensitive.PostgreSQL is case agnostic in the case of SQL keywords and unquoted identifiers, forcing all
queriesto lower case.  It is not the same as being case insensitive, in that you can create tables using the double
quoteprotection mentioned above, such that they can only be accessed using the same double quoting method.I find it is
bestto abandon case in object identifiers when migrating to PostgreSQL.  Also, it is safest to avoid any identifiers
thatrequire quoting to avoid problems down the road. 
>
> It is worth noting that for data comparisons, PostgreSQL is case sensitive and there is no option to change this
behaviour.Youwill have to force data to upper or lower on both sides of text comparisons if case is not important to
theoperation and there is a chance of it being different.  This conversion might be a good time to force data used in
joinsand comparisons to all upper or lower case.  You will also need to look at the application for code that does
comparisonsof user-entered information taking advantage of MS SQL Server's typical case insensitivity. 
>
> Another issue that is not immediately evident is that MS SQL Server supports ALTER TABLE statements that contain
commaseparated lists of alterations.  PostgreSQL currently does not.  This is important since the MS SQL Server
scriptingprogram creates all constraints in ALTER TABLE statements.  If any tables have more than one constraint, you
willhave to surgically separate them into their own ALTER TABLE statements, or move them in the CREATE TABLE statement. 
>
> Indexes are a bright spot, mostly.  The CLUSTER keyword in PostgreSQL is not the same as the CLUSTERED keyword in a
MSSQL Server index creation.  PostgreSQL will allow you to 'cluster' a table, that is, rearranging the tuples in a
tablein order for that field.  This sounds good, except that the cluster is not maintained for updates and inserts, and
thefact that it will break all your other indexes whenever you generate the clustering. 
>
> Having said all that, here is a partial list of things to correct:
>
>   1.  Force to lower case.
>
>   2.  Remove all square brackets.
>
>   3.  Remove all object owner prefixes (i.e. "dbo.")
>
>   4.  Remove all reference to filegroup (i.e. "ON PRIMARY")
>
>   5.  Remove all non-supported optional keywords (i.e. "WITH NOCHECK", "CLUSTERED")
>
>   6.  Update all non-supported data types (i.e. "DATETIME" becomes "TIMESTAMP") Also, this is a good time to get away
fromMONEY.It is supported in PostgreSQL, but is on its way out.Use NUMERIC(19,4). 
>
> 7.  Replace the T-SQL batch terminator "GO" with the PostgreSQL batch terminator ";"
>
> Put this file somewhere safe, and now let's get the data.
>
> 5.Data
>
> Data is data.  It is brought over in text form and cast into it's proper form by the database according to the
datatypesyou used in creating your tables.  If you have binary data, I am the wrong guy to ask. 
>
> There are a couple gotchas here too, of course.  Since we use the COPY command, and it interprets a newline as the
endof a tuple, you need to clean out all those newlines lurking in your text fields in MS SQL Server.  This is easy
enoughto do.  Also, the data dump from MS SQL Server will use the standard cr/lf line terminator, which needs to be
changedto lf or it will cause havoc in comparisons of strings, among other problems.  I did this the easy way,
downloadingthe dumps to my machine running my favorite Unix-like operating system via ftp, which does this translation
foryou. 
>
> The first step in dumping the data out of MS SQL Server is to type all the names of your fields into a text file on
theWin32 machine.  You can cheat and issue: 
>
> "select name from sysobjects where type = 'U'"
>
> in Query Analyzer (ISQL-W) to get the list, then save the results to a file.  Then, write a handy little script to
callbcp, the Bulk Copy Program.  Mine looks like this: 
>
> set file [open "C:\\inetpub\\ftproot\\tablelist.txt" r]
> while {![eof $file]} {
>     set table [gets $file]
>     exec bcp <database>..$table out $table -c -k -S192.168.100.1 -Usa -Ppassword -r ~
> }
> close $file
>
> This will dump all the listed tables into files of the same name in the current directory.  The -c flag means to use
plaincharacter format.  The -k flag tells bcp to "keep nulls".  This is important later when we import the data.  The
-ris the "row terminator".  To make cleaning up the carriage returns easier, I use this to signal the end of a row.  I
putthis script in the C:\InetPub\ftproot directory, so I can go right to the next step. 
>
> From the Unix-like machine, start ftp and get the file listing you created earlier.  Put it in a work directory.
Changeto the new work directory and get the files: 
>
> ftp> lcd /home/homer/workdir
> Local directory now /home/homer/workdir
> ftp> fget tablelist.txt
>
> This should download all of the data files to the work directory, magically converting line terminators to Unix
compatibleformat.  If you can't use FTP, there are other ways to get files from here to there.  Just be advised that
youmay need a little sed script to fix the cr/lf problem. 
>
> Now, let's fix the embedded line feed issue.
>
> #!/usr/pkg/bin/tclsh
> set file [open tblnames r]
> set flist [read -nonewline $file]
> close $file
> set flist [split $flist \n]
> foreach f $flist {
>     set file [open $f r]
>     set data [read -nonewline $file]
>     close $file
>     regsub -all {\000} $data {} data
>     regsub -all {\n} $data \\\n data
>     regsub -all {~} $data \n data
>     set file [open $f w]
>     puts -nonewline $file $data
>     close $file
> }
>
> The regsub lines are where the work gets done.  They replace all nulls (\000) with an empty string, then all
linefeedswith a literal "\n" which will tell COPY what to do when we import the file, then my line terminators get
replacedwith a linefeed, which is what COPY is expecting.  There are cleaner and easier ways to do this, but you get
thepoint. 
>
> Now, go back to the sql file you edited to create your database objects.  I assume it is on the Unix-like box at this
point. It should have a series of CREATE TABLE statements, followed by ALTER TABLE and CREATE INDEX, etc statements.
Whatwe need to do now is tell it we want to load data after the tables are created, but before anything else. 
>
> For each CREATE TABLE statement, follow it with a COPY statment.  Something like
>
> COPY tablename FROM '/home/homer/workdir/tablename' with null as '';
>
> Once you have this done, execute it against your PostgreSQL database, something like
>
> $ psql newdb < modifiedscript.sql &> outfile
>
> should work.The output file is good to have for looking for problems.  It gets messy so
>
> $ cat outfile | grep ERROR
>
> can give you an idea how things went.  I guarantee you have some troubleshooting to do.
>
> 6.  Views
>
> Views are pretty easy, as long as you didn't use too many functions in them.  A favorite of mine is isnull().  Like
mostfunctions, it has a PostgreSQL counterpart, coalesce().  A surprising number of functions will work just fine.  For
example,round() is exactly the same.  datepart() becomes date_part(), but the arguments are the same, althought
PostgreSQLmay be more particular about format strings.  For example, SQL Server will accept datepart(yyyy, mydatefield)
aswell as datepart(year, mydatefield)  .  PostgreSQL wants to see date_part('year', mydatefield) (note single quotes). 
>
> Generating sql for views is pretty much the same as for tables.  From the Enterprise Manager, right click on your
databaseand select 'All Tasks', then 'Generate SQL Scripts' from the context menu.  Uncheck 'Script All Objects', and
select'All Views'.  On the 'Formatting' tab, de-select 'Generate DROP...'.  On the 'Options' tab, Select the 'MS-DOS'
fileformat, and make sure 'Create one file' is checked.  Click OK, give it a name, and put it somewhere you can find
it.
>
> Run this file through the same script you created to clean the sql for your tables, and see if it will work on
PostgreSQL. If not, you will have to do some fixing of functions. 
>
> 7.  Summary
>
> Converting a database from MS SQL Server is not always easy.  It is, however, always worth it.  You will find
PostgreSQLto be an extremely powerful and flexible product, with the best tech support in the world, the actual
developersand users of the product.  If you spent days trying to get xp_sendmail to work on SQL Server version 7.0, or
wonderedwhat was in those enormous "Service Packs" then you will appreciate this. 

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
   - Indira Gandhi

pgsql-docs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: PostgreSQL/JDBC Test Suite Howto
Next
From: Rene Pijlman
Date:
Subject: NULLs and sort order