Re: Postgres SQL Syntax - Mailing list pgsql-general
From | Ted Byers |
---|---|
Subject | Re: Postgres SQL Syntax |
Date | |
Msg-id | 045801c74c91$a1911fc0$6701a8c0@RnDworkstation Whole thread Raw |
In response to | Postgres SQL Syntax ("Jim C." <jlistnews@gmail.com>) |
Responses |
Re: Postgres SQL Syntax
|
List | pgsql-general |
----- Original Message ----- From: "Merlin Moncure" <mmoncure@gmail.com> > > the open standard to convert data from one database to another, > unfortunately, is SQL. SQL is incomplete, illogical, and obscure, so > here we are. > The same can be said about any programming language, can it not? Even languages as powerful and expressive as C++ and Perl have elements that seem illogical. That they are incomplete is substantially demonstrated by the existence of the boost library, most of which ought to be added to the C++ standard library, and CPAN. But for perl, there is no standard so it is hard to say definitively where the language ends and developer's libraries begin. I am sure that all of the programming students I have taught have found C++ template metaprogramming obscure and very difficult to fathom, but I don't think it is possible at this time to beat Perl, and especially its object model, for obscurity. I will concede, though, that this impression may be an artifact of my being used to the object models in C++ and Java, and that therefore the perl object model just appears to me to be a bit weird. For the same reason, I find some aspects of SQL difficult to fathom. That may be because I am not as familiar with set theory as I am with algebra and calculus and analytic geometry, or it may be an artifact of the languages with which I am most comfortable. I don't know if you can do statistical analyses, such as nonlinear least squares, time series analysis r chi-squared tests within SQL, but when faced with such a problem I tend to follow the path of least resistance and export the data from the database into my client code and do whatever analysis I need using my C++ or Java code. But it there is a lot of data, I suspect there would be much less impact on network traffic, and probably better performance, if the analysis could be done in a SQL stored procedure. One of the more common analyses I need to do involves time series analysis, sometimes with some kind of smoothing (such as a moving average) applied before the analysis proper. > moving data from mysql to postgresql is easy...its the table schemas > that are tough. If you have the table schemas done, you can > When I design my databases, I normally assume that I may have to migrate the data from one RDBMS profuct to another, for whatever reason. Therefore, I have the ones I am most likely to have to support running here. I create a SQL script to create my databases, aiming to use the highest common factor across the RDBMS' SQL, and test it on each to verify that I succeeded in finding the higest common factor. I have, then, a simple perl script to execute the script, and it "knows" which tools to use based on a configuration file that has the specifics for using MySQL or Postgresql or MS SQL Server, and a commandline parameter that specifies which DB to use. > mysqldump --compatible=postgresql | psql > > which should work for 90% of tables, because mysql supports only a > subset of the types postgresql supports. > > the schemas are a bit trickier...you have to do them by hand or use a > conversion tool. one such tool is DTS. if you look around you might > find something else though. > There's many more than one way to skin a cat. My inclination is to take a brute force, albeit brain-dead, approach. One of my irritants is that none of the RDBMS products appear to support the same ways of importing data, e.g. from a flat file (CSV files, for instance). But at least they all do it in some way, and they all are able to export a table to, e.g. a csv file. Therefore, it becomes trivially easy to move data, even in the absence of a tool like DTS, by exporting the data to a CSV file and then reading that file into the target RDBMS. This is something I've done countless times. But, not having found a way to easily determine the schema programmatically, so my perl or Java or C++ code can be generic enough to apply to any of my databases, I find myself taking an hour or two to write a simple script, usually in Perl, to handle each table in sequence. There are, of course, things to watch, such as keeping the table creation statements in a different script than that which creates indices and especially foreign keys, so the tables are ready when the data is to be loaded, but the constraints are created after the data is loaded, purely for performance reasons. I learned the hard way that loading data slows down dramatically if the database has to continually check constraints, so I create constraints after loading valid data, and leave them in place only while new data is to be stored. I guess I am recommending that the schemas be done by hand, regardless of whether you're creating a new database or repairing or migrating an old one, even if the original developer wasn't considerate enough to create, or provide, an appropriate SQL script to create the database de novo. it seems to me to be risky to assume that the tools we like now will still exist ten years from now. Anyone remember IBM's OS/2? That was my favourite OS, and it was arguably better than any flavour of Windows available at the time. But even though I used it exclusively ten years ago, it is no longer a viable option for any machine I now use. I am sure anyone reading this who is of my generation could produce a long list of products that they used and liked which no longer exist or which are now no longer commercially viable. C++ is still a good option for many kinds of application development, but I don't think there are very many people using Watcom's compiler to produce commercial applications. SQL will likely exist ten years from now, but will our favourite RDBMS? I won't make that prediction, but I think I can minimize the risk by creating SQL scripts that use the highest common factor across the RDBMS products I have at my disposal. Therefore, I should be able to use my scripts, with minimal pain, regardless of what RDBMS products are available at that time. I do much the same for my C++ coding. I try to use the highest common factor in the language, as implemented by the suite of compilers I use, and keep the implementation specific stuff to a minimum in completely separate compilation units (and even in different directories). I just see it as inevitable that significant refactoring will be required, especially when migrating from one suite of tools to another, so I plan an architecture for my code that should make it as easy as possible, and similarly, for moving data around, I developed my approach to be as easy and reliable as possible, even if it is not the most elegant or efficient. Cheers, Ted
pgsql-general by date: