Postgres vr.s MySQL- style differences? - Mailing list pgsql-advocacy
From | Brian Hurt |
---|---|
Subject | Postgres vr.s MySQL- style differences? |
Date | |
Msg-id | 465AE767.7030809@janestcapital.com Whole thread Raw |
Responses |
Re: Postgres vr.s MySQL- style differences?
(Andrew Sullivan <ajs@crankycanuck.ca>)
Re: Postgres vr.s MySQL- style differences? (Lukas Kahwe Smith <smith@pooteeweet.org>) Re: Postgres vr.s MySQL- style differences? (Jim Nasby <decibel@decibel.org>) |
List | pgsql-advocacy |
I'm still new to databases (though not to computing in general), and I'll admit that I haven't ever really used MySQL. But listening in to the various conversations comparing MySQL to Postgres, one recurring theme I've seen is that accessing the different databases requires different styles. And that many of the problems in switching from one database to the other is mainly a matter of problems with the style. Especially when the style is encoded in software. Take, for example, inserting large numbers of records into a table. As someone who was raised on Postgres (so to speak), when loading large numbers (more than 10 or so) records into a table I immediate start reaching for a copy command. And our code base is littered with copies- from simple programs that given a CSV file and a table name burst-load a table, to complicated libraries that use multithreaded queues and switch between copies and transactional inserts depending upon how much data needs to be copied. But someone used to MySQL would just start firing off lone, non-transactional insert statements, and their code base wouldn't contain such programs or libraries. Or take transactions. Our code operates on the assumption that the database handles concurrency just fine. And that the code can feel free to haul off and do whatever it likes- insert, delete, update, or select whatever data it feels like- whenever it feels like, without regard to whoever else is accessing the same data. Don't worry, the database will sort it all out for you. I'd be terrified to take this code base to a non-transactional database, because I have no idea where the synchronization issues are- they haven't been a performance issue, so no thought whatsoever has been given to them beyond "let the database handle it". But I don't think there are many, or that they'd be that hard to code around- if the code base had been written from the get go knowing that we couldn't rely on the database for concurrency, adjustments could be made. As a side note, given a choice between having to specially handle inserts vr.s specially handle synchronization, I'll take specially handling inserts any day. Or take the multi-table joins. I used to laugh at the idea of 27-table joins, until I found one in our code. I was doing a complicated query against a view, which was joining together several other views, which were joining together several other views, which were joining together views where were joining together tables. No, I'm not exagerating for comedic effect, the views really did go four layers deep. Why? Because Postgres just handled it, so there was no real pushback for making views of views of views of views. I only noticed because the complicated query was somewhat slow (not intolerably slow, just somewhat slow) and went investigating why. If we were working with a database that didn't handle multi-table joins nearly so well, we probably would have redesigned the table structure at some point to limit the depth of the views, and the number of tables in the query. My point here is to try to explain why moving code between MySQL and Postgres, in either direction, always seems fraught with difficulty. And why the new database always seems to be of less value than the old database. If you've already spent the time and effort to avoid synchronization problems outside the database, and to limit the number of tables you're joining in your queries, then not having to do these things has zero value. On the other hand, if you haven't already special cased inserts, having to now go back and special case inserts is a cost. Likewise, if you've already paid the cost to use copy instead of inserts on inserting, not having to do so is a minor advantage- but if you have given little thought to limiting the number of tables you join (barring performance problems), or synchronization issues (beyond an occassional begin and commit), you're looking at a large cost on switching. Just my $0.02. Brian
pgsql-advocacy by date: