Thread: Postgres vr.s MySQL- style differences?

Postgres vr.s MySQL- style differences?

From
Brian Hurt
Date:
I'm still new to databases (though not to computing in general), and
I'll admit that I haven't ever really used MySQL.  But listening in to
the various conversations comparing MySQL to Postgres, one recurring
theme I've seen is that accessing the different databases requires
different styles.  And that many of the problems in switching from one
database to the other is mainly a matter of problems with the style.
Especially when the style is encoded in software.

Take, for example, inserting large numbers of records into a table.  As
someone who was raised on Postgres (so to speak), when loading large
numbers (more than 10 or so) records into a table I immediate start
reaching for a copy command.  And our code base is littered with copies-
from simple programs that given a CSV file and a table name burst-load a
table, to complicated libraries that use multithreaded queues and switch
between copies and transactional inserts depending upon how much data
needs to be copied.  But someone used to MySQL would just start firing
off lone, non-transactional insert statements, and their code base
wouldn't contain such programs or libraries.

Or take transactions.  Our code operates on the assumption that the
database handles concurrency just fine.  And that the code can feel free
to haul off and do whatever it likes- insert, delete, update, or select
whatever data it feels like- whenever it feels like, without regard to
whoever else is accessing the same data.  Don't worry, the database will
sort it all out for you.  I'd be terrified to take this code base to a
non-transactional database, because I have no idea where the
synchronization issues are- they haven't been a performance issue, so no
thought whatsoever has been given to them beyond "let the database
handle it".  But I don't think there are many, or that they'd be that
hard to code around- if the code base had been written from the get go
knowing that we couldn't rely on the database for concurrency,
adjustments could be made.

As a side note, given a choice between having to specially handle
inserts vr.s specially handle synchronization, I'll take specially
handling inserts any day.

Or take the multi-table joins.  I used to laugh at the idea of 27-table
joins, until I found one in our code.  I was doing a complicated query
against a view, which was joining together several other views, which
were joining together several other views, which were joining together
views where were joining together tables.  No, I'm not exagerating for
comedic effect, the views really did go four layers deep.  Why?  Because
Postgres just handled it, so there was no real pushback for making views
of views of views of views.  I only noticed because the complicated
query was somewhat slow (not intolerably slow, just somewhat slow) and
went investigating why.  If we were working with a database that didn't
handle multi-table joins nearly so well, we probably would have
redesigned the table structure at some point to limit the depth of  the
views, and the number of tables in the query.

My point here is to try to explain why moving code between MySQL and
Postgres, in either direction, always seems fraught with difficulty.
And why the new database always seems to be of less value than the old
database.  If you've already spent the time and effort to avoid
synchronization problems outside the database, and to limit the number
of tables you're joining in your queries, then not having to do these
things has zero value.  On the other hand, if you haven't already
special cased inserts, having to now go back and special case inserts is
a cost.  Likewise, if you've already paid the cost to use copy instead
of inserts on inserting, not having to do so is a minor advantage- but
if you have given little thought to limiting the number of tables you
join (barring performance problems), or synchronization issues (beyond
an occassional begin and commit), you're looking at a large cost on
switching.

Just my $0.02.

Brian


Re: Postgres vr.s MySQL- style differences?

From
Andrew Sullivan
Date:
On Mon, May 28, 2007 at 10:29:59AM -0400, Brian Hurt wrote:
> different styles.  And that many of the problems in switching from one
> database to the other is mainly a matter of problems with the style.
> Especially when the style is encoded in software.

Right.  This is not only true of PostgreSQL-MySQL comparisons,
though.  Even with databases that are rather more compliant with the
ANSI specifications than MySQL, there are plenty of details that you
have to pay attention to.  (MySQL is a special case, really, because
in its early history it actively disparaged many of the features that
non-toy databases took to be obvious goods, like referential
integrity and ACID.)

I would argue that IBM tries pretty hard to conform with the ANSI SQL
spec in DB2, for instance.  Yet moving from Postgres DB2 (or
conversely) would not be trivial, if you wanted to preserve your
optimizations.  DB2's approach to concurrency causes what is called
lock escalation.  In most cases, it works well, and the DB2 people
are _really smart_ and have implemented impressive algorithms to
handle this.  But if you're used to MVCC, the chances are pretty good
under DB2 that you'll eventually run into lock starvation, and find
that you have strange occasions of suddenly near-stopped performance
unless you go through your application and port it very carefully.

This is why many applications that have some portability layer for
the database seem to be so awful at using the database: they can't
use any of the strengths of any of the underlying databases, because
those strengths are usually the thing that some other database is
really bad at.  So you often end up with an application that isn't
truly stellar at anything, and your database system has to be
oversized to perform acceptably.  (Note that there's nothing
intrinisically wrong with this approach; you've just optimised your
code for portability at the expense of performance.)

SQL really gives a minimal, rather than maximal portability.  It's
nice that PostgreSQL tends to conform with SQL, though, because at
least if you did have to move in or (less likely :) out of it, your
reference point is fairly well-defined.

A
--
Andrew Sullivan  | ajs@crankycanuck.ca
The whole tendency of modern prose is away from concreteness.
        --George Orwell

Re: Postgres vr.s MySQL- style differences?

From
Lukas Kahwe Smith
Date:
Brian Hurt wrote:

>
> Take, for example, inserting large numbers of records into a table.  As
> someone who was raised on Postgres (so to speak), when loading large
> numbers (more than 10 or so) records into a table I immediate start
> reaching for a copy command.  And our code base is littered with copies-
> from simple programs that given a CSV file and a table name burst-load a
> table, to complicated libraries that use multithreaded queues and switch
> between copies and transactional inserts depending upon how much data
> needs to be copied.  But someone used to MySQL would just start firing
> off lone, non-transactional insert statements, and their code base
> wouldn't contain such programs or libraries.

Just to make sure that nobody gets confused here. MySQL has LODA DATA
INFILE as their COPY alternative, which performs marvelously.

regards,
Lukas

Re: Postgres vr.s MySQL- style differences?

From
Lukas Kahwe Smith
Date:
Andrew Sullivan wrote:

> This is why many applications that have some portability layer for
> the database seem to be so awful at using the database: they can't
> use any of the strengths of any of the underlying databases, because
> those strengths are usually the thing that some other database is
> really bad at.  So you often end up with an application that isn't
> truly stellar at anything, and your database system has to be
> oversized to perform acceptably.  (Note that there's nothing
> intrinisically wrong with this approach; you've just optimised your
> code for portability at the expense of performance.)

Yup, the tend to heavily rely on data integrity management on the middle
tier. Which by the way is not totally crazy, because you need most of
the data integrity rules on the frontend anyways, to generate proper
GUI's. Actually in non object-relational RDBMS (which are the norm,
PostgreSQL is special), you can never do your full integrity management
inside the database. Which is all kind of ok, as long as you never mess
with the database through anything but your single middle tier
implementation.

My point is, there are solid reasons for writing applications that use
the database as a mostly stupid data store with very efficient retrieval
and these applications tend to also perform pretty well regardless of
what RDBMS you stick underneath.

regards,
Lukas

Re: Postgres vr.s MySQL- style differences?

From
Andrew Sullivan
Date:
On Mon, May 28, 2007 at 09:37:53PM +0200, Lukas Kahwe Smith wrote:

> inside the database. Which is all kind of ok, as long as you never mess
> with the database through anything but your single middle tier
> implementation.

That's a pretty big if.  It's part of the reason I really dislike the
idea of moving integrity work out of the db and into the application.
For it's not only ensuring the data only gets updated by the
application that you need worry about; you also have to make sure
that all the application programmers (and there will be more than
one, for any application that lasts any length of time or does
anything non-trivial) do all the integrity stuff in the same way.
And they rarely do.  The whole point of centralising your data is not
only to share it; it's to share it in a consistent way.  Otherwise,
why not just dump it on the filesystem?

> My point is, there are solid reasons for writing applications that use
> the database as a mostly stupid data store with very efficient retrieval
> and these applications tend to also perform pretty well regardless of
> what RDBMS you stick underneath.

RDBMS are often not actually very efficient retrieval engines.  If
that's what you need, straight filesystem access is a better bet.
The _R_ in RDBMS is there for a reason, and if you're not going to
use your database in a relational way, then you don't need an RDBMS.
You _can_ use the RDBMS as a mostly stupid data store, and it likely
won't break anything.  But it seems a waste of computing cycles to
me.  Often, the "stupid datastore" systems I've seen look like
they've used SQL because that was the hammer the programmer had,
rather than the right tool for the job.  (This is optimising for
developer learning curve rather than anything else.  Also a
legitimate trade-off, if you want to make it, so long as you know
you're doing so.)

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
I remember when computers were frustrating because they *did* exactly what
you told them to.  That actually seems sort of quaint now.
        --J.D. Baldwin

Re: Postgres vr.s MySQL- style differences?

From
Lukas Kahwe Smith
Date:
Andrew Sullivan wrote:

> RDBMS are often not actually very efficient retrieval engines.  If

Yeah, well but it is more convinient, than writing your own :)

regards,
Lukas

Re: Postgres vr.s MySQL- style differences?

From
Jim Nasby
Date:
On May 28, 2007, at 2:37 PM, Lukas Kahwe Smith wrote:
> Andrew Sullivan wrote:
>> This is why many applications that have some portability layer for
>> the database seem to be so awful at using the database: they can't
>> use any of the strengths of any of the underlying databases, because
>> those strengths are usually the thing that some other database is
>> really bad at.  So you often end up with an application that isn't
>> truly stellar at anything, and your database system has to be
>> oversized to perform acceptably.  (Note that there's nothing
>> intrinisically wrong with this approach; you've just optimised your
>> code for portability at the expense of performance.)
>
> Yup, the tend to heavily rely on data integrity management on the
> middle tier. Which by the way is not totally crazy, because you
> need most of the data integrity rules on the frontend anyways, to
> generate proper GUI's. Actually in non object-relational RDBMS
> (which are the norm, PostgreSQL is special), you can never do your
> full integrity management inside the database.

Can you give an example of that, because I can't think of one, unless
you're talking about integrity that has to lie outside the database
(ie: if you're storing filesystem locations in the database).

> Which is all kind of ok, as long as you never mess with the
> database through anything but your single middle tier implementation.

I do a lot of work on systems where it would be impractical or
impossible to do everything through middleware. The other issue is
that you have to be very, very careful how you do integrity checking
in the middleware if you want to avoid things like race conditions.
The database not only gives that to you for free, but it's also
extremely efficient at it.

> My point is, there are solid reasons for writing applications that
> use the database as a mostly stupid data store with very efficient
> retrieval and these applications tend to also perform pretty well
> regardless of what RDBMS you stick underneath.

Perhaps if you carefully hand-code your middleware... my experience
is that things like Hibernate tend to destroy performance because
unless you really, really know it well you end up with some pretty
bad database access methods. It doesn't have to be that way, but it
often is.

Put another way, if a database is a foot-gun when it comes to
performance, middleware is a foot-bazooka. :)
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



Re: Postgres vr.s MySQL- style differences?

From
Jim Nasby
Date:
On May 28, 2007, at 9:29 AM, Brian Hurt wrote:
> I'm still new to databases (though not to computing in general),
> and I'll admit that I haven't ever really used MySQL.  But
> listening in to the various conversations comparing MySQL to
> Postgres, one recurring theme I've seen is that accessing the
> different databases requires different styles.  And that many of
> the problems in switching from one database to the other is mainly
> a matter of problems with the style.  Especially when the style is
> encoded in software.
>
> Take, for example, inserting large numbers of records into a
> table.  As someone who was raised on Postgres (so to speak), when
> loading large numbers (more than 10 or so) records into a table I
> immediate start reaching for a copy command.  And our code base is
> littered with copies- from simple programs that given a CSV file
> and a table name burst-load a table, to complicated libraries that
> use multithreaded queues and switch between copies and
> transactional inserts depending upon how much data needs to be
> copied.  But someone used to MySQL would just start firing off
> lone, non-transactional insert statements, and their code base
> wouldn't contain such programs or libraries.

And that someone would have horrid performance in *any* ACID RDBMS.
This is one of the things I hate about MySQL: it teaches people how
to abuse an RDBMS.

Of course, that came around and bit them in the rear when they put
InnoDB in: since it's (mostly) ACID, inserts suddenly started to
suck. So to make their users happy they then added a bunch of knobs
that let you dial-back the ACIDity.

> Or take the multi-table joins.  I used to laugh at the idea of 27-
> table joins, until I found one in our code.  I was doing a
> complicated query against a view, which was joining together
> several other views, which were joining together several other
> views, which were joining together views where were joining
> together tables.  No, I'm not exagerating for comedic effect, the
> views really did go four layers deep.  Why?  Because Postgres just
> handled it, so there was no real pushback for making views of views
> of views of views.  I only noticed because the complicated query
> was somewhat slow (not intolerably slow, just somewhat slow) and
> went investigating why.  If we were working with a database that
> didn't handle multi-table joins nearly so well, we probably would
> have redesigned the table structure at some point to limit the
> depth of  the views, and the number of tables in the query.

I have a story about the small-time mentality that MySQL and some
other tools generate. Someone posted what he didn't like about Ruby
On Rails on a Joel On Software forum. One of the beefs was that rails
won't pick up RI information from the database, because the author of
rails has this hair-brained idea that the database should be nothing
but a stupid data dump (if that's really what he wanted, why the heck
didn't just use flat files??) Of course, this resulted in a big
flamewar between what were largely professional DBAs (from various
RDBMS backgrounds) and the rails fanboys. One of the threads was
talking about what features you need when you have large numbers of
tables... to which one of the rails fans replied:

"I guess I could see that if you had a *really* complex database,
with say, 30 tables."

> My point here is to try to explain why moving code between MySQL
> and Postgres, in either direction, always seems fraught with
> difficulty.  And why the new database always seems to be of less
> value than the old database.  If you've already spent the time and
> effort to avoid synchronization problems outside the database, and
> to limit the number of tables you're joining in your queries, then
> not having to do these things has zero value.  On the other hand,
> if you haven't already special cased inserts, having to now go back
> and special case inserts is a cost.  Likewise, if you've already
> paid the cost to use copy instead of inserts on inserting, not
> having to do so is a minor advantage- but if you have given little
> thought to limiting the number of tables you join (barring
> performance problems), or synchronization issues (beyond an
> occassional begin and commit), you're looking at a large cost on
> switching.

Switching databases always has some kind of cost, because none of
them work the same. The problem with MySQL is that it has the ability
to act *very* different from any other database. Shunning ANSI SQL is
one example, but there's things like silently truncating data, mis-
handling nulls, etc. (google:"mysql gotchas" for a good starting
list). Similarly, moving from DB2, MSSQL, or Sybase to PostgreSQL or
Oracle is easier than the other way around, because PostgreSQL and
Oracle are MVCC, which means you spend *far* less time worrying about
locking issues. With non-MVCC databases, you have to be very
conscious of what the access pattern is for each table and make sure
not to break that.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



Re: Postgres vr.s MySQL- style differences?

From
Lew
Date:
Jim Nasby wrote:
> I do a lot of work on systems where it would be impractical or
> impossible to do everything through middleware. The other issue is that
> you have to be very, very careful how you do integrity checking in the
> middleware if you want to avoid things like race conditions. The
> database not only gives that to you for free, but it's also extremely
> efficient at it.

Data integrity in the data just seems to make sense on the face of it.
Consider enterprise databases, with many different applications accessing any
given database, some of which reside outside the hosting enterprise.  It is
unreasonable if not impossible to route all such applications through a single
(choke-) point on the way to the database, another single point.  As Jim says,
enterprise-scale DMBSes are good at handling this sort of load.

> Perhaps if you carefully hand-code your middleware... my experience is
> that things like Hibernate tend to destroy performance because unless
> you really, really know it well you end up with some pretty bad database
> access methods. It doesn't have to be that way, but it often is.

I've seen similar difficulties with OO-relational frameworks.

The trap of such a framework is that getting to "really, really know it well"
requires at least as much database expertise as without the framework, plus
you have to know the framework even better.  Even then you still often "end up
with some pretty bad database access"; best case is you get a subset of the
database system's power for an increase in application complexity. This is the
irony lost on the proponents of these frameworks: "If you work really, really
hard it'll save you effort, except in development or maintenance."

--
Lew

Re: Postgres vr.s MySQL- style differences?

From
Andrew Sullivan
Date:
On Tue, May 29, 2007 at 09:18:23AM -0400, Lew wrote:
> still often "end up with some pretty bad database access"; best case is you
> get a subset of the database system's power for an increase in application
> complexity. This is the irony lost on the proponents of these frameworks:
> "If you work really, really hard it'll save you effort, except in
> development or maintenance."

Well, maybe not.  Remember, there are lots of ways to optimise your
code design.  One is to design for the tools you have and know how to
use.  If you have a bunch of developers who know Beans (sorry, I
couldn't resist), and you have six weeks to deliver functionality
that oughta take six months to do properly, then you just throw
together the things you know how to do.  This makes for much less
good code, of course, and will cost in the long run.  But for
immediate-term problems, it might be a good trade-off.  (There are
plenty of companies who then never do step two, which is to throw all
that away and do the job properly.  But that's just bad management,
and is not an argument that the original trade-off was necessarily a
bad one.)

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
The plural of anecdote is not data.
        --Roger Brinner

Re: Postgres vr.s MySQL- style differences?

From
Lukas Kahwe Smith
Date:
Jim Nasby wrote:

>> Yup, the tend to heavily rely on data integrity management on the
>> middle tier. Which by the way is not totally crazy, because you need
>> most of the data integrity rules on the frontend anyways, to generate
>> proper GUI's. Actually in non object-relational RDBMS (which are the
>> norm, PostgreSQL is special), you can never do your full integrity
>> management inside the database.
>
> Can you give an example of that, because I can't think of one, unless
> you're talking about integrity that has to lie outside the database (ie:
> if you're storing filesystem locations in the database).

Yes, but more trivially since like "is this a valid email address" (note
I said this is not the case with object-relational databases like
PostgreSQL). Checking for the format of an email address is just
something I would want to do with a check constraint (especially if I
have to replicate that check constraint in every relevant column).

> Perhaps if you carefully hand-code your middleware... my experience is
> that things like Hibernate tend to destroy performance because unless
> you really, really know it well you end up with some pretty bad database
> access methods. It doesn't have to be that way, but it often is.

Yeah, I think at the very least ORM's make it less obvious when you are
calling something expensive. It all happens sort of magical and you do
not really know when something is going to cost you and if so how much.

regards,
Lukas

Re: Postgres vr.s MySQL- style differences?

From
David Fetter
Date:
On Tue, May 29, 2007 at 07:26:51PM +0200, Lukas Kahwe Smith wrote:
> Jim Nasby wrote:
>
> >>Yup, the tend to heavily rely on data integrity management on the
> >>middle tier. Which by the way is not totally crazy, because you
> >>need most of the data integrity rules on the frontend anyways, to
> >>generate proper GUI's. Actually in non object-relational RDBMS
> >>(which are the norm, PostgreSQL is special), you can never do your
> >>full integrity management inside the database.
> >
> >Can you give an example of that, because I can't think of one,
> >unless you're talking about integrity that has to lie outside the
> >database (ie: if you're storing filesystem locations in the
> >database).
>
> Yes, but more trivially since like "is this a valid email address"

This is where PostgreSQL really shines.  You can create a DOMAIN
constraint that checks whether an address is valid all the way up to
checking whether it's accepted for delivery, although I'd imagine you
wouldn't want to do that last.  You can then use that DOMAIN all over
your code.

> >Perhaps if you carefully hand-code your middleware... my experience
> >is that things like Hibernate tend to destroy performance because
> >unless you really, really know it well you end up with some pretty
> >bad database access methods. It doesn't have to be that way, but it
> >often is.
>
> Yeah, I think at the very least ORM's make it less obvious when you
> are calling something expensive.  It all happens sort of magical and
> you do not really know when something is going to cost you and if so
> how much.

All this magic means that you're taking on a larger hassle, more
subtle bugs, etc., etc. for no long-term benefit.

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate

Re: Postgres vr.s MySQL- style differences?

From
Lukas Kahwe Smith
Date:
David Fetter wrote:
> On Tue, May 29, 2007 at 07:26:51PM +0200, Lukas Kahwe Smith wrote:
>> Jim Nasby wrote:
>>
>>>> Yup, the tend to heavily rely on data integrity management on the
>>>> middle tier. Which by the way is not totally crazy, because you
>>>> need most of the data integrity rules on the frontend anyways, to
>>>> generate proper GUI's. Actually in non object-relational RDBMS
>>>> (which are the norm, PostgreSQL is special), you can never do your
>>>> full integrity management inside the database.
>>> Can you give an example of that, because I can't think of one,
>>> unless you're talking about integrity that has to lie outside the
>>> database (ie: if you're storing filesystem locations in the
>>> database).
>> Yes, but more trivially since like "is this a valid email address"
>
> This is where PostgreSQL really shines.  You can create a DOMAIN
> constraint that checks whether an address is valid all the way up to
> checking whether it's accepted for delivery, although I'd imagine you
> wouldn't want to do that last.  You can then use that DOMAIN all over
> your code.

I know! I know!
/me hugs PostgreSQL

Actually while thinking of an example for my previous post I simply used
the "email validation" example from Roberts pl/php talk.

regards,
Lukas

Re: Postgres vr.s MySQL- style differences?

From
Lew
Date:
Andrew Sullivan wrote:
> Remember, there are lots of ways to optimise your
> code design.  One is to design for the tools you have and know how to
> use.  If you have a bunch of developers who know Beans (sorry, I
> couldn't resist), and you have six weeks to deliver functionality
> that oughta take six months to do properly, then you just throw
> together the things you know how to do.  This makes for much less
> good code, of course, and will cost in the long run.  But for
> immediate-term problems, it might be a good trade-off.  (There are
> plenty of companies who then never do step two, which is to throw all
> that away and do the job properly.  But that's just bad management,
> and is not an argument that the original trade-off was necessarily a
> bad one.)

I completely agree.  And laugh at your pun.

My post had to do with the ideal; yours focuses on the pragmatic.

If your one-off approach employs standard patterns (like JavaBeans) and
programmers steeped in a culture of excellence, the refactoring to a
full-blown architecture is much less painful.

My own practices are slighlty schizoid - I aim for technical elegance but
frequently make compromises for expediency.

--
Lew