Raymond Chui writes:
> I am wonder why most of people choose MySQL in Linux
> world rather than PostgreSQL?
I normally stay out of the advocacy wars, but as a recent convert I
figure I'll pipe up.
I switched to PostgreSQL as the back-end to my services at
http://www.flutterby.com/ from MySQL when I was collaborating on a
project with a "data warehousing" person who came from a larger
systems background, where I came from a microcomputer background.
I've been seriously programming microcomputers for a decade and a
half. One of the great accomplishments of my career was shoehorning a
big database application into a 4.77 MHz PC XT under that 640k limit,
including the bigger versions of DOS and the Novell drivers. 30 meg
was a *huge* database. Assembly language and special case hacks to
CTREE (which we only used part of because the record management was
more inefficient than we could tolerate) were the norm.
When we were struggling over linker paging issues and 500 bytes here
and there the idea of wastefully having the database functionality
uncoupled from the application just wasn't an option. So referential
integrity and update triggers and transactions all those other things
that a real database system provides had to be special-cased in the
application code.
Since then I've been a games programmer, and then on to a rendering
system where we were extremely careful about space in data structures
because we were bumping up against 2 gig physical RAM limits, and the
application was running on farms of a few thousand processors.
When you're shipping a hundred thousand copies of software, hardware
costs are leveraged very hard. Saving 10 dollars in minimum hardware
requirements gives you a million bucks to spend on programming.
MySQL blends that philosophy (provide the basic services as fast and
lightweight as possible, offload the thinking on to the programmer)
with a few extremely useful data types (ENUM, longer strings and
BLOBs). It's not really a database manager, more a loosely coupled
system of tables.
However, good programmers are hard to find, the world is heading
towards a much more client-server way of thinking, and hardware is
cheaper than sand and getting more so. Now that I code for the web
it's much more reasonable to suggest that someone spend an extra $10k
on a server than that they employ a $100k+/year coder.
PostgreSQL comes from the latter camp.
The benchmarks that show PostgreSQL outperforming MySQL generally talk
about joins between 10 or more tables. This is not something that the
microcomputing side of me would even consider. When first implemented
for MySQL, my simple comment system for my personal web page tried to
intelligently cache the sort of lookups that would usually be done
with joins because that's the way I'm used to seeing things.
My impression of the PostgreSQL philosophy is that ENUM is irrelevant
because you just throw that definition in another table.
In many web apps transactions just don't matter, generally you've got
to order inserts so that other things can link to the new sequence
numbers anyway, and disk space is cheaper than testing out delete
routines (Oracle users I've talked to, and the (in)famous Philip
Greenspun in one of his talks in Berkeley, have said "Set a flag in
the record and make your queries check DELETED='N', don't ever delete,
that'll make your tables unstable"). In the sequence of things, if
that final INSERT that makes the user's message visible fails the user
will try again.
It takes a while to make the conceptual shift to thinking about how to
implement the structure of the database completely inside the database
manager rather than letting the application manage integrity.
Finally, version 7 is fairly recent, and it's the first version of
PostgreSQL that's really rich enough to be usable for quick and dirty
projects. Even at that I'm hoping that none of my users post comments
longer than 8k and holding off on converting a few other bits of my
website over to SQL 'til 7.1.
And in the process of changing my thinking I've almost ripped out
PostgreSQL and gone back to MySQL several times. Changing my code to
use different syntax for "INSERT" and "UPDATE" ("SET" versus "VALUES")
was a pain in the tail. The huge latency on connections meant that I
had to get mod_perl installed and tuned, which broke some legacy
CGI. Having to write TIMESTAMP functionality that MySQL included from
scratch meant yet another stumble in the learning curve, and I still
need to rework some of the gross places I used the wrong ENUM
workarounds while I was learning.
While I'm now convinced that PostgreSQL is the way for me to go, this
was a long uphill battle driven by my respect for people who had
different backgrounds than I, and my xenophilia for different ways of
thinking.
That momentum in the coder populace at large is going to take a while
to overcome.
Dan