Re: young guy wanting (Postgres DBA) ammo - Mailing list pgsql-general

From Greg Smith
Subject Re: young guy wanting (Postgres DBA) ammo
Date
Msg-id Pine.GSO.4.64.0711020735280.7194@westnet.com
Whole thread Raw
In response to young guy wanting (Postgres DBA) ammo  (Kevin Hunter <hunteke@earlham.edu>)
Responses Re: young guy wanting (Postgres DBA) ammo  (Ted Byers <r.ted.byers@rogers.com>)
List pgsql-general
On Fri, 2 Nov 2007, Kevin Hunter wrote:

> I don't have "ammo" to defend (or agree?) with my friend when he says
> that "Postgres requires a DBA and MySQL doesn't so that's why they
> choose the latter."

A statement like this suggests a fundamental misunderstanding of what a
DBA does, and unfortunately for you that means you're stuck with educating
them as to why they don't even understand the concept--which is
particularly tough when you're not a DBA yourself.

The job of a DBA is to make sure the data you're storing in the database
is safe and that the system as a whole performs fast enough to keep up
with demand.  If your data is so trivial that it doesn't really matter
whether the data stays intact or gets corrupted, and there are no
performance requirements to meet, then you don't need someone operating as
a DBA; in every other case, you do.

It's simple to setup MySQL with the default configuration running such
trivial workloads, giving the impression you've built a system that works
fine.  There are a number of ways this default setup can end up with
corrupted data one day.  As mentioned in the paper you've already read,
it's possible to setup recent MySQL versions to run in the new strict
modes with the right type of engine such that it has reasonable standards
for data integrity.  Actually doing that work _correctly_ will require a
DBA, but since it's possible not to do it at all and have things appear to
work, many people walk away thinking they didn't need someone acting in
that role at all.

PostgreSQL defaults to high standards for data integrity and as a result
you can't avoid being exposed to some amount of fighting with the
inevitable ramifications of that.  An example already thrown out here is
that you must do some amount of initially frustrating configuration in
order to even get users to login the way people expect.  Another one on
the performance side is that you'll be forced to understand the trade-offs
in how vacuuming works in PostgreSQL in order to keep your system running
acceptably.  It's not possible to run a secure database on a larger scale
without going through these sort of exercises.  But if you don't care
about security and never reach a large scale, you could get the impression
that this work was a waste of time, and that the database that forced you
to go through it was unreasonably difficult to setup without a DBA.

To step back for a second, the software industry as a whole is going
through this phase right now where programmers are more empowered than
ever to run complicated database-driven designs without actually having to
be DBAs.  It used to be that you "needed a DBA" for every job like this
because they were the only people who knew how to setup the database
tables at all, and once they were involved they also (if they were any
good) did higher-level design planning, with scalabilty in mind, and
worried about data integrity issues.

Software frameworks like Ruby on Rails and Hibernate have made it simple
for programmers to churn out code that operates on databases without
having the slightest idea what is going on under the hood.  From a
programmer's perspective, the "better" database is the one that requires
the least work to get running.  This leads to projects where a system that
worked fine "in development" crashes and burns once it reaches a
non-trivial workload, because if you don't design databases with an eye
towards scalability and integrity you don't magically get either.

The sad part is that it's nearly impossible to educate people going
through this process what they're doing wrong.  Human nature is such that
until you've had a day where sloppy setup caused you to lose a gigantic
amount of data, spending some time with that sick feeling in your stomach
that everyone who has been through this knows, it's hard to ever reach the
level of paranoid necessary to be a successful DBA.  Until you've fought
to try and speed up a database application where data normalization is the
only way to solve the fundamental problem causing the slowdown, it's
impossible to truly appreciate why you should consider design tradeoffs in
that area from day one.  Can you build a database without someone who has
been through these experiences?  Sure.  That doesn't mean it's a good
idea.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-general by date:

Previous
From: "none none"
Date:
Subject: index on array?
Next
From: "John D. Burger"
Date:
Subject: Re: select random order by random