On Sat, 3 Nov 2007, Ted Byers wrote:
> As one of these programmers, where is the best place to find the
> information I need to get it right...I ask you where I can learn what
> you believe a good DBA needs to know.
What a DBA should know in general is a little different from the question
I think you want an answer to, which is "what should a programmer know so
that they can effectively work like/without a DBA?"
There's an academic answer to that question. I could tell you to learn
something about data normalization, indexing, what happens on the server
when you join two tables, and how cursors work in your language of choice.
But without some practice, I don't know how much of that would stick.
The most valuable exercise I think someone with a good programming
background, but is relatively new to databases, can go through is to work
on a dramatically larger data set than you would normally encounter. The
main thing I've seen developers do wrong is writing code or designing
tables that don't scale well. Since coding works better when you can get
quick feedback after changes, it's very easy to settle into working with
only small test cases, and that can turn into a serious problem when such
code runs into the real world.
The only way to really understand how to think more like a DBA is to try
and write something that works well against a big pile of data. To throw
out some simple guidelines, you want to be working with a database that's
at least 10X as big as the amount of RAM on your system, and if you do
something that scans the full table like "select * from x" that should
take at least a couple of minutes to complete.
Now, try to build a program that operates quickly on subsets of this data.
Working on this scale will let you discover very quickly if you've made
any of the blatant mistakes that programmers unfamiliar with working on
full-size data sets tend to make. Most importantly, you don't ever want
to write queries that assume it's OK to send all the data to the client to
sort through, rather than pushing as much as possible toward the database
server. Second, you'll end up needing to construct indexes properly to
optimize the server side. And if there's more than one big table
involved, you'll be forced to come to grips with how joins work and
probably learn something about normalization.
You'll know you're learning something valuable whenver you run something
that you expect to return almost instantly, but instead it churns away for
minutes before finishing. Resist the urge to stop it too quickly, and
instead spend that time thinking about what's gone wrong, or monitoring
the client and/or server for clues.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD