Re: DB design advice: lots of small tables? - Mailing list pgsql-general

From Kevin Grittner
Subject Re: DB design advice: lots of small tables?
Date
Msg-id 1363646970.31085.YahooMailNeo@web162906.mail.bf1.yahoo.com
Whole thread Raw
In response to Re: DB design advice: lots of small tables?  (Shaun Thomas <sthomas@optionshouse.com>)
List pgsql-general
Shaun Thomas <sthomas@optionshouse.com> wrote:
> On 03/15/2013 08:36 AM, Kevin Grittner wrote:
>
>> I occasionally hear someone maintaining that having a meaningless
>> sequential ID column as the primary key of each table is required
>> by the relational model.
>
> You know, I've heard you mention this a couple times, and I still don't
> understand why you maintain such a position. Artificial the linkages may be, but
> unless I'm missing something, the level of abstraction is often necessary to
> avoid excessive storage requirements and bulky foreign keys.
>
> For me, it comes down to two things:
>
> 1. Copy a "natural" arbitrary-length text string possibly millions of
> times, or use a 32/64-bit integer lookup value.
> 2. Use multiple natural columns in a primary key, necessitating copying all
> columns to child tables for foreign keys, or use one surrogate key for both.

When I was working for the Wisconsin Courts the Circuit Court
databases had some tables which were maintained by a central Court
Operations group and some tables holding data generated by each
county.  The primary key of each county maintained table included a
county number (smallint).  Most county-maintained tables were
related to court cases, and included a court case number as part of
the primary key (varchar(14)).  Beyond that, other fields were
added to each table, and where there was a natural heirarchy to the
data the "child" normally had the PK of its parent and something to
make it unique within that set.  Often this was a unique number
that started at 1 for each parent.  Some of these tables, when
replicated to the central state-wide site, had hundreds of millions
of rows.

You would probably be surprised how many queries were able to
optimize much better than they could have with a single-column ID
in each table.  The plans often surprised me, and when I would
force the planner to take the plan which seemed obvious to me, 90%
of the time the planner had me beat.  The other times often
suggested new indexes or adjustments to cost factors which
benefited the whole workload.

> Yes, it complicates the schema.

That's not the point.

> But until someone comes up with a database storage method that automatically deduplicates stored data, I
> can never advocate using arbitrary strings as natural keys, no matter how unique
> and meaningful they are.

If micro-managing disk space usage it the primary concern,
single-column synthetic keys are likely to win in many
circumstances.  I have no doubt that there are cases where it can
benefit performance.  I can guarantee you that many of the queries
we ran at Wisconsin Courts would have been much slower with such
synthetic keys because they limit the available plans and force
some unnatural "navigation".  I know because I saw cases where
people forced the type of navigation the planner would need to do
if synthetic keys were used, and the result was performance orders
of magnitude worse.

> Maybe I just say that because I was burned by just such a column in a previous
> engagement. It was 64-characters of arbitrary text, and was used as a lookup
> value for dozens of tables. Had it been mapped to a "meaningless"
> surrogate key, several tables would have been halved (or more) in size.
> Don't even get me started on indexing that horrible monstrosity in every
> table it lived in.

That is getting pretty extreme.

> Small tables? Yeah, whatever. I don't care. But anything that has the
> potential to be duplicated millions of times? You better bet that's going to
> be a serial identifier.

Like I said, Wis Cts. has a table that has hundreds of millions of
rows and the primary key is a smallint, a varchar(14), and another
smallint.  You can poke around in it at this site by drilling down
on the "Court Record Events" button for a case:

http://wcca.wicourts.gov/

> Now, you've been doing this longer than I have, in a wider array of
> contexts, so you clearly have some perspective I don't. From where I'm
> sitting though, I don't get the barely suppressed rage. ;)

Oh, every time a programmer who had been working in Access or MySQL
was hired, I had to have this discussion all over again.  You're
ripping the scabs off the old wounds from those battles.  :-)  I usually
had to pull out plans from complex (and fast!) queries to make my
point.

The other thing is that it brings back memories from the early '80s
of working in a database product called TOTAL which required
explicit navigation over links.  Relational technology was such a
breath of fresh air compared to that, I dread a pendulum swing back
to that.

That's been my experience, anyway.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-general by date:

Previous
From: Wasim Arif
Date:
Subject: Roadmap for Postgres on AIX
Next
From: John R Pierce
Date:
Subject: Re: Roadmap for Postgres on AIX