Re: Need schema design advice - Mailing list pgsql-general

From Jeff Soules
Subject Re: Need schema design advice
Date
Msg-id c956da920810111811i574d89c9hc55577f50b1641bf@mail.gmail.com
Whole thread Raw
In response to Need schema design advice  (Matthew Wilson <matt@tplus1.com>)
List pgsql-general
On Sat, Oct 11, 2008 at 1:10 PM, Matthew Wilson <matt@tplus1.com> wrote:
> I need to track employees and their preferred locations, shifts, and
> stations.
>
> For example, I need to track that Alice prefers to work the morning
> shift at the west-side location, and she likes to work the cash-register
> station.
>
> Also, I need to track that Bob likes the west-side and north-side
> locations, likes the night shift, and likes the dishwasher station. Note
> the one-to-many relationship between Bob and his preferred locations.  I
> need to support that possibility.
>
> So, I see two ways to make my tables, and I'd like some advice.
>
> FIRST METHOD:
>
> create table preferred_location (
>    employee_id int references employee (id),
>    location_id int references location (id));
>
> create table preferred_shift (
>    employee_id int references employee (id),
>    shift int references shift (id));
>
> create table preferred_station (
>    employee_id int references employee (id),
>    station_id int references station (id));
>
> SECOND METHOD:
>
> create table preferences (
>
>    employee_id int references employee (id),
>    other_table_name text, /
>    other_table_id int));
>
> In the second method, I'd store tuples like this in the preferences
> table:
>
>    (<Alice's ID>, 'location', <west-side location ID>),
>    (<Alice's ID>, 'shift', <morning shift ID>)
>    (<Alice's ID>, 'station', <cash register station ID>)
>
> The nice thing about the second approach is I can extend this to store
> all sorts of preferences as I dream them up.  But on the downside, I
> don't have any FK constraints.
>
> I suspect this is a pretty common dilemma.  Any commentary from the
> experts on this list is welcome.
>
> Thanks in advance!
>
> Matt


I'm certainly not an expert, but hopefully my commentary will still be
somewhat helpful.

Your "method 2" is something called an Entity-Attribute-Value table
design[1].  There was a discussion on this list a couple weeks ago
about the merits and drawbacks of designing your tables this way.

Honestly, it probably depends on what your ultimate needs are.

As the "Downsides" section of the Wiki link [1] shows, most of the
problems with EAV really start to emerge when the tables get huge and
you're dealing with hundreds of thousands to millions of entities,
each with potentially hundreds of attribute-value pairs.  If you're
intending to roll out your application for every Starbucks on your
continent, that might start to be a problem.  (From my experience,
implementations like this over large data sets suffer a big
performance hit and carry a lot of data integrity baggage.)  If you're
talking about something for use in your chain of three internet cafes
around one town, and you aren't going to have more than a dozen
Attributes per Entity, it probably doesn't matter, because the
complications will be more manageable without screwing something up.

That said, by going the EAV/"Method-2" route, you're gaining
flexibility, but at the cost of increased complication, and ultimately
repurposing a relational database to do something that isn't very
database-like, that's really more like a spreadsheet.  (So why not
just use a spreadsheet?)  You have little room for recording
additional information, like ordering preferences, or indicating that
(say) a station preference depends on a location preference, or that a
shift time depends on day of the week, etc -- so you're probably not
getting as much flexibility as you think.  Sure, you could add an
"Extra_Data" column, so you have rows:
 Marie-Location-West-1,
 Marie-Location-East-2,
 Marie-Shift-Evening-Tuesday,
 Marie-Station-Register-West,
 Marie-Shift-Morning-Sunday,
etc.  But you can see the data integrity nightmare already, when you
somehow manage to record "Marie-Shift-Register-1".  Not to mention
that you'll have to do format conversions for that "Extra_Data" field,
and incorporate logic somewhere else in your program that deciphers
whatever's in the generic data field to come up with ordering
preferences for locations, station preferences by shift times, or
whatever else you want to store.

Essentially, in my humble opinion, you're putting off the problem of
thinking about the nature and structure of your data, and most
importantly, what you're going to use that data for, when ultimately
those are the decisions that should be guiding how you design and use
the database.  Particularly given that this sounds like a
management-efficiency project rather than one that your business (I'm
assuming it's your business) hinges upon, I am imagining that you have
time
to be sure about exactly what you want to do with the data.  If you
expect that your business needs will change dramatically over the
lifetime of the product, or you don't have time to make these
decisions now, then maybe the flexibility outweighs the drawbacks.

Good luck!


[1] See e.g. http://en.wikipedia.org/wiki/Entity-Attribute-Value_model

pgsql-general by date:

Previous
From: "Scott Marlowe"
Date:
Subject: Re: Need advice on PITR Slony(?) Backups etc.
Next
From: "Raj K"
Date:
Subject: Re: Reg: Permission error in Windows psql while trying to read sql commands from file