Re: There can be only one! How to avoid the "highlander-problem". - Mailing list pgsql-general
From | Erwin Brandstetter |
---|---|
Subject | Re: There can be only one! How to avoid the "highlander-problem". |
Date | |
Msg-id | 1181008482.859185.111800@q69g2000hsb.googlegroups.com Whole thread Raw |
In response to | Re: There can be only one! How to avoid the "highlander-problem". (Lew <lew@nospam.lewscanon.com>) |
Responses |
Re: There can be only one! How to avoid the "highlander-problem".
Re: There can be only one! How to avoid the "highlander-problem". |
List | pgsql-general |
Hi Lew! Thank you for your comments. I have elaborated on them. On Jun 3, 7:22 pm, Lew <l...@nospam.lewscanon.com> wrote: (...) > The trouble with this is that it models "kingship" as an attribute of every > man. (What, no female rulers allowed?) Yeah, saddening, isn't it? Actually, for simplicity's sake I restricted my model to a "male, monarchistic world". > The overhead of being "not king" is > carried in every "mankind" record. This may suffice for your particular model, > but if you were designing for evolution you'd have a problem. Every new > attribute of "mankind" would need a new column in the table - "isDuke", > "isNoble", "isHogSlopCleaner". You are right, of course. (I switch to "nation" instead of "people" in my examples like you did, as the term seems clearer.) However, in your SQL model, you extracted nationality instead of kingship. If every man has to be member of exactly one nation (which I postulate), nationality can reside with the man. (we need man.nation_id instead of nation.man_id) That leaves only the kingship to be allocated. I postulate further that a king only be king of his own people (rules out multiple kingships, too). So the "king" needs only to have 1 attribute: man_id. To make room for other roles, as you mentioned, I include a role_id. However, roles must be as unique like the kingship. To enforce uniqueness of one king (or other role) per nation I include the seemingly redundant nation_id and impose a UNIQUE (nation_id, role_id) on it. To enforce that a man can only become king of his own people, I wrap both (man_id, nation_id) in a FOREIGN KEY constraint on "man". PostgreSQL therefore requires a corresponding (redundant) UNIQUE (nation_id, role_id) on "man". !NOTE that I do NOT reference table "nation", so we have no circular foreign-key constraints! 0.) Lets number the models: CREATE TABLE nation ( nation_id INTEGER PRIMARY KEY ); CREATE TABLE man ( man_id INTEGER PRIMARY KEY, nation_id INTEGER NOT NULL REFERENCES nation (nation_id) ON UPDATE CASCADE ON DELETE CASCADE ); CREATE TABLE role -- "role" is non-reserved word in postgresql or SQL2003, but reserved in SQL99 ( man_id INTEGER PRIMARY KEY REFERENCES man (man_id) ON UPDATE CASCADE ON DELETE CASCADE, nation_id INTEGER, role_id INTEGER, UNIQUE (nation_id, role_id) FOREIGN KEY (man_id, nation_id) REFERENCES man (man_id, nation_id) ON UPDATE CASCADE ON DELETE CASCADE ); This makes sense if we have a lot of men per nation and an unknown number of unique roles per nation. I will simplify this model step by step now, along with simplified conditions: 1.) First, lets get rid of multiple roles. My model only needs kingship. So I replace table "role" with the following table "king" (the rest is the same). : CREATE TABLE king ( king_id INTEGER PRIMARY KEY REFERENCES man (man_id) ON UPDATE CASCADE ON DELETE CASCADE, nation_id INTEGER UNIQUE, FOREIGN KEY (man_id, nation_id) REFERENCES man (man_id, nation_id) ON UPDATE CASCADE ON DELETE CASCADE ); 2.) Now we can further simplify the structure. Skip the table "king" and merge kingship as an attribute into table "man". This makes sense with one (or a small number of ) known role(s). Adds a field to _every_ man and gets rid of one tuple per king and the overhead for that extra table. Whether this is preferable over 1.) depends on the typical number of men per nation. If there is more than just a few, you should stick to 1.). If there is only a few, however, you gain something. Note, how we reference nation(nation_id) twice (!), but only one time is NOT NULL. We are still avoiding circular references. CREATE TABLE man ( man_id INTEGER PRIMARY KEY, nation_id INTEGER NOT NULL REFERENCES nation (nation_id) ON UPDATE CASCADE ON DELETE CASCADE, king_id INTEGER UNIQUE REFERENCES nation (nation_id) ON UPDATE CASCADE ON DELETE CASCADE, CHECK ((nation_id = king_id)) -- needed to make sure a man can only become king of his own people. ); 3.) As an improvement over 2.) we can merge kingship into nation (as you suggested). Note the "ON DELETE SET NULL" clause, that allows a king to die. Actually I would pass on kingship to another man (or NULL if none are left) per trigger, much like in my initial post: "trg_mankind_delaft()". Note also that king_id isn't "NOT NULL", so we need to be prepared for nations without a king (king_id IS NULL). To enforce a king we'd set it "NOT NULL DEFAULT 0", but then we'd need a dummy man with man_id = 0 to serve referential integrity and that's where the circular references begin to bite. Because the dummy man needs a nation first. This could only be solved by entering a dummy nation and a dummy man before enforcing referential integrity. We also need triggers BEFORE INSERT AND UPDATE to check that the king is member of the nation IF NEW.king_id IS NOT NULL AND nation_id IS DISTINCT FROM NEW.nation_id FROM man WHERE man_id = NEW.king_id THEN RAISE EXCEPTION 'Usurper!'; END IF; Now we have to store only one field per nation and not per man. CREATE TABLE nation ( nation_id INTEGER PRIMARY KEY king_id INTEGER REFERENCES man (man_id) ON UPDATE CASCADE ON DELETE SET NULL ); CREATE TABLE man ( man_id INTEGER PRIMARY KEY, nation_id INTEGER NOT NULL REFERENCES nation (nation_id) ON UPDATE CASCADE ON DELETE CASCADE ); 4.) Finally, we can simplify the model 2.) in another way and avoid the circular references. As man.king_id only ever references the same nation.nation_id as man.nation_id, we can substitute the integer with a (smaller) boolean and get rid of the FOREIGN KEY constraint and the CHECK constraint. We still need to take care of the UNIQUEness of kingship per nation, though. That's where my initially posted code example comes in. Also the overhead of being "not king" has to be carried by every man instead of every nation, as you mentioned. BUT in my case, there are fewer men than nations - only 0.7 per nation on average. Yeah, more than half of the "nations" have no men, and no nation has more than handful. That's where the analogy to "nation" fails and my example is a bit misleading. I have chosen it because everyone understands the "highlander problem" without much explanation: there can be only one. :) I think, all of the four models make sense under the right circumstances. 3.) or 4.) are my choices. I have implemented 3.) a number of times in the past. It works, but you have to pay attention to the circular references. Now I have come up with the new 4.) and it works like a charm. So far. Even more so, as I have _many_ SELECTs (often only including "nation" and not so often also "man"), few INSERTs and DELETEs, and rare UPDATEs. In the operations on "man", >99% of the INSERTs only need to operate on one tuple, and all of them only need to manipulate the much smaller "man"-table. As there is more trigger work in 4.) versus more checking of referential integrity in 3.), I am not really sure, which one is the winner, though. Overhead for UPDATEs or SELECTs should be roughly the same as for 3.). As for SELECTS, 3.) should be slightly faster than 4.), but no real nation join man on nation.king_id = man.man_id nation join man USING (nation_id) WHERE king And storage (in my case): 0.7 x 1 Byte clearly beats 1 x 4 Bytes. (half of which are NULL though, so use only a bit of the NULL-bitmask) OK, I may have gotten carried away. Hopefully all of this is of any use to anyone - or at least of interest to you, Lew. Regards Erwin
pgsql-general by date: