Thread: disjoint union types
Hi, I was after opinions as to the best way to lay tables out to get the effect of a "disjoint union" type (also known as a "tagged union"). When I have to do this at the moment, I'm creating a structure like: CREATE TABLE circle ( id SERIAL PRIMARY KEY, radius REAL NOT NULL ); CREATE TABLE square ( id SERIAL PRIMARY KEY, sidelen REAL NOT NULL ); CREATE TABLE shapes ( id SERIAL PRIMARY KEY, tag INTEGER NOT NULL, circleid INTEGER REFERENCES circle CHECK ((tag = 1) = (circleid IS NOT NULL)), squareid INTEGER REFERENCES square CHECK ((tag = 2) = (squareid IS NOT NULL)) ); I can then put data into this by doing: BEGIN; INSERT INTO circle (radius) VALUES (1); INSERT INTO shapes (tag,circleid) VALUES (1,currval('circle_id_seq')); COMMIT; This works, but it's sometimes a bit of a headache turning things around so they fit this structure. Are there standard solutions to this that work better? Thanks, Sam
On Oct 9, 2007, at 9:38 AM, Sam Mason wrote: > Hi, > > I was after opinions as to the best way to lay tables out to get the > effect of a "disjoint union" type (also known as a "tagged union"). > When I have to do this at the moment, I'm creating a structure like: > > CREATE TABLE circle ( id SERIAL PRIMARY KEY, radius REAL NOT > NULL ); > CREATE TABLE square ( id SERIAL PRIMARY KEY, sidelen REAL NOT > NULL ); > > CREATE TABLE shapes ( > id SERIAL PRIMARY KEY, > > tag INTEGER NOT NULL, > > circleid INTEGER REFERENCES circle > CHECK ((tag = 1) = (circleid IS NOT NULL)), > squareid INTEGER REFERENCES square > CHECK ((tag = 2) = (squareid IS NOT NULL)) > ); > > I can then put data into this by doing: > > BEGIN; > INSERT INTO circle (radius) VALUES (1); > INSERT INTO shapes (tag,circleid) VALUES (1,currval > ('circle_id_seq')); > COMMIT; > > This works, but it's sometimes a bit of a headache turning things > around > so they fit this structure. Are there standard solutions to this that > work better? You could use after triggers on your circle and shape tables to automatically make the insert into shapes for you. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
On Tue, Oct 09, 2007 at 10:30:15AM -0500, Erik Jones wrote: >On Oct 9, 2007, at 9:38 AM, Sam Mason wrote: >> CREATE TABLE circle ( id SERIAL PRIMARY KEY, radius REAL NOT NULL ); >> CREATE TABLE square ( id SERIAL PRIMARY KEY, sidelen REAL NOT NULL ); >> >> CREATE TABLE shapes ( >> id SERIAL PRIMARY KEY, >> >> tag INTEGER NOT NULL, >> >> circleid INTEGER REFERENCES circle >> CHECK ((tag = 1) = (circleid IS NOT NULL)), >> squareid INTEGER REFERENCES square >> CHECK ((tag = 2) = (squareid IS NOT NULL)) >> ); > > You could use after triggers on your circle and shape tables to > automatically make the insert into shapes for you. Yes, that helps a bit with getting data in. Doing anything generally useful with this data once it's there is still quite painful. If you've ever used a language supporting something like this natively then things get easier. In, say, Haskell you could do: data Shape = Circle Double | Square Double if I then wanted to get the area out I'd be able to do something like: area (Circle r) = pi * r ^ 2 area (Square l) = l ^ 2 mapping this over a list is easy. In SQL I'd need to do something much more complicated to get the areas of all these shapes out, maybe: SELECT s.id, CASE WHEN s.tag = 1 THEN pi() * r ^ 2 WHEN s.tag = 2 THEN l ^ 2 END AS area FROM shapes s LEFT JOIN circle c ON s.tag = 1 AND s.circleid = c.id LEFT JOIN square q ON s.tag = 2 AND s.squareid = q.id This is just a simple example, but if you've got a few of these structures to match up it starts to get complicated pretty quickly. Sam
On 10/10/07, Sam Mason <sam@samason.me.uk> wrote:
On Tue, Oct 09, 2007 at 10:30:15AM -0500, Erik Jones wrote:
>On Oct 9, 2007, at 9:38 AM, Sam Mason wrote:
>> CREATE TABLE circle ( id SERIAL PRIMARY KEY, radius REAL NOT NULL );
>> CREATE TABLE square ( id SERIAL PRIMARY KEY, sidelen REAL NOT NULL );
>>
>> CREATE TABLE shapes (
>> id SERIAL PRIMARY KEY,
>>
>> tag INTEGER NOT NULL,
>>
>> circleid INTEGER REFERENCES circle
>> CHECK ((tag = 1) = (circleid IS NOT NULL)),
>> squareid INTEGER REFERENCES square
>> CHECK ((tag = 2) = (squareid IS NOT NULL))
>> );
>
> You could use after triggers on your circle and shape tables to
> automatically make the insert into shapes for you.
Yes, that helps a bit with getting data in. Doing anything generally
useful with this data once it's there is still quite painful. If you've
ever used a language supporting something like this natively then things
get easier. In, say, Haskell you could do:
data Shape = Circle Double | Square Double
if I then wanted to get the area out I'd be able to do something like:
area (Circle r) = pi * r ^ 2
area (Square l) = l ^ 2
mapping this over a list is easy. In SQL I'd need to do something much
more complicated to get the areas of all these shapes out, maybe:
SELECT s.id,
CASE WHEN s.tag = 1 THEN pi() * r ^ 2
WHEN s.tag = 2 THEN l ^ 2 END AS area
FROM shapes s
LEFT JOIN circle c ON s.tag = 1 AND s.circleid = c.id
LEFT JOIN square q ON s.tag = 2 AND s.squareid = q.id
This is just a simple example, but if you've got a few of these
structures to match up it starts to get complicated pretty quickly.
Sam
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
I wonder if the best way to go would be to use the OO stuff.
If you had a "shapes" table, that had the various operations you were interested in (say area), then you could have a circle table inherit from that, and automatically compute the area with a trigger, as Erik suggested. Then you could just query shapes:
CREATE TABLE shapes (
shape_id serial PRIMARY KEY,
area real not null
);
shape_id serial PRIMARY KEY,
area real not null
);
CREATE TABLE circle (
radius real not null
) INHERITS (shapes);
CREATE FUNCTION circle_area() RETURNS trigger AS $circle_area$
BEGIN
NEW.area = pi() * NEW.radius ^ 2;
RETURN NEW;
END;
$circle_area$ LANGUAGE plpgsql;
CREATE TRIGGER circle_area BEFORE INSERT OR UPDATE ON circle
FOR EACH ROW EXECUTE PROCEDURE circle_area();
INSERT INTO circle (radius) values (5)
SELECT * FROM shapes
shape_id 1
area 78.5398
Ian Barber
On Wed, Oct 10, 2007 at 05:02:36PM +0100, Ian Barber wrote: > I wonder if the best way to go would be to use the OO stuff. I don't see how the following is object orientated, but I'm not sure it matters much. > If you had a "shapes" table, that had the various operations you were > interested in (say area) I prefer to keep everything as normalised as possible, if I start putting an area column somewhere then I'm forced to keep it updated somehow. The example I gave was rather simple, but I'd like to do lots of other things beside some (simple) calculations, e.g. matching tables up depending on the internal state of each object. > , then you could have a circle table inherit from > that, and automatically compute the area with a trigger, as Erik suggested. > Then you could just query shapes: > > CREATE TABLE shapes ( > shape_id serial PRIMARY KEY, > area real not null > ); > > CREATE TABLE circle ( > radius real not null > ) INHERITS (shapes); Postgres implements inheritance in a strange way (the way it is at the moment makes sense from an implementation, but not users', point of view), you can end up with a circle and square both with shape_id=1 if I don't take a lot of care. > CREATE FUNCTION circle_area() RETURNS trigger AS $circle_area$ > BEGIN > NEW.area = pi() * NEW.radius ^ 2; > RETURN NEW; > END; > $circle_area$ LANGUAGE plpgsql; > > CREATE TRIGGER circle_area BEFORE INSERT OR UPDATE ON circle > FOR EACH ROW EXECUTE PROCEDURE circle_area(); > > INSERT INTO circle (radius) values (5) > SELECT * FROM shapes > > shape_id 1 > area 78.5398 This works to store the area of the shape, but doesn't allow me to work with work with more complicated structures. I'll try and think up a better example and send it along to the list when I can describe it. Thanks, Sam
On Oct 10, 2007, at 11:42 AM, Sam Mason wrote: > On Wed, Oct 10, 2007 at 05:02:36PM +0100, Ian Barber wrote: >> I wonder if the best way to go would be to use the OO stuff. > > I don't see how the following is object orientated, but I'm not > sure it > matters much. > >> If you had a "shapes" table, that had the various operations you were >> interested in (say area) > > I prefer to keep everything as normalised as possible, if I start > putting an area column somewhere then I'm forced to keep it updated > somehow. The example I gave was rather simple, but I'd like to do > lots > of other things beside some (simple) calculations, e.g. matching > tables > up depending on the internal state of each object. > >> , then you could have a circle table inherit from >> that, and automatically compute the area with a trigger, as Erik >> suggested. >> Then you could just query shapes: >> >> CREATE TABLE shapes ( >> shape_id serial PRIMARY KEY, >> area real not null >> ); >> >> CREATE TABLE circle ( >> radius real not null >> ) INHERITS (shapes); > > Postgres implements inheritance in a strange way (the way it is at the > moment makes sense from an implementation, but not users', point of > view), you can end up with a circle and square both with shape_id=1 > if I > don't take a lot of care. It doesn't take much care at all to avoid that: don't use SERIAL for the primary key of the parent. Instead use an explicity "id integer NOT NULL DEFAULT nextval('some_seq'), that way all of the child tables will use the same sequence as the parent. That being said, I'm not convinced that table inheritance is what's needed here. I'll wait until you post the example you mention below before commenting further. > >> CREATE FUNCTION circle_area() RETURNS trigger AS $circle_area$ >> BEGIN >> NEW.area = pi() * NEW.radius ^ 2; >> RETURN NEW; >> END; >> $circle_area$ LANGUAGE plpgsql; >> >> CREATE TRIGGER circle_area BEFORE INSERT OR UPDATE ON circle >> FOR EACH ROW EXECUTE PROCEDURE circle_area(); >> >> INSERT INTO circle (radius) values (5) >> SELECT * FROM shapes >> >> shape_id 1 >> area 78.5398 > > This works to store the area of the shape, but doesn't allow me to > work > with work with more complicated structures. I'll try and think up a > better example and send it along to the list when I can describe it. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
On Wed, Oct 10, 2007 at 12:10:10PM -0500, Erik Jones wrote: > On Oct 10, 2007, at 11:42 AM, Sam Mason wrote: >> On Wed, Oct 10, 2007 at 05:02:36PM +0100, Ian Barber wrote: >>> CREATE TABLE shapes ( >>> shape_id serial PRIMARY KEY, >>> area real not null >>> ); >>> >>> CREATE TABLE circle ( >>> radius real not null >>> ) INHERITS (shapes); >> >> Postgres implements inheritance in a strange way (the way it is at the >> moment makes sense from an implementation, but not users', point of >> view), you can end up with a circle and square both with shape_id=1 if I >> don't take a lot of care. > > It doesn't take much care at all to avoid that: don't use SERIAL for the > primary key of the parent. Instead use an explicity "id integer NOT NULL > DEFAULT nextval('some_seq'), that way all of the child tables will use the > same sequence as the parent. That being said, I'm not convinced that table > inheritance is what's needed here. I'll wait until you post the example > you mention below before commenting further. The main problem I have with inheritance is that the unique constraints are only checked per-table. If I have: CREATE TABLE a ( id INTEGER NOT NULL PRIMARY KEY ); CREATE TABLE b ( ) inherits a; INSERT INTO a VALUES (1); I can run this without anything having any complaints. INSERT INTO b SELECT * FROM a; If I'm careful about getting data into the database then this isn't a problem, but, to me, this is exactly the thing I shouldn't have to be careful about because it's the database's job to keep track of this. As I said before, I understand, from an implementation point of view, why it has this behaviour; it just isn't very nice from a users'. The shape example I used before was me attempting to simplify the problem, I think I took things too far. My original motivation for wanting to do this sort of thing was a (sort of) stock control problem. I've got various types of items that I want to store information about. I want to treat the union of these different types in a uniform manner, for example I keep track of whether they are in their canonical location or are "checked out" and in use. To do this I had something like: CREATE TABLE books ( id SERIAL PRIMARY KEY, name TEXT ); CREATE TABLE computer ( id SERIAL PRIMARY KEY, name TEXT ); CREATE TABLE stock ( id SERIAL PRIMARY KEY, tag INTEGER NOT NULL, bookid INTEGER REFERENCES books CHECK ((tag = 1) = (bookid IS NOT NULL)), computerid INTEGER REFERENCES computer CHECK ((tag = 2) = (computerid IS NOT NULL)), barcode TEXT UNIQUE ); This, for example, allows me to rely on the database to check that the barcode uniquely identifies each piece of equipment. It also doesn't require touching more than is needed when scanning for the actual item's detail as the tag is there to specify where to look. We've recently added barcodes to non-"stock" items and it took me a while to realise that a similar rotation of things allows me to store a single barcode in a similar way. For example, I could drop the barcode from the stock table and table like: CREATE TABLE barcodes ( id SERIAL PRIMARY KEY, tag INTEGER NOT NULL, stockid INTEGER REFERENCES stock CHECK ((tag = 1) = (stockid IS NOT NULL)), plateid INTEGER REFERENCES testplates CHECK ((tag = 2) = (plateid IS NOT NULL)), barcode TEXT UNIQUE ); and I can get back to wherever I want to. The annoyance is that it's a bit of a fiddle, schema wise, to do this translation. So that's my reason for asking if there were other ways of doing this sort of thing. Sam