Thread: disjoint union types

disjoint union types

From

Sam Mason

Date:

09 October 2007, 15:18:37

Hi,

I was after opinions as to the best way to lay tables out to get the
effect of a "disjoint union" type (also known as a "tagged union").
When I have to do this at the moment, I'm creating a structure like:

  CREATE TABLE circle ( id SERIAL PRIMARY KEY, radius  REAL NOT NULL );
  CREATE TABLE square ( id SERIAL PRIMARY KEY, sidelen REAL NOT NULL );

  CREATE TABLE shapes (
    id SERIAL PRIMARY KEY,

    tag INTEGER NOT NULL,

    circleid INTEGER REFERENCES circle
      CHECK ((tag = 1) = (circleid IS NOT NULL)),
    squareid INTEGER REFERENCES square
      CHECK ((tag = 2) = (squareid IS NOT NULL))
  );

I can then put data into this by doing:

  BEGIN;
  INSERT INTO circle (radius) VALUES (1);
  INSERT INTO shapes (tag,circleid) VALUES (1,currval('circle_id_seq'));
  COMMIT;

This works, but it's sometimes a bit of a headache turning things around
so they fit this structure.  Are there standard solutions to this that
work better?


Thanks,
  Sam

Re: disjoint union types

From

Erik Jones

Date:

09 October 2007, 15:31:06

On Oct 9, 2007, at 9:38 AM, Sam Mason wrote:

> Hi,
>
> I was after opinions as to the best way to lay tables out to get the
> effect of a "disjoint union" type (also known as a "tagged union").
> When I have to do this at the moment, I'm creating a structure like:
>
>   CREATE TABLE circle ( id SERIAL PRIMARY KEY, radius  REAL NOT
> NULL );
>   CREATE TABLE square ( id SERIAL PRIMARY KEY, sidelen REAL NOT
> NULL );
>
>   CREATE TABLE shapes (
>     id SERIAL PRIMARY KEY,
>
>     tag INTEGER NOT NULL,
>
>     circleid INTEGER REFERENCES circle
>       CHECK ((tag = 1) = (circleid IS NOT NULL)),
>     squareid INTEGER REFERENCES square
>       CHECK ((tag = 2) = (squareid IS NOT NULL))
>   );
>
> I can then put data into this by doing:
>
>   BEGIN;
>   INSERT INTO circle (radius) VALUES (1);
>   INSERT INTO shapes (tag,circleid) VALUES (1,currval
> ('circle_id_seq'));
>   COMMIT;
>
> This works, but it's sometimes a bit of a headache turning things
> around
> so they fit this structure.  Are there standard solutions to this that
> work better?

You could use after triggers on your circle and shape tables to
automatically make the insert into shapes for you.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

Re: disjoint union types

From

Sam Mason

Date:

10 October 2007, 14:02:46

On Tue, Oct 09, 2007 at 10:30:15AM -0500, Erik Jones wrote:
>On Oct 9, 2007, at 9:38 AM, Sam Mason wrote:
>>   CREATE TABLE circle ( id SERIAL PRIMARY KEY, radius  REAL NOT NULL );
>>   CREATE TABLE square ( id SERIAL PRIMARY KEY, sidelen REAL NOT NULL );
>>
>>   CREATE TABLE shapes (
>>     id SERIAL PRIMARY KEY,
>>
>>     tag INTEGER NOT NULL,
>>
>>     circleid INTEGER REFERENCES circle
>>       CHECK ((tag = 1) = (circleid IS NOT NULL)),
>>     squareid INTEGER REFERENCES square
>>       CHECK ((tag = 2) = (squareid IS NOT NULL))
>>   );
>
> You could use after triggers on your circle and shape tables to
> automatically make the insert into shapes for you.

Yes, that helps a bit with getting data in.  Doing anything generally
useful with this data once it's there is still quite painful.  If you've
ever used a language supporting something like this natively then things
get easier.  In, say, Haskell you could do:

  data Shape = Circle Double | Square Double

if I then wanted to get the area out I'd be able to do something like:

  area (Circle r) = pi * r ^ 2
  area (Square l) = l ^ 2

mapping this over a list is easy.  In SQL I'd need to do something much
more complicated to get the areas of all these shapes out, maybe:

  SELECT s.id,
    CASE WHEN s.tag = 1 THEN pi() * r ^ 2
         WHEN s.tag = 2 THEN l ^ 2 END AS area
  FROM shapes s
    LEFT JOIN circle c ON s.tag = 1 AND s.circleid = c.id
    LEFT JOIN square q ON s.tag = 2 AND s.squareid = q.id

This is just a simple example, but if you've got a few of these
structures to match up it starts to get complicated pretty quickly.

  Sam

Re: disjoint union types

From

"Ian Barber"

Date:

10 October 2007, 16:02:45

On 10/10/07, Sam Mason <sam@samason.me.uk> wrote:

On Tue, Oct 09, 2007 at 10:30:15AM -0500, Erik Jones wrote:
>On Oct 9, 2007, at 9:38 AM, Sam Mason wrote:
>>   CREATE TABLE circle ( id SERIAL PRIMARY KEY, radius  REAL NOT NULL );
>>   CREATE TABLE square ( id SERIAL PRIMARY KEY, sidelen REAL NOT NULL );
>>
>>   CREATE TABLE shapes (
>>     id SERIAL PRIMARY KEY,
>>
>>     tag INTEGER NOT NULL,
>>
>>     circleid INTEGER REFERENCES circle
>>       CHECK ((tag = 1) = (circleid IS NOT NULL)),
>>     squareid INTEGER REFERENCES square
>>       CHECK ((tag = 2) = (squareid IS NOT NULL))
>>   );
>
> You could use after triggers on your circle and shape tables to
> automatically make the insert into shapes for you.

Yes, that helps a bit with getting data in.  Doing anything generally
useful with this data once it's there is still quite painful.  If you've
ever used a language supporting something like this natively then things
get easier.  In, say, Haskell you could do:

data Shape = Circle Double | Square Double

if I then wanted to get the area out I'd be able to do something like:

area (Circle r) = pi * r ^ 2
area (Square l) = l ^ 2

mapping this over a list is easy.  In SQL I'd need to do something much
more complicated to get the areas of all these shapes out, maybe:

SELECT s.id,
   CASE WHEN s.tag = 1 THEN pi() * r ^ 2
        WHEN s.tag = 2 THEN l ^ 2 END AS area
FROM shapes s
   LEFT JOIN circle c ON s.tag = 1 AND s.circleid = c.id
   LEFT JOIN square q ON s.tag = 2 AND s.squareid = q.id

This is just a simple example, but if you've got a few of these
structures to match up it starts to get complicated pretty quickly.

Sam

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

I wonder if the best way to go would be to use the OO stuff.

If you had a "shapes" table, that had the various operations you were interested in (say area), then you could have a circle table inherit from that, and automatically compute the area with a trigger, as Erik suggested. Then you could just query shapes:

CREATE TABLE shapes (
shape_id serial PRIMARY KEY,
area real not null
);

CREATE TABLE circle (
radius real not null
) INHERITS (shapes);

CREATE FUNCTION circle_area() RETURNS trigger AS $circle_area$
    BEGIN
    NEW.area = pi() * NEW.radius ^ 2;
        RETURN NEW;
    END;
$circle_area$ LANGUAGE plpgsql;

CREATE TRIGGER circle_area BEFORE INSERT OR UPDATE ON circle
FOR EACH ROW EXECUTE PROCEDURE circle_area();

INSERT INTO circle (radius) values (5)

SELECT * FROM shapes

shape_id 1

area 78.5398

Ian Barber

Re: disjoint union types

From

Sam Mason

Date:

10 October 2007, 16:42:49

On Wed, Oct 10, 2007 at 05:02:36PM +0100, Ian Barber wrote:
> I wonder if the best way to go would be to use the OO stuff.

I don't see how the following is object orientated, but I'm not sure it
matters much.

> If you had a "shapes" table, that had the various operations you were
> interested in (say area)

I prefer to keep everything as normalised as possible, if I start
putting an area column somewhere then I'm forced to keep it updated
somehow.  The example I gave was rather simple, but I'd like to do lots
of other things beside some (simple) calculations, e.g. matching tables
up depending on the internal state of each object.

> , then you could have a circle table inherit from
> that, and automatically compute the area with a trigger, as Erik suggested.
> Then you could just query shapes:
>
> CREATE TABLE shapes (
>  shape_id serial PRIMARY KEY,
>  area real not null
> );
>
> CREATE TABLE circle (
>  radius real not null
> ) INHERITS (shapes);

Postgres implements inheritance in a strange way (the way it is at the
moment makes sense from an implementation, but not users', point of
view), you can end up with a circle and square both with shape_id=1 if I
don't take a lot of care.

> CREATE FUNCTION circle_area() RETURNS trigger AS $circle_area$
>     BEGIN
>      NEW.area = pi() * NEW.radius ^ 2;
>         RETURN NEW;
>     END;
> $circle_area$ LANGUAGE plpgsql;
>
> CREATE TRIGGER circle_area BEFORE INSERT OR UPDATE ON circle
>     FOR EACH ROW EXECUTE PROCEDURE circle_area();
>
> INSERT INTO circle (radius) values (5)
> SELECT * FROM shapes
>
> shape_id 1
> area 78.5398

This works to store the area of the shape, but doesn't allow me to work
with work with more complicated structures.  I'll try and think up a
better example and send it along to the list when I can describe it.

Thanks,
  Sam

Re: disjoint union types

From

Erik Jones

Date:

10 October 2007, 17:10:40

On Oct 10, 2007, at 11:42 AM, Sam Mason wrote:

> On Wed, Oct 10, 2007 at 05:02:36PM +0100, Ian Barber wrote:
>> I wonder if the best way to go would be to use the OO stuff.
>
> I don't see how the following is object orientated, but I'm not
> sure it
> matters much.
>
>> If you had a "shapes" table, that had the various operations you were
>> interested in (say area)
>
> I prefer to keep everything as normalised as possible, if I start
> putting an area column somewhere then I'm forced to keep it updated
> somehow.  The example I gave was rather simple, but I'd like to do
> lots
> of other things beside some (simple) calculations, e.g. matching
> tables
> up depending on the internal state of each object.
>
>> , then you could have a circle table inherit from
>> that, and automatically compute the area with a trigger, as Erik
>> suggested.
>> Then you could just query shapes:
>>
>> CREATE TABLE shapes (
>>  shape_id serial PRIMARY KEY,
>>  area real not null
>> );
>>
>> CREATE TABLE circle (
>>  radius real not null
>> ) INHERITS (shapes);
>
> Postgres implements inheritance in a strange way (the way it is at the
> moment makes sense from an implementation, but not users', point of
> view), you can end up with a circle and square both with shape_id=1
> if I
> don't take a lot of care.

It doesn't take much care at all to avoid that:  don't use SERIAL for
the primary key of the parent.  Instead use an explicity "id integer
NOT NULL DEFAULT nextval('some_seq'), that way all of the child
tables will use the same sequence as the parent.  That being said,
I'm not convinced that table inheritance is what's needed here.  I'll
wait until you post the example you mention below before commenting
further.

>
>> CREATE FUNCTION circle_area() RETURNS trigger AS $circle_area$
>>     BEGIN
>>      NEW.area = pi() * NEW.radius ^ 2;
>>         RETURN NEW;
>>     END;
>> $circle_area$ LANGUAGE plpgsql;
>>
>> CREATE TRIGGER circle_area BEFORE INSERT OR UPDATE ON circle
>>     FOR EACH ROW EXECUTE PROCEDURE circle_area();
>>
>> INSERT INTO circle (radius) values (5)
>> SELECT * FROM shapes
>>
>> shape_id 1
>> area 78.5398
>
> This works to store the area of the shape, but doesn't allow me to
> work
> with work with more complicated structures.  I'll try and think up a
> better example and send it along to the list when I can describe it.



Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

Re: disjoint union types

From

Sam Mason

Date:

12 October 2007, 13:46:32

On Wed, Oct 10, 2007 at 12:10:10PM -0500, Erik Jones wrote:
> On Oct 10, 2007, at 11:42 AM, Sam Mason wrote:
>> On Wed, Oct 10, 2007 at 05:02:36PM +0100, Ian Barber wrote:
>>> CREATE TABLE shapes (
>>>  shape_id serial PRIMARY KEY,
>>>  area real not null
>>> );
>>>
>>> CREATE TABLE circle (
>>>  radius real not null
>>> ) INHERITS (shapes);
>>
>> Postgres implements inheritance in a strange way (the way it is at the
>> moment makes sense from an implementation, but not users', point of
>> view), you can end up with a circle and square both with shape_id=1 if I
>> don't take a lot of care.
>
> It doesn't take much care at all to avoid that:  don't use SERIAL for the
> primary key of the parent.  Instead use an explicity "id integer NOT NULL
> DEFAULT nextval('some_seq'), that way all of the child tables will use the
> same sequence as the parent.  That being said, I'm not convinced that table
> inheritance is what's needed here.  I'll wait until you post the example
> you mention below before commenting further.

The main problem I have with inheritance is that the unique constraints
are only checked per-table.  If I have:

  CREATE TABLE a ( id INTEGER NOT NULL PRIMARY KEY );
  CREATE TABLE b ( ) inherits a;
  INSERT INTO a VALUES (1);

I can run this without anything having any complaints.

  INSERT INTO b
    SELECT * FROM a;

If I'm careful about getting data into the database then this isn't a
problem, but, to me, this is exactly the thing I shouldn't have to be
careful about because it's the database's job to keep track of this.  As
I said before, I understand, from an implementation point of view, why
it has this behaviour; it just isn't very nice from a users'.

The shape example I used before was me attempting to simplify the
problem, I think I took things too far.  My original motivation for
wanting to do this sort of thing was a (sort of) stock control problem.
I've got various types of items that I want to store information about.
I want to treat the union of these different types in a uniform manner,
for example I keep track of whether they are in their canonical location
or are "checked out" and in use.

To do this I had something like:

  CREATE TABLE books ( id SERIAL PRIMARY KEY, name TEXT );
  CREATE TABLE computer ( id SERIAL PRIMARY KEY, name TEXT );

  CREATE TABLE stock (
    id SERIAL PRIMARY KEY,
    tag INTEGER NOT NULL,
    bookid INTEGER REFERENCES books
      CHECK ((tag = 1) = (bookid IS NOT NULL)),
    computerid INTEGER REFERENCES computer
      CHECK ((tag = 2) = (computerid IS NOT NULL)),
    barcode TEXT UNIQUE
  );

This, for example, allows me to rely on the database to check that the
barcode uniquely identifies each piece of equipment.  It also doesn't
require touching more than is needed when scanning for the actual item's
detail as the tag is there to specify where to look.  We've recently
added barcodes to non-"stock" items and it took me a while to realise
that a similar rotation of things allows me to store a single barcode
in a similar way.  For example, I could drop the barcode from the stock
table and table like:

  CREATE TABLE barcodes (
    id SERIAL PRIMARY KEY,
    tag INTEGER NOT NULL,
    stockid INTEGER REFERENCES stock
      CHECK ((tag = 1) = (stockid IS NOT NULL)),
    plateid INTEGER REFERENCES testplates
      CHECK ((tag = 2) = (plateid IS NOT NULL)),
    barcode TEXT UNIQUE
  );

and I can get back to wherever I want to.  The annoyance is that it's
a bit of a fiddle, schema wise, to do this translation.  So that's my
reason for asking if there were other ways of doing this sort of thing.

  Sam