Thread: Feature Proposal: Constant Values in Columns or Foreign Keys
Hi! I have a little feature proposal. Let me try to explain the motivation behind it. Suppose our application has two types of objects, looking somewhat like this: abstract class Base { public int Id; public int SomeData; } class Derived1 : Base { public int Data1; } class Derived2 : Base { public string Data2; } There are many ways of modeling this in a relational database. I am interested in this one: CREATE TYPE derived_type AS ENUM ('derived1', 'derived2); CREATE TABLE base ( id int PRIMARY KEY, some_data int NOT NULL, type derived_type NOT NULL ); CREATE UNIQUE INDEX base_derived_type_index ON base (id, derived_type); CREATE TABLE derived1 ( id int PRIMARY KEY, data1 int NOT NULL, type derived_type NOT NULL CHECK (type = 'derived1'), FOREIGN KEY (id, type) REFERENCES base (id, type) ON DELETE CASCADE ) CREATE TABLE derived2 ( id int PRIMARY KEY, data2 text NOT NULL, type derived_type NOT NULL CHECK (type = 'derived2'), FOREIGN KEY (id, type) REFERENCES base (id, type) ON DELETE CASCADE ) Note that the type column in derived1 and derived2 ensures that there is at most one row in either derived1 or derived2 which refers to a given row in "base". This works fine, actually. What bugs me, though, is the redundant data in the type columns of derived1 and derived2. It would be nice if I could either declare the columns as constant (so the data wouldn't be stored on disk anymore), or (even better) use a constant value in the foreign keys, as in FOREIGN KEY (id, 'derived2') REFERENCES base (id, type) ON DELETE CASCADE In the latter case, I could omit the type column of derived1 and derived2 altogether. I suspect that it wouldn't be terribly hard to implement this. What do you think? Wouldn't this be nice to have? Regards, -- Nils Gösche "Don't ask for whom the <CTRL-G> tolls."
Hi,
Regards,
Bartek
how about inheritance in postgres?
CREATE TABLE "tblBase"
(
id serial NOT NULL, -- serial type is my assumption.
"SomeData" integer,
CONSTRAINT "tblBase_pkey" PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
CREATE TABLE "tblDerived1"
(
-- Inherited from table "tblBase": id integer NOT NULL DEFAULT nextval('"tblBase_id_seq"'::regclass),
-- Inherited from table "tblBase": "SomeData" integer,
"Data1" integer,
CONSTRAINT "tblDerived1_pkey" PRIMARY KEY (id )
)
INHERITS ("tblBase")
WITH (
OIDS=FALSE
);
CREATE TABLE "tblDerived2"
(
-- Inherited from table "tblBase": id integer NOT NULL DEFAULT nextval('"tblBase_id_seq"'::regclass),
-- Inherited from table "tblBase": "SomeData" integer,
"Data2" text,
CONSTRAINT "tblDerived2_pkey" PRIMARY KEY (id )
)
INHERITS ("tblBase")
WITH (
OIDS=FALSE
);
inheritance is described in doc here: http://www.postgresql.org/docs/9.1/static/ddl-inherit.html
With this approach all IDs will use the same sequence so there will not be duplicated PKs in inherited tables.
This could be also modeled with "standard" SQL approach without redundant information. Solution depends on requirements.
Regards,
Bartek
2012/4/17 Nils Gösche <cartan@cartan.de>
Hi!
I have a little feature proposal. Let me try to explain the motivation
behind it.
Suppose our application has two types of objects, looking somewhat like
this:
abstract class Base
{
public int Id;
public int SomeData;
}
class Derived1 : Base
{
public int Data1;
}
class Derived2 : Base
{
public string Data2;
}
There are many ways of modeling this in a relational database. I am
interested in this one:
CREATE TYPE derived_type AS ENUM ('derived1', 'derived2);
CREATE TABLE base (
id int PRIMARY KEY,
some_data int NOT NULL,
type derived_type NOT NULL
);
CREATE UNIQUE INDEX base_derived_type_index ON base (id, derived_type);
CREATE TABLE derived1 (
id int PRIMARY KEY,
data1 int NOT NULL,
type derived_type NOT NULL CHECK (type = 'derived1'),
FOREIGN KEY (id, type) REFERENCES base (id, type) ON DELETE CASCADE
)
CREATE TABLE derived2 (
id int PRIMARY KEY,
data2 text NOT NULL,
type derived_type NOT NULL CHECK (type = 'derived2'),
FOREIGN KEY (id, type) REFERENCES base (id, type) ON DELETE CASCADE
)
Note that the type column in derived1 and derived2 ensures that there is at
most one row in either derived1 or derived2 which refers to a given row in
"base".
This works fine, actually. What bugs me, though, is the redundant data in
the type columns of derived1 and derived2. It would be nice if I could
either declare the columns as constant (so the data wouldn't be stored on
disk anymore), or (even better) use a constant value in the foreign keys, as
in
FOREIGN KEY (id, 'derived2') REFERENCES base (id, type) ON DELETE
CASCADE
In the latter case, I could omit the type column of derived1 and derived2
altogether.
I suspect that it wouldn't be terribly hard to implement this. What do you
think? Wouldn't this be nice to have?
Regards,
--
Nils Gösche
"Don't ask for whom the <CTRL-G> tolls."
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Bartosz Dmytrak wrote: > how about inheritance in postgres? I know about Postgres' inheritance feature, but would prefer a more standard relational solution. > With this approach all IDs will use the same sequence so there will not be duplicated PKs in inherited tables. In my case, the primary keys are usually uuids, actually. It was just an example, anyway. > This could be also modeled with "standard" SQL approach without redundant information. Solution depends on requirements. Well, it could be modeled in the way I described, if only I could use constant values in foreign keys: FOREIGN KEY (id, 'derived2') REFERENCES base (id, type) ON DELETE CASCADE I was quite surprised to find that this wasn't possible. Is there any good reason why not? The reason I like this particular way of modeling the data is that I have a guarantee that there won't be an entry in bothderived tables at the same time for the same row in the base table; also, I can have further constraints and foreignkeys from and to the base table. Of course, I could also omit the type field and simply live with the possibility of having two rows in the derived tablesreferring to the same row of the base table. But it would be nice if I could rule that out with simple constraints. Regards, -- Nils Gösche Don't ask for whom the <Ctrl-G> tolls.
Attachment
=?UTF-8?Q?Nils_G=C3=B6sche?= <cartan@cartan.de> writes: > Bartosz Dmytrak wrote: >> how about inheritance in postgres? > I know about Postgres' inheritance feature, but would prefer a more standard relational solution. [ blink... ] That seems like a pretty silly argument for proposing something that is *more* nonstandard. > I was quite surprised to find that this wasn't possible. Is there any good reason why not? It's contrary to SQL standard is why not. And it's not just a matter of being outside the spec, as inheritance is; this is messing with the details of something that is defined in the standard. As an example, I would wonder how such an FK is supposed to be represented in the spec-defined information schema views. Other interesting questions include what would happen to the supposed constant during actions such as ON DELETE SET NULL or ON UPDATE CASCADE, which normally would result in a change in the referencing row. regards, tom lane
Tom Lane wrote: > =?UTF-8?Q?Nils_G=C3=B6sche?= <cartan@cartan.de> writes: > > I was quite surprised to find that this wasn't possible. Is there any > > good reason why not? > > It's contrary to SQL standard is why not. And it's not just a matter > of > being outside the spec, as inheritance is; this is messing with the > details of something that is defined in the standard. As an example, > I would wonder how such an FK is supposed to be represented in the > spec-defined information schema views. Other interesting questions > include what would happen to the supposed constant during actions such > as ON DELETE SET NULL or ON UPDATE CASCADE, which normally would result > in a change in the referencing row. Well, something I can have already is a column together with a NOT NULL constraint, and a CHECK constraint that ensures thatthe value is really constant, and a foreign key that includes this constant column. You could just handle it the sameway. The information schema is a good point, though. If the only way to put this into the information schema is to introduce somedummy column, possibly with NULL as a name or something, then I don't like the idea much myself anymore. I guess the best I could hope for, then, is an optimization that does not store constant columns on disk ;-) Regards, -- Nils Gösche "Don't ask for whom the <CTRL-G> tolls."
Le mercredi 18 avril 2012 à 00:06 +0200, Nils Gösche a écrit : > Bartosz Dmytrak wrote: > > > > The reason I like this particular way of modeling the data is that I have a guarantee that there won't be an entry in bothderived tables at the same time for the same row in the base table; also, I can have further constraints and foreignkeys from and to the base table. > use a trigger on each of the derived tables, that cancels any insert if the same id already exists in the other table? > Of course, I could also omit the type field and simply live with the possibility of having two rows in the derived tablesreferring to the same row of the base table. But it would be nice if I could rule that out with simple constraints. You don't say how your data gets inserted, but considering how complicated your preferred option looks, I have to ask why you can't use something as simple as : CREATE TABLE base ( id int PRIMARY KEY, some_data int NOT NULL, type integer NOT NULL DEFAULT 1 ); -- type 1 = derived1, type 2 = derived2 CREATE TABLE derived1 ( id int PRIMARY KEY, data1 int NOT NULL, FOREIGN KEY (id) REFERENCES base (id) ON DELETE CASCADE ); CREATE TABLE derived2 ( id int PRIMARY KEY, data2 text NOT NULL, FOREIGN KEY (id) REFERENCES base (id) ON DELETE CASCADE ); You'll have to build the queries according to the value of type, but this should give you the features you mention? -- Vincent Veyron http://marica.fr/ Logiciel de gestion des sinistres assurances et des dossiers contentieux pour le service juridique
Vincent Veyron wrote: > use a trigger on each of the derived tables, that cancels any insert if > the same id already exists in the other table? Yes, that would work. > You don't say how your data gets inserted, but considering how > complicated your preferred option looks, I have to ask why you can't > use something as simple as : [...] > You'll have to build the queries according to the value of type, but > this should give you the features you mention? Sure, this is what I meant when I said that you could simply omit the type field. However, if you still want the guarantee about not having two derived rows for the same base row, you'll have to use a trigger function. The whole point of the exercise is to avoid that trigger function and replace it with a simple foreign key :-). It's ok, maybe it's not worth the trouble. Regards, -- Nils Gösche "Don't ask for whom the <CTRL-G> tolls."
according to DB theory:
1NF: Table faithfully represents a relation and has no repeating groups
2NF: No non-prime attribute in the table is functionally dependent on a proper subset of anycandidate key.
so these constants are not in line with this approach.
You can implement one to one relation:
CREATE TABLE "tblBase"(
id text NOT NULL,
"SomeData" integer,
CONSTRAINT "tblBase_pkey" PRIMARY KEY (id )
);
CREATE TABLE "tblDerived1"(
id text NOT NULL,
"Data1" integer,
CONSTRAINT "tblDerived1_pkey" PRIMARY KEY (id ),
CONSTRAINT "tblDerived1_id_fkey" FOREIGN KEY (id)
REFERENCES "tblBase" (id) MATCH FULL
ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
);
CREATE TABLE "tblDerived2"(
id text NOT NULL,
"Data1" text,
CONSTRAINT "tblDerived2_pkey" PRIMARY KEY (id ),
CONSTRAINT "tblDerived2_id_fkey" FOREIGN KEY (id)
REFERENCES "tblBase" (id) MATCH FULL
ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
);
So, You don't have any duplicated PK, every class is described in separate entity (table).
If Your business logic needs to recognize classes in other way You can use views, with constant field which is not stored on disk:
CREATE VIEW "vDerived1"
AS
SELECT *, 'DERIVED1'::text as "ClassType" FROM
"tblBase" NATURAL JOIN "tblDerived1";
CREATE VIEW "vDerived2"
AS
SELECT *, 'DERIVED2'::text as "ClassType" FROM
"tblBase" NATURAL JOIN "tblDerived2";
The problem is that "tblDerived1".id is not guaranteed to be not present in
"tblDerived2". This could be handled e.g. by trigger (before update) on both tables (cross check), or using a kind of "middleware", I mean a function which is responsible do perform inserts.
In my opinion it should be possible to recognize proper class based on its attributes, so it should be quite easy to implement this function. Function could be overloaded (same name, different set of attributes).
Regards,
Bartek
Bartek
Bartosz Dmytrak wrote: > according to DB theory: > 1NF: Table faithfully represents a relation and has no repeating groups > 2NF: No non-prime attribute in the table is functionally dependent on a proper subset of anycandidate key. > source: http://en.wikipedia.org/wiki/Database_normalization#Normal_forms > so these constants are not in line with this approach. This is true. That's why you would have to guard the "constantness" of the column/attribute with a CHECK constraint, toavoid update anomalies. Yes, the whole model would be simpler and more beautiful without the type column, and trigger functions on the derived tablesinstead. On the other hand, the foreign key including the type field might be faster than the foreign key without thetype field plus trigger function. So, if this approach is any good, then only if it is actually faster. No, I haven'ttimed it :-) Regards, -- Nils Gösche Don't ask for whom the <Ctrl-G> tolls.