Thread: db design question

db design question

From
"Jules Alberts"
Date:
Hello everyone,

I'm currently designing the structure of our new pg database, and I
have a question. In our current platform, addresses are stored in
several places, like this (in semicode):

  company(name varchar(100), street varchar(100), state varchar(100))
  employee(code int, street varchar(100), state varchar(100))
  consultant(name varchar(50), street varchar(100), state varchar(100))

The colums street and state (and a lot more in real life) are the same.
My idea for the new db was someting like this:

  company(name varchar(100))
  employee(code int)
  consultant(name varchar(50))
    address(ref_oid OID, street varchar(100), state varchar(100))

In this way, I can store all the addresses together and find them with.
SELECT * WHERE addres.ref_oid = company.oid;

Sort of the same manner as the storeage of BLOBs. Is this approach
common practice? Is it safe? I know I have to make backups with the
OIDs etc., but besides that, any more pitfalls?

TIA!

Re: db design question

From
"Josh Berkus"
Date:
Jules,

> My idea for the new db was someting like this:
>
>   company(name varchar(100))
>   employee(code int)
>   consultant(name varchar(50))
>  address(ref_oid OID, street varchar(100), state varchar(100))
>
> In this way, I can store all the addresses together and find them
> with.
> SELECT * WHERE addres.ref_oid = company.oid;

That's a fine idea, except that you have the referential integrity
backward:

  Company(name varchar(100), address_id INT)
  employee(code int, address_id INT)
  consultant(name varchar(50), address_id INT)
 address(address_id INT PRIMARY KEY, street varchar(100), state
varchar(100))

While there are reasons to do the kind of multi-table join that you
propose, the standard relational model (above) works better.  You can
even automate the creation and relationship of addresses to companies,
employees, etc. through VIEWS and RULES.

I heartily reccomend "Practical Issues in Database Management" to you.
 Fabian Pascal, the author, treats extensively some of the pitfalls of
getting unneccessarily creative with the relational model.

BTW, don't use the OID.   The OID, as of 7.2.0, is for *system purposes
only* and should not be used for queries, joins, indexes, or keys.   If
you need a table-indepentant unique ID, use a sequence.

-Josh Berkus

Re: db design question

From
"Jules Alberts"
Date:
On 15 Oct 2002 at 9:38, Josh Berkus wrote:
> Jules,
>
> > My idea for the new db was someting like this:
> >
> >   company(name varchar(100))
> >   employee(code int)
> >   consultant(name varchar(50))
> >  address(ref_oid OID, street varchar(100), state varchar(100))
> >
> > In this way, I can store all the addresses together and find them
> > with.
> > SELECT * WHERE addres.ref_oid = company.oid;
>
> That's a fine idea, except that you have the referential integrity
> backward:
>
>   Company(name varchar(100), address_id INT)
>   employee(code int, address_id INT)
>   consultant(name varchar(50), address_id INT)
>  address(address_id INT PRIMARY KEY, street varchar(100), state
> varchar(100))
>
> While there are reasons to do the kind of multi-table join that you
> propose, the standard relational model (above) works better.  You can
> even automate the creation and relationship of addresses to companies,
> employees, etc. through VIEWS and RULES.

Thanks, great advice!

> I heartily reccomend "Practical Issues in Database Management" to you.
>  Fabian Pascal, the author, treats extensively some of the pitfalls of
> getting unneccessarily creative with the relational model.

OK, I will have a look.

> BTW, don't use the OID.   The OID, as of 7.2.0, is for *system purposes
> only* and should not be used for queries, joins, indexes, or keys.   If
> you need a table-indepentant unique ID, use a sequence.

I'll drop it in this case, your approach "feels" a lot safer. However,
I plan to store BLOBs in my db, and is this case I'm afraid I will
_have_ to use OIDs. The idea is to be able to "attach" a blob to _any_
row in the db. An example of how I planned to use it (this works BTW)

-- import a BLOB
insert into blobs (ref_oid, blob_oid, blob_name, description) values (
  (select oid from employee where name='Jules'),
  lo_import('/usr/share/pixmaps/gimp.png'),
  '/usr/share/pixmaps/gimp.png',
  'test: imported picture');

-- retrieve the BLOB
select lo_export((select blob_oid from blobs where ref_oid=
  (select oid from employee where name='Jules')),
  '/tmp/gimp.png');

One pitfall I was already warned for: allways use the datatype OID, or
get in trouble when dumping / restoring, as the actual values of the
OIDs change.

Re: db design question

From
"Jules Alberts"
Date:
On 15 Oct 2002 at 9:38, Josh Berkus wrote:
> Jules,
>
> > My idea for the new db was someting like this:
> >
> >   company(name varchar(100))
> >   employee(code int)
> >   consultant(name varchar(50))
> >  address(ref_oid OID, street varchar(100), state varchar(100))
> >
> > In this way, I can store all the addresses together and find them
> > with.
> > SELECT * WHERE addres.ref_oid = company.oid;
>
> That's a fine idea, except that you have the referential integrity
> backward:
>
>   Company(name varchar(100), address_id INT)
>   employee(code int, address_id INT)
>   consultant(name varchar(50), address_id INT)
>  address(address_id INT PRIMARY KEY, street varchar(100), state
> varchar(100))
>
> While there are reasons to do the kind of multi-table join that you
> propose, the standard relational model (above) works better.
<snip>

I just thought of something... If I wanted the possibility to have more
than one address per employee / company / consultant the OID approach
would be better.

A way (other than using OIDs) to solve this "(several tables):N" issue
(if you know what I mean) might be using an array for the address
references:

    company(name varchar(50), address_id INT[])

but somehow I don't like the idea of an array datatype -a table within
a table- in the relational model.

Re: db design question

From
"Josh Berkus"
Date:
Jules,

> I'll drop it in this case, your approach "feels" a lot safer.
> However,
> I plan to store BLOBs in my db, and is this case I'm afraid I will
> _have_ to use OIDs. The idea is to be able to "attach" a blob to
> _any_
> row in the db. An example of how I planned to use it (this works BTW)

Yeah, you need to use OIDs for lo_export.   I'm not personally familiar
with the issues on referencing these OIDs in tables.

Instead of using the row_oid for the standard tables, though, consider
doing this:

1) Create a table-independant sequence, "global_seq"
2) Make the primary key of each significant table DEFAULT NEXTVAL
('global_seq') instead of SERIAL.
3) Reference the primary keys in your blobs table, instead of the OID.

The above will work as well as using the OID, without the potential
headaches and with a greater degree of control.   I used this scheme,
in fact, to collectivize modification timestamps and journaled notes
for 5 tables in one DB design.

Keep in mind one other thing, though: while collectivizing your BLOBS
in the fashion above simplifies your database schema (almost always a
good thing) it can come at a substantial performance penalty if your
database contains many large tables.   For example, I did *not* merge
my modification timestamps into a single table for my latest database
effort, as it contains 6 significant tables totalling 2.5 million rows.
  And a single, 2.5 million row mod_data table searches and sorts very
much slower than 6 sets of columns with an average of 400,000 rows
each.

-Josh Berkus





Re: db design question

From
"Jules Alberts"
Date:
On 16 Oct 2002 at 9:19, Josh Berkus wrote:
<snip>
> Instead of using the row_oid for the standard tables, though, consider
> doing this:
>
> 1) Create a table-independant sequence, "global_seq"
> 2) Make the primary key of each significant table DEFAULT NEXTVAL
> ('global_seq') instead of SERIAL.
> 3) Reference the primary keys in your blobs table, instead of the OID.
>
> The above will work as well as using the OID, without the potential
> headaches and with a greater degree of control.
<snip>

I hadn't thought of that approach. Sounds like a workable solution, but
then again, so does the OID way. I allways prefer to keep things as
simple as possible, using the most commonly applicated method.

Both OIDs and sequences exist and should do the job, but the question
is, which one is more reliable? Which one will keep it's current
syntax, functionality etc. longest? The db I'm designing will be the
core of my companies IT and will be in use for at least 5 years (the
current one is into its 7th) and a lot can change in that time. That's
why these decisions are so important now, on what horse do I put my
money?

I'll have a closer look at sequences, thanks for sharing your solution!

Re: db design question

From
Vijay Deval
Date:
Hi Jules

What will happen to oid if you dump the table, upgrade PostgreSQL and
recreate tables?

In case a company has more than one consultent, company is going to
feature in more than one tuples with different oid's.

It might be a better idea to have two tables. One table assigns serial
id number to company. Other table could have this serial id as one
attribute , along with other things like consultant name etc. This
should avert problem of repeating data.

It is quite possible that one consultant gives consultations to more
than one companies. In that case the data needs to be stored in three
tables.

Vijay

Jules Alberts wrote:
>
> Hello everyone,

>
> The colums street and state (and a lot more in real life) are the same.
> My idea for the new db was someting like this:
>
>   company(name varchar(100))
>   employee(code int)
>   consultant(name varchar(50))
>         address(ref_oid OID, street varchar(100), state varchar(100))
>
> In this way, I can store all the addresses together and find them with.
> SELECT * WHERE addres.ref_oid = company.oid;


Re: db design question

From
"Jules Alberts"
Date:
On 20 Oct 2002 at 19:16, Vijay Deval wrote:
> Hi Jules

Hello Vijay,

> What will happen to oid if you dump the table, upgrade PostgreSQL and
> recreate tables?

A dump / restore (you don't even need an upgrade) will change the value
of the OIDs. However, it will keep the relations working. Like this:

create table person(name varchar);
insert into person values ('Joe');
select oid from person where name = 'Joe'; -- e.g. 123456
create table address (ref_oid oid, street varchar);
insert into address values (123456, 'Penny Lane');

dump + restore

select oid from person where name = 'Joe'; -- new value! 888888
select stree from address where ref_oid = 888888 -- 'Penny Lane'

I tested this and it works. You just have to avoid hardcoded OID's in
your app, allways works with a subselect. Still, there is a translation
going on in the dump / restore mechanism. That's one step extra that
could cause a problem.

> In case a company has more than one consultent, company is going to
> feature in more than one tuples with different oid's.
>
> It might be a better idea to have two tables. One table assigns serial
> id number to company. Other table could have this serial id as one
> attribute , along with other things like consultant name etc. This
> should avert problem of repeating data.

I already do that.

> It is quite possible that one consultant gives consultations to more
> than one companies. In that case the data needs to be stored in three
> tables.

That too.

The actual problem was that I have several tables (address, BLOBs,
actions etc.) that are possibly related to _any_ table in my db. A
solution could be to create a referring column from every table to an
INT primary key in address, action etc., but that would limit the link
to one at most. For "address" this is probably OK, but for "actions" an
"BLOBs" surely not. I guess I could work with intermediate tables
(between "any table" and address, BLOB's and actions), but that would
create an overhead of factor 2. Quite a bit IMO.

That's why I'm looking at the possibilities of using OID for this.
Advantage:
- OID is global (througout the db) unique, even if I don't care from
  which table a column comes, I will find it.

Disadvantages:
- more coding in select stataments. Each select that involves one of
  the address / BLOBs / actions tables will be about twice the size
- the changing of OID value with dump and restore. This makes me a bit
  nervous, allthough the longer select statements should cover this

There are many people saying "don't do it". :-) OIDs are considered an
internal system mechanism that shouldn't be used in everyday life. If
this is the common view on OIDs the developers may well change their
behaviour, or replace them with some other mechanism ("why not?
nobody's using them anyway").

A workaround for the address and actions tables may be using a global
sequence and use nextval() primary keys for every table. Someone
recommended a book (Pascal: practical issues in db management). I've
ordered it and will read it, hope it will help me make up my mind.

Re: db design question

From
"Josh Berkus"
Date:
Jules,

> sequence and use nextval() primary keys for every table. Someone
> recommended a book (Pascal: practical issues in db management). I've
> ordered it and will read it, hope it will help me make up my mind.

Me.  Unfortunately, it won't help you make up your mind on the OID vs.
Sequence issue.   The reason I reccomended it is that Pascal is a real
fanatic about the relational database model, and gives you
a) detailed explanations of why it's dangerous* to get creative with
the relational model, and
b) suggestions for how to do a lot of complex things *within* the
relational model.
OID vs. sequence is strictly a PostgreSQL implementation issue, and
even the core developers are divided on the issue, though it's 3 to 1
against the last time I saw this argued out.

*not to say that you should *never* get creative -- though Pascal
thinks you shouldn't -- just that it's very important to know the
penalties involved.

-Josh Berkus

Re: db design question

From
Andrew McMillan
Date:
On Thu, 2002-10-17 at 19:11, Jules Alberts wrote:
> Both OIDs and sequences exist and should do the job, but the question
> is, which one is more reliable? Which one will keep it's current
> syntax, functionality etc. longest? The db I'm designing will be the
> core of my companies IT and will be in use for at least 5 years (the
> current one is into its 7th) and a lot can change in that time. That's
> why these decisions are so important now, on what horse do I put my
> money?
Don't put your money on OID.  These have changed in the past, are known
to change between dump/reload and offer no advantages.

In some databases the equivalent of the OID can provide fast access to a
record, but this is not the case in PostgreSQL where you will still need
to take normal measures (i.e. index on that column), just as you would a
SERIAL column, which would be invariant between dump/restore.

In more recent PostgreSQL versions you can create tables without OIDs,
so there is no longer even space savings involved.

For the large-object interface you are stuck with OID for now, of
course.

Regards,
                    Andrew.
--
Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/         PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201     MOB: +64(21)635-694    OFFICE: +64(4)499-2267
           Survey for free with http://survey.net.nz/


Re: db design question

From
"Jules Alberts"
Date:
On 21 Oct 2002 at 8:44, Josh Berkus wrote:

Hello Josh,

> > sequence and use nextval() primary keys for every table. Someone
> > recommended a book (Pascal: practical issues in db management). I've
> > ordered it and will read it, hope it will help me make up my mind.
>
> Me.  Unfortunately, it won't help you make up your mind on the OID vs.
> Sequence issue.

That's not a problem. I saw some articles from Pascal on the web, and
I'm sure his book will be valuable anyway.

>   The reason I reccomended it is that Pascal is a real
> fanatic about the relational database model, and gives you
> a) detailed explanations of why it's dangerous* to get creative with
> the relational model, and
> b) suggestions for how to do a lot of complex things *within* the
> relational model.
> OID vs. sequence is strictly a PostgreSQL implementation issue, and
> even the core developers are divided on the issue, though it's 3 to 1
> against the last time I saw this argued out.

I guess that's 3:1 against OID? Well, I'm beginning to be convinced not
to use OID (Andrew, thanks for your posting too).

So let's say I don't use OID. Is a SEQUENCE bound to several tables the
best solution? Any other suggestions?

> *not to say that you should *never* get creative -- though Pascal
> thinks you shouldn't -- just that it's very important to know the
> penalties involved.

I don't _want_ to be creative, not in this db anyway :). I want
everything to be as standard as possible, as solid as possible.

The way I'm struggling to solve this issue makes me wonder if I'm not
making a huge design mistake. OTOH I cant imaging that I'm the first
one to encounter this. I guess the issue is

    a one to many relationship where "one" can be any table in the db

How does one solve this in a relational model? Sorry if I keep going on
about this, but I have a hunch that this is _very_ important for my db.
If I don't solve it correctly, I'm sure I will be in a lot of trouble
later on.

TIA!

Re: db design question

From
Andrew McMillan
Date:
On Tue, 2002-10-22 at 20:14, Jules Alberts wrote:
> On 21 Oct 2002 at 8:44, Josh Berkus wrote:
>
> So let's say I don't use OID. Is a SEQUENCE bound to several tables the
> best solution? Any other suggestions?

The sequence doesn't have to be bound to several tables - just your
address table:

Josh originally said:
> That's a fine idea, except that you have the referential integrity
> backward:
>
>   Company(name varchar(100), address_id INT)
>   employee(code int, address_id INT)
>   consultant(name varchar(50), address_id INT)
>  address(address_id INT PRIMARY KEY, street varchar(100), state
> varchar(100))

My only quibble with this would be to change the address table thus:

address( address_id SERIAL PRIMARY KEY, street TEXT, state TEXT );

This will create a sequence for you called address_address_id_seq and
set the default to nextval('address_address_id_seq') so that whenever
you create a new record without specifically assigning a sequence, it
will get handed one.

When writing a company record you do something like:

BEGIN;
addr_id = "SELECT nextval('address_address_id_seq');

insert into company (name, address_id ) values('company name', addr_id
);

insert into address( address_id, street, state ) values( addr_id,
'George Street', 'New South Wales');
COMMIT;

Obviously similar things happen writing an employee record or whatever.



>     a one to many relationship where "one" can be any table in the db
>
> How does one solve this in a relational model? Sorry if I keep going on
> about this, but I have a hunch that this is _very_ important for my db.
> If I don't solve it correctly, I'm sure I will be in a lot of trouble
> later on.

It isn't at all unusual, I'm afraid.  Codes tables do this sort of thing
all the time - where you have a table that contains:
Code    Value
M    Male
F    Female
X    Unknown

And you want to refer to that in all sorts of places in the system.
Well, in the places you want to refer to it you store the unique
identifier, i.e. the "Code".

For your case, you want to store the unique identifier (i.e.
address_id).

> In this way, I can store all the addresses together and find them
> with.
> SELECT * WHERE addres.ref_oid = company.oid;

This turns around and becomes something like:

SELECT * FROM address a, company c WHERE a.address_id = c.address_id ;

Cheers,
                    Andrew.
--
---------------------------------------------------------------------
Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/         PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201     MOB: +64(21)635-694    OFFICE: +64(4)499-2267
           Survey for free with http://survey.net.nz/
---------------------------------------------------------------------


arc relationship [was: db design question]

From
"Jules Alberts"
Date:
On 22 Oct 2002 at 21:30, Andrew McMillan wrote:
> On Tue, 2002-10-22 at 20:14, Jules Alberts wrote:
> > On 21 Oct 2002 at 8:44, Josh Berkus wrote:
> >
> > So let's say I don't use OID. Is a SEQUENCE bound to several tables the
> > best solution? Any other suggestions?
>
> The sequence doesn't have to be bound to several tables - just your
> address table:
>
> Josh originally said:
> > That's a fine idea, except that you have the referential integrity
> > backward:
> >
> >   Company(name varchar(100), address_id INT)
> >   employee(code int, address_id INT)
> >   consultant(name varchar(50), address_id INT)
> >  address(address_id INT PRIMARY KEY, street varchar(100), state
> > varchar(100))

Hello Andrew,

That wouldn't solve my problem, because I want to be able to store more
addresses per company / client / consultant. Also, I have other tables
(action and blobs) that do the same.

I had a conversation with an Oracle DBA this morning. After explaining,
which took some time :-), he told me that what I wanted is in Oracle
called an "arc" relationship. Knowing this word, Google was a lot
friendlier, I found a good explanation here:

http://www.stormloader.com/yonghuang/computer/SemanticallyDependentAttri
butes.txt

Apparently there are several solutions:

1. create an intermediate table for every table you want address /
action / blobs to be related to. This would create a big overhead, I
don't like that.

2. in address / action / blobs create a column for each table it is
related to. Enforce that per row only one of these columns is filled.
The value would be the primary key of the related table. The other
columns would be NULL, which could cause problems.

3. in address / action / blobs create a column that contains the unique
identifier of the row in the related table. This is the OID / sequence
approach I was thinking about.

4. same as 3., but in address / action / blobs also create a column
that contains the name of the related table. This will makes a backward
search (which is the company / client / consultant this address belongs
to) easier.

I think number 4 is the best solution. Create a sequence from which all
tables get their primary key and include an integer row in address etc
that points to the primary key of the related table and store the name
of that table. Things to keep an eye on:

- don't change table names without changing the values in address etc
- set a huge maximum value for the global sequence

Thanks for your reactions everyone! If there is more feedback /
remarks, please let me know. TIA!

Re: arc relationship [was: db design question]

From
Andrew McMillan
Date:
On Tue, 2002-10-22 at 22:34, Jules Alberts wrote:
> That wouldn't solve my problem, because I want to be able to store more
> addresses per company / client / consultant. Also, I have other tables
> (action and blobs) that do the same.

Oh, I see.

Yes, the standard way to deal with this is a to 'normalise' the data
through intermediate tables.

In my past I have usually used an 'address_type' or 'contact_type' field
to additionally identify these relationships, but that is probably
irrelevant to this question.

Basic form is:

company ( company_id, company_data );
address ( address_id, address_data);
employee ( employee_id, employee_data );
company_address( company_id, address_id );
employee_address( employee_id, address_id );

However, if you can guarantee that company_id and employee_id are
non-colliding (e.g. you use the same sequence to generate them) then
company_address and employee_address can be combined into a single
table.

Alternatively, a defining attribute can be added to identify the source,
so you can have individual sequences within the source tables:

contact_address( contact_source, contact_id, address_id )

Where a 'contact_source' of 'E' might identify that contact_id is an
employee_id or 'C' might indicate a company_id and so on.


> I had a conversation with an Oracle DBA this morning. After explaining,
> which took some time :-), he told me that what I wanted is in Oracle
> called an "arc" relationship.

Yes.  I've heard them called other things too (polite ones, even :-).


> 4. same as 3., but in address / action / blobs also create a column
> that contains the name of the related table. This will makes a backward
> search (which is the company / client / consultant this address belongs
> to) easier.
>
> I think number 4 is the best solution. Create a sequence from which all
> tables get their primary key and include an integer row in address etc
> that points to the primary key of the related table and store the name
> of that table. Things to keep an eye on:
>
> - don't change table names without changing the values in address etc
> - set a huge maximum value for the global sequence

Note that the 'contact_source' can be anything you want - it doesn't
have to be the full table name from the source table.  It is probably
better if it isn't, in fact, because then you are restricted to a single
reference field in your source table.

Consider a company record that required two signatories to be recorded,
each with their address.  This could potentially (easily) be handled as
two fields on the company record, but you would need your
'contact_source' linking to their address to indicate not just the
table, but the field as well.  Usually it is easier to do this
symbolically, than to use the full table.field sort of notation (which
_still_ might not support arrays, or whatever).

Cheers,
                    Andrew.
--
---------------------------------------------------------------------
Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/         PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201     MOB: +64(21)635-694    OFFICE: +64(4)499-2267
           Survey for free with http://survey.net.nz/
---------------------------------------------------------------------


Re: arc relationship [was: db design question]

From
"Josh Berkus"
Date:
Folks,

Getting back to Jules' original question about having a universal
"blob" table, as I said, I've done this.  An example:

Table Clients(usq PK NEXTVAL('universal_sq'), client_name, etc ...)
Table Orders(usq PK NEXTVAL('universal_sq'), order_date, etc ...)
Table Invoices(usq PK NEXTVAL('universal_sq'), invoice_no, etc ...)

And the multi-relational tables:

Table mod_data (ref_usq PK INT, mod_user, mod_date, create_user,
create_date);
Table notes (note_id SERIAL, ref_usq INT, note_type, note_date,
note_user, note_text);

A simplified query:
SELECT clients.*, mod_data.* FROM clients JOIN mod_data
ON clients.usq = mod_data.ref_usq;

Both of the above tables, through the ref_usq, are related to any of
the tables possessing a USQ as the primary key.    This is in use in a
production system, and has been quite reliable.

Advantages of the above approach:
1) It allows you to "attach" the "flexible child" tables to any
qualified parent table without changing the schema.
2) It reduces the number of tables in your schema, simplifying and
reducing clutter and confusion.

Disadvantages of the above approach:
1) The relationships are not enforcable through the standard SQL
foreign key constraints.   As a result, you need to write your own
custom triggers and rules to enforce the desired relationships.  This
can get annoying, with up to 3 triggers per relationship.
2) Any "flexible child" table has, of necessity, as many rows as the
sum total of the rows in the parent tables, or the requisite multiple
for one-to-many relations.  This results in a child table that is much,
much larger than the standard model of having several different child
tables would be.   As I have previously mentioned, this is a
performance drag as one large table is, in practice, slower than
several small tables.
3) Any auto-journaling system or auto-archiving system of record
changes will have difficulty working around the above ambiguous
relationships.

My conclusion is that the above is a fine approach for small databases
(the production one in which I'm using it has about 1,000-2,000 records
in each of the 6 main tables) but a poor approach for very large
databases for performance reasons.

-Josh Berkus



Re: arc relationship [was: db design question]

From
"Jules Alberts"
Date:
On 22 Oct 2002 at 9:36, Josh Berkus wrote:

Hello Josh,

> Getting back to Jules' original question about having a universal
> "blob" table, as I said, I've done this.  An example:
<snip>
> Both of the above tables, through the ref_usq, are related to any of
> the tables possessing a USQ as the primary key.    This is in use in a
> production system, and has been quite reliable.

That's good to hear!

<snip>
> 2) Any "flexible child" table has, of necessity, as many rows as the
> sum total of the rows in the parent tables, or the requisite multiple
> for one-to-many relations.  This results in a child table that is much,
> much larger than the standard model of having several different child
> tables would be.   As I have previously mentioned, this is a
> performance drag as one large table is, in practice, slower than
> several small tables.

Why the necessity to have a row in the child table for each row in the
parent? Only when customer has an address, I will insert a row in the
child. So the child table will be only as large as the sum of addresses
required for all parents together, not as large as the sum of all rows
in all parents together. Maybe I didn't understand you correctly
(English isn't my native language)

> 3) Any auto-journaling system or auto-archiving system of record
> changes will have difficulty working around the above ambiguous
> relationships.
<snip>

Do you have an URL to such systems? I'm not familiar with them, I guess
you don't mean journalling filesystems? TIA!

With the info I have so far, I plan to work like this:

------------------------------------------------------
-- sequence for global primary key
create sequence glob_sq;

-- the parents
create table customer (
  glob_id int unique default nextval('glob_sq'),
  id serial primary key,
  name varchar);

create table employee (
  glob_id int unique default nextval('glob_sq'),
  id serial primary key,
  name varchar);

-- one of the children
create table address (
  -- tables that have no children need no global ID
  id serial primary key,
  ref_table varchar,
  ref_id int not null,
  street varchar);

-- some testing, create some parents
insert into employee (name) values ('Paul');
insert into employee (name) values ('John');
insert into customer (name) values ('Apple Inc.');
select * from employee;
select * from customer;

-- create some children (that's the nice part :-)
insert into address (ref_table, ref_id, street) values (
  'employee',
  (select glob_id from employee where name = 'Paul'),
  'Penny Lane');
insert into address (ref_table, ref_id, street) values (
  'customer',
  (select glob_id from customer where name = 'Apple Inc.'),
  'Abbey Road');
select * from address;

-- search Paul's address
select street from address where ref_id =
  (select glob_id from employee where name = 'Paul');

-- now backward, find the employee that lives on Penny Lane. If done
-- in real life, the FROM table would be generated in runtime with a
-- SELECT REF_TABLE FROM ADDRES WHERE ...Thas' where the ref_table
-- column comes in handy
select name from employee where glob_id =
  (select glob_id from address where street = 'Penny Lane');
------------------------------------------------------

Re: arc relationship [was: db design question]

From
"Josh Berkus"
Date:
Jules,

> Why the necessity to have a row in the child table for each row in
> the
> parent? Only when customer has an address, I will insert a row in the
>
> child. So the child table will be only as large as the sum of
> addresses
> required for all parents together, not as large as the sum of all
> rows
> in all parents together. Maybe I didn't understand you correctly
> (English isn't my native language)

1) I am not reccomending that you use this approach for addresses --
the standard relational model will serve your purposes, so there's not
reason to get creative.  I was reccomending that you try the "flexible
child" approach *only* for the BLOB reference table.

2) In answer to your question:  Imagine that I have 5 tables, clients,
employees, invoices, orders, and payments.   Imagine that each table
has roughly 20,000 rows.   Each table also has a row in the "mod_data"
table.  If I want to query the mod_data for a particular client, then
the database has to search 100,000 rows, not the 20,000 it would search
if the mod_data were directly in the clients table.   get it?

> Do you have an URL to such systems? I'm not familiar with them, I
> guess
> you don't mean journalling filesystems? TIA!

I'm talking about triggers or other mechanisms that record each change
to the database records into a permanent archive for auditing purposes.

> With the info I have so far, I plan to work like this:

See above.  As I said before, I feel that the "flexible child" approach
is a *bad* approach for storing the addresses.   I just suggested it
for the BOLBs.   For the addresses, see my first e-mail to you on the
topic.

-Josh Berkus

Re: arc relationship [was: db design question]

From
"Jules Alberts"
Date:
On 23 Oct 2002 at 9:39, Josh Berkus wrote:
<snip>
> 1) I am not reccomending that you use this approach for addresses --
> the standard relational model will serve your purposes, so there's not
> reason to get creative.  I was reccomending that you try the "flexible
> child" approach *only* for the BLOB reference table.

OK, I'm convinced. I will apply the "arc" only when there is no other
way. For the addresses I will use the relational method.

> 2) In answer to your question:  Imagine that I have 5 tables, clients,
> employees, invoices, orders, and payments.   Imagine that each table
> has roughly 20,000 rows.   Each table also has a row in the "mod_data"
> table.  If I want to query the mod_data for a particular client, then
> the database has to search 100,000 rows, not the 20,000 it would search
> if the mod_data were directly in the clients table.   get it?
>
> > Do you have an URL to such systems? I'm not familiar with them, I
> > guess
> > you don't mean journalling filesystems? TIA!
>
> I'm talking about triggers or other mechanisms that record each change
> to the database records into a permanent archive for auditing purposes.

Oh, I see.

> > With the info I have so far, I plan to work like this:
>
> See above.  As I said before, I feel that the "flexible child" approach
> is a *bad* approach for storing the addresses.   I just suggested it
> for the BOLBs.   For the addresses, see my first e-mail to you on the
> topic.

I will, thanks again for all your help!