Re: [HACKERS] Fwd: Joins and links - Mailing list pgsql-hackers

From Clark Evans
Subject Re: [HACKERS] Fwd: Joins and links
Date
Msg-id 37820519.9938D467@manhattanproject.com
Whole thread Raw
In response to Re: [HACKERS] Fwd: Joins and links  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Leon wrote:
> No one is talking about abolishing any standard SQL feature. After
> you carefully verified design you can hard-code links to speedup
> access. Before that has happened the usual SQL will do.

Leon,

Interesting idea.  You are re-introducing some of the
hierarchical database ideas, is this right?  Here is 
what I'm receiving, could you correct me if I'm 
mis-understanding?  (much of this you did not say...)

- - - - - - 

In current SQL implementations, if updates are done on a
tuple being modified, referencing tables are not usually
identified or checked, let alone updated.  Then, when a 
query is requested, the database figures out how referenced 
data can be retrieved, and does this at the time of the query.

In this proposal, in addition to carrying a primary key
for a referenced table, tuples in the referencing table 
will also have a place to record the physical address 
of each referenced tuple.  In this way, referenced 
data is easily retrieved during a query, since the
physical address of the referenced information is
stored in the referant.

For example, lets take the following schema

ORDER      (ORDER_ID, ... );
PRODUCT    (PRODUCT_ID, NAME, ... );
ORDER_LINE (ORDER_ID,PRODUCT_ID, ... );

In the current cases, changes to the PRODUCT table,
let's say a changed name, do not result in an update
of the ORDER_LINE tuples which reference the product
tuple being changed.

In this proposal, a few hidden field (ID/REF) would be added:

ORDER      ( LEON_ID, ORDER_ID, ... );
PRODUCT    ( LEON_ID, PRODUCT_ID, NAME, ... );
ORDER_LINE ( LEON_ID, ORDER_ID, PRODUCT_ID, ... , PRODUCT_LEON_REF );

Where the ORDER_LINE table would have a reference to the
physical LEON_ID of the tuple being referenced by PRODUCT_ID.

Then, an update of the PRODUCT table would result in a cascading
update of all referencing tables, including ORDER_LINE to 
change the PRODUCT_LEON_REF from its previous value to the
update value.  The LEON_ID and LEON_REF are internal implementation
fields and not available though SQL.

SUMMARY,

With this method, query speed is drastically improved since
the "join" logic is performed once during insert, instead
of many times during select.

This method should work well, when the referencing table
changes relatively infrequently.  Thus people, products, 
and other relatively static "reference" information is
a key canidate for this 'indexing' technique.

This technique should not be used if the frequency of 
updates exceed the frequency of select statements.  

- - - - - - -

Overall, I think it is a good idea.  I frequently do weaker
versions all the time that I call "field cashing", where 
the NAME field of infrequently chaging tuples are frequently
accessed.  In this case, one may put PRODUCT_NAME in the 
ORDER_LINE table and put a trigger on PRODUCT to cascade 
update of NAME to the ORDER_LINE.PRODUCT_NAME table.  
I tend to make monsters like this a nightly process, since
product name changes need not be immediate (they are rare,
and thus not frequent, and thus, not usually immediate).  
This allows the cascade update to run at night when 
things are alot less stressful on the database.

Is this in-line with what you are saying?  

I suggest borrowing an XML term for the idea, GROVE.
In XML, a GROVE is a tree built from XML/SGML/notations.
In this case, you can think of frequently joined 
information as cutting deep into the landscape, thus
the more the query is done, the more of a chance that
the UPDATE/SELECT ratio wil be small, and thus, the
greater chance that the hard wired physical address
is cashed in the table.   The reason I like the
name, is that it defines a common pathway that is 
easy, without preventing the efficiency of uncommon 
paths (where updates >> select ).

Hmm.  I'm just worrying about the CASCADE nature
of the beast.  On the extreme that I was writing
about earlier, a prototype OO dbms that I was
looking at about 6 years ago (god knows what the
name is), they did *everything* this way.   And
GOD it was slow... especially since it cascaded
when frequency of updates far exceed the frequency
of selects. 

Thoughts?


Best,

Clark


pgsql-hackers by date:

Previous
From: Chris Bitmead
Date:
Subject: CVS, Java etc
Next
From: Peter Mount
Date:
Subject: RE: [HACKERS] CVS, Java etc