Re: [HACKERS] Fwd: Joins and links - Mailing list pgsql-hackers
From | Clark Evans |
---|---|
Subject | Re: [HACKERS] Fwd: Joins and links |
Date | |
Msg-id | 37820519.9938D467@manhattanproject.com Whole thread Raw |
In response to | Re: [HACKERS] Fwd: Joins and links (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Leon wrote: > No one is talking about abolishing any standard SQL feature. After > you carefully verified design you can hard-code links to speedup > access. Before that has happened the usual SQL will do. Leon, Interesting idea. You are re-introducing some of the hierarchical database ideas, is this right? Here is what I'm receiving, could you correct me if I'm mis-understanding? (much of this you did not say...) - - - - - - In current SQL implementations, if updates are done on a tuple being modified, referencing tables are not usually identified or checked, let alone updated. Then, when a query is requested, the database figures out how referenced data can be retrieved, and does this at the time of the query. In this proposal, in addition to carrying a primary key for a referenced table, tuples in the referencing table will also have a place to record the physical address of each referenced tuple. In this way, referenced data is easily retrieved during a query, since the physical address of the referenced information is stored in the referant. For example, lets take the following schema ORDER (ORDER_ID, ... ); PRODUCT (PRODUCT_ID, NAME, ... ); ORDER_LINE (ORDER_ID,PRODUCT_ID, ... ); In the current cases, changes to the PRODUCT table, let's say a changed name, do not result in an update of the ORDER_LINE tuples which reference the product tuple being changed. In this proposal, a few hidden field (ID/REF) would be added: ORDER ( LEON_ID, ORDER_ID, ... ); PRODUCT ( LEON_ID, PRODUCT_ID, NAME, ... ); ORDER_LINE ( LEON_ID, ORDER_ID, PRODUCT_ID, ... , PRODUCT_LEON_REF ); Where the ORDER_LINE table would have a reference to the physical LEON_ID of the tuple being referenced by PRODUCT_ID. Then, an update of the PRODUCT table would result in a cascading update of all referencing tables, including ORDER_LINE to change the PRODUCT_LEON_REF from its previous value to the update value. The LEON_ID and LEON_REF are internal implementation fields and not available though SQL. SUMMARY, With this method, query speed is drastically improved since the "join" logic is performed once during insert, instead of many times during select. This method should work well, when the referencing table changes relatively infrequently. Thus people, products, and other relatively static "reference" information is a key canidate for this 'indexing' technique. This technique should not be used if the frequency of updates exceed the frequency of select statements. - - - - - - - Overall, I think it is a good idea. I frequently do weaker versions all the time that I call "field cashing", where the NAME field of infrequently chaging tuples are frequently accessed. In this case, one may put PRODUCT_NAME in the ORDER_LINE table and put a trigger on PRODUCT to cascade update of NAME to the ORDER_LINE.PRODUCT_NAME table. I tend to make monsters like this a nightly process, since product name changes need not be immediate (they are rare, and thus not frequent, and thus, not usually immediate). This allows the cascade update to run at night when things are alot less stressful on the database. Is this in-line with what you are saying? I suggest borrowing an XML term for the idea, GROVE. In XML, a GROVE is a tree built from XML/SGML/notations. In this case, you can think of frequently joined information as cutting deep into the landscape, thus the more the query is done, the more of a chance that the UPDATE/SELECT ratio wil be small, and thus, the greater chance that the hard wired physical address is cashed in the table. The reason I like the name, is that it defines a common pathway that is easy, without preventing the efficiency of uncommon paths (where updates >> select ). Hmm. I'm just worrying about the CASCADE nature of the beast. On the extreme that I was writing about earlier, a prototype OO dbms that I was looking at about 6 years ago (god knows what the name is), they did *everything* this way. And GOD it was slow... especially since it cascaded when frequency of updates far exceed the frequency of selects. Thoughts? Best, Clark
pgsql-hackers by date: