Re: [HACKERS] Tree type, how best to impliment? - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] Tree type, how best to impliment? |
Date | |
Msg-id | 4058.911343818@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [HACKERS] Tree type, how best to impliment? (Terry Mackintosh <terry@terrym.com>) |
List | pgsql-hackers |
Terry Mackintosh <terry@terrym.com> writes: > On Tue, 17 Nov 1998, Tom Lane wrote: >> Why is "category, pcatid" unique? This seems to constrain a parent >> to have only one child per category value --- is that what you want? > Yes. > It is very much like a directory structure, one would not want two > directories of the same name both off the same point in the file system. Precisely... >> If so, why not use the category code as the ID suffix, and not have to >> bother with maintaining a next-ID counter? > category is human readable, for display, the id is not, and when deciding > what the next child's name should be, if not for the next-ID one would > have to go count all the other records that have the same parent. Why do you need a next ID at all? Think directory structure. Using your example, this is perfectly valid: Things / \ Big Small / \ Cars Boats The index will work just as well (probably better) with key strings like "Things/Big/Small" as with key strings like "1.1.2". Moreover, you don't need a separate index to enforce uniqueness, and you don't need to update the parent row when adding a child. You do need invented IDs if the category items are not necessarily unique, but it seems your problem does not need that, so why complicate the concept? > Yes, I just was not sure how well indexes work with text fields? I use 'em all the time... > I had also though about one field only (text), where the categories would > be all chained together delimited with slashes and have a PRIMARY KEY > index. That would automate by design all of the above problems. But it > would creat new ones:-) Like for many deep records, the table would be > much bigger. If your higher-level nodes have long category names, then the space savings from using an ID instead of a category name might become interesting. But if you were willing to use a fixed-size char(255) (in fact two of 'em!) in every tuple for IDs, I don't think you can complain about the average space cost of this approach... Another way to approach this is to give each node a unique serial number, and use the serial number as the child's back-link: CREATE TABLE tab ( nodeID int4 serial primary key, parentID int4 not null, -- nodeID of parent node category text not null, unique (parentID, category)); (I may not have the "serial" syntax quite right, but you get the idea.) This approach is very compact, but a row's position in the hierarchy is represented only indirectly --- you have to chase the parentID links if you need to build the full pathname given just the nodeID. You can't lookup a row directly by its "category/subcategory/subsubcategory" path either; you need a query for each level in order to fetch the parent IDs. But you needed that in your original design too, since there's no other way to get the numeric IDs for subcategories. A further space saving is to use the Postgres OIDs as the row identifiers, but that makes it difficult to dump and reload the table, so I don't recommend it. regards, tom lane
pgsql-hackers by date: