Threaded Records in SQL: Advice Needed - Mailing list pgsql-sql

From Ingram, Bryan
Subject Threaded Records in SQL: Advice Needed
Date
Msg-id 9B7D4396307CD311809A00500415EB405D1F99@BKMAIL
Whole thread Raw
List pgsql-sql
Hi Everyone,
I'm planning to implement a project using a php/postgres 6.5.3 combination.
Part of what I need to do is develop a model in SQL to keep track of
"threaded records."  What I mean by this, is that I need to keep track of
parent-child relationships in exactly the same way that a
threaded-discussion group does.
The design should be able to support approximately 100 users, mostly
readers, and be able to handle approximately 1 insert per minute.
So far, I've found two methods/algorithms for implenting this in SQL, but
there's a problem with each.  
The first one is extremely efficient and inexpensive for selecting and
ordering all of the records in a thread (taking only one select to get all
of the records and in order), but requires that ALL records be updated on
insert, which is very expensive.  Because I'm expecting about 120000 records
a year, or ca. 450 new records every business day, the prospect of each
insert updating each record in the database is not appealing.  At first it
would be ok, but as the record count increased, the inserts would become
extremely slow.  This method works by modeling the records as "nested sets"
and was explained by Joe Celko in one of his articles.
The second method is simply adding fields to keep track of an id, parent_id,
and root_id.  Its benefit is that it is inexpensive on inserts, requiring
only one insert and no updates, but it is costly to select and order all the
records in a thread.  Essentially, the only way I have found to do this is
to implement a recursive function that selects one record at a time, checks
for children, ad infinitum.  Not very efficient.
My concern with method 1 is that the great expense of inserting new records
will slow the database down to unacceptable levels. This is based on my past
experience with postgres when updating a database of 1.5 million rows (which
took an hour and a half with syncing turned off.)  
Method 2 is problematic because it uses recursion first of all (which I
understand to be heavy in resource usage) and because it makes a select for
each and every record in a thread to be displayed.
What I'd like to find is either a better way that I'm not aware of, or some
kind of middle ground.
Also, what do you make of these methods.  Are they as inefficient as I want
to believe?
Thanks!
Bryan Ingram
Database Programmer
S!XTYFOOTSP!DER
972-492-5676
bingram@sixtyfootspider.com <mailto:bingram@sixtyfootspider.com> <http://www.sixtyfootspider.com/>
www.sixtyfootspider.com


pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: Function
Next
From: "Robert Nosko"
Date:
Subject: How to do this in PostgreSQL?