Thread: Can SQL return a threaded-comment-view result set?

Can SQL return a threaded-comment-view result set?

From
mvppetlab@yahoo.com (Chris)
Date:
[I also posted this to comp.databases but since I'm actually using
PostgreSQL I would be content with a pgsql specific answer.]

I think I already know that the answer is that this can't be done, but
I'll ask anyways.

Suppose you want to use an RDBMS to store messages for a threaded
message forum like usenet and then display the messages. A toy table
definition (that I've tried to make standards compliant) might look
like:

create table messages (
       message_id integer,
       in_reply_to integer,
       created date,
       author varchar(20),
       title varchar(30),
       message varchar(256),
       primary key (message_id)
);


The in_reply_to field, if not null, means that the message is a reply
to the message with the message_id it has stored. Suppose now that we
populate the database with a 5 message discussion.


insert into messages values
 (1, null, '2003-09-01', 'John', 'Favorite DB?',
                     'What is your favorite database?');
insert into messages values
 (2, null, '2003-09-02', 'Mike', 'New DB2 benchmarks',
                     'I just posted some new DB2 benchmarks.');
insert into messages values
 (3,    1, '2003-09-03', 'Mike', 'Re: Favorite DB?',
                 'I\'d say DB2.');
insert into messages values
 (4,    1, '2003-09-05', 'Dave', 'Re: Favorite DB?',
                     'I\'m an Oracle man myself.');
insert into messages values
 (5,    3, '2003-09-07', 'John', 'Re: Favorite DB?',
                 'DB2? I thought you liked free databases?');


If we rendered an oldest-first threaded view of the discussion it
would look like:

Author: John
Title:  Favorite DB?
Date:   2003-09-01
What is your favorite database?

    Author: Mike
    Title:  Re: Favorite DB?
    Date:   2003-09-03
    I'd say DB2.

        Author: John
        Title:  Re: Favorite DB?
        Date:   2003-09-07
        DB2? I thought you liked free databases?.

    Author: Dave
    Title:  Re: Favorite DB?
    Date:   2003-09-05
    I'm an Oracle man myself.

Author: Mike
Title:  New DB2 benchmarks
Date:   2003-09-02
I just posted some new DB2 benchmarks.


My question is: is it possible to use pure SQL to return a result set
that would make rendering a threaded view like the above really easy?
That is, is there an SQL query that would return something like:


 i | r |  created   | auth |       title        |  message  | nesting
---+---+------------+------+--------------------+-----------+---------
 1 |   | 2003-09-01 | John | Favorite DB?       | What is y | 0
 3 | 1 | 2003-09-03 | Mike | Re: Favorite DB?   | I'd say D | 1
 5 | 3 | 2003-09-07 | John | Re: Favorite DB?   | DB2? I th | 2
 4 | 1 | 2003-09-05 | Dave | Re: Favorite DB?   | I'm an Or | 1
 2 |   | 2003-09-02 | Mike | New DB2 benchmarks | I just po | 0


If I had an SQL query that could return that then it would be very
easy to have a computer program print threaded views like the one
above.

If this can't be done, then do any of you have recommendations about
the best way to accomplish this with the least amount of inefficient
back-and-forth between the database and, say, Java or some other
language?

Thank you very much in advance for any answers! This has been a
frustrating matter for me.

Chris

Re: Can SQL return a threaded-comment-view result set?

From
Tom Lane
Date:
mvppetlab@yahoo.com (Chris) writes:
> My question is: is it possible to use pure SQL to return a result set
> that would make rendering a threaded view like the above really easy?

What you're really after is a tree structure.  This is relatively easy
to do with Oracle's CONNECT BY or SQL99's WITH, neither of which we have
in Postgres (yet).  I think in bog-standard SQL92 you can't readily do
it without adding some additional fields to the table.

I highly recommend you buy a copy of Joe Celko's "SQL For Smarties".
He has two whole chapters on different ways to handle trees in SQL.

            regards, tom lane

Re: Can SQL return a threaded-comment-view result set?

From
Joe Conway
Date:
Tom Lane wrote:
> mvppetlab@yahoo.com (Chris) writes:
>>My question is: is it possible to use pure SQL to return a result set
>>that would make rendering a threaded view like the above really easy?
>
> What you're really after is a tree structure.  This is relatively easy
> to do with Oracle's CONNECT BY or SQL99's WITH, neither of which we have
> in Postgres (yet).  I think in bog-standard SQL92 you can't readily do
> it without adding some additional fields to the table.
>
> I highly recommend you buy a copy of Joe Celko's "SQL For Smarties".
> He has two whole chapters on different ways to handle trees in SQL.

In the short term, if you don't mind a Postgres specific answer, you
could also take a look at contrib/tablefunc for a function called
connectby().

HTH,

Joe


TREE STRUCTURES was: Can SQL return a threaded-comment-view result set?

From
Josh Berkus
Date:
Chris,

> Suppose you want to use an RDBMS to store messages for a threaded
> message forum like usenet and then display the messages. A toy table
> definition (that I've tried to make standards compliant) might look
> like:

This is not at all a new problem.    Tree sturctures in SQL are one of those
"classic" problems with many solutions.

Joe Celko's "SQL for Smarties, 2nd Edition" has 2 chapters on tree structures.
The first thing you should do is read these chapters; otherwise, you won't be
able to make an informed decision about what tree structure to use.  The main
ones are:
Relational-table (a table for each level of the tree)
Adjacency List (what you described)
String-Append (tree in a Text field, as USA:California:SanFrancisco)
Nested Set (hard to explain)

In a few months, Joe will be publishing a whole book about them, and Joe and I
will have an article in the first issue of DotDot covering implementation of
a nested set tree using PostgreSQL "data-push" functions.  (I would not
recommend nested sets for your issue, as the tree does not update quickly,
and works poorly for fragmented trees)

Right now, you can also check out two PostgreSQL-proprietary tree structure
solutions in the /contrib directory of your Postgres source (assuming that
you have 7.3 or up):
Joe Conway's connectby() in /dblink, which works similar
    to Oracles' CONNECT BY
/ltree , which uses the "string-building" tree stucture technique.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Can SQL return a threaded-comment-view result set?

From
mvppetlab@yahoo.com (Chris)
Date:
Hello all,

I've been meaning to get back to you all but I just haven't had time.
Okay, I've got a little bit of time now so here goes....

I received many useful answers from many of you including Tom Lane,
Joe Conway, and Josh Berkus. Max Nachlinger in particular on October
5th (which was my birthday) sent me a large amount of threaded
discussion forum code of his own. (Nice birthday present, Max. Thank
you.) I will be investigating his solution when I have more time since
his is almost certainly more efficient than my own.

My own solution is a 20-line PL/pgSQL function I put together after
reading the 7.3 docs at postgresql.org. It requires no modifications
to my original example table definitions other than that I decided to
use a 0 value instead of a NULL value for the in_reply_to column when
a message isn't a reply, because that way my plpgsql function doesn't
have to treat NULL as a special case.

In particular, my solution doesn't require a message to keep pointers
to its children. If a message is a reply, it simply points to its
parent's id via in_reply_to. You can add as many messages as you want
with just single simple INSERT statements; you don't have to do any
tree-refactoring or updating to the parent. The downside is that while
insert speed couldn't be any better and inserting couldn't be any
easier, building the threaded view seems rather algorithmically
inefficient, and in almost all applications optimising for obtaining
the threaded view rather than insert speed is more important. One
probably couldn't base even a moderate-load application on this
solution, but if one wanted to anyways I suppose an in-memory tree
representation could be maintained which allows new messages to be
linked into the in-memory tree efficiently as they're inserted into
the database, and then whenever the application is shutdown and
reloaded it could rebuild that in-memory representation on startup. Or
something. And until you run out of memory.... (Also, simply caching
the results of queries could be effective if you have many identical
queries producing identical results [which my application does] so
this solution might not work too bad for me.)

For the sake of googlers and like novices reading this, I've adapted
my PL/pgSQL function so that it works with the original example I
posted. (My real code uses more fields, different types, and has some
other subtle differences because there's more than one type of table
to consider and there are foreign key constraints.) After loading the
below code, evaluating

select * from threadview(0, 0);

builds a table like the one I wanted in my original posting.

           ---Chris

=================================================
-- This code originally due to Chris Barry,
http://www.goodfig.org/feedback
-- It's hereby placed in the public domain. These public domain
licenses
-- usually have some sort of warning about no guarantee of fitness for
a particular
-- purpose, etc. Well, the below code is DEFINITELY not fit for any
purpose! So,
-- use it at your own peril. Caveat emptor.

-- drop database discussion;
create database discussion;
\c discussion

--  The path to plpgsql.so may need to be edited for your system.
create function plpgsql_call_handler()
         returns opaque as '/usr/local/pgsql/lib/plpgsql.so' language
'c';

create language 'plpgsql' handler plpgsql_call_handler
                            lancompiler 'PL/pgSQL';

create table messages (
       message_id integer,
       in_reply_to integer,
       created date,
       author varchar(20),
       title varchar(30),
       message varchar(256),
       primary key (message_id)
);

-- A threadrow is the same thing as a row from the messages table
-- except a nesting integer has been added so the client knows how
-- much to indent the thread message. I'm not sure if there's a
-- syntax that makes it unnecessary to duplicate the redundant
-- information from the messages table (e.g inheritance).
create type threadrow as (
       message_id integer,
       in_reply_to integer,
       created date,
       author varchar(20),
       title varchar(30),
       message varchar(256),
       nesting integer
);


create or replace function threadview(int, int) returns setof
threadrow as '
declare
    p alias for $1; -- p is the parent
    i alias for $2; -- i is the indentation (nesting)
    c threadrow%rowtype;
    c2 threadrow%rowtype;
begin
    for c in select *, 0 as nesting from messages
                        where in_reply_to = p
                         order by created asc
    loop
              c.nesting = i;
        return next c;
        for c2 in select * from threadview(c.message_id, i+1) loop
            return next c2;
        end loop;
    end loop;
    return;
end;
' language 'plpgsql';


-- Load the table with some example data:

insert into messages values
 (1,    0, '2003-09-01', 'John', 'Favorite DB?',
            'What is your favorite database?');
insert into messages values
 (2,    0, '2003-09-02', 'Mike', 'New DB2 benchmarks',
            'I just posted some new DB2 benchmarks.');
insert into messages values
 (3,    1, '2003-09-03', 'Mike', 'Re: Favorite DB?',
        'I\'d say DB2.');
insert into messages values
 (4,    1, '2003-09-05', 'Dave', 'Re: Favorite DB?',
            'I\'m an Oracle man myself.');
insert into messages values
 (5,    3, '2003-09-07', 'John', 'Re: Favorite DB?',
        'DB2? I thought you liked free databases?');