Thread: Can SQL return a threaded-comment-view result set?

Can SQL return a threaded-comment-view result set?

From
mvppetlab@yahoo.com (Chris)
Date:
[I also posted this to comp.databases but since I'm actually using
PostgreSQL I would be content with a pgsql specific answer, so I
also posted this to comp.databases.postgresql.novice because I'm
a novice and it looked like a good group, but then I realized that
this might not be a novice question even though I'm a novice, so
I guess this group comp.databases.postgresql.general is the last
that I'll try.]

I think I already know that the answer is that this can't be done, but
I'll ask anyways.

Suppose you want to use an RDBMS to store messages for a threaded
message forum like usenet and then display the messages. A toy table
definition (that I've tried to make standards compliant) might look
like:

create table messages (
       message_id integer,
       in_reply_to integer,
       created date,
       author varchar(20),
       title varchar(30),
       message varchar(256),
       primary key (message_id)
);


The in_reply_to field, if not null, means that the message is a reply
to the message with the message_id it has stored. Suppose now that we
populate the database with a 5 message discussion.


insert into messages values
 (1, null, '2003-09-01', 'John', 'Favorite DB?',
                     'What is your favorite database?');
insert into messages values
 (2, null, '2003-09-02', 'Mike', 'New DB2 benchmarks',
                     'I just posted some new DB2 benchmarks.');
insert into messages values
 (3,    1, '2003-09-03', 'Mike', 'Re: Favorite DB?',
                 'I\'d say DB2.');
insert into messages values
 (4,    1, '2003-09-05', 'Dave', 'Re: Favorite DB?',
                     'I\'m an Oracle man myself.');
insert into messages values
 (5,    3, '2003-09-07', 'John', 'Re: Favorite DB?',
                 'DB2? I thought you liked free databases?');


If we rendered an oldest-first threaded view of the discussion it
would look like:

Author: John
Title:  Favorite DB?
Date:   2003-09-01
What is your favorite database?

    Author: Mike
    Title:  Re: Favorite DB?
    Date:   2003-09-03
    I'd say DB2.

        Author: John
        Title:  Re: Favorite DB?
        Date:   2003-09-07
        DB2? I thought you liked free databases?.

    Author: Dave
    Title:  Re: Favorite DB?
    Date:   2003-09-05
    I'm an Oracle man myself.

Author: Mike
Title:  New DB2 benchmarks
Date:   2003-09-02
I just posted some new DB2 benchmarks.


My question is: is it possible to use pure SQL to return a result set
that would make rendering a threaded view like the above really easy?
That is, is there an SQL query that would return something like:


 i | r |  created   | auth |       title        |  message  | nesting
---+---+------------+------+--------------------+-----------+---------
 1 |   | 2003-09-01 | John | Favorite DB?       | What is y | 0
 3 | 1 | 2003-09-03 | Mike | Re: Favorite DB?   | I'd say D | 1
 5 | 3 | 2003-09-07 | John | Re: Favorite DB?   | DB2? I th | 2
 4 | 1 | 2003-09-05 | Dave | Re: Favorite DB?   | I'm an Or | 1
 2 |   | 2003-09-02 | Mike | New DB2 benchmarks | I just po | 0


If I had an SQL query that could return that then it would be very
easy to have a computer program print threaded views like the one
above.

If this can't be done, then do any of you have recommendations about
the best way to accomplish this with the least amount of inefficient
back-and-forth between the database and, say, Java or some other
language?

Thank you very much in advance for any answers! This has been a
frustrating matter for me.

Chris

Re: Can SQL return a threaded-comment-view result set?

From
David W Noon
Date:
On Thursday 02 Oct 2003 09:13 in
<404a8308.0310020013.5294255@posting.google.com>, Chris
(mvppetlab@yahoo.com) wrote:

> Suppose you want to use an RDBMS to store messages for a threaded
> message forum like usenet and then display the messages. A toy table
> definition (that I've tried to make standards compliant) might look
> like:
>
> create table messages (
>        message_id integer,
>        in_reply_to integer,
>        created date,
>        author varchar(20),
>        title varchar(30),
>        message varchar(256),
>        primary key (message_id)
> );
>
>
> The in_reply_to field, if not null, means that the message is a reply
> to the message with the message_id it has stored. Suppose now that we
> populate the database with a 5 message discussion.

You will need a second table, called a path enumeration table. Joe Celko
wrote up this technique in his book "SQL For Smarties".

I think I can dig up some sample SQL for you, as I used this technique
several times a few years ago -- although on DB2 rather than PostrgeSQL.
Since the SQL can be a bit intricate, I have set follow-ups to
comp.databases.postgresql.sql, as it would be more on-topic there.

However, I recommend Joe Celko's book, as it explains the technique as well
as demonstrates it.
--
Regards,

Dave  [RLU#314465]
======================================================
dwnoon@spamtrap.ntlworld.com (David W Noon)
Remove spam trap to reply via e-mail.
======================================================

Re: Can SQL return a threaded-comment-view result set?

From
mvppetlab@yahoo.com (Chris)
Date:
Hello all,

I've been meaning to get back to you all but I just haven't had time.
Okay, I've got a little bit of time now so here goes....

I received many useful answers from many of you including Tom Lane,
Joe Conway, and Josh Berkus. Max Nachlinger in particular on October
5th (which was my birthday) sent me a large amount of threaded
discussion forum code of his own. (Nice birthday present, Max. Thank
you.) I will be investigating his solution when I have more time since
his is almost certainly more efficient than my own.

My own solution is a 20-line PL/pgSQL function I put together after
reading the 7.3 docs at postgresql.org. It requires no modifications
to my original example table definitions other than that I decided to
use a 0 value instead of a NULL value for the in_reply_to column when
a message isn't a reply, because that way my plpgsql function doesn't
have to treat NULL as a special case.

In particular, my solution doesn't require a message to keep pointers
to its children. If a message is a reply, it simply points to its
parent's id via in_reply_to. You can add as many messages as you want
with just single simple INSERT statements; you don't have to do any
tree-refactoring or updating to the parent. The downside is that while
insert speed couldn't be any better and inserting couldn't be any
easier, building the threaded view seems rather algorithmically
inefficient, and in almost all applications optimising for obtaining
the threaded view rather than insert speed is more important. One
probably couldn't base even a moderate-load application on this
solution, but if one wanted to anyways I suppose an in-memory tree
representation could be maintained which allows new messages to be
linked into the in-memory tree efficiently as they're inserted into
the database, and then whenever the application is shutdown and
reloaded it could rebuild that in-memory representation on startup. Or
something. And until you run out of memory.... (Also, simply caching
the results of queries could be effective if you have many identical
queries producing identical results [which my application does] so
this solution might not work too bad for me.)

For the sake of googlers and like novices reading this, I've adapted
my PL/pgSQL function so that it works with the original example I
posted. (My real code uses more fields, different types, and has some
other subtle differences because there's more than one type of table
to consider and there are foreign key constraints.) After loading the
below code, evaluating

select * from threadview(0, 0);

builds a table like the one I wanted in my original posting.

           ---Chris

=================================================
-- This code originally due to Chris Barry,
http://www.goodfig.org/feedback
-- It's hereby placed in the public domain. These public domain
licenses
-- usually have some sort of warning about no guarantee of fitness for
a particular
-- purpose, etc. Well, the below code is DEFINITELY not fit for any
purpose! So,
-- use it at your own peril. Caveat emptor.

-- drop database discussion;
create database discussion;
\c discussion

--  The path to plpgsql.so may need to be edited for your system.
create function plpgsql_call_handler()
         returns opaque as '/usr/local/pgsql/lib/plpgsql.so' language
'c';

create language 'plpgsql' handler plpgsql_call_handler
                            lancompiler 'PL/pgSQL';

create table messages (
       message_id integer,
       in_reply_to integer,
       created date,
       author varchar(20),
       title varchar(30),
       message varchar(256),
       primary key (message_id)
);

-- A threadrow is the same thing as a row from the messages table
-- except a nesting integer has been added so the client knows how
-- much to indent the thread message. I'm not sure if there's a
-- syntax that makes it unnecessary to duplicate the redundant
-- information from the messages table (e.g inheritance).
create type threadrow as (
       message_id integer,
       in_reply_to integer,
       created date,
       author varchar(20),
       title varchar(30),
       message varchar(256),
       nesting integer
);


create or replace function threadview(int, int) returns setof
threadrow as '
declare
    p alias for $1; -- p is the parent
    i alias for $2; -- i is the indentation (nesting)
    c threadrow%rowtype;
    c2 threadrow%rowtype;
begin
    for c in select *, 0 as nesting from messages
                        where in_reply_to = p
                         order by created asc
    loop
              c.nesting = i;
        return next c;
        for c2 in select * from threadview(c.message_id, i+1) loop
            return next c2;
        end loop;
    end loop;
    return;
end;
' language 'plpgsql';


-- Load the table with some example data:

insert into messages values
 (1,    0, '2003-09-01', 'John', 'Favorite DB?',
            'What is your favorite database?');
insert into messages values
 (2,    0, '2003-09-02', 'Mike', 'New DB2 benchmarks',
            'I just posted some new DB2 benchmarks.');
insert into messages values
 (3,    1, '2003-09-03', 'Mike', 'Re: Favorite DB?',
        'I\'d say DB2.');
insert into messages values
 (4,    1, '2003-09-05', 'Dave', 'Re: Favorite DB?',
            'I\'m an Oracle man myself.');
insert into messages values
 (5,    3, '2003-09-07', 'John', 'Re: Favorite DB?',
        'DB2? I thought you liked free databases?');