Thread: In need of help with message fetching query

In need of help with message fetching query

From
Max Strömberg
Date:
Hello everyone.

I'm working on a small project of mine, which basically revolves
around messages.

These messages are to be ordered in a very standard fashion of single-depth
threads. That is, a message can be a reply, and a so-called "head".

A head is simply the head of a chain. To put it more eloquently, the table is:

create table messages (
    id serial not null primary key,
    author_id integer null references account (id) deferrable
initially deferred,
    text varchar(200) not null,
    timestamp timestamp with time zone not null,
    reply_to_id integer null,
    reply_to_account_id integer not null references account (id)
deferrable initially deferred,
    unique (account_id, reply_to_id)
);

That is, a message is made by an account, and *can* be made in reply
to another message. If it is, reply_to_account_id will equal the
author_id of the corresponding row where reply_to_id = id.

So, for example, a simple case:

  #1    by:A1   "Hello world!"  reply_to:null
  #2    by:A2   "Hey there."    reply_to:#1
  #3    by:A3   "'Sup?"         reply_to:#1

Would be:

 Hello world!
  +-- Hey there.
  +-- 'Sup?

This is a simple case -- the head in this particular instance is #1.

Now, given the above, imagine we were to have:

  #1    by:A1   "Hello world!"  reply_to:null
  #2    by:A2   "Hey there."    reply_to:#1
  #3    by:A3   "'Sup?"         reply_to:#1
  #4    by:A1   "Not much!"     reply_to:#3

It immediately becomes more complicated -- #4 is a reply to #3, which
is a reply to #1. The desired resultset would be:

  #3    by:A3   "'Sup?"         reply_to:#1
  #4    by:A1   "Not much!"     reply_to:#3
  #1    by:A1   "Hello world!"  reply_to:null
  #2    by:A2   "Hey there."    reply_to:#1
  #3    by:A3   "'Sup?"         reply_to:#1

Now, #3 has become a head in its first appearance. In the second, at
the end, it is a mere reply. Thus:

 'Sup?
  +-- Not much!
 Hello world!
  +-- Hey there.
  +-- 'Sup?

So the head is really "has one or more replies, or reply_to is null."

The query could thus be summed up into, "Get all rows which have
reply_to_id null, or at least one other row refers to in its
reply_to_id. Then, for every such row, include all messages which have
reply_to_id = current row's id"

It is entirely possible that this results in two queries -- one for
fetching the heads, one for fetching all replies.

Further, there are other constraints which would appear trivial: the
heads fetching part will have a limit, and a where clause telling it
"get things older than this", as well as "get rows with author_id in
(x, y, ...)".

There is also a very quirky issue -- the sorting. A head's placement
in the resultset is dependant on its latest reply. That is, if a new
row is added to our table:

  #5    by:A4   "O hi!"         reply_to:#1

With its timestamp set to the current time, this will result in the #1
"chain" getting bubbled up to the top!

I implemented this as: order by (select b.timestamp from messages b
where reply_to_id = o.id order by timestamp desc limit 1) desc

Obviously, this isn't a very cheap query at all! Even with me being
humble, my hardware simply won't cut it after some amount of messages.

So, I think that sums my problem up fairly well. I've been trying to
get this to work the way I want for days, but I'm simply not skilled
enough in SQL to make this work. I did read manuals, and I think I
really exhausted most of my options.

(As an aside: while I deploy using PostgreSQL (what else?), I also use
SQLite on my MacBook, but I could give that up.)

Re: In need of help with message fetching query

From
Sam Mason
Date:
On Tue, Feb 10, 2009 at 11:19:13PM +0100, Max Strrrmberg wrote:
> [...] messages are to be ordered in a very standard fashion of single-depth
> threads. That is, a message can be a reply, and a so-called "head".
>
> create table messages (
>   id serial not null primary key,
>   author_id integer null references account,
>   text varchar(200) not null,
>   timestamp timestamp not null,
>   reply_to_id integer null,
>   reply_to_account_id integer not null references account,
>     unique (account_id, reply_to_id)
> );

I'd almost advocate having a having another table, and thus breaking
various normalization rules, to make your life easier:

  CREATE TABLE heads (
    head_message_id INTEGER PRIMARY KEY REFERENCES messages,
    last_reply  TIMESTAMP NOT NULL
  );

You can have a trigger to maintain this table, and maybe even one
to insert things as well.  Not sure about the other rules you want,
but if a "thread" can only involve two people (obviously for email
conversions this isn't true) then you could move the author_id and
reply_to_account_id into this table as well.  If you did this you could
move other attributes around and get things normalized again.

Does that help with ideas?

--
  Sam  http://samason.me.uk/

Re: In need of help with message fetching query

From
Max Strömberg
Date:
Hey, thanks for your quick response

On Wed, Feb 11, 2009 at 12:00 AM, Sam Mason <sam@samason.me.uk> wrote:
> On Tue, Feb 10, 2009 at 11:19:13PM +0100, Max Strrrmberg wrote:
>> [...] messages are to be ordered in a very standard fashion of single-depth
>> threads. That is, a message can be a reply, and a so-called "head".
>>
>> create table messages (
>>   id serial not null primary key,
>>   author_id integer null references account,
>>   text varchar(200) not null,
>>   timestamp timestamp not null,
>>   reply_to_id integer null,
>>   reply_to_account_id integer not null references account,
>>     unique (account_id, reply_to_id)
>> );
>
> I'd almost advocate having a having another table, and thus breaking
> various normalization rules, to make your life easier:
>
>  CREATE TABLE heads (
>    head_message_id INTEGER PRIMARY KEY REFERENCES messages,
>    last_reply  TIMESTAMP NOT NULL
>  );
>
> You can have a trigger to maintain this table, and maybe even one
> to insert things as well.  Not sure about the other rules you want,
> but if a "thread" can only involve two people (obviously for email
> conversions this isn't true) then you could move the author_id and
> reply_to_account_id into this table as well.  If you did this you could
> move other attributes around and get things normalized again.
>
> Does that help with ideas?

You make an interesting point.

So, whenever a row is inserted into messages, I would want to do
something like `update or insert into heads for id =
coalesce(replied_to, own_id), set last_reply = own_timestamp.`
Correct?

Though I might've been crap at explaining it, I think there's an
essential part missing here: a segment of the tree could be up for
fetch because it has a reply from a certain account. That's why I
chose to denormalize reply_to_account_id.

So... to get an equivalent non-NF table, I would want one row in heads
per reply? :/ And update each row with last_reply. Seems to me that is
getting unwieldy pretty fast.

Hrm...

>
> --
>  Sam  http://samason.me.uk/
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>