Thread: In need of help with message fetching query
Hello everyone. I'm working on a small project of mine, which basically revolves around messages. These messages are to be ordered in a very standard fashion of single-depth threads. That is, a message can be a reply, and a so-called "head". A head is simply the head of a chain. To put it more eloquently, the table is: create table messages ( id serial not null primary key, author_id integer null references account (id) deferrable initially deferred, text varchar(200) not null, timestamp timestamp with time zone not null, reply_to_id integer null, reply_to_account_id integer not null references account (id) deferrable initially deferred, unique (account_id, reply_to_id) ); That is, a message is made by an account, and *can* be made in reply to another message. If it is, reply_to_account_id will equal the author_id of the corresponding row where reply_to_id = id. So, for example, a simple case: #1 by:A1 "Hello world!" reply_to:null #2 by:A2 "Hey there." reply_to:#1 #3 by:A3 "'Sup?" reply_to:#1 Would be: Hello world! +-- Hey there. +-- 'Sup? This is a simple case -- the head in this particular instance is #1. Now, given the above, imagine we were to have: #1 by:A1 "Hello world!" reply_to:null #2 by:A2 "Hey there." reply_to:#1 #3 by:A3 "'Sup?" reply_to:#1 #4 by:A1 "Not much!" reply_to:#3 It immediately becomes more complicated -- #4 is a reply to #3, which is a reply to #1. The desired resultset would be: #3 by:A3 "'Sup?" reply_to:#1 #4 by:A1 "Not much!" reply_to:#3 #1 by:A1 "Hello world!" reply_to:null #2 by:A2 "Hey there." reply_to:#1 #3 by:A3 "'Sup?" reply_to:#1 Now, #3 has become a head in its first appearance. In the second, at the end, it is a mere reply. Thus: 'Sup? +-- Not much! Hello world! +-- Hey there. +-- 'Sup? So the head is really "has one or more replies, or reply_to is null." The query could thus be summed up into, "Get all rows which have reply_to_id null, or at least one other row refers to in its reply_to_id. Then, for every such row, include all messages which have reply_to_id = current row's id" It is entirely possible that this results in two queries -- one for fetching the heads, one for fetching all replies. Further, there are other constraints which would appear trivial: the heads fetching part will have a limit, and a where clause telling it "get things older than this", as well as "get rows with author_id in (x, y, ...)". There is also a very quirky issue -- the sorting. A head's placement in the resultset is dependant on its latest reply. That is, if a new row is added to our table: #5 by:A4 "O hi!" reply_to:#1 With its timestamp set to the current time, this will result in the #1 "chain" getting bubbled up to the top! I implemented this as: order by (select b.timestamp from messages b where reply_to_id = o.id order by timestamp desc limit 1) desc Obviously, this isn't a very cheap query at all! Even with me being humble, my hardware simply won't cut it after some amount of messages. So, I think that sums my problem up fairly well. I've been trying to get this to work the way I want for days, but I'm simply not skilled enough in SQL to make this work. I did read manuals, and I think I really exhausted most of my options. (As an aside: while I deploy using PostgreSQL (what else?), I also use SQLite on my MacBook, but I could give that up.)
On Tue, Feb 10, 2009 at 11:19:13PM +0100, Max Strrrmberg wrote: > [...] messages are to be ordered in a very standard fashion of single-depth > threads. That is, a message can be a reply, and a so-called "head". > > create table messages ( > id serial not null primary key, > author_id integer null references account, > text varchar(200) not null, > timestamp timestamp not null, > reply_to_id integer null, > reply_to_account_id integer not null references account, > unique (account_id, reply_to_id) > ); I'd almost advocate having a having another table, and thus breaking various normalization rules, to make your life easier: CREATE TABLE heads ( head_message_id INTEGER PRIMARY KEY REFERENCES messages, last_reply TIMESTAMP NOT NULL ); You can have a trigger to maintain this table, and maybe even one to insert things as well. Not sure about the other rules you want, but if a "thread" can only involve two people (obviously for email conversions this isn't true) then you could move the author_id and reply_to_account_id into this table as well. If you did this you could move other attributes around and get things normalized again. Does that help with ideas? -- Sam http://samason.me.uk/
Hey, thanks for your quick response On Wed, Feb 11, 2009 at 12:00 AM, Sam Mason <sam@samason.me.uk> wrote: > On Tue, Feb 10, 2009 at 11:19:13PM +0100, Max Strrrmberg wrote: >> [...] messages are to be ordered in a very standard fashion of single-depth >> threads. That is, a message can be a reply, and a so-called "head". >> >> create table messages ( >> id serial not null primary key, >> author_id integer null references account, >> text varchar(200) not null, >> timestamp timestamp not null, >> reply_to_id integer null, >> reply_to_account_id integer not null references account, >> unique (account_id, reply_to_id) >> ); > > I'd almost advocate having a having another table, and thus breaking > various normalization rules, to make your life easier: > > CREATE TABLE heads ( > head_message_id INTEGER PRIMARY KEY REFERENCES messages, > last_reply TIMESTAMP NOT NULL > ); > > You can have a trigger to maintain this table, and maybe even one > to insert things as well. Not sure about the other rules you want, > but if a "thread" can only involve two people (obviously for email > conversions this isn't true) then you could move the author_id and > reply_to_account_id into this table as well. If you did this you could > move other attributes around and get things normalized again. > > Does that help with ideas? You make an interesting point. So, whenever a row is inserted into messages, I would want to do something like `update or insert into heads for id = coalesce(replied_to, own_id), set last_reply = own_timestamp.` Correct? Though I might've been crap at explaining it, I think there's an essential part missing here: a segment of the tree could be up for fetch because it has a reply from a certain account. That's why I chose to denormalize reply_to_account_id. So... to get an equivalent non-NF table, I would want one row in heads per reply? :/ And update each row with last_reply. Seems to me that is getting unwieldy pretty fast. Hrm... > > -- > Sam http://samason.me.uk/ > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >