Best way to store a threaded message list/tree in SQL - Mailing list pgsql-general

From Mike Christensen
Subject Best way to store a threaded message list/tree in SQL
Date
Msg-id 49CC1656.5050100@comcast.net
Whole thread Raw
Responses Re: Best way to store a threaded message list/tree in SQL
List pgsql-general
Hi guys -

I'm looking for the best way to store a set of "posts" as well as
comments on those posts in SQL.  Imagine a design similar to a "Wall" on
Facebook where users can write posts on their wall and other users can
comment on those posts.  I need to be able to display all wall posts as
well as the comments.

When I first started out, I came up with a table such as:

CREATE Table wallposts
(
  id uuid NOT NULL,
  posted timestamp NOT NULL,
  userid uuid NOT NULL,
  posterid uuid NOT NULL,
  parentid uuid NOT NULL,
  comment text NOT NULL
)

id is unique, parentid will be null on original posts and point to an id
if the row is a comment on an existing post.  Easy enough and super fast
to insert new data.  However, doing a select which would return me:

POST 1
COMMENT 1
COMMENT 2
POST 2
COMMENT 1
COMMENT 2

Regardless of which order the rows existed in the database proved to be
extremely difficult.  I obviously can't just order by date, as someone
might comment on post 1 after post 2 has been posted.  If I do a LEFT
JOIN to get the parent post on all rows, and then sort by that date
first, all the original posts group together as they'd have a value of null.

Then I got this idea:

CREATE TABLE wallposts
(
  id uuid NOT NULL,
  threadposted timestamp,
  posted timestamp,
  ...
  comment text
)

On an original post, threadposted and posted would be the same.  On a
comment, timestamp would be the time the original post was posted and
"posted" would be the time the comment on that thread was posted.  Now I
can just do:

select * from wallposts order by threadposted, posted;

This works great, however one thing irks me.  If two people create a
post at the same time, comments on the two posts would get munged
together as they'd have the same timestamp.  I could use "ticks" instead
of a datetime, but still the accuracy is only 1/1000 of a second.  I
could also setup a unique constraint on threadposted and posted which
makes inserts a bit more expensive, but if I had multiple database
servers in a farm, the chance of a collision is still there.  I almost
went ahead with this anyway since the chances of this happening are
extremely small, but I wanted to see if I could eat my cake and still
have it too.  Mostly for my own educational curiosity.

Third solution would be to store this data in the form of a graph.  Each
node would have a v-left and v-right pointer.  I could order by "left"
which would traverse the tree in the order I need.  However, every time
someone inserts a comment I'd have to re balance the whole tree.  This
would create a ton of row locking, and all sorts of problems if the site
was very busy.  Plus, it's kinda extreme and also causes replication
problems.  So I tossed this idea quickly.

I also thought about just storing the original posts and then
serializing the comments in a binary form, since who cares about
individual comments.  This would be very fast, however if a user wants
to delete their comment or append a new comment to the end, I have to
deserialize this data, modify the structure, then serialize it back and
update the row.  If a bunch of people are commenting on the same post at
the same time, I might have random issues with that.

So here's what I eventually did.  I query for all the posts ordered by
date entered.  In the middle ware layer, I loop through the recordset
and create a "stack" of original posts, each node on the stack points to
a linked list of comments.  When I come across an original post, I push
a new node on the stack and when I come across a comment I add a node to
the linked list.  I organize this in memory so I can traverse the
recordset once and have O(n).  After I create the in-memory
representation of the wall, I traverse through this data structure again
and write out HTML.  This works great and has super fast inserts and
super fast selects, and no weird row locking issues; however it's a bit
heavier on my presentation layer and requires me to build an in memory
representation of the user's wall to move stuff around so it's in the
right order.  Still, I believe this is the best approach I've found so far.

I thought I'd check with some other SQL experts and see if 1) Postgres
has some super nifty tree functions that I've never heard of and 2) see
if there's actually a way to do this using joins or unions or something
which would still be performant with millions of users.  Thoughts?

Mike

pgsql-general by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Is there a meaningful benchmark?
Next
From: David Fetter
Date:
Subject: Re: Enumerating a row set