Thread: Recursive CTE trees + Sorting by votes

Recursive CTE trees + Sorting by votes

From
Gregory Taylor
Date:
We are working on a threaded comment system, and found this post by Disqus to be super helpful:


The CTE works wonderfully, and we're really happy with the results. The last obstacle is figuring out how to sort by a "votes" field, meanwhile preserving the tree structure.

If we "ORDER BY path, votes" (assuming we have the same structure as in the article), we never need tie-breaking on "path", so the "votes" part of this doesn't even come into the equation.

I suspect we need to do some path manipulation, but I'm not too sure of where to begin with this. I attempted incorporating "votes" into the path, but I failed pretty badly with this. It's probably way off, but here's my last (failed) attempt:


Any ideas would be greatly appreciated! If we can retain the path structure and also sort by votes, we'll be able to paginate freely without issues.

--

Re: Recursive CTE trees + Sorting by votes

From
Martijn van Oosterhout
Date:
On Wed, Aug 06, 2014 at 05:28:09PM -0400, Gregory Taylor wrote:
> We are working on a threaded comment system, and found this post by Disqus
> to be super helpful:
>
> http://cramer.io/2010/05/30/scaling-threaded-comments-on-django-at-disqus/
>
> The CTE works wonderfully, and we're really happy with the results. The
> last obstacle is figuring out how to sort by a "votes" field, meanwhile
> preserving the tree structure.

What do you mean exactly? Do you mean that want everything at the same
level to be sorted by vote?

> If we "ORDER BY path, votes" (assuming we have the same structure as in the
> article), we never need tie-breaking on "path", so the "votes" part of this
> doesn't even come into the equation.
>
> I suspect we need to do some path manipulation, but I'm not too sure of
> where to begin with this. I attempted incorporating "votes" into the path,
> but I failed pretty badly with this. It's probably way off, but here's my
> last (failed) attempt:
>
> https://gist.github.com/gtaylor/e3926a90fe108d52a4c8

I think what you need to do is do the ordering withing the CTE itself.
Something like:

WITH RECUSIVE cte () AS (
   SELECT ... ORDER BY vote DESC
UNION ALL
   SELECT ... JOIN cte ... ORDER BY vote DESC
) SELECT * from cte;

Or another idea, add a column that is the path of the parent:

WITH RECUSIVE cte () AS (
   SELECT array[] as path_parent, array[id] as path, ... ORDER BY vote DESC
UNION ALL
   SELECT cte.path as path_parent, cte.path || comments.id as path, ... JOIN cte ... ORDER BY vote DESC
) SELECT * from cte order by path, votes desc;

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

Attachment

Re: Recursive CTE trees + Sorting by votes

From
Gregory Taylor
Date:
Hello Martijn,

Thanks for the reply, my responses are inline below.

On Wed, Aug 6, 2014 at 5:38 PM, Martijn van Oosterhout <kleptog@svana.org> wrote:
On Wed, Aug 06, 2014 at 05:28:09PM -0400, Gregory Taylor wrote:
> We are working on a threaded comment system, and found this post by Disqus
> to be super helpful:
>
> http://cramer.io/2010/05/30/scaling-threaded-comments-on-django-at-disqus/
>
> The CTE works wonderfully, and we're really happy with the results. The
> last obstacle is figuring out how to sort by a "votes" field, meanwhile
> preserving the tree structure.

What do you mean exactly? Do you mean that want everything at the same
level to be sorted by vote?

Each level of the tree should be sorted by vote, while retaining the correct hierarchy. So the top level entry with the most votes should be at the top, plus all of the items beneath it (with each level of the tree under that row being sorted correctly).
 

> If we "ORDER BY path, votes" (assuming we have the same structure as in the
> article), we never need tie-breaking on "path", so the "votes" part of this
> doesn't even come into the equation.
>
> I suspect we need to do some path manipulation, but I'm not too sure of
> where to begin with this. I attempted incorporating "votes" into the path,
> but I failed pretty badly with this. It's probably way off, but here's my
> last (failed) attempt:
>
> https://gist.github.com/gtaylor/e3926a90fe108d52a4c8

I think what you need to do is do the ordering withing the CTE itself.
Something like:

WITH RECUSIVE cte () AS (
   SELECT ... ORDER BY vote DESC
UNION ALL
   SELECT ... JOIN cte ... ORDER BY vote DESC
) SELECT * from cte;

It looks like you can't order within a CTE.
 

Or another idea, add a column that is the path of the parent:

WITH RECUSIVE cte () AS (
   SELECT array[] as path_parent, array[id] as path, ... ORDER BY vote DESC
UNION ALL
   SELECT cte.path as path_parent, cte.path || comments.id as path, ... JOIN cte ... ORDER BY vote DESC
) SELECT * from cte order by path, votes desc;

I got this recommendation from someone else, and think that it's probably the way to go. I've been playing with it unsuccessfully so far, though. Most certainly because I've got something weirded up. Here's what I have:


    WITH RECURSIVE cte (
        id, discussion_id, body, num_votes,
        class_section_id, modified_time,
        author_id, reply_parent_id,
        path, votes_path, depth
    )  AS (
        SELECT  discussion_response.id, discussion_response.discussion_id,
            discussion_response.body, discussion_response.num_votes,
            discussion_response.last_edited_time,
            discussion_response.class_section_id,
            discussion_response.author_id, discussion_response.reply_parent_id,
            array[id] AS path,
            array[num_votes, id] AS votes_path,
            1 AS depth
        FROM    discussion_response
        WHERE   reply_parent_id IS NULL AND discussion_id=2763

        UNION ALL

        SELECT  discussion_response.id, discussion_response.discussion_id,
            discussion_response.body, discussion_response.num_votes,
            discussion_response.last_edited_time,
            discussion_response.class_section_id,
            discussion_response.author_id, discussion_response.reply_parent_id,
            cte.path || discussion_response.id,
            cte.votes_path || discussion_response.num_votes || discussion_response.id,
            cte.depth + 1 AS depth
        FROM    discussion_response
        JOIN cte ON discussion_response.reply_parent_id = cte.id
        WHERE discussion_response.discussion_id=2763
    )
    SELECT * FROM cte ORDER BY votes_path DESC, path DESC LIMIT 50 OFFSET 0;

The problem with this is that non-root level (depth > 1) rows end up at the top because of the ordering by votes_path. For example:

id=292839, num_votes=0, reply_parent_id=211957, votes_path={2,211957,0,292839}, path={211957,292839}, depth=2
id=211957, num_votes=2, reply_parent_id=NULL, votes_path={2,211957}, path={211957}, depth=1

I understand why it is ordered this way, it's just not what I was hoping for. Ideally this ends up like this:

id=211957, num_votes=2, reply_parent_id=NULL, votes_path={2,211957}, path={211957}, depth=1
id=292839, num_votes=0, reply_parent_id=211957, votes_path={2,211957,0,292839}, path={211957,292839}, depth=2

Sorting by path causes the correct "tree" structure to be returned and in the right order, but obviously it's not
sorted at all by votes.

--

Re: Recursive CTE trees + Sorting by votes

From
Vik Fearing
Date:
On 08/07/2014 01:22 PM, Gregory Taylor wrote:
> I got this recommendation from someone else, and think that it's
> probably the way to go. I've been playing with it unsuccessfully so far,
> though. Most certainly because I've got something weirded up. Here's
> what I have:
>
>
>     WITH RECURSIVE cte (
>         id, discussion_id, body, num_votes,
>         class_section_id, modified_time,
>         author_id, reply_parent_id,
>         path, votes_path, depth
>     )  AS (
>         SELECT  discussion_response.id <http://discussion_response.id>,
> discussion_response.discussion_id,
>             discussion_response.body, discussion_response.num_votes,
>             discussion_response.last_edited_time,
>             discussion_response.class_section_id,
>             discussion_response.author_id,
> discussion_response.reply_parent_id,
>             array[id] AS path,
>             array[num_votes, id] AS votes_path,
>             1 AS depth
>         FROM    discussion_response
>         WHERE   reply_parent_id IS NULL AND discussion_id=2763
>
>         UNION ALL
>
>         SELECT  discussion_response.id <http://discussion_response.id>,
> discussion_response.discussion_id,
>             discussion_response.body, discussion_response.num_votes,
>             discussion_response.last_edited_time,
>             discussion_response.class_section_id,
>             discussion_response.author_id,
> discussion_response.reply_parent_id,
>             cte.path || discussion_response.id
> <http://discussion_response.id>,
>             cte.votes_path || discussion_response.num_votes ||
> discussion_response.id <http://discussion_response.id>,
>             cte.depth + 1 AS depth
>         FROM    discussion_response
>         JOIN cte ON discussion_response.reply_parent_id = cte.id
> <http://cte.id>
>         WHERE discussion_response.discussion_id=2763
>     )
>     SELECT * FROM cte ORDER BY votes_path DESC, path DESC LIMIT 50 OFFSET 0;
>
> The problem with this is that non-root level (depth > 1) rows end up at
> the top because of the ordering by votes_path. For example:
>
> id=292839, num_votes=0, reply_parent_id=211957,
> votes_path={2,211957,0,292839}, path={211957,292839}, depth=2
> id=211957, num_votes=2, reply_parent_id=NULL, votes_path={2,211957},
> path={211957}, depth=1
>
> I understand why it is ordered this way, it's just not what I was hoping
> for. Ideally this ends up like this:
>
> id=211957, num_votes=2, reply_parent_id=NULL, votes_path={2,211957},
> path={211957}, depth=1
> id=292839, num_votes=0, reply_parent_id=211957,
> votes_path={2,211957,0,292839}, path={211957,292839}, depth=2
>
> Sorting by path causes the correct "tree" structure to be returned and
> in the right order, but obviously it's not
> sorted at all by votes.

Just export the order from your CTE.

WITH RECURSIVE tree AS (
    SELECT dr.id,
           ...,
           array[dr.id] as path,
           1 as depth,
           row_number() over (order by dr.num_votes desc) as sort_order
    FROM discussion_response AS dr
    WHERE dr.reply_parent_id IS NULL
      AND dr.discussion_id = 2763

    UNION ALL

    SELECT dr.id,
           ...,
           tree.path || dr.id,
           tree.depth + 1
           row_number() over (order by dr.num_votes desc)
    FROM discussion_response AS dr
    JOIN tree ON tree.id = dr.reply_parent_id
    WHERE NOT array[dr.id] <@ tree.path
)
SELECT *
FROM tree
ORDER BY depth, sort_order
LIMIT 50;
--
Vik


Re: Recursive CTE trees + Sorting by votes

From
Gregory Taylor
Date:
On Thu, Aug 7, 2014 at 8:12 AM, Vik Fearing <vik.fearing@dalibo.com> wrote:
Just export the order from your CTE.

WITH RECURSIVE tree AS (
    SELECT dr.id,
           ...,
           array[dr.id] as path,
           1 as depth,
           row_number() over (order by dr.num_votes desc) as sort_order
    FROM discussion_response AS dr
    WHERE dr.reply_parent_id IS NULL
      AND dr.discussion_id = 2763

    UNION ALL

    SELECT dr.id,
           ...,
           tree.path || dr.id,
           tree.depth + 1
           row_number() over (order by dr.num_votes desc)
    FROM discussion_response AS dr
    JOIN tree ON tree.id = dr.reply_parent_id
    WHERE NOT array[dr.id] <@ tree.path
)
SELECT *
FROM tree
ORDER BY depth, sort_order
LIMIT 50;

It looks like this clobbers the hierarchy by sorting by depth first. I'm trying to preserve said hierarchy so I can paginate using OFFSET/LIMIT easily. I'm not sure what I'm shooting for is even possible, though.

--

Re: Recursive CTE trees + Sorting by votes

From
Paul Jungwirth
Date:
> Or another idea, add a column that is the path of the parent:

I don't think this will work. The problem is you need the full path to
keep the children with their parents, but you also need the score. If
you make the path an array of (-votes, id) tuples (perhaps flattened
for simplicity), then you get the correct ordering. That way at every
stage you are sorting by votes, but still keeping children with their
parents:

comments=> WITH RECURSIVE cte (id, message, author, path, parent_id,
depth, votes)  AS (
    SELECT  id,
        message,
        author,
        array[-votes,id] AS path,
        parent_id,
        1 AS depth, votes
    FROM    comments
    WHERE   parent_id IS NULL
    UNION ALL
    SELECT  comments.id,
        comments.message,
        comments.author,
        cte.path || -comments.votes || comments.id,
        comments.parent_id,
        cte.depth + 1 AS depth, comments.votes
    FROM    comments
    JOIN cte ON comments.parent_id = cte.id
    )
    SELECT id, message, author, path, depth, votes FROM cte
ORDER BY path;
 id |           message           | author |       path        | depth | votes
----+-----------------------------+--------+-------------------+-------+-------
  5 | Very interesting post!      | thedz  | {-3,5}            |     1 |     3
  8 | Fo sho, Yall                | Mac    | {-3,5,-12,8}      |     2 |    12
  7 | Agreed                      | G      | {-3,5,-5,7}       |     2 |     5
  6 | You sir, are wrong          | Chris  | {-3,5,-3,6}       |     2 |     3
  1 | This thread is really cool! | David  | {-1,1}            |     1 |     1
  3 | I agree David!              | Daniel | {-1,1,-4,3}       |     2 |     4
  2 | Ya David, we love it!       | Jason  | {-1,1,-3,2}       |     2 |     3
  4 | gift Jason                  | Anton  | {-1,1,-3,2,-15,4} |     3 |    15
(8 rows)

Time: 0.966 ms

Paul


On Wed, Aug 6, 2014 at 2:38 PM, Martijn van Oosterhout
<kleptog@svana.org> wrote:
> On Wed, Aug 06, 2014 at 05:28:09PM -0400, Gregory Taylor wrote:
>> We are working on a threaded comment system, and found this post by Disqus
>> to be super helpful:
>>
>> http://cramer.io/2010/05/30/scaling-threaded-comments-on-django-at-disqus/
>>
>> The CTE works wonderfully, and we're really happy with the results. The
>> last obstacle is figuring out how to sort by a "votes" field, meanwhile
>> preserving the tree structure.
>
> What do you mean exactly? Do you mean that want everything at the same
> level to be sorted by vote?
>
>> If we "ORDER BY path, votes" (assuming we have the same structure as in the
>> article), we never need tie-breaking on "path", so the "votes" part of this
>> doesn't even come into the equation.
>>
>> I suspect we need to do some path manipulation, but I'm not too sure of
>> where to begin with this. I attempted incorporating "votes" into the path,
>> but I failed pretty badly with this. It's probably way off, but here's my
>> last (failed) attempt:
>>
>> https://gist.github.com/gtaylor/e3926a90fe108d52a4c8
>
> I think what you need to do is do the ordering withing the CTE itself.
> Something like:
>
> WITH RECUSIVE cte () AS (
>    SELECT ... ORDER BY vote DESC
> UNION ALL
>    SELECT ... JOIN cte ... ORDER BY vote DESC
> ) SELECT * from cte;
>
> Or another idea, add a column that is the path of the parent:
>
> WITH RECUSIVE cte () AS (
>    SELECT array[] as path_parent, array[id] as path, ... ORDER BY vote DESC
> UNION ALL
>    SELECT cte.path as path_parent, cte.path || comments.id as path, ... JOIN cte ... ORDER BY vote DESC
> ) SELECT * from cte order by path, votes desc;
>
> Hope this helps,
> --
> Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
>> He who writes carelessly confesses thereby at the very outset that he does
>> not attach much importance to his own thoughts.
>    -- Arthur Schopenhauer
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
>
> iQIVAwUBU+KgXUvt++dL5i1EAQgKzQ//fWqd56vcwKsYQDtbUE3Q2/ohUinYxpb6
> HgS9HoEs8QU3b4yzE6VOVXcUcN3/z6PPx4Mz3rqFOVgsFcZR2umGAaVw5oEr57Bd
> mqFDVgUxq8Xio2tijO0XFU89fh+/Cvus08CRh+OH6POLe6M76ox6cmFPtQzeaEon
> iFKXZZRIzFv7zpoE3xsQ7wgqSF44L0TIJIjdw3Dhcs8fN+T/jO0hJtUMKidGwbbv
> 9f08r9kjSMBYAhKCPXZHy/By/E91DhA8GjJFL1MloHPol/lzSkn7v7amWJZaILyE
> g3ghGUG1YhPJPA3Dw2VBKWzumNyu8kXSzTvzN6PacFToCf2ZIfTJH59ehPqztt0o
> FC6auCvO1vWS3NbOKSwdBVvXb/bJsIM3uqN16LSVhHqUp75eOFp5AWKJMCjQF1hE
> MkHk5xyz2CWsYZTlzqCKtGxRjFEbxKGjtqsxcM4qKM3uSjMG/ZhaAY6FZFLIage0
> yxsHrE5N+zfDAGV1EplxxtzMHUEqyFnBYQNRHUSChLPCkgrluOeFFRQU22aVpUUL
> vbPIBI8E16bbtU6zwnE3DoMdBm1Pq5E4c+URbfbzJhGB1e/DkDqf7pOZjojLJ9ue
> DRP777bBbsYwtCdS69kiIDkfwA2f7lliILI9wpnKSg64SIWlCR6NVWFTsfU8OP4l
> cJw8kApkDr4=
> =8bEW
> -----END PGP SIGNATURE-----
>



--
_________________________________
Pulchritudo splendor veritatis.


Re: Recursive CTE trees + Sorting by votes

From
Gregory Taylor
Date:

On Thu, Aug 7, 2014 at 11:57 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
> Or another idea, add a column that is the path of the parent:

I don't think this will work. The problem is you need the full path to
keep the children with their parents, but you also need the score. If
you make the path an array of (-votes, id) tuples (perhaps flattened
for simplicity), then you get the correct ordering. That way at every
stage you are sorting by votes, but still keeping children with their
parents:

comments=> WITH RECURSIVE cte (id, message, author, path, parent_id,
depth, votes)  AS (
    SELECT  id,
        message,
        author,
        array[-votes,id] AS path,
        parent_id,
        1 AS depth, votes
    FROM    comments
    WHERE   parent_id IS NULL
    UNION ALL
    SELECT  comments.id,
        comments.message,
        comments.author,
        cte.path || -comments.votes || comments.id,
        comments.parent_id,
        cte.depth + 1 AS depth, comments.votes
    FROM    comments
    JOIN cte ON comments.parent_id = cte.id
    )
    SELECT id, message, author, path, depth, votes FROM cte
ORDER BY path;
 id |           message           | author |       path        | depth | votes
----+-----------------------------+--------+-------------------+-------+-------
  5 | Very interesting post!      | thedz  | {-3,5}            |     1 |     3
  8 | Fo sho, Yall                | Mac    | {-3,5,-12,8}      |     2 |    12
  7 | Agreed                      | G      | {-3,5,-5,7}       |     2 |     5
  6 | You sir, are wrong          | Chris  | {-3,5,-3,6}       |     2 |     3
  1 | This thread is really cool! | David  | {-1,1}            |     1 |     1
  3 | I agree David!              | Daniel | {-1,1,-4,3}       |     2 |     4
  2 | Ya David, we love it!       | Jason  | {-1,1,-3,2}       |     2 |     3
  4 | gift Jason                  | Anton  | {-1,1,-3,2,-15,4} |     3 |    15
(8 rows)

Time: 0.966 ms


This is outstanding, Paul. I'm still checking things over, but it looks like this is going to work. It looks like I was really close, but didn't think to go negative, and I had one of my arrays flip-flopped from what you've got. I made those two changes and it would appear that this is perfect.

Much appreciated, I would have been beating my head against this for a lot longer without the help!