Thread: BUG #8144: Problem with rank window function and CTEs

BUG #8144: Problem with rank window function and CTEs

From
marc@bloodnok.com
Date:
The following bug has been logged on the website:

Bug reference:      8144
Logged by:          Marc Munro
Email address:      marc@bloodnok.com
PostgreSQL version: 9.2.4
Operating system:   Linux 3.6.3 (debian wheezy)
Description:        =


I have a query in which I want to use the result of a window function to
isolate the most relevant results.   While I was trying to develop and test
the query, I discovered what looks like a bug in the results of the rank()
function.  This has been tried with the same results on 9.1.9 and 9.2.4

I have two versions of the query, the first works as expected while the
second produces incorrect rank values.  I have tried various work-arounds
but they all seem to trigger the same behaviour once a certain degree of
complexity is reached.

This version of the query seems to work, producing the expected rank
values:

with recursive parent_orgs(parent_org_id, org_id) as
        (
          select pr.first_party_id, pr.second_party_id
            from party_rels pr
           where pr.relationship_type_id in (1009, 1010, 1011, 1012, 1013)
        ),
             ancestor_orgs(
                ancestor_org_id, org_id, depth) as
        (
          select org_id, org_id, 0
            from parent_orgs
           union all
          select p.parent_org_id, a.org_id, a.depth + 1
            from ancestor_orgs a
            join parent_orgs p
              on p.org_id =3D a.ancestor_org_id
        )
        select ao.org_id, oi.item_id, =

               oi.seq_id, oi.complete, =

               ao.ancestor_org_id, ao.depth,
               rank() over (partition by oi.item_id order by ao.depth)
          from ancestor_orgs ao
          join oitems oi
            on oi.org_id =3D ao.ancestor_org_id
         where ao.org_id =3D 20150;
        =

        org_id | item_id | seq_id | complete | ancestor_org_id | depth |
rank =

--------+---------+--------+----------+-----------------+-------+------
  20150 |       1 |      1 | t        |           20139 |     4 |    1
  20150 |       2 |      1 | t        |           20139 |     4 |    1
  20150 |  200146 |      1 | t        |           20146 |     3 |    1
  20150 |  200147 |      1 | t        |           20146 |     3 |    1
  20150 |  200148 |      1 | t        |           20146 |     3 |    1
  20150 |  200149 |      2 | t        |           20146 |     3 |    1
  20150 |  200150 |      1 | t        |           20146 |     3 |    1
  20150 |  200151 |      1 | t        |           20146 |     3 |    1
  20150 |  200152 |      1 | t        |           20146 |     3 |    1
  20150 |  200153 |      7 | t        |           20150 |     0 |    1
  20150 |  200153 |      1 | t        |           20146 |     3 |    2
  20150 |  200154 |      1 | t        |           20146 |     3 |    1
[ rows removed for brevity ]
(38 rows)

This version, which should be equivalent, yields crazy rank values:

        with recursive parent_orgs(parent_org_id, org_id) as
        (
          select pr.first_party_id, pr.second_party_id
            from party_rels pr
           where pr.relationship_type_id in (1009, 1010, 1011, 1012, 1013)
        ),
             ancestor_orgs(
                ancestor_org_id, org_id, depth) as
        (
          select org_id, org_id, 0
            from parent_orgs
           union all
          select p.parent_org_id, a.org_id, a.depth + 1
            from ancestor_orgs a
            join parent_orgs p
              on p.org_id =3D a.ancestor_org_id
        ),
             visible_org_items(org_id, item_id, =

                              seq_id, complete, ancestor_org_id, =

                              depth, rank) as
        (
          select ao.org_id, oi.item_id, =

                 oi.seq_id, oi.complete, =

                 ao.ancestor_org_id, ao.depth,
                 rank() over (partition by oi.item_id order by ao.depth)
            from ancestor_orgs ao
            join oitems oi
              on oi.org_id =3D ao.ancestor_org_id
        )
        select *
          from visible_org_items
         where org_id =3D 20150;
        =

         org_id | item_id | seq_id | complete | ancestor_org_id | depth |
rank =

--------+---------+--------+----------+-----------------+-------+------
  20150 |       1 |      1 | t        |           20139 |     4 |   21
  20150 |       2 |      1 | t        |           20139 |     4 |   21
  20150 |  200146 |      1 | t        |           20146 |     3 |    9
  20150 |  200147 |      1 | t        |           20146 |     3 |    9
  20150 |  200148 |      1 | t        |           20146 |     3 |    9
  20150 |  200149 |      2 | t        |           20146 |     3 |    9
  20150 |  200150 |      1 | t        |           20146 |     3 |    9
  20150 |  200151 |      1 | t        |           20146 |     3 |    9
  20150 |  200152 |      1 | t        |           20146 |     3 |    9
  20150 |  200153 |      7 | t        |           20150 |     0 |    1
  20150 |  200153 |      1 | t        |           20146 |     3 |   10
  20150 |  200154 |      1 | t        |           20146 |     3 |    9
[ rows removed for brevity ]
(38 rows)

I have a pg_dump (< 5K in size) with which the problem can be reproduced.

Re: BUG #8144: Problem with rank window function and CTEs

From
Tom Lane
Date:
marc@bloodnok.com writes:
> I have a query in which I want to use the result of a window function to
> isolate the most relevant results.   While I was trying to develop and test
> the query, I discovered what looks like a bug in the results of the rank()
> function.  This has been tried with the same results on 9.1.9 and 9.2.4

Well, you didn't provide enough information for somebody else to
reproduce the problem, but just looking at this I see no particular
reason to think there's a bug.  In the first case, the
"where ao.org_id = 20150" clause filters rows before the rank() is
taken, whereas in the second case it filters after the rank() function.
I think the larger rank values indicate that there are other rows with
different org_id but the same item_id, which you won't see in the final
output in either case --- but in the second query, the rank() function
does see them.

            regards, tom lane