Thread: Planning problem: pushing conditions through GROUP BY

Planning problem: pushing conditions through GROUP BY

From
Martijn van Oosterhout
Date:
Hoi,

I've having a smaller planner problem (server version 8.3.5). I have a
table with revisions (revs) and I have a view on top of that gives the
latest revision of each rule. When I request a specific rule, it works
fine.

# explain select * from maxrevs where rul_id=(select rul_id from rules where rul_sid=15895);
                                      QUERY PLAN
-----------------------------------------------------------------------------------------
 GroupAggregate  (cost=2.93..159.77 rows=2 width=8)
  InitPlan
    ->  Index Scan using rules_index_rul_sid on rules  (cost=0.00..4.27 rows=1 width=4)
          Index Cond: (rul_sid = 15895)
  ->  Bitmap Heap Scan on revs  (cost=2.93..159.32 rows=86 width=8)
        Recheck Cond: (rul_id = $0)
        ->  Bitmap Index Scan on revs_rul_id  (cost=0.00..2.91 rows=86 width=0)
              Index Cond: (rul_id = $0)
(8 rows)

But when I want to join on this view, it all goes pear shaped, as can
be seen in the following (completely equivalent) query:

# explain select * from maxrevs where rul_id=ANY(select rul_id from rules where rul_sid=15895);
                                        QUERY PLAN
---------------------------------------------------------------------------------------------
 Hash IN Join  (cost=6996.32..7080.72 rows=1 width=8)
  Hash Cond: (revs.rul_id = rules.rul_id)
  ->  HashAggregate  (cost=6992.04..7032.23 rows=3215 width=8)
        ->  Seq Scan on revs  (cost=0.00..5610.36 rows=276336 width=8)
  ->  Hash  (cost=4.27..4.27 rows=1 width=4)
        ->  Index Scan using rules_index_rul_sid on rules  (cost=0.00..4.27 rows=1 width=4)
              Index Cond: (rul_sid = 15895)
(7 rows)

As you can see, it correctly realises that there is only one rule that
matches, but fails to push this information through to the hash
aggregate. I would have expected a Nested Loop with a Bitmap Scan under
a GroupAggreate, as in the first case.

Switching between DISTINCT ON() and GROUP BY doesn't help, apparently
the planner is smart enough to see they are the same in this case.

If I manually rewrite the query to put a distinct at the outer level,
it all works fine. But the real situation is somewhat more complicated.
Is the kind of thing I can expect 8.4 to handle better? Or is there
some trick I can use to get the result I want?

Thanks in advance,

-- table definitions --

        Table "test.revs"
  Column    |  Type   | Modifiers
-------------+---------+-----------
 rev_id      | integer |
 rul_id      | integer |
 rev_content | text    |
Indexes:
   "revs_rev_id" btree (rev_id)
   "revs_rul_id" btree (rul_id)

    View "test.maxrevs"
 Column |  Type   | Modifiers
--------+---------+-----------
 rul_id | integer |
 rev_id | integer |
View definition:
 SELECT revs.rul_id, max(revs.rev_id) AS rev_id
  FROM revs
 GROUP BY revs.rul_id;

         Table "public.rules"
     Column       |  Type   | Modifiers
-------------------+---------+-----------
 rul_id            | integer | not null
 rul_sid           | integer | not null
 rul_repository_id | integer | not null
Indexes:
   "rules_rul_id_key" UNIQUE, btree (rul_id)
   "rules_index_rul_sid" btree (rul_sid)

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

Re: Planning problem: pushing conditions through GROUP BY

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> I've having a smaller planner problem (server version 8.3.5). I have a
> table with revisions (revs) and I have a view on top of that gives the
> latest revision of each rule. When I request a specific rule, it works
> fine.
> ...
> But when I want to join on this view, it all goes pear shaped, as can
> be seen in the following (completely equivalent) query:

> # explain select * from maxrevs where rul_id=ANY(select rul_id from rules where rul_sid=15895);

There isn't any provision in the planner for trying to exchange the
order of a join and a GROUP BY operation.  In the general case it
wouldn't even be allowed anyway since it'd change the results.
In this specific case --- semijoin using an equality condition on the
grouping key --- I agree that it would be a legitimate transformation.
But it's not obvious that it would be a win.  I think if we wanted to
consider it we'd have to develop plans both ways and compare costs.
And that would be a really major amount of work.

The simpler case isn't treated as a join, just a simple WHERE condition
on individual rows, so it does what you want.

I think you'll have to write out your queries without benefit of the
view :-(.  Sorry the news isn't better.

            regards, tom lane

Re: Planning problem: pushing conditions through GROUP BY

From
Martijn van Oosterhout
Date:
On Sun, Nov 22, 2009 at 12:38:36PM -0500, Tom Lane wrote:
> > But when I want to join on this view, it all goes pear shaped, as can
> > be seen in the following (completely equivalent) query:
>
> > # explain select * from maxrevs where rul_id=ANY(select rul_id from rules where rul_sid=15895);
>
> There isn't any provision in the planner for trying to exchange the
> order of a join and a GROUP BY operation.  In the general case it
> wouldn't even be allowed anyway since it'd change the results.
> In this specific case --- semijoin using an equality condition on the
> grouping key --- I agree that it would be a legitimate transformation.
> But it's not obvious that it would be a win.  I think if we wanted to
> consider it we'd have to develop plans both ways and compare costs.
> And that would be a really major amount of work.

It's not obvious it would be a win, but it would be a nice
transformation to have. In the actual case it's an inner join, but
since we know the other side only produces unique entries, the result
is the same.

> I think you'll have to write out your queries without benefit of the
> view :-(.  Sorry the news isn't better.

It's not the end of the world. At least I know it's it's not something
I'm missing. (I wonder if window functions work differently here.
That's 8.4 territory though).

Thanks anyway,

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment