Re: left outer join vs subplan - Mailing list pgsql-hackers

From Teodor Sigaev
Subject Re: left outer join vs subplan
Date
Msg-id 46E159F1.9070103@sigaev.ru
Whole thread Raw
In response to Re: left outer join vs subplan  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers

> I think we need to build up a library of autogenerated queries, so we
> can do things which address multiple use cases. Can you tell us more
> about who/what generated it, so we can research?

Sorry, I can't publish a lot of information, that is on of the biggest russian 
software company, it tries to migrate from MS SQL to PostgreSQL. MS SQL can 
optimize such queries to form similar to second query.

> 
> The query formulation does seem a fairly common one.

It seems to me too. Instead of SUM aggregates it can be MIN/AMX/AVG etc or more 
complex subquery. But pgsql usually optimizes non-aggregate subquery rather well.

> The value of sum(b.val) is never used in the query, so the aggregate
> itself could be discarded. I suspect there are other conditions you
> aren't showing us that would make this impossible?
No, - select *, ie all fields from a and bagg tables.


> The aggregate prevents the condition bagg.id = a.id from being pushed
> down so that we know b.id = a.id. If we knew that then we could use b.id
> = ? as an index condition to retrieve the rows.
In this case, it's safe to push down clause b.id=a.id.

BTW, is pgsql understand that query 'select id,sum() ... group by id' produces result with unique id?

> Since we can't use the best technique, we use another. That then hits a
> third optimization problem. When an IndexScan is used to enforce order,
> we don't estimate how much of the table needs to be scanned before we
> start hitting rows. In the example you give we need to scan 65% of the
Why 65%? a.addon has only 100 unique values and in first 100 tuples in index 
(a2idx) it will be about 80 tuples with  id>10000.

> table using an IndexScan before we hit any rows. So we would probably be
> better off doing a Sort<-SeqScan to apply the condition.

That's true for query without LIMIT clause (second query is slower for 10% in 
compare with first one - both without LIMIT).
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
Next
From: Kenneth Marshall
Date:
Subject: Re: Hash index todo list item