> I think we need to build up a library of autogenerated queries, so we
> can do things which address multiple use cases. Can you tell us more
> about who/what generated it, so we can research?
Sorry, I can't publish a lot of information, that is on of the biggest russian
software company, it tries to migrate from MS SQL to PostgreSQL. MS SQL can
optimize such queries to form similar to second query.
>
> The query formulation does seem a fairly common one.
It seems to me too. Instead of SUM aggregates it can be MIN/AMX/AVG etc or more
complex subquery. But pgsql usually optimizes non-aggregate subquery rather well.
> The value of sum(b.val) is never used in the query, so the aggregate
> itself could be discarded. I suspect there are other conditions you
> aren't showing us that would make this impossible?
No, - select *, ie all fields from a and bagg tables.
> The aggregate prevents the condition bagg.id = a.id from being pushed
> down so that we know b.id = a.id. If we knew that then we could use b.id
> = ? as an index condition to retrieve the rows.
In this case, it's safe to push down clause b.id=a.id.
BTW, is pgsql understand that query 'select id,sum() ... group by id' produces result with unique id?
> Since we can't use the best technique, we use another. That then hits a
> third optimization problem. When an IndexScan is used to enforce order,
> we don't estimate how much of the table needs to be scanned before we
> start hitting rows. In the example you give we need to scan 65% of the
Why 65%? a.addon has only 100 unique values and in first 100 tuples in index
(a2idx) it will be about 80 tuples with id>10000.
> table using an IndexScan before we hit any rows. So we would probably be
> better off doing a Sort<-SeqScan to apply the condition.
That's true for query without LIMIT clause (second query is slower for 10% in
compare with first one - both without LIMIT).
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/