Re: Apparent missed query optimization with self-join and inner grouping - Mailing list pgsql-general

From Tom Lane
Subject Re: Apparent missed query optimization with self-join and inner grouping
Date
Msg-id 1934924.1596306669@sss.pgh.pa.us
Whole thread Raw
In response to Apparent missed query optimization with self-join and inner grouping  (Zack Weinberg <zackw@panix.com>)
Responses How can you find out what point logical replication is at?
List pgsql-general
Zack Weinberg <zackw@panix.com> writes:
> For each (experiment_id, url_id) pair for some small subset of the
> experiment_ids, I need to query the full_url_id corresponding to the
> *largest* value of redirect_num.  The query planner does something
> reasonable with this SELECT:

> => explain (analyze, verbose)
>    select b.experiment_id, b.url_id, b.full_url_id
>      from blockpage b,
>           (select experiment_id, url_id, max(redirect_num) as redirect_num
>              from blockpage group by experiment_id, url_id) bm
>     where b.experiment_id = bm.experiment_id
>       and b.url_id = bm.url_id
>       and b.redirect_num = bm.redirect_num
>       and bm.experiment_id in (16, 43);

With that query, the "bm.experiment_id in (16, 43)" restriction is
pushed into the "bm" sub-select, greatly reducing the amount of work
the GROUP BY step has to do.

> But if I change the final part of the WHERE to reference
> b.experiment_id instead of bm.experiment_id, I get this much more
> expensive query plan:

> => explain (analyze, verbose)
>    select b.experiment_id, b.url_id, b.full_url_id
>      from blockpage b,
>           (select experiment_id, url_id, max(redirect_num) as redirect_num
>              from blockpage group by experiment_id, url_id) bm
>     where b.experiment_id = bm.experiment_id
>       and b.url_id = bm.url_id
>       and b.redirect_num = bm.redirect_num
>       and b.experiment_id in (16, 43);

With that query, the GROUP BY is evaluated in full, and it costs you.

In principle, given the nearby "where b.experiment_id = bm.experiment_id"
clause, we could derive "bm.experiment_id in (16, 43)" from the stated
clause.  But we don't.  The existing machinery around derivation of
implied equalities only works for simple equalities, not OR clauses.
Extending that would be a bit of a research project, and it's far from
clear that the benefits would be worth the additional planning costs.

> What is the best way to report this to the developers?  Should I file
> a bug report?  I'm using Postgres 12.2.

This is not a bug, and you should not hold your breath waiting
for it to change.

            regards, tom lane



pgsql-general by date:

Previous
From: Curt Kolovson
Date:
Subject: how reliable is pg_rewind?
Next
From: David Rowley
Date:
Subject: Re: bad JIT decision