Home > mailing lists

Thread: [PERFORM] Query plan for views and WHERE clauses, Luke is not using the index

[PERFORM] Query plan for views and WHERE clauses, Luke is not using the index

From

kimaidou

Date:

18 August 2017, 19:46:47

Hi all,

I have come across a unexpected behavior.

You can see full detail on an issue on the QGEP project in Github :

https://github.com/QGEP/QGEP/issues/308#issuecomment-323122514

Basically, we have this view with some LEFT JOIN :
http://paste.debian.net/982003/

We have indexes on some fields ( foreign keys, and a GIST index for the PostGIS geometry field)

If I use the raw SQL defining the view, and add a WHERE clause like:

WHERE "progression_geometry" && st_makeenvelope(1728327.03249295568093657,8240789.26074041239917278,1728608.10987572139129043,8240958.16933418624103069,3949)

the query plan is "as expected", as it is using the spatial index (and others too). This query gets 100 lines from a "main" table containing 20000 lines (and child tables having more). It is pretty fast and "low cost"

See the query plan:

https://explain.depesz.com/s/6Qgb

When we call the WHERE on the view:

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT *
FROM "qgep"."vw_qgep_reach"
WHERE "progression_geometry" && st_makeenvelope(1728327.03249295568093657,8240789.26074041239917278,1728608.10987572139129043,8240958.16933418624103069,3949)

The query plan is "wrong", as PostgreSQL seems to consider it should do a seq scan on the tables, and only afterwards filter with the WHERE:

https://explain.depesz.com/s/wXV

The query takes about 1 second instead of less than 100ms.

Do you have any hint on this kind of issue ?

Thanks in advance

Regards,

Michaël

Re: [PERFORM] Query plan for views and WHERE clauses, Luke is not using the index

From

kimaidou

Date:

21 August 2017, 23:34:27

Hi all

I also tried to change the values of join_collapse_limit and rom_collapse_limit to higher values than default: 12, 50 or even 100, with no improvement on the query plan.

Is this a typical behavior, or is there something particular in my query that causes this big difference between the raw query and the view with WHERE ?

Regards

Michaël

2017-08-18 18:46 GMT+02:00 kimaidou <kimaidou@gmail.com>:

Hi all,

I have come across a unexpected behavior.
You can see full detail on an issue on the QGEP project in Github :
https://github.com/QGEP/QGEP/issues/308#issuecomment-323122514

Basically, we have this view with some LEFT JOIN :
http://paste.debian.net/982003/

We have indexes on some fields ( foreign keys, and a GIST index for the PostGIS geometry field)
If I use the raw SQL defining the view, and add a WHERE clause like:

WHERE "progression_geometry" && st_makeenvelope(1728327.03249295568093657,8240789.26074041239917278,1728608.10987572139129043,8240958.16933418624103069,3949)

the query plan is "as expected", as it is using the spatial index (and others too). This query gets 100 lines from a "main" table containing 20000 lines (and child tables having more). It is pretty fast and "low cost"
See the query plan:
https://explain.depesz.com/s/6Qgb

When we call the WHERE on the view:

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT *
FROM "qgep"."vw_qgep_reach"
WHERE "progression_geometry" && st_makeenvelope(1728327.03249295568093657,8240789.26074041239917278,1728608.10987572139129043,8240958.16933418624103069,3949)
The query plan is "wrong", as PostgreSQL seems to consider it should do a seq scan on the tables, and only afterwards filter with the WHERE:
https://explain.depesz.com/s/wXV
The query takes about 1 second instead of less than 100ms.
Do you have any hint on this kind of issue ?
Thanks in advance
Regards,
Michaël

Re: [PERFORM] Query plan for views and WHERE clauses, Luke is notusing the index

From

David Rowley

Date:

22 August 2017, 00:52:17

On 19 August 2017 at 04:46, kimaidou <kimaidou@gmail.com> wrote:
> When we call the WHERE on the view:
>
> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
> SELECT *
> FROM "qgep"."vw_qgep_reach"
> WHERE "progression_geometry" &&
>
st_makeenvelope(1728327.03249295568093657,8240789.26074041239917278,1728608.10987572139129043,8240958.16933418624103069,3949)
>
>
> The query plan is "wrong", as PostgreSQL seems to consider it should do a
> seq scan on the tables, and only afterwards filter with the WHERE:
> https://explain.depesz.com/s/wXV
>
> The query takes about 1 second instead of less than 100ms.
>
> Do you have any hint on this kind of issue ?

This is by design due to the DISTINCT ON() clause. Only quals which
filter columns which are in the DISTINCT ON can be safely pushed down.

Consider the following, where I've manually pushed the WHERE clause.

postgres=# create table tt (a int, b int);
CREATE TABLE
postgres=# create index on tt (a);
CREATE INDEX
postgres=# insert into tt values(1,1),(1,2),(2,1),(2,2);
INSERT 0 4
postgres=# select * from (select distinct on (a) a,b from tt order by
a,b) tt where b = 2;
 a | b
---+---
(0 rows)

postgres=# select * from (select distinct on (a) a,b from tt where b =
2 order by a,b) tt;
 a | b
---+---
 1 | 2
 2 | 2
(2 rows)

Note the results are not the same.

If I'd done WHERE a = 2, then the planner would have pushed the qual
down into the subquery.

More reading in check_output_expressions() in allpaths.c:

/* If subquery uses DISTINCT ON, check point 3 */
if (subquery->hasDistinctOn &&
!targetIsInSortList(tle, InvalidOid, subquery->distinctClause))
{
/* non-DISTINCT column, so mark it unsafe */
safetyInfo->unsafeColumns[tle->resno] = true;
continue;
}

The comment for point 3 reads:

 * 3. If the subquery uses DISTINCT ON, we must not push down any quals that
 * refer to non-DISTINCT output columns, because that could change the set
 * of rows returned.  (This condition is vacuous for DISTINCT, because then
 * there are no non-DISTINCT output columns, so we needn't check.  Note that
 * subquery_is_pushdown_safe already reported that we can't use volatile
 * quals if there's DISTINCT or DISTINCT ON.)

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [PERFORM] Query plan for views and WHERE clauses, Luke is notusing the index

From

kimaidou

Date:

22 August 2017, 10:24:12

Thanks a lot for your detailed explanation. I will try ASAP with no DISTINCT ( we are quite sure it is not needed anyway ), and report back here.

Michaël

2017-08-21 23:52 GMT+02:00 David Rowley <david.rowley@2ndquadrant.com>:

On 19 August 2017 at 04:46, kimaidou <kimaidou@gmail.com> wrote:
> When we call the WHERE on the view:
>
> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
> SELECT *
> FROM "qgep"."vw_qgep_reach"
> WHERE "progression_geometry" &&
> st_makeenvelope(1728327.03249295568093657,8240789.26074041239917278,1728608.10987572139129043,8240958.16933418624103069,3949)
>
>
> The query plan is "wrong", as PostgreSQL seems to consider it should do a
> seq scan on the tables, and only afterwards filter with the WHERE:
> https://explain.depesz.com/s/wXV
>
> The query takes about 1 second instead of less than 100ms.
>
> Do you have any hint on this kind of issue ?

This is by design due to the DISTINCT ON() clause. Only quals which
filter columns which are in the DISTINCT ON can be safely pushed down.

Consider the following, where I've manually pushed the WHERE clause.

postgres=# create table tt (a int, b int);
CREATE TABLE
postgres=# create index on tt (a);
CREATE INDEX
postgres=# insert into tt values(1,1),(1,2),(2,1),(2,2);
INSERT 0 4
postgres=# select * from (select distinct on (a) a,b from tt order by
a,b) tt where b = 2;
a | b
---+---
(0 rows)

postgres=# select * from (select distinct on (a) a,b from tt where b =
2 order by a,b) tt;
a | b
---+---
1 | 2
2 | 2
(2 rows)

Note the results are not the same.

If I'd done WHERE a = 2, then the planner would have pushed the qual
down into the subquery.

More reading in check_output_expressions() in allpaths.c:

/* If subquery uses DISTINCT ON, check point 3 */
if (subquery->hasDistinctOn &&
!targetIsInSortList(tle, InvalidOid, subquery->distinctClause))
{
/* non-DISTINCT column, so mark it unsafe */
safetyInfo->unsafeColumns[tle->resno] = true;
continue;
}

The comment for point 3 reads:

* 3. If the subquery uses DISTINCT ON, we must not push down any quals that
* refer to non-DISTINCT output columns, because that could change the set
* of rows returned. (This condition is vacuous for DISTINCT, because then
* there are no non-DISTINCT output columns, so we needn't check. Note that
* subquery_is_pushdown_safe already reported that we can't use volatile
* quals if there's DISTINCT or DISTINCT ON.)

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services