Thread: Postgres not using GiST index in a lateral join

Postgres not using GiST index in a lateral join

From
Igor Stassiy
Date:
Hello,

I have a query plan optimization question. It is formatted nicely on 


But here is a copy for the archive:

Here is the setup:

CREATE EXTENSION postgis;
DROP TABLE IF EXISTS A;
DROP TABLE IF EXISTS B; 
CREATE TABLE A(shape Geometry, id INT); 
CREATE TABLE B(shape Geometry, id INT, kind INT); 
CREATE INDEX ON A USING GIST (shape); 
CREATE INDEX ON B USING GIST (shape);

I am running the following commands:

ANALYZE A; 
ANALYZE B;

-- for each row in A, select exactly one row in B (if there is one)
-- such that B contains geometry of A
EXPLAIN SELECT * FROM (SELECT A.id as aid, (SELECT B.id FROM B WHERE ST_Contains(B.shape, A.shape) AND B.kind != 1 LIMIT 1) as bid FROM A) AS TMP;

which gives me { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A", "Startup Cost": 0.00, "Total Cost": 2606592.33, "Plan Rows": 549745, "Plan Width": 1646, "Plans": [ { "Node Type": "Limit", "Parent Relationship": "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Seq Scan", "Parent Relationship": "Outer", "Relation Name": "B", "Startup Cost": 0.00, "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Filter": "((shape && A.shape) AND _st_contains(shape, A.shape))" } ] } ] } }


Note that there is a sequential scan inside the lateral join, however there is clearly an index available. However after setting

set enable_seqscan=false;

the index is being used. This actually affects runtime significantly (around 3 times faster) and seems that postgres should figure things like that automatically. { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A", "Startup Cost": 10000000000.00, "Total Cost": 10004716493.85, "Plan Rows": 549450, "Plan Width": 1677, "Plans": [ { "Node Type": "Limit", "Parent Relationship": "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Index Scan", "Parent Relationship": "Outer", "Scan Direction": "NoMovement", "Index Name": "B_shape_idx", "Relation Name": "B", "Startup Cost": 0.00, "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Index Cond": "(shape && A.shape)", "Filter": "_st_contains(shape, A.shape)" } ] } ] } }

Is there any way to tell postgres to use index in a less hacky way? Possibly by rewriting the query? From what I understand the use of set enable_... is not recommended in production.

When you actually run the commands above it will give

{ "Plan": { "Node Type": "Seq Scan", "Relation Name": "a", "Alias": "a", "Startup Cost": 0.00, "Total Cost": 10372.75, "Plan Rows": 1230, "Plan Width": 36, "Plans": [ { "Node Type": "Limit", "Parent Relationship": "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.14, "Total Cost": 8.41, "Plan Rows": 1, "Plan Width": 4, "Plans": [ { "Node Type": "Index Scan", "Parent Relationship": "Outer", "Scan Direction": "NoMovement", "Index Name": "b_shape_idx", "Relation Name": "b", "Alias": "b", "Startup Cost": 0.14, "Total Cost": 8.41, "Plan Rows": 1, "Plan Width": 4, "Index Cond": "(shape && a.shape)", "Filter": "((kind <> 1) AND _st_contains(shape, a.shape))" } ] } ] } }

Unfortunately I cannot provide data to reproduce the query plan results.

Thanks,
Igor

Re: Postgres not using GiST index in a lateral join

From
Paul Ramsey
Date:
Stop writing so many subqueries, think in joins; the poor planner!

SELECT DISTINCT ON (a.id) a.id AS a_id, b.id AS b_id
FROM a
JOIN b
ON ST_Contains(b.shape, a.shape)
WHERE b.kind != 1

Also, the DISTINCT ON syntax (PgSQL custom) lets you winnow a result
set down to just one of the inputs.

P.


On Wed, Mar 4, 2015 at 6:36 AM, Igor Stassiy <istassiy@gmail.com> wrote:
> Hello,
>
> I have a query plan optimization question. It is formatted nicely on
>
> http://stackoverflow.com/questions/28856452/postgres-not-using-gist-index-in-lateral-join
>
> But here is a copy for the archive:
>
> Here is the setup:
>
> CREATE EXTENSION postgis;
> DROP TABLE IF EXISTS A;
> DROP TABLE IF EXISTS B;
> CREATE TABLE A(shape Geometry, id INT);
> CREATE TABLE B(shape Geometry, id INT, kind INT);
> CREATE INDEX ON A USING GIST (shape);
> CREATE INDEX ON B USING GIST (shape);
>
> I am running the following commands:
>
> ANALYZE A;
> ANALYZE B;
>
> -- for each row in A, select exactly one row in B (if there is one)
> -- such that B contains geometry of A
> EXPLAIN SELECT * FROM (SELECT A.id as aid, (SELECT B.id FROM B WHERE
> ST_Contains(B.shape, A.shape) AND B.kind != 1 LIMIT 1) as bid FROM A) AS
> TMP;
>
> which gives me { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A",
> "Startup Cost": 0.00, "Total Cost": 2606592.33, "Plan Rows": 549745, "Plan
> Width": 1646, "Plans": [ { "Node Type": "Limit", "Parent Relationship":
> "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost":
> 4.68, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Seq Scan",
> "Parent Relationship": "Outer", "Relation Name": "B", "Startup Cost": 0.00,
> "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Filter": "((shape &&
> A.shape) AND _st_contains(shape, A.shape))" } ] } ] } }
>
>
> Note that there is a sequential scan inside the lateral join, however there
> is clearly an index available. However after setting
>
> set enable_seqscan=false;
>
> the index is being used. This actually affects runtime significantly (around
> 3 times faster) and seems that postgres should figure things like that
> automatically. { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A",
> "Startup Cost": 10000000000.00, "Total Cost": 10004716493.85, "Plan Rows":
> 549450, "Plan Width": 1677, "Plans": [ { "Node Type": "Limit", "Parent
> Relationship": "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00,
> "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node
> Type": "Index Scan", "Parent Relationship": "Outer", "Scan Direction":
> "NoMovement", "Index Name": "B_shape_idx", "Relation Name": "B", "Startup
> Cost": 0.00, "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Index
> Cond": "(shape && A.shape)", "Filter": "_st_contains(shape, A.shape)" } ] }
> ] } }
>
> Is there any way to tell postgres to use index in a less hacky way? Possibly
> by rewriting the query? From what I understand the use of set enable_... is
> not recommended in production.
>
> When you actually run the commands above it will give
>
> { "Plan": { "Node Type": "Seq Scan", "Relation Name": "a", "Alias": "a",
> "Startup Cost": 0.00, "Total Cost": 10372.75, "Plan Rows": 1230, "Plan
> Width": 36, "Plans": [ { "Node Type": "Limit", "Parent Relationship":
> "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.14, "Total Cost":
> 8.41, "Plan Rows": 1, "Plan Width": 4, "Plans": [ { "Node Type": "Index
> Scan", "Parent Relationship": "Outer", "Scan Direction": "NoMovement",
> "Index Name": "b_shape_idx", "Relation Name": "b", "Alias": "b", "Startup
> Cost": 0.14, "Total Cost": 8.41, "Plan Rows": 1, "Plan Width": 4, "Index
> Cond": "(shape && a.shape)", "Filter": "((kind <> 1) AND _st_contains(shape,
> a.shape))" } ] } ] } }
>
> Unfortunately I cannot provide data to reproduce the query plan results.
>
> Thanks,
> Igor


Re: Postgres not using GiST index in a lateral join

From
Igor Stassiy
Date:
I would like to stop executing the query for a row of table "a" when a single row of "b" is found. This query would not stop
processing but will filter all the rows that are found at the end of execution.

Is there a way to express this without a subquery?

On Wed, Mar 4, 2015 at 11:53 PM Paul Ramsey <pramsey@cleverelephant.ca> wrote:
Stop writing so many subqueries, think in joins; the poor planner!

SELECT DISTINCT ON (a.id) a.id AS a_id, b.id AS b_id
FROM a
JOIN b
ON ST_Contains(b.shape, a.shape)
WHERE b.kind != 1

Also, the DISTINCT ON syntax (PgSQL custom) lets you winnow a result
set down to just one of the inputs.

P.


On Wed, Mar 4, 2015 at 6:36 AM, Igor Stassiy <istassiy@gmail.com> wrote:
> Hello,
>
> I have a query plan optimization question. It is formatted nicely on
>
> http://stackoverflow.com/questions/28856452/postgres-not-using-gist-index-in-lateral-join
>
> But here is a copy for the archive:
>
> Here is the setup:
>
> CREATE EXTENSION postgis;
> DROP TABLE IF EXISTS A;
> DROP TABLE IF EXISTS B;
> CREATE TABLE A(shape Geometry, id INT);
> CREATE TABLE B(shape Geometry, id INT, kind INT);
> CREATE INDEX ON A USING GIST (shape);
> CREATE INDEX ON B USING GIST (shape);
>
> I am running the following commands:
>
> ANALYZE A;
> ANALYZE B;
>
> -- for each row in A, select exactly one row in B (if there is one)
> -- such that B contains geometry of A
> EXPLAIN SELECT * FROM (SELECT A.id as aid, (SELECT B.id FROM B WHERE
> ST_Contains(B.shape, A.shape) AND B.kind != 1 LIMIT 1) as bid FROM A) AS
> TMP;
>
> which gives me { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A",
> "Startup Cost": 0.00, "Total Cost": 2606592.33, "Plan Rows": 549745, "Plan
> Width": 1646, "Plans": [ { "Node Type": "Limit", "Parent Relationship":
> "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost":
> 4.68, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Seq Scan",
> "Parent Relationship": "Outer", "Relation Name": "B", "Startup Cost": 0.00,
> "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Filter": "((shape &&
> A.shape) AND _st_contains(shape, A.shape))" } ] } ] } }
>
>
> Note that there is a sequential scan inside the lateral join, however there
> is clearly an index available. However after setting
>
> set enable_seqscan=false;
>
> the index is being used. This actually affects runtime significantly (around
> 3 times faster) and seems that postgres should figure things like that
> automatically. { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A",
> "Startup Cost": 10000000000.00, "Total Cost": 10004716493.85, "Plan Rows":
> 549450, "Plan Width": 1677, "Plans": [ { "Node Type": "Limit", "Parent
> Relationship": "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00,
> "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node
> Type": "Index Scan", "Parent Relationship": "Outer", "Scan Direction":
> "NoMovement", "Index Name": "B_shape_idx", "Relation Name": "B", "Startup
> Cost": 0.00, "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Index
> Cond": "(shape && A.shape)", "Filter": "_st_contains(shape, A.shape)" } ] }
> ] } }
>
> Is there any way to tell postgres to use index in a less hacky way? Possibly
> by rewriting the query? From what I understand the use of set enable_... is
> not recommended in production.
>
> When you actually run the commands above it will give
>
> { "Plan": { "Node Type": "Seq Scan", "Relation Name": "a", "Alias": "a",
> "Startup Cost": 0.00, "Total Cost": 10372.75, "Plan Rows": 1230, "Plan
> Width": 36, "Plans": [ { "Node Type": "Limit", "Parent Relationship":
> "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.14, "Total Cost":
> 8.41, "Plan Rows": 1, "Plan Width": 4, "Plans": [ { "Node Type": "Index
> Scan", "Parent Relationship": "Outer", "Scan Direction": "NoMovement",
> "Index Name": "b_shape_idx", "Relation Name": "b", "Alias": "b", "Startup
> Cost": 0.14, "Total Cost": 8.41, "Plan Rows": 1, "Plan Width": 4, "Index
> Cond": "(shape && a.shape)", "Filter": "((kind <> 1) AND _st_contains(shape,
> a.shape))" } ] } ] } }
>
> Unfortunately I cannot provide data to reproduce the query plan results.
>
> Thanks,
> Igor

Re: Postgres not using GiST index in a lateral join

From
Alban Hertroys
Date:
> On 04 Mar 2015, at 22:18, Igor Stassiy <istassiy@gmail.com> wrote:
>
> I would like to stop executing the query for a row of table "a" when a single row of "b" is found. This query would
notstop 
> processing but will filter all the rows that are found at the end of execution.
>
> Is there a way to express this without a subquery?

Does it? Because that would be somewhat surprising.

> On Wed, Mar 4, 2015 at 11:53 PM Paul Ramsey <pramsey@cleverelephant.ca> wrote:
> Stop writing so many subqueries, think in joins; the poor planner!
>
> SELECT DISTINCT ON (a.id) a.id AS a_id, b.id AS b_id
> FROM a
> JOIN b
> ON ST_Contains(b.shape, a.shape)
> WHERE b.kind != 1
>
> Also, the DISTINCT ON syntax (PgSQL custom) lets you winnow a result
> set down to just one of the inputs.


> > -- for each row in A, select exactly one row in B (if there is one)
> > -- such that B contains geometry of A
> > EXPLAIN SELECT * FROM (SELECT A.id as aid, (SELECT B.id FROM B WHERE
> > ST_Contains(B.shape, A.shape) AND B.kind != 1 LIMIT 1) as bid FROM A) AS
> > TMP;
> >
> > which gives me { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A",
> > "Startup Cost": 0.00, "Total Cost": 2606592.33, "Plan Rows": 549745, "Plan
> > Width": 1646, "Plans": [ { "Node Type": "Limit", "Parent Relationship":
> > "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost":
> > 4.68, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Seq Scan",
> > "Parent Relationship": "Outer", "Relation Name": "B", "Startup Cost": 0.00,
> > "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Filter": "((shape &&
> > A.shape) AND _st_contains(shape, A.shape))" } ] } ] } }

How did your query plan end up in JSON notation? It's quite difficult to read like this.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: Postgres not using GiST index in a lateral join

From
Igor Stassiy
Date:
I got the query plan using:
EXECUTE EXPLAIN (format json) ... INTO query_plan;
RAISE NOTICE query_plan;

since by default EXPLAIN returns a set of rows and I'd have to create a table for storing that. As I mentioned in the original email,
the queries and output is nicely formatted at:



On Thu, Mar 5, 2015 at 11:54 AM Alban Hertroys <haramrae@gmail.com> wrote:

> On 04 Mar 2015, at 22:18, Igor Stassiy <istassiy@gmail.com> wrote:
>
> I would like to stop executing the query for a row of table "a" when a single row of "b" is found. This query would not stop
> processing but will filter all the rows that are found at the end of execution.
>
> Is there a way to express this without a subquery?

Does it? Because that would be somewhat surprising.

> On Wed, Mar 4, 2015 at 11:53 PM Paul Ramsey <pramsey@cleverelephant.ca> wrote:
> Stop writing so many subqueries, think in joins; the poor planner!
>
> SELECT DISTINCT ON (a.id) a.id AS a_id, b.id AS b_id
> FROM a
> JOIN b
> ON ST_Contains(b.shape, a.shape)
> WHERE b.kind != 1
>
> Also, the DISTINCT ON syntax (PgSQL custom) lets you winnow a result
> set down to just one of the inputs.


> > -- for each row in A, select exactly one row in B (if there is one)
> > -- such that B contains geometry of A
> > EXPLAIN SELECT * FROM (SELECT A.id as aid, (SELECT B.id FROM B WHERE
> > ST_Contains(B.shape, A.shape) AND B.kind != 1 LIMIT 1) as bid FROM A) AS
> > TMP;
> >
> > which gives me { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A",
> > "Startup Cost": 0.00, "Total Cost": 2606592.33, "Plan Rows": 549745, "Plan
> > Width": 1646, "Plans": [ { "Node Type": "Limit", "Parent Relationship":
> > "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost":
> > 4.68, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Seq Scan",
> > "Parent Relationship": "Outer", "Relation Name": "B", "Startup Cost": 0.00,
> > "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Filter": "((shape &&
> > A.shape) AND _st_contains(shape, A.shape))" } ] } ] } }

How did your query plan end up in JSON notation? It's quite difficult to read like this.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.