Thread: Postgres not using GiST index in a lateral join
Hello,
Here is the setup:
I have a query plan optimization question. It is formatted nicely on
But here is a copy for the archive:
Here is the setup:
CREATE EXTENSION postgis;
DROP TABLE IF EXISTS A;
DROP TABLE IF EXISTS B;
CREATE TABLE A(shape Geometry, id INT);
CREATE TABLE B(shape Geometry, id INT, kind INT);
CREATE INDEX ON A USING GIST (shape);
CREATE INDEX ON B USING GIST (shape);
I am running the following commands:
I am running the following commands:
ANALYZE A;
ANALYZE B;
-- for each row in A, select exactly one row in B (if there is one)
-- such that B contains geometry of A
EXPLAIN SELECT * FROM (SELECT A.id as aid, (SELECT B.id FROM B WHERE ST_Contains(B.shape, A.shape) AND B.kind != 1 LIMIT 1) as bid FROM A) AS TMP;
which gives me { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A", "Startup Cost": 0.00, "Total Cost": 2606592.33, "Plan Rows": 549745, "Plan Width": 1646, "Plans": [ { "Node Type": "Limit", "Parent Relationship": "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Seq Scan", "Parent Relationship": "Outer", "Relation Name": "B", "Startup Cost": 0.00, "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Filter": "((shape && A.shape) AND _st_contains(shape, A.shape))" } ] } ] } }
Note that there is a sequential scan inside the lateral join, however there is clearly an index available. However after setting
set enable_seqscan=false;
the index is being used. This actually affects runtime significantly (around 3 times faster) and seems that postgres should figure things like that automatically. { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A", "Startup Cost": 10000000000.00, "Total Cost": 10004716493.85, "Plan Rows": 549450, "Plan Width": 1677, "Plans": [ { "Node Type": "Limit", "Parent Relationship": "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Index Scan", "Parent Relationship": "Outer", "Scan Direction": "NoMovement", "Index Name": "B_shape_idx", "Relation Name": "B", "Startup Cost": 0.00, "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Index Cond": "(shape && A.shape)", "Filter": "_st_contains(shape, A.shape)" } ] } ] } }
Is there any way to tell postgres to use index in a less hacky way? Possibly by rewriting the query? From what I understand the use of set enable_... is not recommended in production.
When you actually run the commands above it will give
which gives me { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A", "Startup Cost": 0.00, "Total Cost": 2606592.33, "Plan Rows": 549745, "Plan Width": 1646, "Plans": [ { "Node Type": "Limit", "Parent Relationship": "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Seq Scan", "Parent Relationship": "Outer", "Relation Name": "B", "Startup Cost": 0.00, "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Filter": "((shape && A.shape) AND _st_contains(shape, A.shape))" } ] } ] } }
Note that there is a sequential scan inside the lateral join, however there is clearly an index available. However after setting
set enable_seqscan=false;
the index is being used. This actually affects runtime significantly (around 3 times faster) and seems that postgres should figure things like that automatically. { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A", "Startup Cost": 10000000000.00, "Total Cost": 10004716493.85, "Plan Rows": 549450, "Plan Width": 1677, "Plans": [ { "Node Type": "Limit", "Parent Relationship": "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Index Scan", "Parent Relationship": "Outer", "Scan Direction": "NoMovement", "Index Name": "B_shape_idx", "Relation Name": "B", "Startup Cost": 0.00, "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Index Cond": "(shape && A.shape)", "Filter": "_st_contains(shape, A.shape)" } ] } ] } }
Is there any way to tell postgres to use index in a less hacky way? Possibly by rewriting the query? From what I understand the use of set enable_... is not recommended in production.
When you actually run the commands above it will give
{ "Plan": { "Node Type": "Seq Scan", "Relation Name": "a", "Alias": "a", "Startup Cost": 0.00, "Total Cost": 10372.75, "Plan Rows": 1230, "Plan Width": 36, "Plans": [ { "Node Type": "Limit", "Parent Relationship": "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.14, "Total Cost": 8.41, "Plan Rows": 1, "Plan Width": 4, "Plans": [ { "Node Type": "Index Scan", "Parent Relationship": "Outer", "Scan Direction": "NoMovement", "Index Name": "b_shape_idx", "Relation Name": "b", "Alias": "b", "Startup Cost": 0.14, "Total Cost": 8.41, "Plan Rows": 1, "Plan Width": 4, "Index Cond": "(shape && a.shape)", "Filter": "((kind <> 1) AND _st_contains(shape, a.shape))" } ] } ] } }
Unfortunately I cannot provide data to reproduce the query plan results.
Unfortunately I cannot provide data to reproduce the query plan results.
Thanks,
Igor
Stop writing so many subqueries, think in joins; the poor planner! SELECT DISTINCT ON (a.id) a.id AS a_id, b.id AS b_id FROM a JOIN b ON ST_Contains(b.shape, a.shape) WHERE b.kind != 1 Also, the DISTINCT ON syntax (PgSQL custom) lets you winnow a result set down to just one of the inputs. P. On Wed, Mar 4, 2015 at 6:36 AM, Igor Stassiy <istassiy@gmail.com> wrote: > Hello, > > I have a query plan optimization question. It is formatted nicely on > > http://stackoverflow.com/questions/28856452/postgres-not-using-gist-index-in-lateral-join > > But here is a copy for the archive: > > Here is the setup: > > CREATE EXTENSION postgis; > DROP TABLE IF EXISTS A; > DROP TABLE IF EXISTS B; > CREATE TABLE A(shape Geometry, id INT); > CREATE TABLE B(shape Geometry, id INT, kind INT); > CREATE INDEX ON A USING GIST (shape); > CREATE INDEX ON B USING GIST (shape); > > I am running the following commands: > > ANALYZE A; > ANALYZE B; > > -- for each row in A, select exactly one row in B (if there is one) > -- such that B contains geometry of A > EXPLAIN SELECT * FROM (SELECT A.id as aid, (SELECT B.id FROM B WHERE > ST_Contains(B.shape, A.shape) AND B.kind != 1 LIMIT 1) as bid FROM A) AS > TMP; > > which gives me { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A", > "Startup Cost": 0.00, "Total Cost": 2606592.33, "Plan Rows": 549745, "Plan > Width": 1646, "Plans": [ { "Node Type": "Limit", "Parent Relationship": > "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost": > 4.68, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Seq Scan", > "Parent Relationship": "Outer", "Relation Name": "B", "Startup Cost": 0.00, > "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Filter": "((shape && > A.shape) AND _st_contains(shape, A.shape))" } ] } ] } } > > > Note that there is a sequential scan inside the lateral join, however there > is clearly an index available. However after setting > > set enable_seqscan=false; > > the index is being used. This actually affects runtime significantly (around > 3 times faster) and seems that postgres should figure things like that > automatically. { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A", > "Startup Cost": 10000000000.00, "Total Cost": 10004716493.85, "Plan Rows": > 549450, "Plan Width": 1677, "Plans": [ { "Node Type": "Limit", "Parent > Relationship": "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, > "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node > Type": "Index Scan", "Parent Relationship": "Outer", "Scan Direction": > "NoMovement", "Index Name": "B_shape_idx", "Relation Name": "B", "Startup > Cost": 0.00, "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Index > Cond": "(shape && A.shape)", "Filter": "_st_contains(shape, A.shape)" } ] } > ] } } > > Is there any way to tell postgres to use index in a less hacky way? Possibly > by rewriting the query? From what I understand the use of set enable_... is > not recommended in production. > > When you actually run the commands above it will give > > { "Plan": { "Node Type": "Seq Scan", "Relation Name": "a", "Alias": "a", > "Startup Cost": 0.00, "Total Cost": 10372.75, "Plan Rows": 1230, "Plan > Width": 36, "Plans": [ { "Node Type": "Limit", "Parent Relationship": > "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.14, "Total Cost": > 8.41, "Plan Rows": 1, "Plan Width": 4, "Plans": [ { "Node Type": "Index > Scan", "Parent Relationship": "Outer", "Scan Direction": "NoMovement", > "Index Name": "b_shape_idx", "Relation Name": "b", "Alias": "b", "Startup > Cost": 0.14, "Total Cost": 8.41, "Plan Rows": 1, "Plan Width": 4, "Index > Cond": "(shape && a.shape)", "Filter": "((kind <> 1) AND _st_contains(shape, > a.shape))" } ] } ] } } > > Unfortunately I cannot provide data to reproduce the query plan results. > > Thanks, > Igor
I would like to stop executing the query for a row of table "a" when a single row of "b" is found. This query would not stop
processing but will filter all the rows that are found at the end of execution.
processing but will filter all the rows that are found at the end of execution.
Is there a way to express this without a subquery?
On Wed, Mar 4, 2015 at 11:53 PM Paul Ramsey <pramsey@cleverelephant.ca> wrote:
Stop writing so many subqueries, think in joins; the poor planner!
SELECT DISTINCT ON (a.id) a.id AS a_id, b.id AS b_id
FROM a
JOIN b
ON ST_Contains(b.shape, a.shape)
WHERE b.kind != 1
Also, the DISTINCT ON syntax (PgSQL custom) lets you winnow a result
set down to just one of the inputs.
P.
On Wed, Mar 4, 2015 at 6:36 AM, Igor Stassiy <istassiy@gmail.com> wrote:
> Hello,
>
> I have a query plan optimization question. It is formatted nicely on
>
> http://stackoverflow.com/questions/28856452/postgres-not-using-gist-index-in-lateral-join
>
> But here is a copy for the archive:
>
> Here is the setup:
>
> CREATE EXTENSION postgis;
> DROP TABLE IF EXISTS A;
> DROP TABLE IF EXISTS B;
> CREATE TABLE A(shape Geometry, id INT);
> CREATE TABLE B(shape Geometry, id INT, kind INT);
> CREATE INDEX ON A USING GIST (shape);
> CREATE INDEX ON B USING GIST (shape);
>
> I am running the following commands:
>
> ANALYZE A;
> ANALYZE B;
>
> -- for each row in A, select exactly one row in B (if there is one)
> -- such that B contains geometry of A
> EXPLAIN SELECT * FROM (SELECT A.id as aid, (SELECT B.id FROM B WHERE
> ST_Contains(B.shape, A.shape) AND B.kind != 1 LIMIT 1) as bid FROM A) AS
> TMP;
>
> which gives me { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A",
> "Startup Cost": 0.00, "Total Cost": 2606592.33, "Plan Rows": 549745, "Plan
> Width": 1646, "Plans": [ { "Node Type": "Limit", "Parent Relationship":
> "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost":
> 4.68, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Seq Scan",
> "Parent Relationship": "Outer", "Relation Name": "B", "Startup Cost": 0.00,
> "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Filter": "((shape &&
> A.shape) AND _st_contains(shape, A.shape))" } ] } ] } }
>
>
> Note that there is a sequential scan inside the lateral join, however there
> is clearly an index available. However after setting
>
> set enable_seqscan=false;
>
> the index is being used. This actually affects runtime significantly (around
> 3 times faster) and seems that postgres should figure things like that
> automatically. { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A",
> "Startup Cost": 10000000000.00, "Total Cost": 10004716493.85, "Plan Rows":
> 549450, "Plan Width": 1677, "Plans": [ { "Node Type": "Limit", "Parent
> Relationship": "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00,
> "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node
> Type": "Index Scan", "Parent Relationship": "Outer", "Scan Direction":
> "NoMovement", "Index Name": "B_shape_idx", "Relation Name": "B", "Startup
> Cost": 0.00, "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Index
> Cond": "(shape && A.shape)", "Filter": "_st_contains(shape, A.shape)" } ] }
> ] } }
>
> Is there any way to tell postgres to use index in a less hacky way? Possibly
> by rewriting the query? From what I understand the use of set enable_... is
> not recommended in production.
>
> When you actually run the commands above it will give
>
> { "Plan": { "Node Type": "Seq Scan", "Relation Name": "a", "Alias": "a",
> "Startup Cost": 0.00, "Total Cost": 10372.75, "Plan Rows": 1230, "Plan
> Width": 36, "Plans": [ { "Node Type": "Limit", "Parent Relationship":
> "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.14, "Total Cost":
> 8.41, "Plan Rows": 1, "Plan Width": 4, "Plans": [ { "Node Type": "Index
> Scan", "Parent Relationship": "Outer", "Scan Direction": "NoMovement",
> "Index Name": "b_shape_idx", "Relation Name": "b", "Alias": "b", "Startup
> Cost": 0.14, "Total Cost": 8.41, "Plan Rows": 1, "Plan Width": 4, "Index
> Cond": "(shape && a.shape)", "Filter": "((kind <> 1) AND _st_contains(shape,
> a.shape))" } ] } ] } }
>
> Unfortunately I cannot provide data to reproduce the query plan results.
>
> Thanks,
> Igor
> On 04 Mar 2015, at 22:18, Igor Stassiy <istassiy@gmail.com> wrote: > > I would like to stop executing the query for a row of table "a" when a single row of "b" is found. This query would notstop > processing but will filter all the rows that are found at the end of execution. > > Is there a way to express this without a subquery? Does it? Because that would be somewhat surprising. > On Wed, Mar 4, 2015 at 11:53 PM Paul Ramsey <pramsey@cleverelephant.ca> wrote: > Stop writing so many subqueries, think in joins; the poor planner! > > SELECT DISTINCT ON (a.id) a.id AS a_id, b.id AS b_id > FROM a > JOIN b > ON ST_Contains(b.shape, a.shape) > WHERE b.kind != 1 > > Also, the DISTINCT ON syntax (PgSQL custom) lets you winnow a result > set down to just one of the inputs. > > -- for each row in A, select exactly one row in B (if there is one) > > -- such that B contains geometry of A > > EXPLAIN SELECT * FROM (SELECT A.id as aid, (SELECT B.id FROM B WHERE > > ST_Contains(B.shape, A.shape) AND B.kind != 1 LIMIT 1) as bid FROM A) AS > > TMP; > > > > which gives me { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A", > > "Startup Cost": 0.00, "Total Cost": 2606592.33, "Plan Rows": 549745, "Plan > > Width": 1646, "Plans": [ { "Node Type": "Limit", "Parent Relationship": > > "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost": > > 4.68, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Seq Scan", > > "Parent Relationship": "Outer", "Relation Name": "B", "Startup Cost": 0.00, > > "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Filter": "((shape && > > A.shape) AND _st_contains(shape, A.shape))" } ] } ] } } How did your query plan end up in JSON notation? It's quite difficult to read like this. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
I got the query plan using:
EXECUTE EXPLAIN (format json) ... INTO query_plan;
RAISE NOTICE query_plan;
since by default EXPLAIN returns a set of rows and I'd have to create a table for storing that. As I mentioned in the original email,
the queries and output is nicely formatted at:
On Thu, Mar 5, 2015 at 11:54 AM Alban Hertroys <haramrae@gmail.com> wrote:
> On 04 Mar 2015, at 22:18, Igor Stassiy <istassiy@gmail.com> wrote:
>
> I would like to stop executing the query for a row of table "a" when a single row of "b" is found. This query would not stop
> processing but will filter all the rows that are found at the end of execution.
>
> Is there a way to express this without a subquery?
Does it? Because that would be somewhat surprising.
> On Wed, Mar 4, 2015 at 11:53 PM Paul Ramsey <pramsey@cleverelephant.ca> wrote:
> Stop writing so many subqueries, think in joins; the poor planner!
>
> SELECT DISTINCT ON (a.id) a.id AS a_id, b.id AS b_id
> FROM a
> JOIN b
> ON ST_Contains(b.shape, a.shape)
> WHERE b.kind != 1
>
> Also, the DISTINCT ON syntax (PgSQL custom) lets you winnow a result
> set down to just one of the inputs.
> > -- for each row in A, select exactly one row in B (if there is one)
> > -- such that B contains geometry of A
> > EXPLAIN SELECT * FROM (SELECT A.id as aid, (SELECT B.id FROM B WHERE
> > ST_Contains(B.shape, A.shape) AND B.kind != 1 LIMIT 1) as bid FROM A) AS
> > TMP;
> >
> > which gives me { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A",
> > "Startup Cost": 0.00, "Total Cost": 2606592.33, "Plan Rows": 549745, "Plan
> > Width": 1646, "Plans": [ { "Node Type": "Limit", "Parent Relationship":
> > "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost":
> > 4.68, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Seq Scan",
> > "Parent Relationship": "Outer", "Relation Name": "B", "Startup Cost": 0.00,
> > "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Filter": "((shape &&
> > A.shape) AND _st_contains(shape, A.shape))" } ] } ] } }
How did your query plan end up in JSON notation? It's quite difficult to read like this.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.