Home > mailing lists

Complex filters -> Bad row estimates -> bad query plan - Mailing list pgsql-general

From	Mathieu Fenniak
Subject	Complex filters -> Bad row estimates -> bad query plan
Date	August 21, 2019 15:53:22
Msg-id	CAHoiPjzcv4T+5a7giM4925Ltj+VegQkmG6+_1PUtGX3-dZ141A@mail.gmail.com Whole thread
Responses	Re: Complex filters -> Bad row estimates -> bad query plan
List	pgsql-general

Tree view

Hi all,

I have a database query where I have a number of "simple" where clauses, a number of "complex" subquery based where clauses, and one NOT EXISTS where clause; it looks something like this:

SELECT ...some fields... FROM Table1

WHERE

Field1 IN (1, 2, 3, 4, 21, 24) AND -- simple filter

Field2 <> 1 AND -- simple filter

Field3 >= '2019-07-08' AND -- simple filter

Field3 <= '2019-08-18' AND -- simple filter

NOT EXISTS (SELECT 1 FROM Table2 WHERE Table2.Status = 2 AND Table2.SomeId = Table1.Id) AND -- anti-join

COALESCE((SELECT Status FROM Table3 WHERE Table3.SomeId = Table1.Id), (SELECT Status FROM Table4 WHERE Table4.SomeId = Table1.Id)) = 2 -- "complex" condition

The problem I'm encountering is that I've observed degraded performance in some cases where the Anti Join merge for the NOT EXISTS clause is planned based upon poor row estimates for Table1. All of the other filters, and the SubPlan filter(s) for the complex clauses, result in the query planner estimating that only 1 row from Table1 will be resulting, so a Nested Loop Anti Join is used and the RHS of that nested loop in a seqscan on Table2. The reality is that many thousands of records match all the conditions; a Merge Anti Join or Hash Anti Join would be a better query plan.

I've tested the query planner with just the simpler conditions, and it makes pretty reasonable estimates about the row count (+/- 10%). Adding the NOT EXISTS results in a Merge Anti Join, and performance is great.

Adding the more "complex" conditions (there are potentially multiple of these subquery plan searches) results in the estimated row count dropping to 1, and, performance dives.

I know there are no "query hints" in PostgreSQL... any thoughts on alternative approaches here? The only option I've used in the past for this is creating specialized indexes, which can provide more targeted statistics; but it's not applicable here since the "complex" conditions use data from another table in a subquery.

Appreciate any thoughts, theories, or directions. :-) Thanks,

Mathieu

pgsql-general by date:

From: Adrian Klaver
Date: 21 August 2019, 15:47:05
Subject: Re: Rename a column if not already renamed.?

From: Rob Sargent
Date: 21 August 2019, 17:54:14
Subject: Re: SELECT all the rows where id is children of other node.

Complex filters -> Bad row estimates -> bad query plan - Mailing list pgsql-general

Previous

Next