Thread: Tricking the optimizer

Tricking the optimizer

From
Vitaliy Garnashevich
Date:
Hi,

In order to implement some security features in our application, we 
sometimes append additional WHERE filters to our queries, so the filer 
in the end looks like:

SELECT ... FROM ... WHERE securityFilter1 AND securityFilter2 AND 
securityFilter3

In the EXPLAIN the filters look something like:

(((SubPlan 1) = 'foo') AND ((SubPlan 2) = 'bar') AND ((SubPlan 3) = 'baz'))

There are no applicable indexes and each filter clause looks opaque to 
the optimizer, so the optimizer is using some generic rules and assumes 
that each AND clause would reduce the total number of returned rows by 
some factor. The problem is that this is not usually the case, and 
usually the clauses would not filter out any rows at all. This leads to 
poor plans, when such query is a part of another bigger query.

Is there any trick to craft the query in such a way, so that to make the 
optimizer believe that the filters would not remove any rows, and all 
rows will likely be returned by the query?

Regards,
Vitaliy



Re: Tricking the optimizer

From
Tom Lane
Date:
Vitaliy Garnashevich <vgarnashevich@gmail.com> writes:
> Is there any trick to craft the query in such a way, so that to make the 
> optimizer believe that the filters would not remove any rows, and all 
> rows will likely be returned by the query?

If you don't mind writing some C code, you could create a dummy operator
that just returns its boolean argument, and attach a selectivity estimator
to it that returns 1.0.  Then you'd write the query as, perhaps,

WHERE ~~~(securityFilter1) AND ...

where ~~~ could be read as "likely()".  I wouldn't recommend using such a
wrapper for any WHERE clause that the optimizer had any intelligence about
at all, because it would defeat all of that.  But in this case you're
not getting any wins anyhow, so burying an opaque subselect in another
layer of opacity won't hurt.

            regards, tom lane


Re: Tricking the optimizer

From
Vitaliy Garnashevich
Date:
We'd rather avoid writing C code. We're trying to stick with the DB 
functionality provided by default.

Could something like the following, in theory, make things at least 
somewhat better in our case (without using the selectivity override)?

... WHERE coalesce(securityFilter1 AND securityFilter2 AND 
securityFilter3)::bool;

Regards,
Vitaliy



Re: Tricking the optimizer

From
Vitaliy Garnashevich
Date:
We've tried to use "... WHERE coalesce(...)::bool;" on a test query:

Before:
->  Seq Scan on public.sys_user sys_user_1  (cost=0.00..27140.46 rows=1 
width=16) (actual time=0.105..38.627 rows=3289 loops=1)
       Filter: (((SubPlan 7) = 0) AND ((SubPlan 8) = 0) AND ((SubPlan 9) 
= 0))

After:
->  Seq Scan on public.sys_user sys_user_1  (cost=0.00..27140.46 
rows=1790 width=16) (actual time=0.136..28.413 rows=3289 loops=1)
       Filter: COALESCE((((SubPlan 7) = 0) AND ((SubPlan 8) = 0) AND 
((SubPlan 9) = 0)))

The estimate got closer to the actual number of rows.

Regards,
Vitaliy