Thread: Query plan change with multiple elements in IN clause

Query plan change with multiple elements in IN clause

From
Mathieu De Zutter
Date:
Hi all,

I'm migrating a web application to an ORM framework (Doctrine) so I need a new way to get statistics about entities into the application without importing all data, only the results (e.g. load total number of children instead of loading all children into the application and counting it afterwards). My current solution is to create a view for those statistics and map it to an (read-only) entity in my application. This view is joined with the table containing the entities on which I need statistics.

The entity table is called 'work'. The view containing statistical information about works is called wps2.

The problem is that this join is performing very badly when more than one work is involved. It chooses a plan that is orders of magnitude slower.

I have attached
- The (simplified) table definitions
- The (simplified) view 
- Two queries with explain analyze plan: "IN (1)" => fast, "IN (1,3)" => slow
- postgresql.conf

I do not understand why the planner does not consider the nested loop in the second case, like it does in the first case.

Can anyone help me?

Thanks.

Kind regards,
Mathieu
Attachment

Re: Query plan change with multiple elements in IN clause

From
Tom Lane
Date:
Mathieu De Zutter <mathieu@dezutter.org> writes:
> The problem is that this join is performing very badly when more than one
> work is involved. It chooses a plan that is orders of magnitude slower.

> I have attached
> - The (simplified) table definitions
> - The (simplified) view
> - Two queries with explain analyze plan: "IN (1)" => fast, "IN (1,3)" =>
> slow
> - postgresql.conf

The reason you get a nice plan in the first case is that "w.id in (1)"
is treated as "w.id = 1", and then there is logic that combines that with
"w.id = wps.id" to conclude that we can synthesize a condition "wps.id = 1".
None of that happens when there's more than one IN item, because it's not
an equality operator anymore.

You might be able to do something like
    JOIN (VALUES (1),(3)) foo(x) ON w.id = foo.x

            regards, tom lane