Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause - Mailing list pgsql-performance

From Sven R. Kunze
Subject Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause
Date
Msg-id a0cde040-44f8-ac0f-c87a-932d2901702e@mail.de
Whole thread Raw
In response to Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-performance

Hi Jeff,

On 29.09.2016 20:03, Jeff Janes wrote:
I don't know what the subquery plan is, I don't see references to that in the email chain.

Lutz posted the following solution:

SELECT * FROM big_table
WHERE
     id in (SELECT big_table_id FROM table_a WHERE "table_a"."item_id" IN (<handful of items>))
    OR
     id in (SELECT big_table_id FROM table_a WHERE "table_b"."item_id" IN (<handful of items>))

I don't believe that current versions of PostgreSQL are capable of rewriting the plan in the style of a union.  It is not just a matter of tweaking the cost estimates, it simply never considers such a plan in the first place given the text of your query.

That's okay and that's why I am asking here. :)

Perhaps some future version of PostgreSQL could do so, but my gut feeling is that that is not very likely.  It would take a lot of work, would risk breaking or slowing down other things, and is probably too much of a niche issue to attract a lot of interest.

I don't hope so; in business and reports/stats applications there is a lot of room for this.

Why do you think that OR-ing several tables is a niche issue? I can at least name 3 different projects (from 3 different domains) where combining 3 or more tables with OR is relevant and should be reasonably fast.

Most domains that could benefit would probably have star-like schemas. So, big_table corresponds to the center of the star, whereas the rays correspond to various (even dynamic) extensions to the base data structure.

Why not just use the union?

Sure that would work in this particular case. However, this thread actually sought a general answer to "how to OR more than two tables".

Are you using a framework which generates the query automatically and you have no control over it?

We use a framework and we can use the UNION if we want to.

Or do you just think it is ugly or fragile for some other reason?

I don't think it's ugly or fragile. I am just used to the fact that **if it's equivalent** then PostgreSQL can figure it out (without constant supervision from application developers).

So, it's just a matter of inconvenience. ;)

Perhaps moving the union from the outside to the inside would be more suitable?  That way teh select list is only specified once, and if you AND more clauses into the WHERE condition they also only need to be specified once.

SELECT * FROM big_table
WHERE
     id in (SELECT big_table_id FROM table_a WHERE "table_a"."item_id" IN (<handful of items>) union 
             SELECT big_table_id FROM table_a WHERE "table_b"."item_id" IN (<handful of items>)
      );

Yet another solution I guess, so thanks a lot. :)

This multitude of solution also shows that applications developers might be overwhelmed by choosing the most appropriate AND most long-lasting one. Because what I take from the discussion is that a UNION might be appropriate right now but that could change in the future even for the very same use-case at hand.

Cheers,
Sven

pgsql-performance by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause
Next
From: "Sven R. Kunze"
Date:
Subject: Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause