Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause - Mailing list pgsql-performance

From Sven R. Kunze
Subject Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause
Date
Msg-id 0e86e399-61dc-8e78-415e-c83db9592a80@mail.de
Whole thread Raw
In response to Multiple-Table-Spanning Joins with ORs in WHERE Clause  ("Sven R. Kunze" <srkunze@mail.de>)
Responses Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause
List pgsql-performance
Thanks a lot Madusudanan, Igor, Lutz and Jeff for your suggestions.

What I can confirm is that the UNION ideas runs extremely fast (don't
have access to the db right now to test the subquery idea, but will
check next week as I travel right now). Thanks again! :)


I was wondering: would it be possible for PostgreSQL to rewrite the
query to generate the UNION (or subquery plan if it's also fast) on it's
own?


Thanks,
Sven

On 22.09.2016 16:44, lfischer wrote:
> Hi Sven
>
> Why not do something like
>
> SELECT * FROM big_table
> WHERE
>      id in (SELECT big_table_id FROM table_a WHERE "table_a"."item_id"
> IN (<handful of items>))
>     OR
>      id in (SELECT big_table_id FROM table_a WHERE "table_b"."item_id"
> IN (<handful of items>))
>
> that way you don't need the "distinct" and therefore there should be
> less comparison going on.
>
> Lutz
>
> On 22/09/16 14:24, Sven R. Kunze wrote:
>> Hi pgsql-performance list,
>>
>>
>> what is the recommended way of doing **multiple-table-spanning joins
>> with ORs in the WHERE-clause**?
>>
>>
>> Until now, we've used the LEFT OUTER JOIN to filter big_table like so:
>>
>>
>> SELECT DISTINCT <fields of big_table>
>> FROM
>>     "big_table"
>>     LEFT OUTER JOIN "table_a" ON ("big_table"."id" =
>> "table_a"."big_table_id")
>>     LEFT OUTER JOIN "table_b" ON ("big_table"."id" =
>> "table_b"."big_table_id")
>> WHERE
>>     "table_a"."item_id" IN (<handful of items>)
>>     OR
>>     "table_b"."item_id" IN (<handful of items>);
>>
>>
>> However, this results in an awful slow plan (requiring to scan the
>> complete big_table which obviously isn't optimal).
>> So, we decided (at least for now) to split up the query into two
>> separate ones and merge/de-duplicate the result with application logic:
>>
>>
>> SELECT <fields of big_table>
>> FROM
>>     "big_table" INNER JOIN "table_a" ON ("big_table"."id" =
>> "table_a"."big_table_id")
>> WHERE
>>     "table_a"."item_id" IN (<handful of items>);
>>
>>
>> SELECT <fields of big_table>
>> FROM
>>     "big_table" INNER JOIN "table_b" ON ("big_table"."id" =
>> "table_b"."big_table_id")
>> WHERE
>>     "table_b"."item_id" IN (<handful of items>);
>>
>>
>> As you can imagine we would be very glad to solve this issue with a
>> single query and without having to re-code existing logic of
>> PostgreSQL. But how?
>>
>>
>> Best,
>> Sven
>>
>>
>> PS: if you require EXPLAIN ANALYZE, I can post them as well.
>>
>>
>
>



pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: query against single partition uses index, against master table does seq scan
Next
From: Dev Nop
Date:
Subject: Storing large documents - one table or partition by doc?