Re: Query planner not using indexes with JOIN query and OR clause - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: Query planner not using indexes with JOIN query and OR clause
Date
Msg-id CAHyXU0zfScZ3feCpxRO_7xGYQKGRdb7Nca6R1P0xAbowJL5=4A@mail.gmail.com
Whole thread Raw
In response to Re: Query planner not using indexes with JOIN query and OR clause  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-performance
On Mon, Jul 13, 2015 at 4:01 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Mon, Jul 13, 2015 at 3:54 PM, Ben Hoyt <benhoyt@gmail.com> wrote:
>> Hi folks,
>>
>> I have a fairly simple three-table query (pasted below) with two LEFT JOINs
>> and an OR in the WHERE clause that for some reason is doing sequential scans
>> on all three tables (two of them large -- several million rows), even though
>> I have indexes on the relevant "filename" columns.
>>
>> Note the two parts of the where clause -- a filter on the image2 table and a
>> filter on the panoramas table. If I comment out either filter and just
>> filter on i.filename by itself, or p.filename by itself, the query planner
>> uses the relevant index and the query takes a few milliseconds. But when I
>> have both clauses (as shown below) it falls back to sequential scanning all
>> three tables for some reason, taking several seconds.
>>
>> What am I missing? There must be some reason PostgreSQL can't use the index
>> in this case, but I can't see what it is. If I were PostgreSQL I'd be using
>> the index on i.filename and p.filename to filter to a couple of rows first,
>> then join, making it super-quick.
>>
>> In this test I'm running PostgreSQL 9.3.3 on Windows 64-bit, but the same
>
> FYI, this won'f fix your issue, but upgrade your postgres to the
> latest bugfix release, 9.3.9.
>
>> My query and PostgreSQL version and the explain and a lot of other table
>> data is pasted below.
>>
>> QUERY
>> ----------
>> select ai.position, i.filename as image_filename, p.filename as
>> panorama_filename
>> from album_items ai
>> left join image2 i on i.imageid = ai.image_id
>> left join panoramas p on p.id = ai.panorama_id
>> where i.filename in ('pano360--v471.jpg', 'pano360-2--v474.jpg') or
>>       p.filename in ('pano360--v471', 'pano360-2--v474')
>
> Try refactoring to:
>
> select ai.position, i.filename as image_filename, p.filename as
> panorama_filename
> from album_items ai
> left join image2 i on i.imageid = ai.image_id
> left join panoramas p on p.id = ai.panorama_id
> where i.filename in ('pano360--v471.jpg', 'pano360-2--v474.jpg')
> union all select ai.position, i.filename as image_filename, p.filename
> as panorama_filename
> from album_items ai
> left join image2 i on i.imageid = ai.image_id
> left join panoramas p on p.id = ai.panorama_id
> where  p.filename in ('pano360--v471', 'pano360-2--v474')

one thing, if a row matches both conditions you may get duplicate
rows.   If that's an issue, one possible way to fix that would be to
convert 'union all' to 'union'.  I'd have to see what the query output
and how it performed.  Another way to tackle queries like this is with
EXISTS:

select ai.position, i.filename as image_filename, p.filename as
panorama_filename
from album_items ai
where exists (
  select 1 from image2 i
  where i.imageid = ai.image_id
    and i.filename in ('pano360--v471.jpg', 'pano360-2--v474.jpg')
  union all select 1 from panoramas p
  where p.id = ai.panorama_id
    and p.filename in ('pano360--v471', 'pano360-2--v474')
)

why do this? by pushing the UNION ALL into the exists statement, you
guarantee your self at most one row from album_items without having to
resort to performance dangerous tradeoffs like DISTINCT or UNION.
Give it a try...

merlin


pgsql-performance by date:

Previous
From: Scott Whitney
Date:
Subject: Re: [ADMIN] could not create shared memory segment: Invalid argument
Next
From: Ben Hoyt
Date:
Subject: Re: Query planner not using indexes with JOIN query and OR clause