Re: Query planner not using indexes with JOIN query and OR clause - Mailing list pgsql-performance
From | Merlin Moncure |
---|---|
Subject | Re: Query planner not using indexes with JOIN query and OR clause |
Date | |
Msg-id | CAHyXU0zfScZ3feCpxRO_7xGYQKGRdb7Nca6R1P0xAbowJL5=4A@mail.gmail.com Whole thread Raw |
In response to | Re: Query planner not using indexes with JOIN query and OR clause (Merlin Moncure <mmoncure@gmail.com>) |
List | pgsql-performance |
On Mon, Jul 13, 2015 at 4:01 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Mon, Jul 13, 2015 at 3:54 PM, Ben Hoyt <benhoyt@gmail.com> wrote: >> Hi folks, >> >> I have a fairly simple three-table query (pasted below) with two LEFT JOINs >> and an OR in the WHERE clause that for some reason is doing sequential scans >> on all three tables (two of them large -- several million rows), even though >> I have indexes on the relevant "filename" columns. >> >> Note the two parts of the where clause -- a filter on the image2 table and a >> filter on the panoramas table. If I comment out either filter and just >> filter on i.filename by itself, or p.filename by itself, the query planner >> uses the relevant index and the query takes a few milliseconds. But when I >> have both clauses (as shown below) it falls back to sequential scanning all >> three tables for some reason, taking several seconds. >> >> What am I missing? There must be some reason PostgreSQL can't use the index >> in this case, but I can't see what it is. If I were PostgreSQL I'd be using >> the index on i.filename and p.filename to filter to a couple of rows first, >> then join, making it super-quick. >> >> In this test I'm running PostgreSQL 9.3.3 on Windows 64-bit, but the same > > FYI, this won'f fix your issue, but upgrade your postgres to the > latest bugfix release, 9.3.9. > >> My query and PostgreSQL version and the explain and a lot of other table >> data is pasted below. >> >> QUERY >> ---------- >> select ai.position, i.filename as image_filename, p.filename as >> panorama_filename >> from album_items ai >> left join image2 i on i.imageid = ai.image_id >> left join panoramas p on p.id = ai.panorama_id >> where i.filename in ('pano360--v471.jpg', 'pano360-2--v474.jpg') or >> p.filename in ('pano360--v471', 'pano360-2--v474') > > Try refactoring to: > > select ai.position, i.filename as image_filename, p.filename as > panorama_filename > from album_items ai > left join image2 i on i.imageid = ai.image_id > left join panoramas p on p.id = ai.panorama_id > where i.filename in ('pano360--v471.jpg', 'pano360-2--v474.jpg') > union all select ai.position, i.filename as image_filename, p.filename > as panorama_filename > from album_items ai > left join image2 i on i.imageid = ai.image_id > left join panoramas p on p.id = ai.panorama_id > where p.filename in ('pano360--v471', 'pano360-2--v474') one thing, if a row matches both conditions you may get duplicate rows. If that's an issue, one possible way to fix that would be to convert 'union all' to 'union'. I'd have to see what the query output and how it performed. Another way to tackle queries like this is with EXISTS: select ai.position, i.filename as image_filename, p.filename as panorama_filename from album_items ai where exists ( select 1 from image2 i where i.imageid = ai.image_id and i.filename in ('pano360--v471.jpg', 'pano360-2--v474.jpg') union all select 1 from panoramas p where p.id = ai.panorama_id and p.filename in ('pano360--v471', 'pano360-2--v474') ) why do this? by pushing the UNION ALL into the exists statement, you guarantee your self at most one row from album_items without having to resort to performance dangerous tradeoffs like DISTINCT or UNION. Give it a try... merlin
pgsql-performance by date: