Query planner not using indexes with JOIN query and OR clause - Mailing list pgsql-performance

From Ben Hoyt
Subject Query planner not using indexes with JOIN query and OR clause
Date
Msg-id CAL9jXCEAumhJf6FJUhZnE8vc0r1-EQOMYvPXe9-0-S3xT+zVnQ@mail.gmail.com
Whole thread Raw
Responses Re: Query planner not using indexes with JOIN query and OR clause
List pgsql-performance
Hi folks,

I have a fairly simple three-table query (pasted below) with two LEFT JOINs and an OR in the WHERE clause that for some reason is doing sequential scans on all three tables (two of them large -- several million rows), even though I have indexes on the relevant "filename" columns.

Note the two parts of the where clause -- a filter on the image2 table and a filter on the panoramas table. If I comment out either filter and just filter on i.filename by itself, or p.filename by itself, the query planner uses the relevant index and the query takes a few milliseconds. But when I have both clauses (as shown below) it falls back to sequential scanning all three tables for some reason, taking several seconds.

What am I missing? There must be some reason PostgreSQL can't use the index in this case, but I can't see what it is. If I were PostgreSQL I'd be using the index on i.filename and p.filename to filter to a couple of rows first, then join, making it super-quick.

In this test I'm running PostgreSQL 9.3.3 on Windows 64-bit, but the same thing happens on our Linux-based database. We're using autovacuum on both databases, and I've tried manually VACUUM ANALYZE-ing all three tables just in case, but that doesn't help. Memory config values are set to sensible values. Note: this is a new query, so in terms of "history", it's always been slow.

My query and PostgreSQL version and the explain and a lot of other table data is pasted below.

QUERY
----------
select ai.position, i.filename as image_filename, p.filename as panorama_filename
from album_items ai
left join image2 i on i.imageid = ai.image_id
left join panoramas p on p.id = ai.panorama_id
where i.filename in ('pano360--v471.jpg', 'pano360-2--v474.jpg') or
      p.filename in ('pano360--v471', 'pano360-2--v474')
----------

POSTGRESQL VERSION
----------
PostgreSQL 9.3.3, compiled by Visual C++ build 1600, 64-bit
----------

EXPLAIN (ANALYZE, BUFFERS)
----------

Hash Left Join  (cost=344184.62..963863.99 rows=376 width=57) (actual time=3157.104..8838.329 rows=2 loops=1)
  Hash Cond: (ai.panorama_id = p.id)
  Filter: ((i.filename = ANY ('{pano360--v471.jpg,pano360-2--v474.jpg}'::text[])) OR (p.filename = ANY ('{pano360--v471,pano360-2--v474}'::text[])))
  Rows Removed by Filter: 7347790
  Buffers: shared hit=8967 read=198827, temp read=76936 written=75908
  I/O Timings: read=609.403
  ->  Hash Left Join  (cost=341001.56..781324.85 rows=7346959 width=39) (actual time=2660.821..7842.202 rows=7347792 loops=1)
        Hash Cond: (ai.image_id = i.imageid)
        Buffers: shared hit=6936 read=198827, temp read=76662 written=75640
        I/O Timings: read=609.403
        ->  Seq Scan on album_items ai  (cost=0.00..156576.59 rows=7346959 width=12) (actual time=0.009..981.074 rows=7347792 loops=1)
              Buffers: shared hit=4297 read=78810
              I/O Timings: read=251.402
        ->  Hash  (cost=194687.36..194687.36 rows=7203136 width=35) (actual time=2658.643..2658.643 rows=7200287 loops=1)
              Buckets: 2048  Batches: 512  Memory Usage: 976kB
              Buffers: shared hit=2639 read=120017, temp written=49961
              I/O Timings: read=358.000
              ->  Seq Scan on image2 i  (cost=0.00..194687.36 rows=7203136 width=35) (actual time=0.007..1063.586 rows=7200287 loops=1)
                    Buffers: shared hit=2639 read=120017
                    I/O Timings: read=358.000
  ->  Hash  (cost=2423.47..2423.47 rows=39247 width=26) (actual time=12.100..12.100 rows=39247 loops=1)
        Buckets: 2048  Batches: 4  Memory Usage: 575kB
        Buffers: shared hit=2031, temp written=170
        ->  Seq Scan on panoramas p  (cost=0.00..2423.47 rows=39247 width=26) (actual time=0.003..5.470 rows=39247 loops=1)
              Buffers: shared hit=2031
Total runtime: 8838.701 ms
----------

TABLE METADATA
----------
Number of rows in album_items: 7347792
Number of rows in image2: 7200287
Number of rows in panoramas: 39247
----------

TABLES (AND THEIR INDEXES) REFERENCED IN QUERY
----------
CREATE TABLE content.album_items
(
  album_id integer NOT NULL,
  image_id integer,
  "position" integer,
  caption text,
  active boolean NOT NULL,
  panorama_id integer,
  CONSTRAINT album_items_album_id_fkey FOREIGN KEY (album_id) REFERENCES content.albums (id),
  CONSTRAINT album_items_image_id_fkey FOREIGN KEY (image_id) REFERENCES content.image2 (imageid),
  CONSTRAINT album_items_panorama_id_fkey FOREIGN KEY (panorama_id) REFERENCES content.panoramas (id),
  CONSTRAINT album_image_unique UNIQUE (album_id, image_id)
);

CREATE INDEX album_items_album_id_idx ON content.album_items (album_id);
CREATE INDEX album_items_image_id_idx ON content.album_items (image_id);
CREATE INDEX album_items_panorama_id_idx ON content.album_items (panorama_id);


CREATE TABLE content.image2
(
  imageid integer NOT NULL DEFAULT nextval('image2_imageid_seq'::regclass),
  hotelid integer,
  filename text NOT NULL,
  originalfoldername text NOT NULL,
  width integer,
  height integer,
  active boolean NOT NULL DEFAULT false,
  importid integer,
  timetaken timestamp without time zone,
  state integer NOT NULL DEFAULT 1,
  has_wide boolean NOT NULL DEFAULT false,
  type integer,
  document_id integer,
  property_id integer,
  CONSTRAINT image2_pkey PRIMARY KEY (imageid),
  CONSTRAINT fk_image2_hotelid FOREIGN KEY (hotelid) REFERENCES content.hotel (hotelid),
  CONSTRAINT fk_image2_importid FOREIGN KEY (importid) REFERENCES content.imageimport (importid),
  CONSTRAINT image2_document_id_fkey FOREIGN KEY (document_id) REFERENCES content.documents (id),
  CONSTRAINT image2_property_id_fkey FOREIGN KEY (property_id) REFERENCES content.properties (id),
  CONSTRAINT uq_image2_filename UNIQUE (filename)
);

CREATE INDEX fki_image2_property_id_fkey ON content.image2 (property_id);
CREATE INDEX image2_document_id_idx ON content.image2 (document_id);
CREATE INDEX image2_importid_idx ON content.image2 (importid);
CREATE INDEX ix_image2_hotelid ON content.image2 (hotelid);
CREATE INDEX ix_image2_imageid ON content.image2 (imageid);


CREATE TABLE content.panoramas
(
  id integer NOT NULL DEFAULT nextval('panoramas_id_seq'::regclass),
  hotel_id integer,
  filename text NOT NULL,
  folder text NOT NULL,
  import_id integer NOT NULL,
  active boolean NOT NULL DEFAULT false,
  state integer NOT NULL,
  num_images integer NOT NULL DEFAULT 0,
  type integer,
  hdr boolean NOT NULL DEFAULT false,
  has_preview boolean NOT NULL DEFAULT false,
  property_id integer,
  data json,
  previews_created boolean NOT NULL DEFAULT false,
  CONSTRAINT panoramas_pkey PRIMARY KEY (id),
  CONSTRAINT fk_panoramas_hotel_id FOREIGN KEY (hotel_id) REFERENCES content.hotel (hotelid),
  CONSTRAINT fk_panoramas_import_id FOREIGN KEY (import_id) REFERENCES content.imageimport (importid),
  CONSTRAINT panoramas_property_id_fkey FOREIGN KEY (property_id) REFERENCES content.properties (id),
  CONSTRAINT panoramas_uq_filename UNIQUE (filename)
);

CREATE INDEX fki_panoramas_property_id_fkey ON content.panoramas (property_id);
CREATE INDEX panoramas_hotel_id_idx ON content.panoramas (hotel_id);
CREATE INDEX panoramas_import_id_idx ON content.panoramas (import_id);
----------

Thanks in advance,
Ben

pgsql-performance by date:

Previous
From: James Cloos
Date:
Subject: Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
Next
From: Merlin Moncure
Date:
Subject: Re: Query planner not using indexes with JOIN query and OR clause