Two fast searches turn slow when used with OR clause - Mailing list pgsql-performance

From Craig James
Subject Two fast searches turn slow when used with OR clause
Date
Msg-id 4C5B0445.2000405@emolecules.com
Whole thread Raw
In response to Re: pgbench results on a new server  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: Two fast searches turn slow when used with OR clause
List pgsql-performance
I can query either my PARENT table joined to PRICES, or my VERSION table joined to PRICES, and get an answer in 30-40
msec. But put the two together, it jumps to 4 seconds.  What am I missing here?  I figured this query would be nearly
instantaneous. The VERSION.ISOSMILES and PARENT.ISOSMILES columns both have unique indexes.  Instead of using these
indexes,it's doing a full-table scan of both tables, even though there can't possibly be more than one match in each
table.

I guess I could rewrite this as a UNION of the two subqueries, but that seems contrived.

This is PG 8.3.10 on Linux.

Thanks,
Craig


=> explain analyze select p.price, p.amount, p.units, s.catalogue_id, vn.version_id
-> from plus p join sample s
->  on (p.compound_id = s.compound_id and p.supplier_id = s.supplier_id)
-> join version vn on (s.version_id = vn.version_id) join parent pn
->  on (s.parent_id = pn.parent_id)
-> where vn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'
-> or pn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'
-> order by price;

  Sort  (cost=71922.00..71922.00 rows=1 width=19) (actual time=4337.114..4337.122 rows=10 loops=1)
    Sort Key: p.price   Sort Method:  quicksort  Memory: 25kB
    ->  Nested Loop  (cost=18407.53..71921.99 rows=1 width=19) (actual time=1122.685..4337.028 rows=10 loops=1)
          ->  Hash Join  (cost=18407.53..71903.71 rows=4 width=20) (actual time=1122.624..4336.682 rows=7 loops=1)
                Hash Cond: (s.version_id = vn.version_id)
                Join Filter: ((vn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'::text) OR (pn.isosmiles =
'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'::text))
                ->  Hash Join  (cost=8807.15..44470.73 rows=620264 width=54) (actual time=431.501..2541.329 rows=620264
loops=1)
                      Hash Cond: (s.parent_id = pn.parent_id)
                      ->  Seq Scan on sample s  (cost=0.00..21707.64 rows=620264 width=24) (actual time=0.008..471.340
rows=620264loops=1) 
                      ->  Hash  (cost=5335.40..5335.40 rows=277740 width=38) (actual time=431.166..431.166 rows=277740
loops=1)
                            ->  Seq Scan on parent pn  (cost=0.00..5335.40 rows=277740 width=38) (actual
time=0.012..195.822rows=277740 loops=1) 
                ->  Hash  (cost=5884.06..5884.06 rows=297306 width=38) (actual time=467.267..467.267 rows=297306
loops=1)
                      ->  Seq Scan on version vn  (cost=0.00..5884.06 rows=297306 width=38) (actual time=0.017..215.285
rows=297306loops=1) 
          ->  Index Scan using i_plus_compound_id on plus p  (cost=0.00..4.51 rows=4 width=26) (actual
time=0.039..0.041rows=1 loops=7) 
                Index Cond: ((p.supplier_id = s.supplier_id) AND (p.compound_id = s.compound_id))
  Total runtime: 4344.222 ms
(17 rows)


If I only query the VERSION table, it's very fast:

x=> explain analyze select p.price, p.amount, p.units, s.catalogue_id, vn.version_id
-> from plus p
-> join sample s on (p.compound_id = s.compound_id and p.supplier_id = s.supplier_id)
-> join version vn on (s.version_id = vn.version_id)
-> where vn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1' order by price;

Sort  (cost=45.73..45.74 rows=1 width=19) (actual time=32.438..32.448 rows=10 loops=1)
    Sort Key: p.price
    Sort Method:  quicksort  Memory: 25kB
    ->  Nested Loop  (cost=0.00..45.72 rows=1 width=19) (actual time=32.309..32.411 rows=10 loops=1)
          ->  Nested Loop  (cost=0.00..36.58 rows=2 width=20) (actual time=32.295..32.319 rows=7 loops=1)
                ->  Index Scan using i_version_isosmiles on version vn  (cost=0.00..8.39 rows=1 width=4) (actual
time=32.280..32.281rows=1 loops=1) 
                      Index Cond: (isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'::text)
                ->  Index Scan using i_sample_version_id on sample s  (cost=0.00..28.12 rows=6 width=20) (actual
time=0.011..0.024rows=7 loops=1) 
                      Index Cond: (s.version_id = vn.version_id)
          ->  Index Scan using i_plus_compound_id on plus p  (cost=0.00..4.51 rows=4 width=26) (actual
time=0.010..0.011rows=1 loops=7) 
                Index Cond: ((p.supplier_id = s.supplier_id) AND (p.compound_id = s.compound_id))
  Total runtime: 32.528 ms
(12 rows)


Same good performance if I only query the PARENT table:

x=> explain analyze select p.price, p.amount, p.units, s.catalogue_id, pn.parent_id from plus p join sample s on
(p.compound_id= s.compound_id and p.supplier_id = s.supplier_id) join parent pn on (s.parent_id = pn.parent_id) where
pn.isosmiles= 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1' order by price; 
                                                                     QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=57.73..57.74 rows=1 width=19) (actual time=43.564..43.564 rows=10 loops=1)
    Sort Key: p.price
    Sort Method:  quicksort  Memory: 25kB
    ->  Nested Loop  (cost=0.00..57.72 rows=1 width=19) (actual time=43.429..43.537 rows=10 loops=1)
          ->  Nested Loop  (cost=0.00..48.58 rows=2 width=20) (actual time=43.407..43.430 rows=7 loops=1)
                ->  Index Scan using i_parent_isosmiles on parent pn  (cost=0.00..8.38 rows=1 width=4) (actual
time=27.342..27.343rows=1 loops=1) 
                      Index Cond: (isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'::text)
                ->  Index Scan using i_sample_parent_id on sample s  (cost=0.00..40.09 rows=9 width=20) (actual
time=16.057..16.070rows=7 loops=1) 
                      Index Cond: (s.parent_id = pn.parent_id)
          ->  Index Scan using i_plus_compound_id on plus p  (cost=0.00..4.51 rows=4 width=26) (actual
time=0.010..0.011rows=1 loops=7) 
                Index Cond: ((p.supplier_id = s.supplier_id) AND (p.compound_id = s.compound_id))
  Total runtime: 43.628 ms




x=> \d version
       Table "x.version"
    Column   |  Type   | Modifiers
------------+---------+-----------
  version_id | integer | not null
  parent_id  | integer | not null
  isosmiles  | text    | not null
  coord_2d   | text    |
Indexes:
     "version_pkey" PRIMARY KEY, btree (version_id)
     "i_version_isosmiles" UNIQUE, btree (isosmiles)
     "i_version_parent_id" btree (parent_id)
Foreign-key constraints:
     "fk_parent" FOREIGN KEY (parent_id) REFERENCES parent(parent_id) ON DELETE CASCADE

x=> \d parent
       Table "x.parent"
   Column   |  Type   | Modifiers
-----------+---------+-----------
  parent_id | integer | not null
  isosmiles | text    | not null
  coord_2d  | text    |
Indexes:
     "parent_pkey" PRIMARY KEY, btree (parent_id)
     "i_parent_isosmiles" UNIQUE, btree (isosmiles)

=> \d sample
                                Table "reaxys.sample"
        Column       |  Type   |                      Modifiers
--------------------+---------+-----------------------------------------------------
  sample_id          | integer | not null default nextval('sample_id_seq'::regclass)
  sample_id_src      | integer |
  parent_id          | integer | not null
  version_id         | integer | not null
  supplier_id        | integer | not null
  catalogue_id       | integer | not null
  catalogue_issue_id | integer | not null
  load_id            | integer | not null
  load_file_id       | integer |
  compound_id        | text    | not null
  cas_number         | text    |
  purity             | text    |
  chemical_name      | text    |
  url                | text    |
  price_code         | text    |
  comment            | text    |
  salt_comment       | text    |
Indexes:
     "sample_pkey" PRIMARY KEY, btree (sample_id)
     "i_sample_casno" btree (cas_number)
     "i_sample_catalogue_id" btree (catalogue_id)
     "i_sample_catalogue_issue_id" btree (catalogue_issue_id)
     "i_sample_chem_name" btree (chemical_name)
     "i_sample_compound_id" btree (compound_id)
     "i_sample_load_id" btree (load_id)
     "i_sample_parent_id" btree (parent_id)
     "i_sample_sample_id_src" btree (sample_id_src)
     "i_sample_supplier_id" btree (supplier_id)
     "i_sample_version_id" btree (version_id)
Foreign-key constraints:
     "fk_item" FOREIGN KEY (version_id) REFERENCES version(version_id) ON DELETE CASCADE

pgsql-performance by date:

Previous
From: "Kenneth Cox"
Date:
Subject: Advice configuring ServeRAID 8k for performance
Next
From: Alan Hodgson
Date:
Subject: Re: Advice configuring ServeRAID 8k for performance