3-table query optimization - Mailing list pgsql-performance

From Michal Taborsky - Internet Mall
Subject 3-table query optimization
Date
Msg-id 44D9EBEB.9010704@mall.cz
Whole thread Raw
Responses Re: 3-table query optimization
List pgsql-performance
Hello everyone.

My (simplified) database structure is:

a) table product (150000 rows)
  product_id BIGINT PRIMARY KEY
  title TEXT
  ...

b) table action (5000 rows)
  action_id BIGINT PRIMARY KEY
  product_id BIGINT, FK to product
  shop_group_id INTEGER (there are about 5 groups, distributed about evenly)

c) table product_program (500000 rows)
  program_id BIGINT (there are about 50 unique)
  product_id BIGINT, FK to product


I need to query products, which are in action table of specific group
and in product_program for a specific program_id. The query is taking
too long to my liking My query is:

SELECT product.product_id
   FROM action
   JOIN product ON (product.product_id=action.product_id)
  WHERE action.shop_group_id=1
    AND EXISTS (SELECT 1
          FROM catalog.product_program
          WHERE product_id=product.product_id
            AND product_program.program_id =1104322
    )


QUERY PLAN
Nested Loop  (cost=0.00..18073.81 rows=1220 width=8) (actual
time=10.153..2705.891 rows=636 loops=1)
   ->  Seq Scan on "action"  (cost=0.00..135.74 rows=2439 width=8)
(actual time=8.108..36.684 rows=2406 loops=1)
         Filter: (shop_group_id = 1)
   ->  Index Scan using product_pkey on product  (cost=0.00..7.34 rows=1
width=8) (actual time=1.031..1.097 rows=0 loops=2406)
         Index Cond: ((product.product_id)::bigint =
("outer".product_id)::bigint)
         Filter: (subplan)
         SubPlan
           ->  Index Scan using product_program_pkey on product_program
  (cost=0.00..4.33 rows=1 width=0) (actual time=0.455..0.455 rows=0
loops=2406)
                 Index Cond: (((program_id)::bigint = 1104322) AND
((product_id)::bigint = ($0)::bigint))
Total runtime: 2708.575 ms



I also tried this:

SELECT product.product_id
   FROM action
   JOIN product ON (product.product_id=action.product_id)
   JOIN catalog.product_program ON (
    product_program.product_id=product.product_id AND
    product_program.program_id =1104322)
  WHERE action.shop_group_id=1


With about the same results (a bit better, but for different groups it
was vice versa):

QUERY PLAN
Nested Loop  (cost=141.84..3494.91 rows=139 width=8) (actual
time=118.584..1295.303 rows=636 loops=1)
   ->  Hash Join  (cost=141.84..2729.11 rows=253 width=16) (actual
time=118.483..231.103 rows=636 loops=1)
         Hash Cond: (("outer".product_id)::bigint =
("inner".product_id)::bigint)
         ->  Index Scan using product_program_pkey on product_program
(cost=0.00..2470.04 rows=7647 width=8) (actual time=0.047..73.514
rows=7468 loops=1)
               Index Cond: ((program_id)::bigint = 1104322)
         ->  Hash  (cost=135.74..135.74 rows=2439 width=8) (actual
time=118.114..118.114 rows=0 loops=1)
               ->  Seq Scan on "action"  (cost=0.00..135.74 rows=2439
width=8) (actual time=0.019..106.864 rows=2406 loops=1)
                     Filter: (shop_group_id = 1)
   ->  Index Scan using product_pkey on product  (cost=0.00..3.01 rows=1
width=8) (actual time=1.300..1.655 rows=1 loops=636)
         Index Cond: (("outer".product_id)::bigint =
(product.product_id)::bigint)


Any ideas if this is really the best I can expect, or is there something
amiss there and my query is wrong for this type of task? My gut feeling
tells me, that this kind of query should be a lot faster. The hardware
is Dual Xeon with enough of RAM and other operations run just fine.

Thank you.

--
Michal Táborský

pgsql-performance by date:

Previous
From: Patrice Beliveau
Date:
Subject: Re: Optimizing queries
Next
From: Tom Lane
Date:
Subject: Re: Optimizing queries