Thread: 3-table query optimization

3-table query optimization

From
Michal Taborsky - Internet Mall
Date:
Hello everyone.

My (simplified) database structure is:

a) table product (150000 rows)
  product_id BIGINT PRIMARY KEY
  title TEXT
  ...

b) table action (5000 rows)
  action_id BIGINT PRIMARY KEY
  product_id BIGINT, FK to product
  shop_group_id INTEGER (there are about 5 groups, distributed about evenly)

c) table product_program (500000 rows)
  program_id BIGINT (there are about 50 unique)
  product_id BIGINT, FK to product


I need to query products, which are in action table of specific group
and in product_program for a specific program_id. The query is taking
too long to my liking My query is:

SELECT product.product_id
   FROM action
   JOIN product ON (product.product_id=action.product_id)
  WHERE action.shop_group_id=1
    AND EXISTS (SELECT 1
          FROM catalog.product_program
          WHERE product_id=product.product_id
            AND product_program.program_id =1104322
    )


QUERY PLAN
Nested Loop  (cost=0.00..18073.81 rows=1220 width=8) (actual
time=10.153..2705.891 rows=636 loops=1)
   ->  Seq Scan on "action"  (cost=0.00..135.74 rows=2439 width=8)
(actual time=8.108..36.684 rows=2406 loops=1)
         Filter: (shop_group_id = 1)
   ->  Index Scan using product_pkey on product  (cost=0.00..7.34 rows=1
width=8) (actual time=1.031..1.097 rows=0 loops=2406)
         Index Cond: ((product.product_id)::bigint =
("outer".product_id)::bigint)
         Filter: (subplan)
         SubPlan
           ->  Index Scan using product_program_pkey on product_program
  (cost=0.00..4.33 rows=1 width=0) (actual time=0.455..0.455 rows=0
loops=2406)
                 Index Cond: (((program_id)::bigint = 1104322) AND
((product_id)::bigint = ($0)::bigint))
Total runtime: 2708.575 ms



I also tried this:

SELECT product.product_id
   FROM action
   JOIN product ON (product.product_id=action.product_id)
   JOIN catalog.product_program ON (
    product_program.product_id=product.product_id AND
    product_program.program_id =1104322)
  WHERE action.shop_group_id=1


With about the same results (a bit better, but for different groups it
was vice versa):

QUERY PLAN
Nested Loop  (cost=141.84..3494.91 rows=139 width=8) (actual
time=118.584..1295.303 rows=636 loops=1)
   ->  Hash Join  (cost=141.84..2729.11 rows=253 width=16) (actual
time=118.483..231.103 rows=636 loops=1)
         Hash Cond: (("outer".product_id)::bigint =
("inner".product_id)::bigint)
         ->  Index Scan using product_program_pkey on product_program
(cost=0.00..2470.04 rows=7647 width=8) (actual time=0.047..73.514
rows=7468 loops=1)
               Index Cond: ((program_id)::bigint = 1104322)
         ->  Hash  (cost=135.74..135.74 rows=2439 width=8) (actual
time=118.114..118.114 rows=0 loops=1)
               ->  Seq Scan on "action"  (cost=0.00..135.74 rows=2439
width=8) (actual time=0.019..106.864 rows=2406 loops=1)
                     Filter: (shop_group_id = 1)
   ->  Index Scan using product_pkey on product  (cost=0.00..3.01 rows=1
width=8) (actual time=1.300..1.655 rows=1 loops=636)
         Index Cond: (("outer".product_id)::bigint =
(product.product_id)::bigint)


Any ideas if this is really the best I can expect, or is there something
amiss there and my query is wrong for this type of task? My gut feeling
tells me, that this kind of query should be a lot faster. The hardware
is Dual Xeon with enough of RAM and other operations run just fine.

Thank you.

--
Michal Táborský

Re: 3-table query optimization

From
Tom Lane
Date:
Michal Taborsky - Internet Mall <michal.taborsky@mall.cz> writes:
> SELECT product.product_id
>    FROM action
>    JOIN product ON (product.product_id=action.product_id)
>   WHERE action.shop_group_id=1
>     AND EXISTS (SELECT 1
>           FROM catalog.product_program
>           WHERE product_id=product.product_id
>             AND product_program.program_id =1104322
>     )

Try converting the EXISTS subquery to an IN.

            regards, tom lane

Re: 3-table query optimization

From
Michal Taborsky - Internet Mall
Date:
Tom Lane napsal(a):
> Michal Taborsky - Internet Mall <michal.taborsky@mall.cz> writes:
>> SELECT product.product_id
>>    FROM action
>>    JOIN product ON (product.product_id=action.product_id)
>>   WHERE action.shop_group_id=1
>>     AND EXISTS (SELECT 1
>>           FROM catalog.product_program
>>           WHERE product_id=product.product_id
>>             AND product_program.program_id =1104322
>>     )
>
> Try converting the EXISTS subquery to an IN.

The performance is roughly the same. For some groups it's better, for
some groups, the bigger ones, it's a bit worse. I forgot to mention,
that the server is running 8.0.2. Upgrading would be a bit painful, as
it is a 24/7 production system, but if it would help significantly, we'd
give it a go.

--
Michal Táborský


Re: 3-table query optimization

From
Alvaro Herrera
Date:
Michal Taborsky - Internet Mall wrote:
> Tom Lane napsal(a):
> >Michal Taborsky - Internet Mall <michal.taborsky@mall.cz> writes:
> >>SELECT product.product_id
> >>   FROM action
> >>   JOIN product ON (product.product_id=action.product_id)
> >>  WHERE action.shop_group_id=1
> >>    AND EXISTS (SELECT 1
> >>          FROM catalog.product_program
> >>          WHERE product_id=product.product_id
> >>            AND product_program.program_id =1104322
> >>    )
> >
> >Try converting the EXISTS subquery to an IN.
>
> The performance is roughly the same.

That's strange -- IN is usually much more amenable to better plans than
EXISTS.  Please post an EXPLAIN ANALYZE of the queries to see what's
going on.  It may be that the query is bound to be "slow" for some
cases (depending on the program_id I guess?)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: 3-table query optimization

From
"Jim C. Nasby"
Date:
On Thu, Aug 10, 2006 at 09:30:35AM +0200, Michal Taborsky - Internet Mall wrote:
> Tom Lane napsal(a):
> >Michal Taborsky - Internet Mall <michal.taborsky@mall.cz> writes:
> >>SELECT product.product_id
> >>   FROM action
> >>   JOIN product ON (product.product_id=action.product_id)
> >>  WHERE action.shop_group_id=1
> >>    AND EXISTS (SELECT 1
> >>          FROM catalog.product_program
> >>          WHERE product_id=product.product_id
> >>            AND product_program.program_id =1104322
> >>    )
> >
> >Try converting the EXISTS subquery to an IN.
>
> The performance is roughly the same. For some groups it's better, for
> some groups, the bigger ones, it's a bit worse. I forgot to mention,
> that the server is running 8.0.2. Upgrading would be a bit painful, as
> it is a 24/7 production system, but if it would help significantly, we'd
> give it a go.

You're exposing yourself to at least one data-loss bug and a security
hole by running 8.0.2. You should at least move to 8.0.8, which won't
require a lot of downtime.

If you can make it happen, moving to 8.1.4 would almost certainly net a
noticable performance gain. I've seen 50-100% improvements, but how much
gain you'll actually see is highly workload dependent.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461