Thread: 3-table query optimization
Hello everyone. My (simplified) database structure is: a) table product (150000 rows) product_id BIGINT PRIMARY KEY title TEXT ... b) table action (5000 rows) action_id BIGINT PRIMARY KEY product_id BIGINT, FK to product shop_group_id INTEGER (there are about 5 groups, distributed about evenly) c) table product_program (500000 rows) program_id BIGINT (there are about 50 unique) product_id BIGINT, FK to product I need to query products, which are in action table of specific group and in product_program for a specific program_id. The query is taking too long to my liking My query is: SELECT product.product_id FROM action JOIN product ON (product.product_id=action.product_id) WHERE action.shop_group_id=1 AND EXISTS (SELECT 1 FROM catalog.product_program WHERE product_id=product.product_id AND product_program.program_id =1104322 ) QUERY PLAN Nested Loop (cost=0.00..18073.81 rows=1220 width=8) (actual time=10.153..2705.891 rows=636 loops=1) -> Seq Scan on "action" (cost=0.00..135.74 rows=2439 width=8) (actual time=8.108..36.684 rows=2406 loops=1) Filter: (shop_group_id = 1) -> Index Scan using product_pkey on product (cost=0.00..7.34 rows=1 width=8) (actual time=1.031..1.097 rows=0 loops=2406) Index Cond: ((product.product_id)::bigint = ("outer".product_id)::bigint) Filter: (subplan) SubPlan -> Index Scan using product_program_pkey on product_program (cost=0.00..4.33 rows=1 width=0) (actual time=0.455..0.455 rows=0 loops=2406) Index Cond: (((program_id)::bigint = 1104322) AND ((product_id)::bigint = ($0)::bigint)) Total runtime: 2708.575 ms I also tried this: SELECT product.product_id FROM action JOIN product ON (product.product_id=action.product_id) JOIN catalog.product_program ON ( product_program.product_id=product.product_id AND product_program.program_id =1104322) WHERE action.shop_group_id=1 With about the same results (a bit better, but for different groups it was vice versa): QUERY PLAN Nested Loop (cost=141.84..3494.91 rows=139 width=8) (actual time=118.584..1295.303 rows=636 loops=1) -> Hash Join (cost=141.84..2729.11 rows=253 width=16) (actual time=118.483..231.103 rows=636 loops=1) Hash Cond: (("outer".product_id)::bigint = ("inner".product_id)::bigint) -> Index Scan using product_program_pkey on product_program (cost=0.00..2470.04 rows=7647 width=8) (actual time=0.047..73.514 rows=7468 loops=1) Index Cond: ((program_id)::bigint = 1104322) -> Hash (cost=135.74..135.74 rows=2439 width=8) (actual time=118.114..118.114 rows=0 loops=1) -> Seq Scan on "action" (cost=0.00..135.74 rows=2439 width=8) (actual time=0.019..106.864 rows=2406 loops=1) Filter: (shop_group_id = 1) -> Index Scan using product_pkey on product (cost=0.00..3.01 rows=1 width=8) (actual time=1.300..1.655 rows=1 loops=636) Index Cond: (("outer".product_id)::bigint = (product.product_id)::bigint) Any ideas if this is really the best I can expect, or is there something amiss there and my query is wrong for this type of task? My gut feeling tells me, that this kind of query should be a lot faster. The hardware is Dual Xeon with enough of RAM and other operations run just fine. Thank you. -- Michal Táborský
Michal Taborsky - Internet Mall <michal.taborsky@mall.cz> writes: > SELECT product.product_id > FROM action > JOIN product ON (product.product_id=action.product_id) > WHERE action.shop_group_id=1 > AND EXISTS (SELECT 1 > FROM catalog.product_program > WHERE product_id=product.product_id > AND product_program.program_id =1104322 > ) Try converting the EXISTS subquery to an IN. regards, tom lane
Tom Lane napsal(a): > Michal Taborsky - Internet Mall <michal.taborsky@mall.cz> writes: >> SELECT product.product_id >> FROM action >> JOIN product ON (product.product_id=action.product_id) >> WHERE action.shop_group_id=1 >> AND EXISTS (SELECT 1 >> FROM catalog.product_program >> WHERE product_id=product.product_id >> AND product_program.program_id =1104322 >> ) > > Try converting the EXISTS subquery to an IN. The performance is roughly the same. For some groups it's better, for some groups, the bigger ones, it's a bit worse. I forgot to mention, that the server is running 8.0.2. Upgrading would be a bit painful, as it is a 24/7 production system, but if it would help significantly, we'd give it a go. -- Michal Táborský
Michal Taborsky - Internet Mall wrote: > Tom Lane napsal(a): > >Michal Taborsky - Internet Mall <michal.taborsky@mall.cz> writes: > >>SELECT product.product_id > >> FROM action > >> JOIN product ON (product.product_id=action.product_id) > >> WHERE action.shop_group_id=1 > >> AND EXISTS (SELECT 1 > >> FROM catalog.product_program > >> WHERE product_id=product.product_id > >> AND product_program.program_id =1104322 > >> ) > > > >Try converting the EXISTS subquery to an IN. > > The performance is roughly the same. That's strange -- IN is usually much more amenable to better plans than EXISTS. Please post an EXPLAIN ANALYZE of the queries to see what's going on. It may be that the query is bound to be "slow" for some cases (depending on the program_id I guess?) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, Aug 10, 2006 at 09:30:35AM +0200, Michal Taborsky - Internet Mall wrote: > Tom Lane napsal(a): > >Michal Taborsky - Internet Mall <michal.taborsky@mall.cz> writes: > >>SELECT product.product_id > >> FROM action > >> JOIN product ON (product.product_id=action.product_id) > >> WHERE action.shop_group_id=1 > >> AND EXISTS (SELECT 1 > >> FROM catalog.product_program > >> WHERE product_id=product.product_id > >> AND product_program.program_id =1104322 > >> ) > > > >Try converting the EXISTS subquery to an IN. > > The performance is roughly the same. For some groups it's better, for > some groups, the bigger ones, it's a bit worse. I forgot to mention, > that the server is running 8.0.2. Upgrading would be a bit painful, as > it is a 24/7 production system, but if it would help significantly, we'd > give it a go. You're exposing yourself to at least one data-loss bug and a security hole by running 8.0.2. You should at least move to 8.0.8, which won't require a lot of downtime. If you can make it happen, moving to 8.1.4 would almost certainly net a noticable performance gain. I've seen 50-100% improvements, but how much gain you'll actually see is highly workload dependent. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461