[6.5.2] join problems ... - Mailing list pgsql-hackers
From | The Hermit Hacker |
---|---|
Subject | [6.5.2] join problems ... |
Date | |
Msg-id | Pine.BSF.4.10.9909190203390.27097-100000@thelab.hub.org Whole thread Raw |
List | pgsql-hackers |
Morning... This weekend, up at a clients site working with them on improving database performance. They are currently running MySQL and I'm trying to convince them to switch over to PostgreSQL, for various features that they just don't have with MySQL... One of the 'queries' that they are currently doing with MySQL consists of two queries that I can reduce down to one using subqueries, but its so slow that its ridiculous...so I figured I'd throw it out as a problem to hopefully solve? The query I'm starting out with is works out as: SELECT id, name, url \ FROM aecCategory \ WHERE ppid='$indid' \ AND pid='$divid'"; The results of this get fed into a while look that takes the id returned and pushes them into: SELECT distinct b.indid, b.divid, b.catid, a.id, a.mid \ FROM aecEntMain a, aecWebEntry b \ WHERE(a.id=b.id AND a.mid=b.mid) \ AND (a.status like 'active%' and b.status like 'active%') AND (a.statuslike '%active:ALL%' and b.status like '%active:ALL%') AND (a.representation like '%:ALL%') AND(b.indid='$indid' and b.divid='$divid' and b.catid='$catid')"; Now, I can/have rewritten this as: SELECT id, name, url FROM aecCategory WHERE ppid='$indid' AND pid='$divid' AND id IN ( SELECT distinct c.id FROM aecEntMain a, aecWebEntry b, aecCategory c WHERE (a.id=b.id AND a.mid=b.mid and b.catid=c.id) AND (a.status like 'active%' and b.status like 'active%') AND (a.status like '%active:ALL%' and b.statuslike '%active:ALL%') AND (a.representation like '%:ALL%') AND (b.indid='$indid' and b.divid='$divid' and b.catidIN ( SELECT id FROM aecCategory WHERE ppid='$indid' AND pid='$divid' ) ));"; An explain of the above shows: Index Scan using aeccategory_primary on aeccategory (cost=8.28 rows=1 width=36)SubPlan -> Unique (cost=1283.70 rows=21width=72) -> Sort (cost=1283.70 rows=21 width=72) -> Nested Loop (cost=1283.70 rows=21 width=72) -> Nested Loop (cost=1280.70 rows=1 width=60) -> Index Scan using aecwebentry_primary on aecwebentry b (cost=1278.63 rows=1 width=36) SubPlan -> Index Scan using aeccategory_primaryon aeccategory (cost=8.28 rows=1 width=12) -> Index Scan using aecentmain_primaryon aecentmain a (cost=2.07 rows=348 width=24) -> Index Scan using aeccategory_idon aeccategory c (cost=3.00 rows=1170 width=12) Now, a few things bother me with the above explain output, based on me hopefully reading this right... The innermost SubPlan reports an estimated rows returned of 1...the actual query returns 59 rows...slightly off? The one that bothers me is the one that reports 1170 rows returned...if you look at the query, the only thing that would/should use aeccategory_id is the line that goes "SELECT distinct c.id"...if I run just that section of the query, it yields a result of 55 rows...way off?? All of my queries are currently on static data, after a vacuum analyze has been performed...everything is faster if I split things up and do a SELECT on a per id basis on return values, but, as the list of 'ids' grow, the number of iterations of the while loop required will slow down the query... I'm not sure what else to look at towards optimizing the query further, or is this something that we still are/need to look at in the server itself? The machine we are working off of right now is an idle Dual-PIII 450Mhz with 512Meg of RAM, very fast SCSI hard drives on a UW controller...and that query is the only thing running while we test things...so we aren't under-powered :) ideas? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
pgsql-hackers by date: