Re: [HACKERS] [6.5.2] join problems ... - Mailing list pgsql-hackers
From | The Hermit Hacker |
---|---|
Subject | Re: [HACKERS] [6.5.2] join problems ... |
Date | |
Msg-id | Pine.BSF.4.10.9909201822050.66830-100000@thelab.hub.org Whole thread Raw |
In response to | Re: [HACKERS] [6.5.2] join problems ... (Mike Mascari <mascarim@yahoo.com>) |
List | pgsql-hackers |
Comparing my original query against yours, idle machine: Mine: 0.000u 0.023s 0:07.78 0.2% 48+132k 0+0io 0pf+0w (55 rows) Your: 0.006u 0.018s 0:12.16 0.0% 408+904k 0+0io 0pf+0w (55 rows) Takes longer to run, less CPU resources, but, if I'm reading this right, more memory resources? On Sun, 19 Sep 1999, Mike Mascari wrote: > The query you've presented is rather convoluted, but > if I'm reading your query correctly, it should reduce > to a simple, three-way join: > > SELECT c.id, c.name, c.url > FROM aecEntMain a, aecWebEntry b, aecCategory c > WHERE a.status LIKE 'active:ALL%' > AND a.representation LIKE '%:ALL%' > AND b.status LIKE 'active:ALL%' > AND b.indid='$indid' > AND b.divid='$divid' > AND (a.id,a.mid = b.id,b.mid) > AND (b.catid,b.indid,b.divid = c.id,c.ppid,c.pid); > > with the following indexes: > aecEntMain: (status) and (id,mid) > aecWebEntry: (status), (indid), (divid), and > (catid,indid,divid) > aecCategory: (id,ppid,pid) > > Now, there are some differences between the above and > what you wrote. For example, the above requires that > the status begins with 'active:ALL'. Your query > requires the status begin with 'active' and must also > contain the pattern 'active:ALL'. So for the above > to be equivalent, you can't have a status such as > 'active <some stuff> active:ALL'. > > With respect to subqueries and PostgreSQL, as you > know, the IN clause requires a nested scan. If you > are going to use subqueries, correlated subqueries > using EXISTS clauses can use indexes: > > SELECT c.id, c.name, c.url > FROM aecCategory c > WHERE EXISTS ( > SELECT a.status > FROM aecEntMain a, aecWebEntry b > WHERE a.status LIKE 'active:ALL%' > AND a.representation LIKE '%:ALL%' > AND b.status LIKE 'active:ALL%' > AND b.indid='$indid' > AND b.divid='$divid' > AND (a.id,a.mid = b.id,b.mid) > AND (b.catid,b.indid,b.divid = c.id,c.ppid,c.pid)); > > Unfortunately, the lack of index support in IN > subqueries affects more than just the IN subquery > clause, since INTERSECT/EXCEPT uses the rewriter to > rewrite such queries as UNIONS of two queries with > an IN/NOT IN subquery, respectively. This makes the > INTERSECT/EXCEPT feature functionally useless except > on very small tables. > > Hope that helps (and is equivalent), > > Mike Mascari (mascarim@yahoo.com) > > --- The Hermit Hacker <scrappy@hub.org> wrote: > > > > Morning... > > > > This weekend, up at a clients site working with > > them on improving > > database performance. They are currently running > > MySQL and I'm trying to > > convince them to switch over to PostgreSQL, for > > various features that they > > just don't have with MySQL... > > > > One of the 'queries' that they are currently doing > > with MySQL > > consists of two queries that I can reduce down to > > one using subqueries, > > but its so slow that its ridiculous...so I figured > > I'd throw it out as a > > problem to hopefully solve? > > > > The query I'm starting out with is works out as: > > > > SELECT id, name, url \ > > FROM aecCategory \ > > WHERE ppid='$indid' \ > > AND pid='$divid'"; > > > > The results of this get fed into a while look that > > takes the id > > returned and pushes them into: > > > > SELECT distinct b.indid, b.divid, b.catid, > > a.id, a.mid \ > > FROM aecEntMain a, aecWebEntry b \ > > WHERE (a.id=b.id AND a.mid=b.mid) \ > > AND (a.status like 'active%' and b.status > > like 'active%') > > AND (a.status like '%active:ALL%' and > > b.status like '%active:ALL%') > > AND (a.representation like '%:ALL%') > > AND (b.indid='$indid' and > > b.divid='$divid' and b.catid='$catid')"; > > > > Now, I can/have rewritten this as: > > > > SELECT id, name, url > > FROM aecCategory > > WHERE ppid='$indid' > > AND pid='$divid' > > AND id IN ( > > SELECT distinct c.id > > FROM aecEntMain a, aecWebEntry b, aecCategory c > > WHERE (a.id=b.id AND a.mid=b.mid and b.catid=c.id) > > AND (a.status like 'active%' and b.status like > > 'active%') > > AND (a.status like '%active:ALL%' and b.status > > like '%active:ALL%') > > AND (a.representation like '%:ALL%') > > AND (b.indid='$indid' and b.divid='$divid' and > > b.catid IN ( > > SELECT id FROM aecCategory WHERE > > ppid='$indid' AND pid='$divid' ) > > ));"; > > > > An explain of the above shows: > > > > Index Scan using aeccategory_primary on aeccategory > > (cost=8.28 rows=1 width=36) > > SubPlan > > -> Unique (cost=1283.70 rows=21 width=72) > > -> Sort (cost=1283.70 rows=21 width=72) > > -> Nested Loop (cost=1283.70 rows=21 > > width=72) > > -> Nested Loop (cost=1280.70 rows=1 > > width=60) > > -> Index Scan using aecwebentry_primary > > on aecwebentry b > > (cost=1278.63 rows=1 width=36) > > SubPlan > > -> Index Scan using > > aeccategory_primary on aeccategory > > (cost=8.28 rows=1 > > width=12) > > -> Index Scan using aecentmain_primary on > > aecentmain a > > (cost=2.07 rows=348 width=24) > > -> Index Scan using aeccategory_id on > > aeccategory c > > (cost=3.00 rows=1170 width=12) > > > > Now, a few things bother me with the above explain > > output, based on me > > hopefully reading this right... > > > > The innermost SubPlan reports an estimated rows > > returned of 1...the > > actual query returns 59 rows...slightly off? > > > > The one that bothers me is the one that reports > > 1170 rows returned...if you > > look at the query, the only thing that would/should > > use aeccategory_id is the > > line that goes "SELECT distinct c.id"...if I run > > just that section of the > > query, it yields a result of 55 rows...way off?? > > > > All of my queries are currently on static data, > > after a vacuum analyze has > > been performed...everything is faster if I split > > things up and do a SELECT > > on a per id basis on return values, but, as the list > > of 'ids' grow, the > > number of iterations of the while loop required will > > slow down the query... > > > > I'm not sure what else to look at towards > > optimizing the query further, > > or is this something that we still are/need to look > > at in the server itself? > > > > The machine we are working off of right now is an > > idle Dual-PIII 450Mhz with > > 512Meg of RAM, very fast SCSI hard drives on a UW > > controller...and that query > > is the only thing running while we test things...so > > we aren't under-powered :) > > > > ideas? > > > > Marc G. Fournier ICQ#7615664 > > IRC Nick: Scrappy > > Systems Administrator @ hub.org > > primary: scrappy@hub.org secondary: > > scrappy@{freebsd|postgresql}.org > > __________________________________________________ > Do You Yahoo!? > Bid and sell for free at http://auctions.yahoo.com > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
pgsql-hackers by date: