Re: [HACKERS] [6.5.2] join problems ... - Mailing list pgsql-hackers

From The Hermit Hacker
Subject Re: [HACKERS] [6.5.2] join problems ...
Date
Msg-id Pine.BSF.4.10.9909201822050.66830-100000@thelab.hub.org
Whole thread Raw
In response to Re: [HACKERS] [6.5.2] join problems ...  (Mike Mascari <mascarim@yahoo.com>)
List pgsql-hackers
Comparing my original query against yours, idle machine:

Mine: 0.000u 0.023s 0:07.78 0.2%      48+132k 0+0io 0pf+0w (55 rows)
Your: 0.006u 0.018s 0:12.16 0.0%      408+904k 0+0io 0pf+0w (55 rows)

Takes longer to run, less CPU resources, but, if I'm reading this right,
more memory resources?

On Sun, 19 Sep 1999, Mike Mascari wrote:

> The query you've presented is rather convoluted, but
> if I'm reading your query correctly, it should reduce
> to a simple, three-way join:
> 
> SELECT c.id, c.name, c.url 
> FROM aecEntMain a, aecWebEntry b, aecCategory c
> WHERE a.status LIKE 'active:ALL%'
> AND a.representation LIKE '%:ALL%'
> AND b.status LIKE 'active:ALL%'
> AND b.indid='$indid' 
> AND b.divid='$divid' 
> AND (a.id,a.mid = b.id,b.mid) 
> AND (b.catid,b.indid,b.divid = c.id,c.ppid,c.pid);
> 
> with the following indexes:
> aecEntMain:  (status) and (id,mid)
> aecWebEntry: (status), (indid), (divid), and
>              (catid,indid,divid)
> aecCategory: (id,ppid,pid)
> 
> Now, there are some differences between the above and
> what you wrote. For example, the above requires that 
> the status begins with 'active:ALL'. Your query 
> requires the status begin with 'active' and must also
> contain the pattern 'active:ALL'. So for the above 
> to be equivalent, you can't have a status such as
> 'active <some stuff> active:ALL'.
> 
> With respect to subqueries and PostgreSQL, as you 
> know, the IN clause requires a nested scan. If you
> are going to use subqueries, correlated subqueries
> using EXISTS clauses can use indexes:
> 
> SELECT c.id, c.name, c.url 
> FROM aecCategory c
> WHERE EXISTS (
> SELECT a.status 
> FROM aecEntMain a, aecWebEntry b
> WHERE a.status LIKE 'active:ALL%'
> AND a.representation LIKE '%:ALL%'
> AND b.status LIKE 'active:ALL%'
> AND b.indid='$indid' 
> AND b.divid='$divid' 
> AND (a.id,a.mid = b.id,b.mid) 
> AND (b.catid,b.indid,b.divid = c.id,c.ppid,c.pid));
> 
> Unfortunately, the lack of index support in IN
> subqueries affects more than just the IN subquery 
> clause, since INTERSECT/EXCEPT uses the rewriter to
> rewrite such queries as UNIONS of two queries with
> an IN/NOT IN subquery, respectively. This makes the
> INTERSECT/EXCEPT feature functionally useless except
> on very small tables.
> 
> Hope that helps (and is equivalent),
> 
> Mike Mascari (mascarim@yahoo.com)
> 
> --- The Hermit Hacker <scrappy@hub.org> wrote:
> > 
> > Morning...
> > 
> >     This weekend, up at a clients site working with
> > them on improving
> > database performance.  They are currently running
> > MySQL and I'm trying to
> > convince them to switch over to PostgreSQL, for
> > various features that they
> > just don't have with MySQL...
> > 
> >     One of the 'queries' that they are currently doing
> > with MySQL
> > consists of two queries that I can reduce down to
> > one using subqueries,
> > but its so slow that its ridiculous...so I figured
> > I'd throw it out as a
> > problem to hopefully solve?
> > 
> >     The query I'm starting out with is works out as:
> > 
> >                 SELECT id, name, url \
> >                   FROM aecCategory \
> >                  WHERE ppid='$indid' \
> >                    AND pid='$divid'";
> > 
> >     The results of this get fed into a while look that
> > takes the id
> > returned and pushes them into:
> > 
> >             SELECT distinct b.indid, b.divid, b.catid,
> > a.id, a.mid \
> >           FROM aecEntMain a, aecWebEntry b \
> >          WHERE (a.id=b.id AND a.mid=b.mid) \
> >            AND (a.status like 'active%' and b.status
> > like 'active%')
> >            AND (a.status like '%active:ALL%' and
> > b.status like '%active:ALL%')
> >            AND (a.representation like '%:ALL%')
> >            AND (b.indid='$indid' and
> > b.divid='$divid' and b.catid='$catid')";
> > 
> >     Now, I can/have rewritten this as:
> > 
> > SELECT id, name, url 
> >   FROM aecCategory 
> >  WHERE ppid='$indid' 
> >    AND pid='$divid' 
> >    AND id IN ( 
> > SELECT distinct c.id 
> >   FROM aecEntMain a, aecWebEntry b, aecCategory c 
> >  WHERE (a.id=b.id AND a.mid=b.mid and b.catid=c.id) 
> >    AND (a.status like 'active%' and b.status like
> > 'active%') 
> >    AND (a.status like '%active:ALL%' and b.status
> > like '%active:ALL%') 
> >    AND (a.representation like '%:ALL%') 
> >    AND (b.indid='$indid' and b.divid='$divid' and
> > b.catid IN ( 
> >         SELECT id FROM aecCategory WHERE
> > ppid='$indid' AND pid='$divid' ) 
> >    ));";
> > 
> >     An explain of the above shows:
> > 
> > Index Scan using aeccategory_primary on aeccategory 
> > (cost=8.28 rows=1 width=36)
> >  SubPlan
> >   ->  Unique  (cost=1283.70 rows=21 width=72)
> >     ->  Sort  (cost=1283.70 rows=21 width=72)
> >       ->  Nested Loop  (cost=1283.70 rows=21
> > width=72)
> >         ->  Nested Loop  (cost=1280.70 rows=1
> > width=60)
> >           ->  Index Scan using aecwebentry_primary
> > on aecwebentry b  
> >                     (cost=1278.63 rows=1 width=36)
> >                 SubPlan
> >                   ->  Index Scan using
> > aeccategory_primary on aeccategory  
> >                             (cost=8.28 rows=1
> > width=12)
> >           ->  Index Scan using aecentmain_primary on
> > aecentmain a  
> >                     (cost=2.07 rows=348 width=24)
> >         ->  Index Scan using aeccategory_id on
> > aeccategory c  
> >                   (cost=3.00 rows=1170 width=12)
> > 
> >     Now, a few things bother me with the above explain
> > output, based on me 
> > hopefully reading this right...
> > 
> >     The innermost SubPlan reports an estimated rows
> > returned of 1...the 
> > actual query returns 59 rows...slightly off?
> > 
> >     The one that bothers me is the one that reports
> > 1170 rows returned...if you
> > look at the query, the only thing that would/should
> > use aeccategory_id is the
> > line that goes "SELECT distinct c.id"...if I run
> > just that section of the 
> > query, it yields a result of 55 rows...way off??
> > 
> >     All of my queries are currently on static data,
> > after a vacuum analyze has 
> > been performed...everything is faster if I split
> > things up and do a SELECT
> > on a per id basis on return values, but, as the list
> > of 'ids' grow, the
> > number of iterations of the while loop required will
> > slow down the query...
> > 
> >     I'm not sure what else to look at towards
> > optimizing the query further,
> > or is this something that we still are/need to look
> > at in the server itself?
> > 
> >     The machine we are working off of right now is an
> > idle Dual-PIII 450Mhz with
> > 512Meg of RAM, very fast SCSI hard drives on a UW
> > controller...and that query
> > is the only thing running while we test things...so
> > we aren't under-powered :)
> > 
> >     ideas?  
> > 
> > Marc G. Fournier                   ICQ#7615664      
> >         IRC Nick: Scrappy
> > Systems Administrator @ hub.org 
> > primary: scrappy@hub.org           secondary:
> > scrappy@{freebsd|postgresql}.org 
> 
> __________________________________________________
> Do You Yahoo!?
> Bid and sell for free at http://auctions.yahoo.com
> 

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



pgsql-hackers by date:

Previous
From: The Hermit Hacker
Date:
Subject: Re: [HACKERS] All things equal, we are still alot slower then MySQL?
Next
From: The Hermit Hacker
Date:
Subject: Re: [HACKERS] Status on Jan Wieck