Re: [HACKERS] [6.5.2] join problems ... - Mailing list pgsql-hackers

From Mike Mascari
Subject Re: [HACKERS] [6.5.2] join problems ...
Date
Msg-id 19990919071724.20897.rocketmail@web119.yahoomail.com
Whole thread Raw
Responses Re: [HACKERS] [6.5.2] join problems ...
Re: [HACKERS] [6.5.2] join problems ...
Re: [HACKERS] [6.5.2] join problems ...
List pgsql-hackers
The query you've presented is rather convoluted, but
if I'm reading your query correctly, it should reduce
to a simple, three-way join:

SELECT c.id, c.name, c.url 
FROM aecEntMain a, aecWebEntry b, aecCategory c
WHERE a.status LIKE 'active:ALL%'
AND a.representation LIKE '%:ALL%'
AND b.status LIKE 'active:ALL%'
AND b.indid='$indid' 
AND b.divid='$divid' 
AND (a.id,a.mid = b.id,b.mid) 
AND (b.catid,b.indid,b.divid = c.id,c.ppid,c.pid);

with the following indexes:
aecEntMain:  (status) and (id,mid)
aecWebEntry: (status), (indid), (divid), and            (catid,indid,divid)
aecCategory: (id,ppid,pid)

Now, there are some differences between the above and
what you wrote. For example, the above requires that 
the status begins with 'active:ALL'. Your query 
requires the status begin with 'active' and must also
contain the pattern 'active:ALL'. So for the above 
to be equivalent, you can't have a status such as
'active <some stuff> active:ALL'.

With respect to subqueries and PostgreSQL, as you 
know, the IN clause requires a nested scan. If you
are going to use subqueries, correlated subqueries
using EXISTS clauses can use indexes:

SELECT c.id, c.name, c.url 
FROM aecCategory c
WHERE EXISTS (
SELECT a.status 
FROM aecEntMain a, aecWebEntry b
WHERE a.status LIKE 'active:ALL%'
AND a.representation LIKE '%:ALL%'
AND b.status LIKE 'active:ALL%'
AND b.indid='$indid' 
AND b.divid='$divid' 
AND (a.id,a.mid = b.id,b.mid) 
AND (b.catid,b.indid,b.divid = c.id,c.ppid,c.pid));

Unfortunately, the lack of index support in IN
subqueries affects more than just the IN subquery 
clause, since INTERSECT/EXCEPT uses the rewriter to
rewrite such queries as UNIONS of two queries with
an IN/NOT IN subquery, respectively. This makes the
INTERSECT/EXCEPT feature functionally useless except
on very small tables.

Hope that helps (and is equivalent),

Mike Mascari (mascarim@yahoo.com)

--- The Hermit Hacker <scrappy@hub.org> wrote:
> 
> Morning...
> 
>     This weekend, up at a clients site working with
> them on improving
> database performance.  They are currently running
> MySQL and I'm trying to
> convince them to switch over to PostgreSQL, for
> various features that they
> just don't have with MySQL...
> 
>     One of the 'queries' that they are currently doing
> with MySQL
> consists of two queries that I can reduce down to
> one using subqueries,
> but its so slow that its ridiculous...so I figured
> I'd throw it out as a
> problem to hopefully solve?
> 
>     The query I'm starting out with is works out as:
> 
>                 SELECT id, name, url \
>                   FROM aecCategory \
>                  WHERE ppid='$indid' \
>                    AND pid='$divid'";
> 
>     The results of this get fed into a while look that
> takes the id
> returned and pushes them into:
> 
>             SELECT distinct b.indid, b.divid, b.catid,
> a.id, a.mid \
>           FROM aecEntMain a, aecWebEntry b \
>          WHERE (a.id=b.id AND a.mid=b.mid) \
>            AND (a.status like 'active%' and b.status
> like 'active%')
>            AND (a.status like '%active:ALL%' and
> b.status like '%active:ALL%')
>            AND (a.representation like '%:ALL%')
>            AND (b.indid='$indid' and
> b.divid='$divid' and b.catid='$catid')";
> 
>     Now, I can/have rewritten this as:
> 
> SELECT id, name, url 
>   FROM aecCategory 
>  WHERE ppid='$indid' 
>    AND pid='$divid' 
>    AND id IN ( 
> SELECT distinct c.id 
>   FROM aecEntMain a, aecWebEntry b, aecCategory c 
>  WHERE (a.id=b.id AND a.mid=b.mid and b.catid=c.id) 
>    AND (a.status like 'active%' and b.status like
> 'active%') 
>    AND (a.status like '%active:ALL%' and b.status
> like '%active:ALL%') 
>    AND (a.representation like '%:ALL%') 
>    AND (b.indid='$indid' and b.divid='$divid' and
> b.catid IN ( 
>         SELECT id FROM aecCategory WHERE
> ppid='$indid' AND pid='$divid' ) 
>    ));";
> 
>     An explain of the above shows:
> 
> Index Scan using aeccategory_primary on aeccategory 
> (cost=8.28 rows=1 width=36)
>  SubPlan
>   ->  Unique  (cost=1283.70 rows=21 width=72)
>     ->  Sort  (cost=1283.70 rows=21 width=72)
>       ->  Nested Loop  (cost=1283.70 rows=21
> width=72)
>         ->  Nested Loop  (cost=1280.70 rows=1
> width=60)
>           ->  Index Scan using aecwebentry_primary
> on aecwebentry b  
>                     (cost=1278.63 rows=1 width=36)
>                 SubPlan
>                   ->  Index Scan using
> aeccategory_primary on aeccategory  
>                             (cost=8.28 rows=1
> width=12)
>           ->  Index Scan using aecentmain_primary on
> aecentmain a  
>                     (cost=2.07 rows=348 width=24)
>         ->  Index Scan using aeccategory_id on
> aeccategory c  
>                   (cost=3.00 rows=1170 width=12)
> 
>     Now, a few things bother me with the above explain
> output, based on me 
> hopefully reading this right...
> 
>     The innermost SubPlan reports an estimated rows
> returned of 1...the 
> actual query returns 59 rows...slightly off?
> 
>     The one that bothers me is the one that reports
> 1170 rows returned...if you
> look at the query, the only thing that would/should
> use aeccategory_id is the
> line that goes "SELECT distinct c.id"...if I run
> just that section of the 
> query, it yields a result of 55 rows...way off??
> 
>     All of my queries are currently on static data,
> after a vacuum analyze has 
> been performed...everything is faster if I split
> things up and do a SELECT
> on a per id basis on return values, but, as the list
> of 'ids' grow, the
> number of iterations of the while loop required will
> slow down the query...
> 
>     I'm not sure what else to look at towards
> optimizing the query further,
> or is this something that we still are/need to look
> at in the server itself?
> 
>     The machine we are working off of right now is an
> idle Dual-PIII 450Mhz with
> 512Meg of RAM, very fast SCSI hard drives on a UW
> controller...and that query
> is the only thing running while we test things...so
> we aren't under-powered :)
> 
>     ideas?  
> 
> Marc G. Fournier                   ICQ#7615664      
>         IRC Nick: Scrappy
> Systems Administrator @ hub.org 
> primary: scrappy@hub.org           secondary:
> scrappy@{freebsd|postgresql}.org 

__________________________________________________
Do You Yahoo!?
Bid and sell for free at http://auctions.yahoo.com


pgsql-hackers by date:

Previous
From: Michael Simms
Date:
Subject: Re: [HACKERS] Re: HISTORY for 6.5.2
Next
From: Keith Parks
Date:
Subject: Re: [HACKERS] case bug?