Thread: RE: [HACKERS] Bug on complex subselect (was: Bug on complex join)

RE: [HACKERS] Bug on complex subselect (was: Bug on complex join)

From
"Jackson, DeJuan"
Date:
> Hello!
>
>    I rewrote my 4-tables join to use subselects:
>
> SELECT DISTINCT subsec_id FROM positions
>    WHERE pos_id IN
>       (SELECT DISTINCT pos_id
>          FROM central
>             WHERE shop_id IN
>                (SELECT shop_id FROM shops
>                   WHERE distr_id IN
>                      (SELECT distr_id FROM districts
>                         WHERE city_id = 2)
>                )
>       )
> ;
>
>    This does not work, either - postgres loops forever, until I cancel
> psql.
>
>    I splitted it - I ran
>
>       (SELECT DISTINCT pos_id
>          FROM central
>             WHERE shop_id IN
>                (SELECT shop_id FROM shops
>                   WHERE distr_id IN
>                      (SELECT distr_id FROM districts
>                         WHERE city_id = 2)
>                )
>       )
>
>    and stored result in a file. Then I substituted the
> subselect with the
> file:
>
> SELECT DISTINCT subsec_id FROM positions
>    WHERE pos_id IN
> (1, 2, 3, 6, 22, 25, 26, 27, 28, 29, 31, 33, 34, 35, 38, 41,
> 42, 44, 45,
> 46, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61, 62, 63, 64)
>
>    and got desired result within a second.
>
>    This finally solves my problem, but I need to pass a long
> way to find
> that postgres cannot handle such not too complex joins and subselects.

If you think about the query long enough you'll realize that several
things have to be assummed for that query to be efficient.
Looking at you final query first, and assuming that you have an index on
positions(pos_id):
 SELECT DISTINCT subsec_id FROM positions
    WHERE pos_id IN
 (1, 2, 3, 6, 22, 25, 26, 27, 28, 29, 31, 33, 34, 35, 38, 41,
  42, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
  60, 61, 62, 63, 64)

This turns into 36 OR clauses in the backend; which as you have
expressed is not a problem for PostgreSQL.  But as soon as you make that
IN clause a subselect Postgres can't assume that the results will be
static for each row that it's evaluating, therefore you have:
   X rows in positions compared to Y rows from the subselect = X*Y
compares
Let's assume that there are only 10 rows in positions, and that each
select of central return 36 rows.  We get up to 360 comparisons for that
query, and none of the compares is likely to use the index because they
are OR'ed together.
Now let's throw in the other subselect and assume that each table only
has 10 rows besides for central which obviously has to have more so
we'll assume 40.
         10 rows from district
    indexed on city_id (*YAY*)    (btree index (maybe 2 compares)
    -------------------------
       2 rows results
    OR= 10 rows from shops        +  2) * 10
    -------------------------
         5 rows results
    OR= 40 rows from central        +  5) * 40
    -------------------------
        36 rows results
    OR= 10 rows from position       + 36) * 10)
    -------------------------     ------------------
         5 rows results                18360 comparisons for this query
And you have to remember that only the innermost subselect is likely to
use an index.  (My math could be wrong but,) I think you get the idea.

Try your query this way:
 SELECT DISTINCT subsec_id
   FROM positions p
  WHERE EXISTS(SELECT 1
                 FROM central c, shops s, districts d
                WHERE p.pos_id = c.pos_id AND
                      c.shop_id = s.shop_id AND
                      s.distr_id = d.distr_id AND
                      d.city_id = 2);
Make sure you have indexes on pos_id, shop_id, distr_id, and city_id.

Re: [HACKERS] Bug on complex subselect (was: Bug on complex join)

From
Thomas Reinke
Date:
Forgive what may be a dumb question. When I subscribed to this group,
instructions for unsubscribing where as indicated in the excerpt below:
  >  >--   >  >Welcome to the pgsql-hackers mailing list!  >  >Please save this message for future reference.  Thank
you. >  >If you ever want to remove yourself from this mailing list,  >you can send mail to <Majordomo@hub.org> with
thefollowing  >command in the body of your email message:  >  >    unsubscribe pgsql-hackers  >  >or from another
account,besides reinke@e-softinc.com:  >  >    unsubscribe pgsql-hackers reinke@e-softinc.com
 

This doesn't work. The response was
   >>>> unsubscribe psql-hackers   **** unsubscribe: unknown list 'psql-hackers'.   **** Help for Majordomo@hub.org:


So I read the instructions on the PostGresQL page
for the hackers mailing list, and it says
    "To subscribe or unsubscribe from the list, send mail to     pgsql-hackers-request@postgresql.org. The body of the
message
should      contain the single line "subscribe" or "unsubscribe". 

When I do this, I get
  ----- The following addresses had permanent fatal errors -----  <psql-hackers-request@postgresql.org>
  ----- Transcript of session follows -----  ... while talking to postgresql.org.:  >>> RCPT
To:<psql-hackers-request@postgresql.org> <<< 550 <psql-hackers-request@postgresql.org>... User unknown  550
<psql-hackers-request@postgresql.org>...User unknown
 

Not to be ungrateful or anything, but I would like to get myself off
this
list. (I don't have time to filter through 50-100 messages/day).

Any suggestions on how I get myself off the list? (Perhaps it is
worthwhile
either updating the instructions on your website, or including an
_up_to_date_
copy in the signatures being sent out on the list?)

Thomas


RE: [HACKERS] Bug on complex subselect (was: Bug on complex join)

From
Oleg Broytmann
Date:
Hello!
  Vadim already gave the idea to use EXISTS. Will try it.  Thanks to all who replied!

On Wed, 10 Mar 1999, Jackson, DeJuan wrote:
> Try your query this way:
>  SELECT DISTINCT subsec_id
>    FROM positions p
>   WHERE EXISTS(SELECT 1
>                  FROM central c, shops s, districts d
>                 WHERE p.pos_id = c.pos_id AND 
>                       c.shop_id = s.shop_id AND
>                       s.distr_id = d.distr_id AND
>                       d.city_id = 2);

> Make sure you have indexes on pos_id, shop_id, distr_id, and city_id.
  All these are primary keys in corresponding tables, and hence have
UNIQUE indicies. Is it enough?

Oleg.
----    Oleg Broytmann     http://members.xoom.com/phd2/     phd2@earthling.net          Programmers don't die, they
justGOSUB without RETURN.