RE: [HACKERS] Bug on complex subselect (was: Bug on complex join) - Mailing list pgsql-hackers

From Jackson, DeJuan
Subject RE: [HACKERS] Bug on complex subselect (was: Bug on complex join)
Date
Msg-id F10BB1FAF801D111829B0060971D839F703F67@cpsmail
Whole thread Raw
Responses RE: [HACKERS] Bug on complex subselect (was: Bug on complex join)  (Oleg Broytmann <phd@sun.med.ru>)
List pgsql-hackers
> Hello!
>
>    I rewrote my 4-tables join to use subselects:
>
> SELECT DISTINCT subsec_id FROM positions
>    WHERE pos_id IN
>       (SELECT DISTINCT pos_id
>          FROM central
>             WHERE shop_id IN
>                (SELECT shop_id FROM shops
>                   WHERE distr_id IN
>                      (SELECT distr_id FROM districts
>                         WHERE city_id = 2)
>                )
>       )
> ;
>
>    This does not work, either - postgres loops forever, until I cancel
> psql.
>
>    I splitted it - I ran
>
>       (SELECT DISTINCT pos_id
>          FROM central
>             WHERE shop_id IN
>                (SELECT shop_id FROM shops
>                   WHERE distr_id IN
>                      (SELECT distr_id FROM districts
>                         WHERE city_id = 2)
>                )
>       )
>
>    and stored result in a file. Then I substituted the
> subselect with the
> file:
>
> SELECT DISTINCT subsec_id FROM positions
>    WHERE pos_id IN
> (1, 2, 3, 6, 22, 25, 26, 27, 28, 29, 31, 33, 34, 35, 38, 41,
> 42, 44, 45,
> 46, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61, 62, 63, 64)
>
>    and got desired result within a second.
>
>    This finally solves my problem, but I need to pass a long
> way to find
> that postgres cannot handle such not too complex joins and subselects.

If you think about the query long enough you'll realize that several
things have to be assummed for that query to be efficient.
Looking at you final query first, and assuming that you have an index on
positions(pos_id):
 SELECT DISTINCT subsec_id FROM positions
    WHERE pos_id IN
 (1, 2, 3, 6, 22, 25, 26, 27, 28, 29, 31, 33, 34, 35, 38, 41,
  42, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
  60, 61, 62, 63, 64)

This turns into 36 OR clauses in the backend; which as you have
expressed is not a problem for PostgreSQL.  But as soon as you make that
IN clause a subselect Postgres can't assume that the results will be
static for each row that it's evaluating, therefore you have:
   X rows in positions compared to Y rows from the subselect = X*Y
compares
Let's assume that there are only 10 rows in positions, and that each
select of central return 36 rows.  We get up to 360 comparisons for that
query, and none of the compares is likely to use the index because they
are OR'ed together.
Now let's throw in the other subselect and assume that each table only
has 10 rows besides for central which obviously has to have more so
we'll assume 40.
         10 rows from district
    indexed on city_id (*YAY*)    (btree index (maybe 2 compares)
    -------------------------
       2 rows results
    OR= 10 rows from shops        +  2) * 10
    -------------------------
         5 rows results
    OR= 40 rows from central        +  5) * 40
    -------------------------
        36 rows results
    OR= 10 rows from position       + 36) * 10)
    -------------------------     ------------------
         5 rows results                18360 comparisons for this query
And you have to remember that only the innermost subselect is likely to
use an index.  (My math could be wrong but,) I think you get the idea.

Try your query this way:
 SELECT DISTINCT subsec_id
   FROM positions p
  WHERE EXISTS(SELECT 1
                 FROM central c, shops s, districts d
                WHERE p.pos_id = c.pos_id AND
                      c.shop_id = s.shop_id AND
                      s.distr_id = d.distr_id AND
                      d.city_id = 2);
Make sure you have indexes on pos_id, shop_id, distr_id, and city_id.

pgsql-hackers by date:

Previous
From: jwieck@debis.com (Jan Wieck)
Date:
Subject: Re: [HACKERS] Developers globe
Next
From: jwieck@debis.com (Jan Wieck)
Date:
Subject: Re: [HACKERS] Re: map