RE: [HACKERS] Bug on complex subselect (was: Bug on complex join) - Mailing list pgsql-hackers
From | Jackson, DeJuan |
---|---|
Subject | RE: [HACKERS] Bug on complex subselect (was: Bug on complex join) |
Date | |
Msg-id | F10BB1FAF801D111829B0060971D839F703F67@cpsmail Whole thread Raw |
Responses |
RE: [HACKERS] Bug on complex subselect (was: Bug on complex join)
|
List | pgsql-hackers |
> Hello! > > I rewrote my 4-tables join to use subselects: > > SELECT DISTINCT subsec_id FROM positions > WHERE pos_id IN > (SELECT DISTINCT pos_id > FROM central > WHERE shop_id IN > (SELECT shop_id FROM shops > WHERE distr_id IN > (SELECT distr_id FROM districts > WHERE city_id = 2) > ) > ) > ; > > This does not work, either - postgres loops forever, until I cancel > psql. > > I splitted it - I ran > > (SELECT DISTINCT pos_id > FROM central > WHERE shop_id IN > (SELECT shop_id FROM shops > WHERE distr_id IN > (SELECT distr_id FROM districts > WHERE city_id = 2) > ) > ) > > and stored result in a file. Then I substituted the > subselect with the > file: > > SELECT DISTINCT subsec_id FROM positions > WHERE pos_id IN > (1, 2, 3, 6, 22, 25, 26, 27, 28, 29, 31, 33, 34, 35, 38, 41, > 42, 44, 45, > 46, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61, 62, 63, 64) > > and got desired result within a second. > > This finally solves my problem, but I need to pass a long > way to find > that postgres cannot handle such not too complex joins and subselects. If you think about the query long enough you'll realize that several things have to be assummed for that query to be efficient. Looking at you final query first, and assuming that you have an index on positions(pos_id): SELECT DISTINCT subsec_id FROM positions WHERE pos_id IN (1, 2, 3, 6, 22, 25, 26, 27, 28, 29, 31, 33, 34, 35, 38, 41, 42, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61, 62, 63, 64) This turns into 36 OR clauses in the backend; which as you have expressed is not a problem for PostgreSQL. But as soon as you make that IN clause a subselect Postgres can't assume that the results will be static for each row that it's evaluating, therefore you have: X rows in positions compared to Y rows from the subselect = X*Y compares Let's assume that there are only 10 rows in positions, and that each select of central return 36 rows. We get up to 360 comparisons for that query, and none of the compares is likely to use the index because they are OR'ed together. Now let's throw in the other subselect and assume that each table only has 10 rows besides for central which obviously has to have more so we'll assume 40. 10 rows from district indexed on city_id (*YAY*) (btree index (maybe 2 compares) ------------------------- 2 rows results OR= 10 rows from shops + 2) * 10 ------------------------- 5 rows results OR= 40 rows from central + 5) * 40 ------------------------- 36 rows results OR= 10 rows from position + 36) * 10) ------------------------- ------------------ 5 rows results 18360 comparisons for this query And you have to remember that only the innermost subselect is likely to use an index. (My math could be wrong but,) I think you get the idea. Try your query this way: SELECT DISTINCT subsec_id FROM positions p WHERE EXISTS(SELECT 1 FROM central c, shops s, districts d WHERE p.pos_id = c.pos_id AND c.shop_id = s.shop_id AND s.distr_id = d.distr_id AND d.city_id = 2); Make sure you have indexes on pos_id, shop_id, distr_id, and city_id.
pgsql-hackers by date: