Thread: problems with a sub-select (takes donkeys' years)

problems with a sub-select (takes donkeys' years)

From
Stuart Rison
Date:
Hello,

I've got two SELECT statements which work fine (take about 5secs each) when
I do them sequentially but take ages (around 5mins) if in a statement with
a sub-SELECT.

The individual SELECTs are:

SELECT p1.brecard_id FROM
    malignant_pathologies p1,
    malignant_pathologies p2,
    malignant_pathologies p3
WHERE p1.code='MAPH'
AND p1.brecard_id=p2.brecard_id
AND p2.code='AMCA'
AND p3.brecard_id=p1.brecard_id
AND p3.code='LOCA';

brecard_id
----------------
DSTL12031999016
DPHA12031999017
DCCH12031999056
DUCH12031999059
DUCH12031999063
DCCH12031999077
DUCH12031999098
(7 rows)

SELECT brecard_id,count(brecard_id) FROM
    malignant_pathologies
WHERE brecard_id IN
('DSTL12031999016','DPHA12031999017','DCCH12031999056','DUCH12031999059','DUCH12
031999063','DCCH12031999077','DUCH12031999098')
GROUP BY brecard_id;

brecard_id      |count
----------------+-----
DCCH12031999056 |   10
DCCH12031999077 |   12
DPHA12031999017 |   11
DSTL12031999016 |   11
DUCH12031999059 |   13
DUCH12031999063 |    6
DUCH12031999098 |   14
(7 rows)

(This is just an example query, what I'm trying to do here is use the list
of brecard_id's generated by the previous query).

Both of these execute in about 5 seconds (on 1000 rows)

When I put them together as:

SELECT brecard_id,count(brecard_id) FROM
    malignant_pathologies
WHERE brecard_id IN (
    SELECT p1.brecard_id FROM
        malignant_pathologies p1,
        malignant_pathologies p2,
        malignant_pathologies p3
    WHERE p1.code='MAPH'
    AND p1.brecard_id=p2.brecard_id
    AND p2.code='AMCA'
    AND p3.brecard_id=p1.brecard_id
    AND p3.code='LOCA');

It takes around 5mins for the query to complete!  The EXPLAIN for the quey is:

NOTICE:  QUERY PLAN:

Aggregate  (cost=4.27 size=0 width=0)
  ->  Group  (cost=4.27 size=0 width=0)
        ->  Sort  (cost=4.27 size=0 width=0)
              ->  Seq Scan on malignant_pathologies  (cost=4.27 size=99
width=12)
                    SubPlan
                      ->  Nested Loop  (cost=8.27 size=1 width=36)
                            ->  Nested Loop  (cost=6.27 size=1 width=24)
                                  ->  Seq Scan on malignant_pathologies p3
(cost=4.27 size=1 width=12)
                                  ->  Index Scan using
malignant_pathologies_pkey on malignant_pathologies p1  (cost=2.00 size=1
width=12)
                            ->  Index Scan using malignant_pathologies_pkey
on malignant_pathologies p2  (cost=2.00 size=1 width=12)

EXPLAIN

[Which is not different from putting together the EXPLAINs from each of the
individual queries]

Can anyone explain why the sub-query form takes so long?

Regards,

Stuart.




+-------------------------+--------------------------------------+
| Stuart Rison            | Ludwig Institute for Cancer Research |
+-------------------------+ 91 Riding House Street               |
| Tel. (0171) 878 4041    | London, W1P 8BT, UNITED KINGDOM.     |
| Fax. (0171) 878 4040    | stuart@ludwig.ucl.ac.uk              |
+-------------------------+--------------------------------------+



Re: [GENERAL] problems with a sub-select (takes donkeys' years)

From
Herouth Maoz
Date:
At 18:32 +0200 on 29/03/1999, Stuart Rison wrote:


> Can anyone explain why the sub-query form takes so long?

Basically, when you do the separation yourself, it means you executed the
sub query once, and then use it as a constant for another query. When you
combine them together, the internal select is ran over and over again,
because there is no way for the optimiser to know that the result will be
the same in all runs...

How about trying a different approach, such as:

SELECT brecard_id,count(brecard_id)
FROM malignant_pathologies p1
WHERE 3 = (
    SELECT count(*)
    FROM malignant_pathologies p2
    WHERE p2.brecard_id = p1.brecard_id
      AND code in ( 'MAPH', 'AMCA', 'LOCA' )
);

This may need tweaking, you are the one who knows distinctness. One of the
troubles of Postgres's SQL92 compatibility is its lack of support for
SELECT COUNT( DISTINCT code ) which is necessary if the combination of
brecard_id and code is not unique. Also, I'm not sure it allows constructs
such as WHERE (brecard_id,3) = ( SELECT brecard_id, count(*) FROM ....).
Gurus?

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma