WHERE IN for JOIN subquery? - Mailing list pgsql-performance

From Dave Johansen
Subject WHERE IN for JOIN subquery?
Date
Msg-id CAAcYxUfsY+t49bDVT4um2-E4qUb6iyEudMeLR0n8t9zTwEvX0g@mail.gmail.com
Whole thread Raw
Responses Re: WHERE IN for JOIN subquery?
List pgsql-performance
I want to do a JOIN against a subquery that is doing an aggregation. The query itself is relatively straightforward, but has poor performance.

Here it is:
SELECT a.*, b.*
    FROM base AS a
    LEFT OUTER JOIN
        (SELECT other, COUNT(value), COUNT(DISTINCT value) FROM other GROUP BY other) AS b
    USING (other)
    WHERE id IN (4, 56, 102);

It's significantly faster, but more complicated (and repetitive), if I add the following:
WHERE other = ANY(ARRAY(SELECT DISTINCT other FROM base WHERE id IN (4, 56, 102)))

I tried adding the following:
other IN (a.other)
Or:
other = a.other
But I get this error:
ERROR:  invalid reference to FROM-clause entry for table "a"

LINE 1: ...ue), COUNT(DISTINCT value) FROM other WHERE other=a.other GR...

                                                             ^

HINT:  There is an entry for table "a", but it cannot be referenced from this part of the query.

Is there a way to do something like that simpler query so the subquery can get better performance by filtering only to what it needs instead of doing the GROUP BY on the whole table?

Thanks,
Dave

In case it's helpful, here's the table definitions:
CREATE TABLE base (id INTEGER PRIMARY KEY, value TEXT, other INTEGER);
CREATE TABLE other (other INTEGER, value INTEGER);

And the explain results:
EXPLAIN ANALYZE SELECT a.*, b.* FROM base AS a LEFT OUTER JOIN (SELECT other, COUNT(value), COUNT(DISTINCT value) FROM other WHERE other = ANY(ARRAY(SELECT DISTINCT other FROM base WHERE id IN (4, 56, 102))) GROUP BY other) AS b USING (other) WHERE id IN (4, 56, 102);

                                                                QUERY PLAN                                                                

-------------------------------------------------------------------------------------------------------------------------------------------

 Merge Right Join  (cost=27619.21..27741.23 rows=3 width=33) (actual time=105.045..115.539 rows=3 loops=1)

   Merge Cond: (other.other = a.other)

   ->  GroupAggregate  (cost=27602.28..27711.74 rows=1001 width=20) (actual time=104.989..115.452 rows=3 loops=1)

         Group Key: other.other

         InitPlan 1 (returns $0)

           ->  Unique  (cost=16.93..16.95 rows=3 width=4) (actual time=0.083..0.127 rows=3 loops=1)

                 ->  Sort  (cost=16.93..16.94 rows=3 width=4) (actual time=0.073..0.085 rows=3 loops=1)

                       Sort Key: base.other

                       Sort Method: quicksort  Memory: 25kB

                       ->  Index Scan using base_pkey on base  (cost=0.29..16.91 rows=3 width=4) (actual time=0.019..0.042 rows=3 loops=1)

                             Index Cond: (id = ANY ('{4,56,102}'::integer[]))

         ->  Sort  (cost=27585.34..27610.20 rows=9945 width=8) (actual time=99.401..107.199 rows=3035 loops=1)

               Sort Key: other.other

               Sort Method: quicksort  Memory: 239kB

               ->  Seq Scan on other  (cost=0.00..26925.00 rows=9945 width=8) (actual time=0.708..90.738 rows=3035 loops=1)

                     Filter: (other = ANY ($0))

                     Rows Removed by Filter: 996965

   ->  Sort  (cost=16.93..16.94 rows=3 width=13) (actual time=0.044..0.051 rows=3 loops=1)

         Sort Key: a.other

         Sort Method: quicksort  Memory: 25kB

         ->  Index Scan using base_pkey on base a  (cost=0.29..16.91 rows=3 width=13) (actual time=0.016..0.027 rows=3 loops=1)

               Index Cond: (id = ANY ('{4,56,102}'::integer[]))

 Planning time: 4.163 ms

 Execution time: 115.665 ms


EXPLAIN ANALYZE SELECT a.*, b.* FROM base AS a LEFT OUTER JOIN (SELECT other, COUNT(value), COUNT(DISTINCT value) FROM other GROUP BY other) AS b USING (other) WHERE id IN (4, 56, 102);

                                                             QUERY PLAN                                                            

------------------------------------------------------------------------------------------------------------------------------------

 Merge Right Join  (cost=127786.02..137791.07 rows=3 width=60) (actual time=7459.042..12060.805 rows=3 loops=1)

   Merge Cond: (other.other = a.other)

   ->  GroupAggregate  (cost=127763.19..137765.69 rows=200 width=20) (actual time=7143.486..12057.835 rows=830 loops=1)

         Group Key: other.other

         ->  Sort  (cost=127763.19..130263.31 rows=1000050 width=8) (actual time=7137.594..9624.119 rows=829088 loops=1)

               Sort Key: other.other

               Sort Method: external merge  Disk: 17576kB

               ->  Seq Scan on other  (cost=0.00..14425.50 rows=1000050 width=8) (actual time=0.555..2727.461 rows=1000000 loops=1)

   ->  Sort  (cost=22.83..22.84 rows=3 width=40) (actual time=0.103..0.112 rows=3 loops=1)

         Sort Key: a.other

         Sort Method: quicksort  Memory: 25kB

         ->  Bitmap Heap Scan on base a  (cost=12.87..22.81 rows=3 width=40) (actual time=0.048..0.064 rows=3 loops=1)

               Recheck Cond: (id = ANY ('{4,56,102}'::integer[]))

               Heap Blocks: exact=1

               ->  Bitmap Index Scan on base_pkey  (cost=0.00..12.87 rows=3 width=0) (actual time=0.029..0.029 rows=3 loops=1)

                     Index Cond: (id = ANY ('{4,56,102}'::integer[]))

 Planning time: 2.179 ms

 Execution time: 12080.172 ms

pgsql-performance by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Bitmap scan is undercosted? - overestimated correlation andcost_index
Next
From: "David G. Johnston"
Date:
Subject: Re: WHERE IN for JOIN subquery?