Thread: WHERE IN (subselect) versus WHERE IN (1,2,3,)

WHERE IN (subselect) versus WHERE IN (1,2,3,)

From
Kevin Goess
Date:
My apologies, I'm sure this question has been asked before but I couldn't find anything on the list that meant anything to me.

We have a table "contexts" with 1.6 million rows, and a table "articles" with 1.4 million rows, where an "article" is a particular kind of "context".  We want to select from a join on those two tables like this

    SELECT COUNT(*)
    FROM contexts
    JOIN articles ON (articles.context_key=contexts.context_key)
    WHERE contexts.context_key IN (...);
    /* and some combination of columns from articles and contexts */

If "IN(...)" is a query, then this guy does a seq scan on the contexts table, even if the subquery is "select col_a from kgtest" where kgtest has one row.  If however I read the ids beforehand and write them into the query, a la "IN (111,222,333...)", then the everything is happy, up to at least 20,000 values written into the sql, at which point smaller machines will take 2-5 minutes to parse the query.

I can certainly write the ids inline into the SQL, but when I do that I get the distinct impression that I'm Doing It Wrong.  Is this expected behavior?  It seems surprising to me.


To demonstrate:

    /* nothing up my sleeve */
    # select * from kgtest;
      cola  
    ---------
     1652729
    (1 row)

    /* inline, good query plan */
# explain (analyze, buffers) select count(*) from contexts JOIN articles ON (articles.context_key=contexts.context_key) where contexts.context_key in (1652729);
                                                             QUERY PLAN                                              
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3.82..3.83 rows=1 width=0) (actual time=0.188..0.189 rows=1 loops=1)
   Buffers: shared hit=7
   ->  Nested Loop  (cost=0.00..3.81 rows=1 width=0) (actual time=0.181..0.181 rows=0 loops=1)
         Buffers: shared hit=7
         ->  Index Scan using contexts_pkey on contexts  (cost=0.00..1.90 rows=1 width=4) (actual time=0.109..0.112 ro
               Index Cond: (context_key = 1652729)
               Buffers: shared hit=4
         ->  Index Scan using articles_pkey on articles  (cost=0.00..1.90 rows=1 width=4) (actual time=0.060..0.060 ro
               Index Cond: (articles.context_key = 1652729)
               Buffers: shared hit=3
 Total runtime: 0.324 ms
(11 rows)

  /* subselect, query plan does seq scan on contexts */


# explain (analyze, buffers) select count(*)from contexts JOIN articles ON (articles.context_key=contexts.context_key) where contexts.context_key in (select cola from kgtest);
                                                                   QUERY PLAN                                        
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=118505.72..118505.73 rows=1 width=0) (actual time=0.274..0.275 rows=1 loops=1)
   Buffers: shared hit=5
   ->  Hash Join  (cost=12512.61..116661.91 rows=737524 width=0) (actual time=0.269..0.269 rows=0 loops=1)
         Hash Cond: (contexts.context_key = articles.context_key)
         Buffers: shared hit=5
         ->  Seq Scan on contexts  (cost=0.00..64533.03 rows=1648203 width=4) (actual time=0.009..0.009 rows=1 loops=1
               Buffers: shared hit=1
         ->  Hash  (cost=412.56..412.56 rows=737524 width=8) (actual time=0.110..0.110 rows=0 loops=1)
               Buckets: 4096  Batches: 32  Memory Usage: 0kB
               Buffers: shared hit=4
               ->  Nested Loop  (cost=40.00..412.56 rows=737524 width=8) (actual time=0.107..0.107 rows=0 loops=1)
                     Buffers: shared hit=4
                     ->  HashAggregate  (cost=40.00..42.00 rows=200 width=4) (actual time=0.069..0.071 rows=1 loops=1)
                           Buffers: shared hit=1
                           ->  Seq Scan on kgtest  (cost=0.00..34.00 rows=2400 width=4) (actual time=0.048..0.050 rows
                                 Buffers: shared hit=1
                     ->  Index Scan using articles_pkey on articles  (cost=0.00..1.84 rows=1 width=4) (actual time=0.0
                           Index Cond: (articles.context_key = kgtest.cola)
                           Buffers: shared hit=3
 Total runtime: 0.442 ms

--
Kevin M. Goess
Software Engineer
Berkeley Electronic Press
kgoess@bepress.com

510-665-1200 x179
www.bepress.com

bepress: sustainable scholarly publishing  

Re: WHERE IN (subselect) versus WHERE IN (1,2,3,)

From
"Albe Laurenz"
Date:
Kevin Goess wrote:
> We have a table "contexts" with 1.6 million rows, and a table
"articles" with 1.4 million rows, where
> an "article" is a particular kind of "context".  We want to select
from a join on those two tables
> like this
>
>     SELECT COUNT(*)
>     FROM contexts
>     JOIN articles ON (articles.context_key=contexts.context_key)
>     WHERE contexts.context_key IN (...);
>     /* and some combination of columns from articles and contexts */
>
> If "IN(...)" is a query, then this guy does a seq scan on the contexts
table, even if the subquery is
> "select col_a from kgtest" where kgtest has one row.  If however I
read the ids beforehand and write
> them into the query, a la "IN (111,222,333...)", then the everything
is happy, up to at least 20,000
> values written into the sql, at which point smaller machines will take
2-5 minutes to parse the query.
>
> I can certainly write the ids inline into the SQL, but when I do that
I get the distinct impression
> that I'm Doing It Wrong.  Is this expected behavior?  It seems
surprising to me.
>
>
> To demonstrate:
>
>     /* nothing up my sleeve */
>     # select * from kgtest;
>       cola
>     ---------
>      1652729
>     (1 row)

[...]

>   /* subselect, query plan does seq scan on contexts */
[...]
>     ->  Seq Scan on kgtest  (cost=0.00..34.00 rows=2400 width=4)
(actual time=0.048..0.050 rows
[...]

There is something missing in this line, but according to what you wrote
it must be "actual [...] rows=1", And yet the planner assumes that the
scan will return 2400 rows.
That means that your statistics are not accurate.

As a first measure, you should ANALYZE the tables involved and see if
the problem persists.  If yes, post the new plans.

Yours,
Laurenz Albe

Re: WHERE IN (subselect) versus WHERE IN (1,2,3,)

From
Kevin Goess
Date:
On Mon, Mar 19, 2012 at 9:24 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
That means that your statistics are not accurate.

As a first measure, you should ANALYZE the tables involved and see if
the problem persists.  If yes, post the new plans.

Aha, thanks, that explains why my test table with one row was so bad.  But even with all freshly ANALYZE'd tables, I still see the query reverting to a sequential scan on that big contexts table once the number of rows in the subselect goes over 199.  Here's a simplified version that demonstrates the problem. 

production=> explain (analyze, buffers) SELECT contexts.context_key FROM contexts JOIN articles ON (articles.context_key=contexts.context_key) WHERE contexts.context_key IN (SELECT context_key FROM virtual_ancestors limit 200) AND articles.indexed;
                                                                     QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=7086.13..219322.15 rows=411736 width=4) (actual time=50.118..1213.046 rows=35 loops=1)
   Hash Cond: (contexts.context_key = articles.context_key)
   Buffers: shared hit=72539 read=100104
   ->  Seq Scan on contexts  (cost=0.00..190285.83 rows=1783283 width=4) (actual time=0.040..769.891 rows=1786074 loops=1)
         Buffers: shared hit=72399 read=100054
   ->  Hash  (cost=1939.43..1939.43 rows=411736 width=8) (actual time=3.510..3.510 rows=35 loops=1)
         Buckets: 65536  Batches: 1  Memory Usage: 2kB
         Buffers: shared hit=140 read=50
         ->  Nested Loop  (cost=6.18..1939.43 rows=411736 width=8) (actual time=0.203..3.487 rows=35 loops=1)
               Buffers: shared hit=140 read=50
               ->  HashAggregate  (cost=6.18..8.18 rows=200 width=4) (actual time=0.174..0.198 rows=48 loops=1)
                     Buffers: shared read=2
                     ->  Limit  (cost=0.00..3.68 rows=200 width=4) (actual time=0.015..0.108 rows=200 loops=1)
                           Buffers: shared read=2
                           ->  Seq Scan on virtual_ancestors  (cost=0.00..87676.17 rows=4759617 width=4) (actual time=0.015..0.075 rows=200 loops=1)
                                 Buffers: shared read=2
               ->  Index Scan using articles_pkey on articles  (cost=0.00..9.64 rows=1 width=4) (actual time=0.015..0.068 rows=1 loops=48)
                     Index Cond: (articles.context_key = virtual_ancestors.context_key)
                     Filter: articles.indexed
                     Buffers: shared hit=140 read=48
 Total runtime: 1213.138 ms
(21 rows)


But if I write the keys in the subquery inline, I get a very nice execution plan, all the way up to a tested maximum of about 50,000 keys:


production=> explain (analyze, buffers) SELECT contexts.context_key FROM contexts JOIN articles ON (articles.context_key=contexts.context_key) WHERE contexts.context_key IN (2482612,2482612,...) AND articles.indexed;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=758.71..3418.40 rows=200 width=4) (actual time=0.621..1.089 rows=35 loops=1)
   Buffers: shared hit=826 read=1
   ->  Bitmap Heap Scan on contexts  (cost=752.58..1487.55 rows=200 width=4) (actual time=0.604..0.699 rows=48 loops=1)
         Recheck Cond: (context_key = ANY ('{2482612,2482612,...}'::integer[]))
         Buffers: shared hit=639
         ->  Bitmap Index Scan on contexts_pkey  (cost=0.00..752.53 rows=200 width=0) (actual time=0.591..0.591 rows=200 loops=1)
               Index Cond: (context_key = ANY ('{2482612,2482612,...}'::integer[]))
               Buffers: shared hit=600
   ->  Bitmap Heap Scan on articles  (cost=6.13..9.64 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=48)
         Recheck Cond: (articles.context_key = contexts.context_key)
         Filter: articles.indexed
         Buffers: shared hit=187 read=1
         ->  Bitmap Index Scan on articles_pkey  (cost=0.00..6.13 rows=1 width=0) (actual time=0.005..0.005 rows=1 loops=48)
               Index Cond: (articles.context_key = contexts.context_key)
               Buffers: shared hit=148
 Total runtime: 1.147 ms

Is this expected behavior, that writing the ids inline does much better than the subquery?  I've been told that it's not, but this isn't the first time I've seen this, so I feel like I'm not understanding something.



Re: WHERE IN (subselect) versus WHERE IN (1,2,3,)

From
Tom Lane
Date:
Kevin Goess <kgoess@bepress.com> writes:
> On Mon, Mar 19, 2012 at 9:24 AM, Albe Laurenz <laurenz.albe@wien.gv.at>wrote:
>> That means that your statistics are not accurate.

> Aha, thanks, that explains why my test table with one row was so bad.  But
> even with all freshly ANALYZE'd tables, I still see the query reverting to
> a sequential scan on that big contexts table once the number of rows in the
> subselect goes over 199.  Here's a simplified version that demonstrates the
> problem.

You've still got a nasty join-size estimation error:

>          ->  Nested Loop  (cost=6.18..1939.43 rows=411736 width=8) (actual
> time=0.203..3.487 rows=35 loops=1)

It's not apparent why that's so far off ...

            regards, tom lane

Re: WHERE IN (subselect) versus WHERE IN (1,2,3,)

From
Tom Lane
Date:
I wrote:
> You've still got a nasty join-size estimation error:

>> ->  Nested Loop  (cost=6.18..1939.43 rows=411736 width=8) (actual
>> time=0.203..3.487 rows=35 loops=1)

> It's not apparent why that's so far off ...

What PG version is this, anyway?  It strikes me that this estimation
error might have something with the eqjoinsel bugs that we repaired
in 9.0.5.  I'm not having any luck reproducing such a bogus estimate
with current code, either, though that may just mean you've omitted
some critical info about how the tables are set up.

            regards, tom lane

Re: WHERE IN (subselect) versus WHERE IN (1,2,3,)

From
Kevin Goess
Date:
Thanks for looking into it, Tom.  We're using 9.0.4, so that might indeed be the problem. What additional data (if any) would you like to see?  If you want to look into it further, I can give you schema, though I hesitate to spam the whole list.  I could also mock up some tables and see what's the smallest data set that shows the problem and send you those in a dump.

The fact that the behavior changes so radically when the limit on the joined table goes from 199 to 200 rows does make me suspect somethings not behaving the way it should.

On Tue, Mar 20, 2012 at 4:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> You've still got a nasty join-size estimation error:

>> ->  Nested Loop  (cost=6.18..1939.43 rows=411736 width=8) (actual
>> time=0.203..3.487 rows=35 loops=1)

> It's not apparent why that's so far off ...

What PG version is this, anyway?  It strikes me that this estimation
error might have something with the eqjoinsel bugs that we repaired
in 9.0.5.  I'm not having any luck reproducing such a bogus estimate
with current code, either, though that may just mean you've omitted
some critical info about how the tables are set up.

                       regards, tom lane



--
Kevin M. Goess
Software Engineer
Berkeley Electronic Press
kgoess@bepress.com

510-665-1200 x179
www.bepress.com

bepress: sustainable scholarly publishing  

Re: WHERE IN (subselect) versus WHERE IN (1,2,3,)

From
Tom Lane
Date:
Kevin Goess <kgoess@bepress.com> writes:
> Thanks for looking into it, Tom.  We're using 9.0.4, so that might indeed
> be the problem. What additional data (if any) would you like to see?

Well, the first thing to do is update to 9.0.latest and see if the plan
changes.  There are plenty of good reasons to do that besides this
issue; see the release notes.

            regards, tom lane