unexpected results with NOT IN query - Mailing list pgsql-general

From Mason Hale
Subject unexpected results with NOT IN query
Date
Msg-id 8bca3aa10803200745q147e06f6j39ef9c5ab4b21b52@mail.gmail.com
Whole thread Raw
Responses Re: unexpected results with NOT IN query  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: unexpected results with NOT IN query  (Stephan Szabo <sszabo@megazone.bigpanda.com>)
List pgsql-general
Hello --

I'm getting some unexpected results with a NOT IN query -- this is on 8.2.5.

This is the query in question:

prod_2=> select id from feed_download_task where id in (111102466,141701504) and id not in (select last_feed_download_task_id from subscription);
 id
----
(0 rows)


This query returns zero rows, but I expect it to return 1 row, because I know that 111102466 *is not* in (select last_feed_download_task_id from subscription) and I know that 1471701504 *is* in that set, as demonstrated below:

Verify that both id values are in the feed_download_task table:

prod_2=> select id from feed_download_task where id in (111102466,141701504);
    id    
-----------
 141701504
 111102466
(2 rows)


Verify that 111102466 is NOT in the set of last_feed_download_task_id's, and that 141701504 is in this set:

prod_2=> select last_feed_download_task_id from subscription where last_feed_download_task_id in (111102466,141701504);
 last_feed_download_task_id
----------------------------
                  141701504
(1 row)


Here's the problem query again, with explain analyze.

prod_2=> select id from feed_download_task where id in (111102466,141701504) and id not in (select last_feed_download_task_id from subscription);
 id
----
(0 rows)

prod_2=> explain analyze select id from feed_download_task where id in (111102466,141701504) and id not in (select last_feed_download_task_id from subscription);
                              
                             QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on feed_download_task  (cost=45077.24..45083.27 rows=1 width=4) (actual time=601.229..601.229 rows=0 loops=1)
   Recheck Cond: (id = ANY ('{111102466,141701504}'::integer[]))
   Filter: (NOT (hashed subplan))
   ->  Bitmap Index Scan on feed_download_task_pkey  (cost=0.00..30.52 rows=2 width=0) (actual time=0.095..0.095 rows=2 loops=1)
         Index Cond: (id = ANY ('{111102466,141701504}'::integer[]))
   SubPlan
     ->  Seq Scan on subscription  (cost=0.00..44097.78 rows=379578 width=4) (actual time=0.032..488.193 rows=162365 loops=1)
 Total runtime: 601.281 ms
(8 rows)


I've tried re-analyzing and re-indexing the tables involved in this query, but I still left scratching my head.

I am also aware that I can use a left join instead of a NOT IN query -- but in this case I need to use this in a DELETE statement, which eliminates the possibility of the left join (I think).

Here is a version using a left outer join, it returns the expected result:

prod_2=# select feed_download_task.id from feed_download_task left join subscription on (subscription.last_feed_download_task_id = feed_download_task.id) where feed_download_task.id in (111102466,141701504) and subscription.id IS NOT NULL;
    id    
-----------
 141701504
(1 row)

Here is the explain analyze output for the above query:

prod_2=# explain analyze select feed_download_task.id from feed_download_task left join subscription on (subscription.last_feed_download_task_id = feed_download_task.id) where feed_download_task.id in (111102466,141701504) and subscription.id IS NOT NULL;
                                                                             QUERY PLAN                                                                             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=31.19..51.69 rows=1 width=4) (actual time=0.158..0.210 rows=1 loops=1)
   ->  Bitmap Heap Scan on feed_download_task  (cost=31.19..37.21 rows=2 width=4) (actual time=0.120..0.134 rows=2 loops=1)
         Recheck Cond: (id = ANY ('{111102466,141701504}'::integer[]))
         ->  Bitmap Index Scan on feed_download_task_pkey  (cost=0.00..31.19 rows=2 width=0) (actual time=0.102..0.102 rows=2 loops=1)
               Index Cond: (id = ANY ('{111102466,141701504}'::integer[]))
   ->  Index Scan using index_subscription_on_last_feed_download_task_id on subscription  (cost=0.00..7.23 rows=1 width=4) (actual time=0.036..0.037 rows=0 loops=2)
         Index Cond: (subscription.last_feed_download_task_id = feed_download_task.id)
         Filter: (id IS NOT NULL)

I feel like I must be missing something obvious.

Thanks in advance for the assistance.

cheers,
Mason

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Row size overhead
Next
From: Tom Lane
Date:
Subject: Re: unexpected results with NOT IN query