Thread: select distinct in a subquery bug/problem

select distinct in a subquery bug/problem

From

"Dan Halbert"

Date:

11 August 2012, 03:32:41

In version 9.1.4-0ubuntu12.04:

Hi - I am getting wrong answers from a certain kind of query, and have narrowed it down to a change in the query plan between two similar queries. The two queries below use different query plans, and generate different results, one of which is completely wrong.

1. select count(t1_id) from t1 where t1_id not in (select distinct t1_id from t2 limit 1103) ==> 13357 [CORRECT result]

2. select count(t1_id) from t1 where t1_id not in (select distinct t1_id from t2 limit 1104) ==> 0 [WRONG result; should be close to 13357]

(See the query plans at end of this message.)

Obviously, the exact numbers are dependent on the table sizes and the data. In this case, t1 has about 14k rows, and t2 has about 210k rows. t1_id in t2 is a foreign key to t1.

Originally I was not using the LIMIT above, and was getting 0 from the query, so I tried some limits to figure out if it made a difference, and then noticed it was due to the query plan difference.

I've tried to reproduce this bug using test tables with at least as many rows. I can generate both query plans, but the query results are correct for both.

So there seems to be something odd or corrupted about my particular table t2, or something about it is exercising a bug. T2 is actually quite a large table with 20 columns, 9 foreign keys, and some more indexed columns. T2 has had various columns renamed, deleted and added over its lifetime.

I've tried VACUUM FULL, REINDEX, and I've tried dropping and recreating the foreign key on t2. Nothing has fixed the problem. I have not yet dumped and reloaded the whole database.

Any suggestions or insight on this? I'm pretty disturbed at getting wrong answers. There was some discussion in the past on this list about the efficiency of "NOT IN (SELECT DISTINCT ...)", but I haven't yet found any bug reports about incorrect results. I'm sorry I haven't been able to create reproducible test case yet.

Thanks,

Dan

------------------------------------

1. Query plan:

Aggregate (cost=6621.84..6621.85 rows=1 width=4)
-> Seq Scan on t1 (cost=6346.02..6603.77 rows=7230 width=4)
Filter: (NOT (hashed SubPlan 1))
SubPlan 1
-> Limit (cost=0.00..6343.26 rows=1103 width=4)
-> Unique (cost=0.00..19926.92 rows=3465 width=4)

-> Index Scan using fki_t2_t1_id_fkey on t2 (cost=0.00..19402.46 rows=209786 width=4)

2. Query plan:

Aggregate (cost=6625.95..6625.96 rows=1 width=4)
-> Seq Scan on t1 (cost=6350.13..6607.88 rows=7230 width=4)
Filter: (NOT (hashed SubPlan 1))
SubPlan 1
-> Limit (cost=6336.33..6347.37 rows=1104 width=4)
-> HashAggregate (cost=6336.33..6370.98 rows=3465 width=4)
-> Seq Scan on t2 (cost=0.00..5811.86 rows=209786 width=4)

Re: select distinct in a subquery bug/problem

From

Tom Lane

Date:

11 August 2012, 05:52:42

"Dan Halbert" <halbert@halwitz.org> writes:
> 1. select count(t1_id) from t1 where t1_id not in (select distinct t1_id from t2 limit 1103)  ==> 13357   [CORRECT
result]
> 2. select count(t1_id) from t1 where t1_id not in (select distinct t1_id from t2 limit 1104)  ==> 0   [WRONG result;
shouldbe close to 13357] 

I'm betting the 1104'th row in t2 is a null.  NOT IN generally doesn't
do what people expect when there are nulls in the sub-select ... but
it is acting per spec.

The apparent dependency on which plan is chosen is illusory and stems
from the fact that "select distinct ... limit" gives you an unspecified
set of rows.  I think the indexscan/unique plan would produce nulls
last, so you'd not see them as long as the limit was less than the
number of distinct values ... but if the distinct is done via hash
aggregation, the null entry could come out much earlier.

            regards, tom lane

Re: select distinct in a subquery bug/problem

From

Raymond O'Donnell

Date:

11 August 2012, 18:21:12

On 11/08/2012 04:32, Dan Halbert wrote:
> In version 9.1.4-0ubuntu12.04:
>
>
>
> Hi - I am getting wrong answers from a certain kind of query, and have
> narrowed it down to a change in the query plan between two similar
> queries. The two queries below use different query plans, and generate
> different results, one of which is completely wrong.
>
>
>
> 1. select count(t1_id) from t1 where t1_id not in (select distinct t1_id
> from t2 limit 1103)  ==> 13357   [CORRECT result]
>
> 2. select count(t1_id) from t1 where t1_id not in (select distinct t1_id
> from t2 limit 1104)  ==> 0   [WRONG result; should be close to 13357]

Does it make a difference if you include an ORDER BY in the subquery?
AIUI, the particular result set from the subquery is indeterminate (in
theory anyway) without it.

Ray.


--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie

Re: select distinct in a subquery bug/problem

From

Dan Halbert

Date:

11 August 2012, 19:01:24

On 8/11/2012 2:21 PM, Raymond O'Donnell wrote:
> On 11/08/2012 04:32, Dan Halbert wrote:
>> 1. select count(t1_id) from t1 where t1_id not in (select distinct t1_id
>> from t2 limit 1103)  ==> 13357   [CORRECT result]
>>
>> 2. select count(t1_id) from t1 where t1_id not in (select distinct t1_id
>> from t2 limit 1104)  ==> 0   [WRONG result; should be close to 13357]
> Does it make a difference if you include an ORDER BY in the subquery?
> AIUI, the particular result set from the subquery is indeterminate (in
> theory anyway) without it.
>
> Ray.
>
Yes, it's artificial. The real problem, as Tom Lane hinted, is that NOT
IN (..., NULL, ...) returns NULL and messes up the result, so if the
SELECT produces any NULL's, the query doesn't do what I want. So I
needed to not use NOT IN or make sure there are no NULL's

Dan