Thread: BUG #18465: Wrong results from SELECT DISTINCT MIN in scalar subquery using HashAggregate

The following bug has been logged on the website:

Bug reference:      18465
Logged by:          Hal Takahara
Email address:      mtakahar@gmail.com
PostgreSQL version: 15.7
Operating system:   macOS 13.6
Description:

* The query in the example below returns wrong results when HashAggregate is
used for eliminating the duplicates for DISTINCT.

postgres=# select version();
                                                           version
                                                 

------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 15.7 (Homebrew) on x86_64-apple-darwin22.6.0, compiled by Apple
clang version 15.0.0 (clang-1500.1.0.2.5), 64-bit
(1 row)

postgres=# CREATE TABLE b (col_int int);
CREATE TABLE cc (col_int int);

INSERT INTO b values (1);
INSERT INTO cc values (null), (1), (2);

CREATE TABLE
postgres=# CREATE TABLE
postgres=# postgres=# INSERT 0 1
postgres=# INSERT 0 3
postgres=# postgres=# \pset null '<null>'
Null display is "<null>".
postgres=# SET enable_hashagg = ON; SET enable_sort = OFF;
SET
SET
postgres=# SELECT (  SELECT  DISTINCT  MIN( col_int  )  FROM B  AS
SUBQUERY1_t1  WHERE SUBQUERY1_t1 .col_int  = table1 .col_int  ), table1
.col_int  FROM CC  AS table1    ;
  min   | col_int 
--------+---------
 <null> |  <null>
 <null> |       1        <------- *** wrong ***
 <null> |       2
(3 rows)

postgres=# EXPLAIN SELECT (  SELECT  DISTINCT  MIN( col_int  )  FROM B  AS
SUBQUERY1_t1  WHERE SUBQUERY1_t1 .col_int  = table1 .col_int  ), table1
.col_int  FROM CC  AS table1    ;
                                       QUERY PLAN
           
----------------------------------------------------------------------------------------
 Seq Scan on cc table1  (cost=0.00..8306.82 rows=2550 width=8)
   SubPlan 2
     ->  HashAggregate  (cost=3.23..3.24 rows=1 width=4)
           Group Key: $1
           InitPlan 1 (returns $1)
             ->  Limit  (cost=0.00..3.22 rows=1 width=4)
                   ->  Seq Scan on b subquery1_t1  (cost=0.00..41.88 rows=13
width=4)
                         Filter: ((col_int IS NOT NULL) AND (col_int =
table1.col_int))
           ->  Result  (cost=3.22..3.23 rows=1 width=4)
(9 rows)

postgres=# SET enable_hashagg = OFF; SET enable_sort = ON;
SET
SET
postgres=# SELECT (  SELECT  DISTINCT  MIN( col_int  )  FROM B  AS
SUBQUERY1_t1  WHERE SUBQUERY1_t1 .col_int  = table1 .col_int  ), table1
.col_int  FROM CC  AS table1    ;
  min   | col_int 
--------+---------
 <null> |  <null>
      1 |       1        <------- *** correct ***
 <null> |       2
(3 rows)

postgres=# EXPLAIN SELECT (  SELECT  DISTINCT  MIN( col_int  )  FROM B  AS
SUBQUERY1_t1  WHERE SUBQUERY1_t1 .col_int  = table1 .col_int  ), table1
.col_int  FROM CC  AS table1    ;
                                       QUERY PLAN
           
----------------------------------------------------------------------------------------
 Seq Scan on cc table1  (cost=0.00..8319.57 rows=2550 width=8)
   SubPlan 2
     ->  Unique  (cost=3.24..3.25 rows=1 width=4)
           InitPlan 1 (returns $1)
             ->  Limit  (cost=0.00..3.22 rows=1 width=4)
                   ->  Seq Scan on b subquery1_t1  (cost=0.00..41.88 rows=13
width=4)
                         Filter: ((col_int IS NOT NULL) AND (col_int =
table1.col_int))
           ->  Sort  (cost=3.24..3.25 rows=1 width=4)
                 Sort Key: ($1)
                 ->  Result  (cost=3.22..3.23 rows=1 width=4)
(10 rows)


* The subquery is returning the first result for all the subsequent
tuples:

postgres=# SET enable_hashagg = ON; SET enable_sort = OFF;
SET
SET
postgres=# TRUNCATE TABLE cc;
TRUNCATE TABLE
postgres=# INSERT INTO cc values (1), (2), (null);
INSERT 0 3
postgres=# SET enable_hashagg = ON; SET enable_sort = OFF;
SET
SET
postgres=# SELECT (  SELECT  DISTINCT  MIN( col_int  )  FROM B  AS
SUBQUERY1_t1  WHERE SUBQUERY1_t1 .col_int  = table1 .col_int  ), table1
.col_int  FROM CC  AS table1    ;
 min | col_int 
-----+---------
   1 |       1        <------- *** correct ***
   1 |       2        <------- *** wrong ***
   1 |  <null>        <------- *** wrong ***
(3 rows)


> On Tue, May 14, 2024 at 09:14:34PM +0000, PG Bug reporting form wrote:
>
> postgres=# SELECT (  SELECT  DISTINCT  MIN( col_int  )  FROM B  AS
> SUBQUERY1_t1  WHERE SUBQUERY1_t1 .col_int  = table1 .col_int  ), table1
> .col_int  FROM CC  AS table1    ;
>   min   | col_int
> --------+---------
>  <null> |  <null>
>  <null> |       1        <------- *** wrong ***
>  <null> |       2
> (3 rows)
>
> postgres=# SELECT (  SELECT  DISTINCT  MIN( col_int  )  FROM B  AS
> SUBQUERY1_t1  WHERE SUBQUERY1_t1 .col_int  = table1 .col_int  ), table1
> .col_int  FROM CC  AS table1    ;
>  min | col_int
> -----+---------
>    1 |       1        <------- *** correct ***
>    1 |       2        <------- *** wrong ***
>    1 |  <null>        <------- *** wrong ***
> (3 rows)

Thanks, I could reproduce this in v15 and v16. Interestingly enough it
was fixed in d0d44049d1, but looks like due to lack of error reports the
fix was applied only on the main branch. I assume this one makes it
necessary to back-patch the fix.



Dmitry Dolgov <9erthalion6@gmail.com> writes:
> Thanks, I could reproduce this in v15 and v16. Interestingly enough it
> was fixed in d0d44049d1, but looks like due to lack of error reports the
> fix was applied only on the main branch. I assume this one makes it
> necessary to back-patch the fix.

Hah.  Nothing like fixing a bug that you know is there but you can't
demonstrate it ;-).  I confirm the problem appears in d0d44049d1^
(and at least as far back as v12), but not in d0d44049d1 or HEAD.
So yeah, add test case and back-patch.

Many thanks for the test case!

            regards, tom lane