BUG #16749: EXPLAIN only shows filtering from the first query in a union chain when using distinct on. - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #16749: EXPLAIN only shows filtering from the first query in a union chain when using distinct on.
Date
Msg-id 16749-7a879a6a752b0843@postgresql.org
Whole thread Raw
Responses Re: BUG #16749: EXPLAIN only shows filtering from the first query in a union chain when using distinct on.
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      16749
Logged by:          Michael Richards
Email address:      mike22e@gmail.com
PostgreSQL version: 12.2
Operating system:   macOS Big Sur (11.0.1)
Description:

### Bug description:

The query plan produced by EXPLAIN ANALYZE does not show all filters being
applied when querying with distinct on over a union'd set of queries. It
only shows a filter as if there was just a single query (the first one in
the union chain). Confirmed in both Postgres 12 and 13.

### Example to reproduce:

Take the following query:

select * from generate_series(0,20) n1
  left join (
    select distinct on (id) * from (
      select n2 as id, n2+2 as a from generate_series(0,10) n2
      union all
      select n3+5 as id, n3+12 as a from generate_series(0,10) n3
    ) s1 order by id, a
  ) s2 on s2.id = n1 where 20 is distinct from s2.a;

Which produces the correct results (row with column a = 20 is removed), as
shown below:

 n1 | id | a  
----+----+----
  0 |  0 |  2
  1 |  1 |  3
  2 |  2 |  4
  3 |  3 |  5
  4 |  4 |  6
  5 |  5 |  7
  6 |  6 |  8
  7 |  7 |  9
  8 |  8 | 10
  9 |  9 | 11
 10 | 10 | 12
 11 | 11 | 18
 12 | 12 | 19
 14 | 14 | 21
 15 | 15 | 22
 16 |    |   
 17 |    |   
 18 |    |   
 19 |    |   
 20 |    |   
(20 rows)

However, the row that contained a = 20 was produced using n3+12 in the
second query in the union, but the EXPLAIN ANALYZE for this query only shows
the n2+2 as a filter; the n3+12 is nowhere to be seen as a filter in the
query plan.

 Hash Left Join  (cost=1.52..1.79 rows=20 width=12) (actual
time=0.098..0.105 rows=20 loops=1)
   Hash Cond: (n1.n1 = n2.n2)
   Filter: (20 IS DISTINCT FROM ((n2.n2 + 2)))
   Rows Removed by Filter: 1
   ->  Function Scan on generate_series n1  (cost=0.00..0.21 rows=21
width=4) (actual time=0.005..0.007 rows=21 loops=1)
   ->  Hash  (cost=1.24..1.24 rows=22 width=8) (actual time=0.068..0.068
rows=16 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Unique  (cost=0.91..1.02 rows=22 width=8) (actual
time=0.058..0.064 rows=16 loops=1)
               ->  Sort  (cost=0.91..0.96 rows=22 width=8) (actual
time=0.058..0.059 rows=22 loops=1)
                     Sort Key: n2.n2, ((n2.n2 + 2))
                     Sort Method: quicksort  Memory: 26kB
                     ->  Append  (cost=0.00..0.42 rows=22 width=8) (actual
time=0.010..0.024 rows=22 loops=1)
                           ->  Function Scan on generate_series n2
(cost=0.00..0.14 rows=11 width=8) (actual time=0.009..0.011 rows=11
loops=1)
                           ->  Function Scan on generate_series n3
(cost=0.00..0.17 rows=11 width=8) (actual time=0.009..0.011 rows=11
loops=1)

The relevant part of the query plan is lines 3-4.

Filter: (20 IS DISTINCT FROM ((n2.n2 + 2)))
Rows Removed by Filter: 1

The above filter did not actually remove any rows at all. The filter that
removed that 1 row should be Filter: (20 IS DISTINCT FROM ((n3.n3 + 12))).

This bug also applies when there are any number of union'd queries, not just
two—the query plan will still only show the filter from the first query in
the union chain. The query results seem to be correct, but the displayed
query plan does not account for it.

Cheers,
Michael


pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #16748: postgis30_13 installation broken
Next
From: Tom Lane
Date:
Subject: Re: SV: Problem with pg_notify / listen