BUG #16749: EXPLAIN only shows filtering from the first query in a union chain when using distinct on. - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #16749: EXPLAIN only shows filtering from the first query in a union chain when using distinct on. |
Date | |
Msg-id | 16749-7a879a6a752b0843@postgresql.org Whole thread Raw |
Responses |
Re: BUG #16749: EXPLAIN only shows filtering from the first query in a union chain when using distinct on.
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 16749 Logged by: Michael Richards Email address: mike22e@gmail.com PostgreSQL version: 12.2 Operating system: macOS Big Sur (11.0.1) Description: ### Bug description: The query plan produced by EXPLAIN ANALYZE does not show all filters being applied when querying with distinct on over a union'd set of queries. It only shows a filter as if there was just a single query (the first one in the union chain). Confirmed in both Postgres 12 and 13. ### Example to reproduce: Take the following query: select * from generate_series(0,20) n1 left join ( select distinct on (id) * from ( select n2 as id, n2+2 as a from generate_series(0,10) n2 union all select n3+5 as id, n3+12 as a from generate_series(0,10) n3 ) s1 order by id, a ) s2 on s2.id = n1 where 20 is distinct from s2.a; Which produces the correct results (row with column a = 20 is removed), as shown below: n1 | id | a ----+----+---- 0 | 0 | 2 1 | 1 | 3 2 | 2 | 4 3 | 3 | 5 4 | 4 | 6 5 | 5 | 7 6 | 6 | 8 7 | 7 | 9 8 | 8 | 10 9 | 9 | 11 10 | 10 | 12 11 | 11 | 18 12 | 12 | 19 14 | 14 | 21 15 | 15 | 22 16 | | 17 | | 18 | | 19 | | 20 | | (20 rows) However, the row that contained a = 20 was produced using n3+12 in the second query in the union, but the EXPLAIN ANALYZE for this query only shows the n2+2 as a filter; the n3+12 is nowhere to be seen as a filter in the query plan. Hash Left Join (cost=1.52..1.79 rows=20 width=12) (actual time=0.098..0.105 rows=20 loops=1) Hash Cond: (n1.n1 = n2.n2) Filter: (20 IS DISTINCT FROM ((n2.n2 + 2))) Rows Removed by Filter: 1 -> Function Scan on generate_series n1 (cost=0.00..0.21 rows=21 width=4) (actual time=0.005..0.007 rows=21 loops=1) -> Hash (cost=1.24..1.24 rows=22 width=8) (actual time=0.068..0.068 rows=16 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Unique (cost=0.91..1.02 rows=22 width=8) (actual time=0.058..0.064 rows=16 loops=1) -> Sort (cost=0.91..0.96 rows=22 width=8) (actual time=0.058..0.059 rows=22 loops=1) Sort Key: n2.n2, ((n2.n2 + 2)) Sort Method: quicksort Memory: 26kB -> Append (cost=0.00..0.42 rows=22 width=8) (actual time=0.010..0.024 rows=22 loops=1) -> Function Scan on generate_series n2 (cost=0.00..0.14 rows=11 width=8) (actual time=0.009..0.011 rows=11 loops=1) -> Function Scan on generate_series n3 (cost=0.00..0.17 rows=11 width=8) (actual time=0.009..0.011 rows=11 loops=1) The relevant part of the query plan is lines 3-4. Filter: (20 IS DISTINCT FROM ((n2.n2 + 2))) Rows Removed by Filter: 1 The above filter did not actually remove any rows at all. The filter that removed that 1 row should be Filter: (20 IS DISTINCT FROM ((n3.n3 + 12))). This bug also applies when there are any number of union'd queries, not just two—the query plan will still only show the filter from the first query in the union chain. The query results seem to be correct, but the displayed query plan does not account for it. Cheers, Michael
pgsql-bugs by date: