BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field
Date
Msg-id 15922-969d62a9b4d95ee1@postgresql.org
Whole thread Raw
Responses Re: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      15922
Logged by:          David Raymond
Email address:      david.raymond@tomtom.com
PostgreSQL version: 11.4
Operating system:   Windows 7
Description:

Running version 11.4 on Windows 7, EnterpriseDB installer.
I have a case where I'm doing a simple select from a table but I'm getting
returned duplicates from its primary key field when I have two different
exists statements in the where clause.

I simplified the tables down to the bare bones, dumped them out with pg_dump
and re-loaded them. Immediately after loading the data the query returns the
correct, consistent answer. Then after running analyze it starts returning
the duplicates.

The query is this:
select count(*), count(distinct id) from n where exists(select 1 from n2h
where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and a_id =
'00005831-4900-1200-0000-0000773ae45f');

If either one of the 2 exists statements is not included then the results
don't have duplicates, it's only when they are both included that there's a
problem.

The three tables involved have 2 million rows in them, so I can't include
the full dump in the form here. Below is output from psql starting
immediately after the fresh table load and before analyze is done. Shows the
table layouts, the record counts, and the explain results for the same query
before and after analyze is done.

Please let me know what the next bits you need from me are.
Thank you,
-David Raymond <david.raymond@tomtom.com>

mnr=> set search_path to bug_test;
SET
Time: 1.427 ms
mnr=> \d+
                    List of relations
  Schema  | Name | Type  | Owner |  Size   | Description
----------+------+-------+-------+---------+-------------
 bug_test | n    | table | mnr   | 16 MB   |
 bug_test | n2a  | table | mnr   | 89 MB   |
 bug_test | n2h  | table | mnr   | 5800 kB |
(3 rows)

mnr=> \d+ n
                                  Table "bug_test.n"
 Column | Type | Collation | Nullable | Default | Storage | Stats target |
Description
--------+------+-----------+----------+---------+---------+--------------+-------------
 id     | uuid |           | not null |         | plain   |              |
Indexes:
    "n_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "n2a" CONSTRAINT "n2a_n_id_fkey" FOREIGN KEY (n_id) REFERENCES
n(id)
    TABLE "n2h" CONSTRAINT "n2h_n_id_fkey" FOREIGN KEY (n_id) REFERENCES
n(id)

mnr=> \d+ n2a
                                 Table "bug_test.n2a"
 Column | Type | Collation | Nullable | Default | Storage | Stats target |
Description
--------+------+-----------+----------+---------+---------+--------------+-------------
 n_id   | uuid |           | not null |         | plain   |              |
 a_id   | uuid |           | not null |         | plain   |              |
Indexes:
    "n2a_pkey" PRIMARY KEY, btree (n_id, a_id)
    "n2a_a_id_n_id_idx" btree (a_id, n_id)
Foreign-key constraints:
    "n2a_n_id_fkey" FOREIGN KEY (n_id) REFERENCES n(id)

mnr=> \d+ n2h
                                 Table "bug_test.n2h"
 Column | Type | Collation | Nullable | Default | Storage | Stats target |
Description
--------+------+-----------+----------+---------+---------+--------------+-------------
 n_id   | uuid |           | not null |         | plain   |              |
 h_id   | uuid |           | not null |         | plain   |              |
Indexes:
    "n2h_pkey" PRIMARY KEY, btree (n_id, h_id)
    "n2h_h_id_n_id_idx" btree (h_id, n_id)
Foreign-key constraints:
    "n2h_n_id_fkey" FOREIGN KEY (n_id) REFERENCES n(id)

mnr=> select 'n' as table_name, count(*) as record_count from n union all
select 'n2a', count(*) from n2a union all select 'n2h', count(*) from n2h;
 table_name | record_count
------------+--------------
 n          |      366,869
 n2a        |    1,546,626
 n2h        |       98,180
(3 rows)

Time: 342.223 ms
mnr=> select count(*), count(distinct id) from n where exists(select 1 from
n2h where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and
a_id = '00005831-4900-1200-0000-0000773ae45f');
 count | count
-------+-------
 6,531 | 6,531
(1 row)

Time: 426.043 ms
mnr=> explain analyze verbose select count(*), count(distinct id) from n
where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from
n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
                                                                     QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=10609.94..10609.95 rows=1 width=16) (actual
time=300.783..300.783 rows=1 loops=1)
   Output: count(*), count(DISTINCT n.id)
   ->  Nested Loop  (cost=1950.10..10590.61 rows=3866 width=16) (actual
time=30.074..295.967 rows=6531 loops=1)
         Output: n.id
         Inner Unique: true
         Join Filter: (n2h.n_id = n.id)
         ->  Nested Loop  (cost=1949.68..2044.12 rows=3866 width=32) (actual
time=30.064..270.423 rows=6531 loops=1)
               Output: n2a.n_id, n2h.n_id
               Inner Unique: true
               ->  HashAggregate  (cost=1949.25..1951.25 rows=200 width=16)
(actual time=29.997..46.915 rows=61325 loops=1)
                     Output: n2h.n_id
                     Group Key: n2h.n_id
                     ->  Seq Scan on bug_test.n2h  (cost=0.00..1703.80
rows=98180 width=16) (actual time=0.023..6.345 rows=98180 loops=1)
                           Output: n2h.n_id, n2h.h_id
               ->  Index Only Scan using n2a_pkey on bug_test.n2a
(cost=0.43..1.67 rows=39 width=16) (actual time=0.003..0.003 rows=0
loops=61325)
                     Output: n2a.n_id, n2a.a_id
                     Index Cond: ((n2a.n_id = n2h.n_id) AND (n2a.a_id =
'00005831-4900-1200-0000-0000773ae45f'::uuid))
                     Heap Fetches: 6531
         ->  Index Only Scan using n_pkey on bug_test.n  (cost=0.42..2.20
rows=1 width=16) (actual time=0.003..0.003 rows=1 loops=6531)
               Output: n.id
               Index Cond: (n.id = n2a.n_id)
               Heap Fetches: 6531
 Planning Time: 0.601 ms
 Execution Time: 301.614 ms
(24 rows)

Time: 302.919 ms
mnr=> analyze n, n2a, n2h;
ANALYZE
Time: 252.862 ms
mnr=> select count(*), count(distinct id) from n where exists(select 1 from
n2h where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and
a_id = '00005831-4900-1200-0000-0000773ae45f');
 count | count
-------+-------
 8,858 | 6,531
(1 row)

Time: 170.372 ms
mnr=> explain analyze verbose select count(*), count(distinct id) from n
where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from
n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');

      QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=21318.14..21318.15 rows=1 width=16) (actual
time=148.410..148.410 rows=1 loops=1)
   Output: count(*), count(DISTINCT n.id)
   ->  Gather  (cost=7592.81..21291.01 rows=5426 width=16) (actual
time=65.294..146.754 rows=8858 loops=1)
         Output: n.id
         Workers Planned: 2
         Workers Launched: 2
         ->  Nested Loop  (cost=6592.81..19748.41 rows=2261 width=16)
(actual time=21.688..48.404 rows=2953 loops=3)
               Output: n.id
               Inner Unique: true
               Worker 0: actual time=1.096..4.734 rows=24 loops=1
               Worker 1: actual time=0.470..6.426 rows=225 loops=1
               ->  Parallel Hash Join  (cost=6592.38..18216.45 rows=2263
width=32) (actual time=21.544..31.739 rows=2953 loops=3)
                     Output: n2a.n_id, n2h.n_id
                     Hash Cond: (n2a.n_id = n2h.n_id)
                     Worker 0: actual time=0.828..3.801 rows=24 loops=1
                     Worker 1: actual time=0.333..4.385 rows=225 loops=1
                     ->  Parallel Bitmap Heap Scan on bug_test.n2a
(cost=1113.32..12667.03 rows=14457 width=16) (actual time=1.187..5.693
rows=10193 loops=3)
                           Output: n2a.n_id, n2a.a_id
                           Recheck Cond: (n2a.a_id =
'00005831-4900-1200-0000-0000773ae45f'::uuid)
                           Heap Blocks: exact=2173
                           Worker 0: actual time=0.196..2.202 rows=246
loops=1
                           Worker 1: actual time=0.105..2.785 rows=1394
loops=1
                           ->  Bitmap Index Scan on n2a_a_id_n_id_idx
(cost=0.00..1104.65 rows=34696 width=0) (actual time=2.861..2.861 rows=30578
loops=1)
                                 Index Cond: (n2a.a_id =
'00005831-4900-1200-0000-0000773ae45f'::uuid)
                     ->  Parallel Hash  (cost=4967.71..4967.71 rows=40908
width=16) (actual time=19.568..19.569 rows=32727 loops=3)
                           Output: n2h.n_id
                           Buckets: 131072  Batches: 1  Memory Usage:
5632kB
                           Worker 0: actual time=0.077..0.077 rows=0
loops=1
                           Worker 1: actual time=0.050..0.050 rows=0
loops=1
                           ->  Parallel Index Only Scan using n2h_pkey on
bug_test.n2h  (cost=0.42..4967.71 rows=40908 width=16) (actual
time=0.024..32.132 rows=98180 loops=1)
                                 Output: n2h.n_id
                                 Heap Fetches: 98180
               ->  Index Only Scan using n_pkey on bug_test.n
(cost=0.42..0.68 rows=1 width=16) (actual time=0.004..0.004 rows=1
loops=8858)
                     Output: n.id
                     Index Cond: (n.id = n2h.n_id)
                     Heap Fetches: 8858
                     Worker 0: actual time=0.035..0.035 rows=1 loops=24
                     Worker 1: actual time=0.007..0.007 rows=1 loops=225
 Planning Time: 0.755 ms
 Execution Time: 155.490 ms
(40 rows)

Time: 156.952 ms
mnr=> select count(*), count(distinct id) from n where exists (select 1 from
n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
 count  | count
--------+--------
 30,578 | 30,578
(1 row)

Time: 95.252 ms
mnr=> select count(*), count(distinct id) from n where exists (select 1 from
n2h where n_id = n.id);
 count  | count
--------+--------
 61,325 | 61,325
(1 row)

Time: 574.642 ms
mnr=> select version();
                          version
------------------------------------------------------------
 PostgreSQL 11.4, compiled by Visual C++ build 1914, 64-bit
(1 row)

Time: 0.305 ms
mnr=>


pgsql-bugs by date:

Previous
From: Manuel Rigger
Date:
Subject: DROP STATISTICS results in "ERROR: tuple concurrently updated"
Next
From: Tom Lane
Date:
Subject: Re: A function privilege problem