BUG #15577: Query returns different results when executed multipletimes - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #15577: Query returns different results when executed multipletimes
Date
Msg-id 15577-ca61ab18904af852@postgresql.org
Whole thread Raw
Responses Re: BUG #15577: Query returns different results when executedmultiple times  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      15577
Logged by:          Bartosz Polnik
Email address:      bartoszpolnik@gmail.com
PostgreSQL version: 11.1
Operating system:   Windows 10
Description:

Hello,

A couple of days ago I stumbled upon a query that doesn't return all rows it
should. When we execute it multiple times, we get only a subset of data. The
query is:

SELECT ta.id AS table_a_id,
       tc.id as table_c_id,
       tba.id AS table_b_id
FROM test.table_b_active tba
       INNER JOIN test.table_c tc ON tba.target_id = tc.id
       INNER JOIN test.table_d td ON tc.table_d_id = td.id
       LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
  AND td.group = 'A'
  AND tc.table_e_id = 4
  AND (
    (tba.target_id = tc.id AND tba.group_type = 'A')
    OR tba.source_id = tc.id
  );

We always expect 31 rows in return:

table_a_id | table_c_id | table_b_id
------------+------------+------------
   16116185 |     328860 |    2936924
   16115788 |     348539 |    2913754
   16115788 |     348539 |    3039173
   16115788 |     348539 |    2913874
   16116256 |     293541 |    2997160
   16116256 |     293541 |    2901938
   16116256 |     293541 |    2901933
   16114789 |     292051 |    3038539
   16114813 |     342353 |    3052371
   16116069 |     351585 |    3025941
   16114814 |     331329 |    2946332
   16115861 |     350487 |    2933633
   16115106 |     350047 |    2902075
   16116066 |     351434 |    3010909
   16114811 |     298605 |    2998909
   16114811 |     298605 |    2893809
   16114811 |     298605 |    2987038
   16114811 |     298605 |    3038877
   16114811 |     298605 |    3010694
   16114811 |     298605 |    2893188
   16114811 |     298605 |    2893391
   16114811 |     298605 |    2983360
   16114811 |     298605 |    3038221
   16114811 |     298605 |    3026078
   16114811 |     298605 |    2998966
   16116260 |     290583 |    2955483
   16116260 |     290583 |    2921135
   16116260 |     290583 |    2947914
   16116260 |     290583 |    2901669
   16116241 |     295971 |    3038921
   16116249 |     296708 |    3038888

But when we run it using psql, we get different results (not all rows are
returned). Here's the query I used to get the execution plan and example
output from two consecutive plan executions:

explain (costs false, analyze true)
  SELECT ta.id AS table_a_id,
         tc.id as table_c_id,
         tba.id AS table_b_id
  FROM test.table_b_active tba
         INNER JOIN test.table_c tc ON tba.target_id = tc.id
         INNER JOIN test.table_d td ON tc.table_d_id = td.id
         LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
  WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
    AND td.group = 'A'
    AND tc.table_e_id = 4
    AND (
      (tba.target_id = tc.id AND tba.group_type = 'A')
      OR tba.source_id = tc.id
    );


Nested Loop Left Join  (actual time=157.436..329.538 rows=21 loops=1)
  ->  Gather  (actual time=157.407..349.743 rows=21 loops=1)
        Workers Planned: 1
        Workers Launched: 1
        ->  Nested Loop (actual time=167.978..281.540 rows=11 loops=2)
              ->  Hash Join  (actual time=0.300..18.206 rows=945 loops=2)
                    Hash Cond: (tc.table_d_id = td.id)
                    ->  Parallel Seq Scan on table_c tc (actual
time=0.107..17.326 rows=1751 loops=2)
                          Filter: (table_e_id = 4)
                          Rows Removed by Filter: 49117
                    ->  Hash (actual time=0.065..0.065 rows=8 loops=2)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Seq Scan on table_d td  (actual
time=0.043..0.049 rows=8 loops=2)
                                Filter: (group = 'A')
                                Rows Removed by Filter: 55
              ->  Index Scan using table_b_idx_target_id on table_b (actual
time=0.273..0.278 rows=0 loops=1890)
                    Index Cond: (target_id = tc.id)
                    Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A')) OR
(source_id = tc.id)))
                    Rows Removed by Filter: 26
  ->  Index Scan using table_a_uq_001 on table_a ta (actual
time=0.019..0.019 rows=1 loops=21)
        Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 1.218 ms
Execution Time: 350.283 ms

Above we only got 21 rows, instead of 31.

Nested Loop Left Join (actual time=164.189..358.068 rows=28 loops=1)
  ->  Gather  (actual time=164.140..370.034 rows=28 loops=1)
        Workers Planned: 1
        Workers Launched: 1
        ->  Nested Loop  (actual time=188.718..306.289 rows=14 loops=2)
              ->  Hash Join (actual time=0.321..21.860 rows=945 loops=2)
                    Hash Cond: (tc.table_d_id = td.id)
                    ->  Parallel Seq Scan on table_c tc (actual
time=0.113..20.784 rows=1751 loops=2)
                          Filter: (table_e_id = 4)
                          Rows Removed by Filter: 49117
                    ->  Hash (actual time=0.063..0.063 rows=8 loops=2)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Seq Scan on table_d td (actual
time=0.040..0.049 rows=8 loops=2)
                                Filter: (group = 'A')
                                Rows Removed by Filter: 55
              ->  Index Scan using table_b_idx_target_id on table_b (actual
time=0.294..0.300 rows=0 loops=1890)
                    Index Cond: (target_id = tc.id)
                    Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A')) OR
(source_id = tc.id)))
                    Rows Removed by Filter: 26
  ->  Index Scan using table_a_uq_001 on table_a ta (actual
time=0.031..0.031 rows=1 loops=28)
        Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 0.971 ms
Execution Time: 371.202 ms

Here we only got 28 rows, instead of 31.

Used indices:
create index table_b_idx_target_id
   on test.table_b (target_id)
   where (t_active IS TRUE);

create unique index table_a_uq_001
  on table_a (table_c_id, date);

And views:
create view test.table_b_active as
SELECT *
FROM test.table_b
WHERE (table_b.t_active IS TRUE);

I made a couple of tests and couldn't reproduce this problem with
max_parallel_workers=0. With this parameter set, we get the execution
plan:

Nested Loop Left Join (actual time=212.010..509.274 rows=31 loops=1)
  ->  Gather (actual time=211.976..508.855 rows=31 loops=1)
        Workers Planned: 1
        Workers Launched: 0
        ->  Nested Loop (actual time=211.593..508.425 rows=31 loops=1)
              ->  Hash Join (actual time=0.063..31.628 rows=1890 loops=1)
                    Hash Cond: (tc.table_d_id = td.id)
                    ->  Parallel Seq Scan on table_c tc (actual
time=0.042..30.000 rows=3501 loops=1)
                          Filter: (table_e_id = 4)
                          Rows Removed by Filter: 98233
                    ->  Hash (actual time=0.016..0.016 rows=8 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Seq Scan on table_d_id td (actual
time=0.008..0.012 rows=8 loops=1)
                                Filter: (group = 'A')
                                Rows Removed by Filter: 55
              ->  Index Scan using table_b_idx_target_id on table_b (actual
time=0.248..0.251 rows=0 loops=1890)
                    Index Cond: (target_id = tc.id)
                    Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A')) OR
(source_id = tc.id)))
                    Rows Removed by Filter: 26
  ->  Index Scan using table_a_uq_001 on table_a ta (actual
time=0.014..0.014 rows=1 loops=31)
        Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 1.135 ms
Execution Time: 509.431 ms

Nr of rows returned is correct - 31.

I tried to create a clean schema with test data, but couldn't get the same
execution plan, so I can't include that. 
The data above comes from db snapshot with table, index and view names
changed by hand, so there might be some typos. Sorry about them.

I'm using PostgreSQL 11.1, compiled by Visual C++ build 1914, 64-bit.

Best regards,
Bartosz Polnik


pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #15446: Crash on ALTER TABLE
Next
From: Andrew Dunstan
Date:
Subject: Re: BUG #15446: Crash on ALTER TABLE