Re: BUG #15577: Query returns different results when executedmultiple times - Mailing list pgsql-bugs

From Bartosz Polnik
Subject Re: BUG #15577: Query returns different results when executedmultiple times
Date
Msg-id CAM37ZevSHt0pdKhZ_Bu7nDoMMWSDsJaaPUPR_ApowoFida4fcA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #15577: Query returns different results when executedmultiple times  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: BUG #15577: Query returns different results when executedmultiple times  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-bugs
For the following sql:

SET force_parallel_mode = on;
explain (costs false, analyze true)
  SELECT ta.id AS table_a_id,
         tc.id as table_c_id,
         tba.id AS table_b_id
  FROM test.table_b_active tba
         INNER JOIN test.table_c tc ON tba.target_id = tc.id
         INNER JOIN test.table_d td ON tc.table_d_id = td.id
         LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
  WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
    AND td.group = 'A'
    AND tc.table_e_id = 4
    AND (
      (tba.target_id = tc.id AND tba.group_type = 'A')
      OR tba.source_id = tc.id
    );


I got:

                                                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=145.151..284.612 rows=26 loops=1)
   ->  Gather (actual time=145.092..304.166 rows=26 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Nested Loop (actual time=153.289..247.768 rows=13 loops=2)
               ->  Hash Join (actual time=0.221..15.028 rows=945 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.085..14.442 rows=1751 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49117
                     ->  Hash (actual time=0.044..0.045 rows=8 loops=2)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.028..0.033 rows=8 loops=2)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.242..0.246 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.016..0.016 rows=1 loops=26)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 5.024 ms
 Execution Time: 304.858 ms


On Second execution:
                                                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=228.405..279.383 rows=19 loops=1)
   ->  Gather (actual time=228.376..288.807 rows=19 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Nested Loop (actual time=150.785..242.440 rows=10 loops=2)
               ->  Hash Join (actual time=0.212..14.353 rows=945 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.075..13.794 rows=1751 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49117
                     ->  Hash (actual time=0.050..0.050 rows=8 loops=2)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d_id td (actual time=0.031..0.037 rows=8 loops=2)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.237..0.241 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.021..0.021 rows=1 loops=19)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 4.816 ms
 Execution Time: 289.503 ms


I executed it a couple of times and saw in output 26 rows (as above), 21 rows (also above), 28 rows, 28 rows, 29 rows etc. 
The changing parts (nr of rows) were: Nested Loop Left Join and Nested Loop.


For the following sql:

SET parallel_leader_participation = on;
explain (costs false, analyze true)
  SELECT ta.id AS table_a_id,
         tc.id as table_c_id,
         tba.id AS table_b_id
  FROM test.table_b_active tba
         INNER JOIN test.table_c tc ON tba.target_id = tc.id
         INNER JOIN test.table_d td ON tc.table_d_id = td.id
         LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
  WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
    AND td.group = 'A'
    AND tc.table_e_id = 4
    AND (
      (tba.target_id = tc.id AND tba.group_type = 'A')
      OR tba.source_id = tc.id
    );

I received:

                                                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=143.805..279.354 rows=29 loops=1)
   ->  Gather (actual time=143.735..299.066 rows=29 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Nested Loop (actual time=149.334..242.108 rows=15 loops=2)
               ->  Hash Join (actual time=0.203..14.536 rows=945 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.072..13.971 rows=1751 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49117
                     ->  Hash (actual time=0.043..0.043 rows=8 loops=2)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.028..0.034 rows=8 loops=2)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.237..0.240 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.018..0.018 rows=1 loops=29)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 5.070 ms
 Execution Time: 299.849 ms



                                                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=231.714..276.770 rows=26 loops=1)
   ->  Gather (actual time=231.654..295.971 rows=26 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Nested Loop (actual time=151.209..239.965 rows=13 loops=2)
               ->  Hash Join (actual time=0.214..14.467 rows=945 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.070..13.923 rows=1751 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49117
                     ->  Hash (actual time=0.045..0.045 rows=8 loops=2)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.028..0.033 rows=8 loops=2)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.235..0.238 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.017..0.017 rows=1 loops=26)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 4.976 ms
 Execution Time: 296.729 ms


I also made a test with:

SET parallel_leader_participation = off;
SET force_parallel_mode = on;
explain (costs false, analyze true)
  SELECT ta.id AS table_a_id,
         tc.id as table_c_id,
         tba.id AS table_b_id
  FROM test.table_b_active tba
         INNER JOIN test.table_c tc ON tba.target_id = tc.id
         INNER JOIN test.table_d td ON tc.table_d_id = td.id
         LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
  WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
    AND td.group = 'A'
    AND tc.table_e_id = 4
    AND (
      (tba.target_id = tc.id AND tba.group_type = 'A')
      OR tba.source_id = tc.id
    );

The output is correct - 31 rows, but the plan is different:
                                                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather (actual time=263.880..507.668 rows=31 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Nested Loop Left Join (actual time=193.246..415.020 rows=31 loops=1)
         ->  Nested Loop (actual time=192.997..414.336 rows=31 loops=1)
               ->  Hash Join (actual time=0.287..23.958 rows=1890 loops=1)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Seq Scan on table_c tc (actual time=0.128..23.044 rows=3501 loops=1)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 98233
                     ->  Hash (actual time=0.064..0.069 rows=8 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.045..0.052 rows=8 loops=1)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.203..0.206 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
         ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.021..0.021 rows=1 loops=31)
               Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 5.051 ms
 Execution Time: 507.858 ms


On Mon, Jan 7, 2019 at 8:59 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Tue, Jan 8, 2019 at 1:56 AM Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
> I've modified the script:
> 1. I Added SET force_parallel_mode = on;:

What if you try that with your original query in the database that
shows the problem?  Or alternatively SET parallel_leader_participation
= on.  What I'm wondering is whether the parallel worker is
consistently getting the same result, but it's a different result from
the leader, and the variation comes from the race between leader and
worker, or whether the instability comes from somewhere else.

--
Thomas Munro
http://www.enterprisedb.com

pgsql-bugs by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: BUG #15579: Adding a column with default from configurationparameter fails on 11.1
Next
From: Thomas Munro
Date:
Subject: Re: BUG #15577: Query returns different results when executedmultiple times