Home > mailing lists

Re: BUG #15577: Query returns different results when executedmultiple times - Mailing list pgsql-bugs

From	Bartosz Polnik
Subject	Re: BUG #15577: Query returns different results when executedmultiple times
Date	January 7, 2019 23:04:41
Msg-id	CAM37Zeu1gv=PswA-C4WeCN13KrrXcD9s1mTdaTuVTKjV=aLoiA@mail.gmail.com Whole thread Raw
In response to	Re: BUG #15577: Query returns different results when executedmultiple times (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses	Re: BUG #15577: Query returns different results when executedmultiple times
List	pgsql-bugs

Tree view

For:

SET max_parallel_workers = 0;

explain (costs false, analyze true)

SELECT ta.id AS table_a_id,

tc.id as table_c_id,

tba.id AS table_b_id

FROM test.table_b_active tba

INNER JOIN test.table_c tc ON tba.target_id = tc.id

INNER JOIN test.table_d td ON tc.table_d_id = td.id

LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =

'2018-08-31' :: DATE

WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE

AND td.group = 'A'

AND tc.table_e_id = 4

AND (

(tba.target_id = tc.id AND tba.group_type = 'A')

OR tba.source_id = tc.id

);

SELECT ta.id AS table_a_id,

tc.id as table_c_id,

tba.id AS table_b_id

FROM test.table_b_active tba

INNER JOIN test.table_c tc ON tba.target_id = tc.id

INNER JOIN test.table_d td ON tc.table_d_id = td.id

LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =

'2018-08-31' :: DATE

WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE

AND td.group = 'A'

AND tc.table_e_id = 4

AND (

(tba.target_id = tc.id AND tba.group_type = 'A')

OR tba.source_id = tc.id

);

I got:

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Nested Loop Left Join (actual time=196.969..433.340 rows=31 loops=1)

-> Gather (actual time=196.897..432.757 rows=31 loops=1)

Workers Planned: 1

Workers Launched: 0

-> Nested Loop (actual time=196.384..432.197 rows=31 loops=1)

-> Hash Join (actual time=0.104..24.477 rows=1890 loops=1)

Hash Cond: (tc.table_d_id = td.id)

-> Parallel Seq Scan on table_c tc (actual time=0.060..23.639 rows=3501 loops=1)

Filter: (table_e_id = 4)

Rows Removed by Filter: 98233

-> Hash (actual time=0.022..0.023 rows=8 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 9kB

-> Seq Scan on table_d td (actual time=0.009..0.013 rows=8 loops=1)

Filter: (group = 'A'::test.group)

Rows Removed by Filter: 55

-> Index Scan using table_b_idx_target_id on table_b (actual time=0.212..0.215 rows=0 loops=1890)

Index Cond: (target_id = tc.id)

Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))

Rows Removed by Filter: 26

-> Index Scan using table_a_uq_001 on table_a ta (actual time=0.019..0.019 rows=1 loops=31)

Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))

Planning Time: 5.084 ms

Execution Time: 433.630 ms

table_a_id | table_c_id | table_b_id

------------+------------+------------

16116185 | 328860 | 2936924

16115788 | 348539 | 3039173

16115788 | 348539 | 2913874

16115788 | 348539 | 2913754

16116256 | 293541 | 2901938

16116256 | 293541 | 2901933

16116256 | 293541 | 2997160

16114789 | 292051 | 3038539

16114813 | 342353 | 3052371

16116069 | 351585 | 3025941

16114814 | 331329 | 2946332

16115861 | 350487 | 2933633

16115106 | 350047 | 2902075

16116066 | 351434 | 3010909

16114811 | 298605 | 2893809

16114811 | 298605 | 2987038

16114811 | 298605 | 3038877

16114811 | 298605 | 3010694

16114811 | 298605 | 2893188

16114811 | 298605 | 2893391

16114811 | 298605 | 2983360

16114811 | 298605 | 3038221

16114811 | 298605 | 3026078

16114811 | 298605 | 2998966

16114811 | 298605 | 2998909

16116260 | 290583 | 2955483

16116260 | 290583 | 2921135

16116260 | 290583 | 2947914

16116260 | 290583 | 2901669

16116241 | 295971 | 3038921

16116249 | 296708 | 3038888

(31 rows)

When I use force_parallel_mode = on, parallel_leader_participation = off and removed max_parallel_workers line from the script (so that default applies), I got:

QUERY PLAN

Gather (actual time=271.255..530.713 rows=31 loops=1)

Workers Planned: 1

Workers Launched: 1

Single Copy: true

-> Nested Loop Left Join (actual time=200.434..437.321 rows=31 loops=1)

-> Nested Loop (actual time=200.126..436.547 rows=31 loops=1)

-> Hash Join (actual time=0.322..25.510 rows=1890 loops=1)

Hash Cond: (tc.table_d_id = td.id)

-> Seq Scan on table_c tc (actual time=0.141..24.501 rows=3501 loops=1)

Filter: (table_e_id = 4)

Rows Removed by Filter: 98233

-> Hash (actual time=0.082..0.084 rows=8 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 9kB

-> Seq Scan on table_d td (actual time=0.049..0.066 rows=8 loops=1)

Filter: (group = 'A'::test.group)

Rows Removed by Filter: 55

-> Index Scan using table_b_idx_target_id on table_b (actual time=0.214..0.217 rows=0 loops=1890)

Index Cond: (target_id = tc.id)

Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))

Rows Removed by Filter: 26

-> Index Scan using table_a_uq_001 on table_a ta (actual time=0.024..0.024 rows=1 loops=31)

Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))

Planning Time: 5.641 ms

Execution Time: 530.926 ms

table_a_id | table_c_id | table_b_id

------------+------------+------------

16116185 | 328860 | 2936924

16115788 | 348539 | 3039173

16115788 | 348539 | 2913874

16115788 | 348539 | 2913754

16116256 | 293541 | 2901938

16116256 | 293541 | 2901933

16116256 | 293541 | 2997160

16114789 | 292051 | 3038539

16114813 | 342353 | 3052371

16116069 | 351585 | 3025941

16114814 | 331329 | 2946332

16115861 | 350487 | 2933633

16115106 | 350047 | 2902075

16116066 | 351434 | 3010909

16114811 | 298605 | 2893809

16114811 | 298605 | 2987038

16114811 | 298605 | 3038877

16114811 | 298605 | 3010694

16114811 | 298605 | 2893188

16114811 | 298605 | 2893391

16114811 | 298605 | 2983360

16114811 | 298605 | 3038221

16114811 | 298605 | 3026078

16114811 | 298605 | 2998966

16114811 | 298605 | 2998909

16116260 | 290583 | 2955483

16116260 | 290583 | 2921135

16116260 | 290583 | 2947914

16116260 | 290583 | 2901669

16116241 | 295971 | 3038921

16116249 | 296708 | 3038888

(31 rows)

Rows returned by two queries are the same.

On Mon, Jan 7, 2019 at 11:23 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote:

On Tue, Jan 8, 2019 at 10:39 AM Bartosz Polnik <bartoszpolnik@gmail.com> wrote:
> Gather (actual time=263.880..507.668 rows=31 loops=1)
> Workers Planned: 1
> Workers Launched: 1
> Single Copy: true
> -> Nested Loop Left Join (actual time=193.246..415.020 rows=31 loops=1)
> -> Nested Loop (actual time=192.997..414.336 rows=31 loops=1)
> -> Hash Join (actual time=0.287..23.958 rows=1890 loops=1)
> Hash Cond: (tc.table_d_id = td.id)
> -> Seq Scan on table_c tc (actual time=0.128..23.044 rows=3501 loops=1)
> Filter: (table_e_id = 4)
> Rows Removed by Filter: 98233
> -> Hash (actual time=0.064..0.069 rows=8 loops=1)
> Buckets: 1024 Batches: 1 Memory Usage: 9kB
> -> Seq Scan on table_d td (actual time=0.045..0.052 rows=8 loops=1)
> Filter: (group = 'A'::test.group)
> Rows Removed by Filter: 55
> -> Index Scan using table_b_idx_target_id on table_b (actual time=0.203..0.206 rows=0 loops=1890)
> Index Cond: (target_id = tc.id)
> Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
> Rows Removed by Filter: 26
> -> Index Scan using table_a_uq_001 on table_a ta (actual time=0.021..0.021 rows=1 loops=31)
> Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))

Hmm. So if the plan runs entirely in a worker, we get 31 rows. If
the plan runs entirely in the leader, we get 31 rows. But if it runs
in both, we get fewer. In all cases the Hash Join produces 1890 rows,
and in all cases the Nested Loop probes table_b_idx_target_id for all
of them: "loops=1890". But somehow the output of the Nested Loop is
not 31 rows (or 31 / 2) when divided over two processes.

As a sanity check, can you please see if the run-only-in-leader case
(max_parallel_workers = 0, so that we see "Workers Launched: 0")
produces the *same* 31 rows as the run-only-in-worker case
(force_parallel_mode = on, so that we see "Single Copy: true")? That
is, the actual values of those 31 rows, in particular the columns
coming from table_b.

--
Thomas Munro
http://www.enterprisedb.com

pgsql-bugs by date:

From: Thomas Munro
Date: 07 January 2019, 22:22:27
Subject: Re: BUG #15577: Query returns different results when executedmultiple times

From: Thomas Munro
Date: 07 January 2019, 23:38:48
Subject: Re: BUG #15577: Query returns different results when executedmultiple times

Re: BUG #15577: Query returns different results when executedmultiple times - Mailing list pgsql-bugs

Previous

Next