Thread: variation of row_number with parallel

variation of row_number with parallel

From

Rajkumar Raghuwanshi

Date:

14 April 2020, 06:58:50

Hi,

I have observed row_number() is giving different results when query executed in parallel. is this expected w.r.t parallel execution.

CREATE TABLE tbl1 (c1 INT) partition by list (c1);

CREATE TABLE tbl1_p1 partition of tbl1 FOR VALUES IN (10);
CREATE TABLE tbl1_p2 partition of tbl1 FOR VALUES IN (20);
CREATE TABLE tbl1_p3 partition of tbl1 FOR VALUES IN (30);

CREATE TABLE tbl2 (c1 INT, c2 INT,c3 INT) partition by list (c1);
CREATE TABLE tbl2_p1 partition of tbl2 FOR VALUES IN (1);
CREATE TABLE tbl2_p2 partition of tbl2 FOR VALUES IN (2);
CREATE TABLE tbl2_p3 partition of tbl2 FOR VALUES IN (3);
CREATE TABLE tbl2_p4 partition of tbl2 FOR VALUES IN (4);
CREATE TABLE tbl2_p5 partition of tbl2 FOR VALUES IN (5);

INSERT INTO tbl1 VALUES (10),(20),(30);

INSERT INTO tbl2 VALUES (1,100,20),(2,200,10),(3,100,20),(4,100,30),(5,100,10);

postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e where d.c1=e.c3;
QUERY PLAN
---------------------------------------------------------------------------------------
WindowAgg (cost=1520.35..12287.73 rows=390150 width=12)
-> Merge Join (cost=1520.35..7410.85 rows=390150 width=4)
Merge Cond: (d.c1 = e.c3)
-> Sort (cost=638.22..657.35 rows=7650 width=4)
Sort Key: d.c1
-> Append (cost=0.00..144.75 rows=7650 width=4)
-> Seq Scan on tbl1_p1 d_1 (cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on tbl1_p2 d_2 (cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on tbl1_p3 d_3 (cost=0.00..35.50 rows=2550 width=4)
-> Sort (cost=882.13..907.63 rows=10200 width=8)
Sort Key: e.c3
-> Append (cost=0.00..203.00 rows=10200 width=8)
-> Seq Scan on tbl2_p1 e_1 (cost=0.00..30.40 rows=2040 width=8)
-> Seq Scan on tbl2_p2 e_2 (cost=0.00..30.40 rows=2040 width=8)
-> Seq Scan on tbl2_p3 e_3 (cost=0.00..30.40 rows=2040 width=8)
-> Seq Scan on tbl2_p4 e_4 (cost=0.00..30.40 rows=2040 width=8)
-> Seq Scan on tbl2_p5 e_5 (cost=0.00..30.40 rows=2040 width=8)
(17 rows)

postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where d.c1=e.c3;
c2 | row_number
-----+------------
200 | 1
100 | 2
100 | 3
100 | 4
100 | 5
(5 rows)

postgres=#
postgres=# set parallel_setup_cost = 0;
SET
postgres=# set parallel_tuple_cost = 0;
SET
postgres=#
postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e where d.c1=e.c3;
QUERY PLAN
------------------------------------------------------------------------------------------------------
WindowAgg (cost=130.75..7521.21 rows=390150 width=12)
-> Gather (cost=130.75..2644.34 rows=390150 width=4)
Workers Planned: 2
-> Parallel Hash Join (cost=130.75..2644.34 rows=162562 width=4)
Hash Cond: (e.c3 = d.c1)
-> Parallel Append (cost=0.00..131.25 rows=4250 width=8)
-> Parallel Seq Scan on tbl2_p1 e_1 (cost=0.00..22.00 rows=1200 width=8)
-> Parallel Seq Scan on tbl2_p2 e_2 (cost=0.00..22.00 rows=1200 width=8)
-> Parallel Seq Scan on tbl2_p3 e_3 (cost=0.00..22.00 rows=1200 width=8)
-> Parallel Seq Scan on tbl2_p4 e_4 (cost=0.00..22.00 rows=1200 width=8)
-> Parallel Seq Scan on tbl2_p5 e_5 (cost=0.00..22.00 rows=1200 width=8)
-> Parallel Hash (cost=90.93..90.93 rows=3186 width=4)
-> Parallel Append (cost=0.00..90.93 rows=3186 width=4)
-> Parallel Seq Scan on tbl1_p1 d_1 (cost=0.00..25.00 rows=1500 width=4)
-> Parallel Seq Scan on tbl1_p2 d_2 (cost=0.00..25.00 rows=1500 width=4)
-> Parallel Seq Scan on tbl1_p3 d_3 (cost=0.00..25.00 rows=1500 width=4)
(16 rows)

postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where d.c1=e.c3;
c2 | row_number
-----+------------
100 | 1
100 | 2
100 | 3
200 | 4
100 | 5
(5 rows)

Thanks & Regards,

Rajkumar Raghuwanshi

Re: variation of row_number with parallel

From

Pavel Stehule

Date:

14 April 2020, 07:08:29

út 14. 4. 2020 v 5:59 odesílatel Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> napsal:

Hi,

I have observed row_number() is giving different results when query executed in parallel. is this expected w.r.t parallel execution.

CREATE TABLE tbl1 (c1 INT) partition by list (c1);
CREATE TABLE tbl1_p1 partition of tbl1 FOR VALUES IN (10);
CREATE TABLE tbl1_p2 partition of tbl1 FOR VALUES IN (20);
CREATE TABLE tbl1_p3 partition of tbl1 FOR VALUES IN (30);

CREATE TABLE tbl2 (c1 INT, c2 INT,c3 INT) partition by list (c1);
CREATE TABLE tbl2_p1 partition of tbl2 FOR VALUES IN (1);
CREATE TABLE tbl2_p2 partition of tbl2 FOR VALUES IN (2);
CREATE TABLE tbl2_p3 partition of tbl2 FOR VALUES IN (3);
CREATE TABLE tbl2_p4 partition of tbl2 FOR VALUES IN (4);
CREATE TABLE tbl2_p5 partition of tbl2 FOR VALUES IN (5);

INSERT INTO tbl1 VALUES (10),(20),(30);

INSERT INTO tbl2 VALUES (1,100,20),(2,200,10),(3,100,20),(4,100,30),(5,100,10);

postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e where d.c1=e.c3;
QUERY PLAN
---------------------------------------------------------------------------------------
WindowAgg (cost=1520.35..12287.73 rows=390150 width=12)
-> Merge Join (cost=1520.35..7410.85 rows=390150 width=4)
Merge Cond: (d.c1 = e.c3)
-> Sort (cost=638.22..657.35 rows=7650 width=4)
Sort Key: d.c1
-> Append (cost=0.00..144.75 rows=7650 width=4)
-> Seq Scan on tbl1_p1 d_1 (cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on tbl1_p2 d_2 (cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on tbl1_p3 d_3 (cost=0.00..35.50 rows=2550 width=4)
-> Sort (cost=882.13..907.63 rows=10200 width=8)
Sort Key: e.c3
-> Append (cost=0.00..203.00 rows=10200 width=8)
-> Seq Scan on tbl2_p1 e_1 (cost=0.00..30.40 rows=2040 width=8)
-> Seq Scan on tbl2_p2 e_2 (cost=0.00..30.40 rows=2040 width=8)
-> Seq Scan on tbl2_p3 e_3 (cost=0.00..30.40 rows=2040 width=8)
-> Seq Scan on tbl2_p4 e_4 (cost=0.00..30.40 rows=2040 width=8)
-> Seq Scan on tbl2_p5 e_5 (cost=0.00..30.40 rows=2040 width=8)
(17 rows)

postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where d.c1=e.c3;
c2 | row_number
-----+------------
200 | 1
100 | 2
100 | 3
100 | 4
100 | 5
(5 rows)

postgres=#
postgres=# set parallel_setup_cost = 0;
SET
postgres=# set parallel_tuple_cost = 0;
SET
postgres=#
postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e where d.c1=e.c3;
QUERY PLAN
------------------------------------------------------------------------------------------------------
WindowAgg (cost=130.75..7521.21 rows=390150 width=12)
-> Gather (cost=130.75..2644.34 rows=390150 width=4)
Workers Planned: 2
-> Parallel Hash Join (cost=130.75..2644.34 rows=162562 width=4)
Hash Cond: (e.c3 = d.c1)
-> Parallel Append (cost=0.00..131.25 rows=4250 width=8)
-> Parallel Seq Scan on tbl2_p1 e_1 (cost=0.00..22.00 rows=1200 width=8)
-> Parallel Seq Scan on tbl2_p2 e_2 (cost=0.00..22.00 rows=1200 width=8)
-> Parallel Seq Scan on tbl2_p3 e_3 (cost=0.00..22.00 rows=1200 width=8)
-> Parallel Seq Scan on tbl2_p4 e_4 (cost=0.00..22.00 rows=1200 width=8)
-> Parallel Seq Scan on tbl2_p5 e_5 (cost=0.00..22.00 rows=1200 width=8)
-> Parallel Hash (cost=90.93..90.93 rows=3186 width=4)
-> Parallel Append (cost=0.00..90.93 rows=3186 width=4)
-> Parallel Seq Scan on tbl1_p1 d_1 (cost=0.00..25.00 rows=1500 width=4)
-> Parallel Seq Scan on tbl1_p2 d_2 (cost=0.00..25.00 rows=1500 width=4)
-> Parallel Seq Scan on tbl1_p3 d_3 (cost=0.00..25.00 rows=1500 width=4)
(16 rows)

postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where d.c1=e.c3;
c2 | row_number
-----+------------
100 | 1
100 | 2
100 | 3
200 | 4
100 | 5
(5 rows)

there are not ORDER BY clause, so order is not defined - paralel hash join surely doesn't ensure a order.

I think so this behave is expected.

Regards

Pavel

Thanks & Regards,
Rajkumar Raghuwanshi

Re: variation of row_number with parallel

From

Rajkumar Raghuwanshi

Date:

14 April 2020, 07:18:42

On Tue, Apr 14, 2020 at 9:39 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:

út 14. 4. 2020 v 5:59 odesílatel Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> napsal:
Hi,

I have observed row_number() is giving different results when query executed in parallel. is this expected w.r.t parallel execution.

CREATE TABLE tbl1 (c1 INT) partition by list (c1);
CREATE TABLE tbl1_p1 partition of tbl1 FOR VALUES IN (10);
CREATE TABLE tbl1_p2 partition of tbl1 FOR VALUES IN (20);
CREATE TABLE tbl1_p3 partition of tbl1 FOR VALUES IN (30);

CREATE TABLE tbl2 (c1 INT, c2 INT,c3 INT) partition by list (c1);
CREATE TABLE tbl2_p1 partition of tbl2 FOR VALUES IN (1);
CREATE TABLE tbl2_p2 partition of tbl2 FOR VALUES IN (2);
CREATE TABLE tbl2_p3 partition of tbl2 FOR VALUES IN (3);
CREATE TABLE tbl2_p4 partition of tbl2 FOR VALUES IN (4);
CREATE TABLE tbl2_p5 partition of tbl2 FOR VALUES IN (5);

INSERT INTO tbl1 VALUES (10),(20),(30);

INSERT INTO tbl2 VALUES (1,100,20),(2,200,10),(3,100,20),(4,100,30),(5,100,10);

postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e where d.c1=e.c3;
QUERY PLAN
---------------------------------------------------------------------------------------
WindowAgg (cost=1520.35..12287.73 rows=390150 width=12)
-> Merge Join (cost=1520.35..7410.85 rows=390150 width=4)
Merge Cond: (d.c1 = e.c3)
-> Sort (cost=638.22..657.35 rows=7650 width=4)
Sort Key: d.c1
-> Append (cost=0.00..144.75 rows=7650 width=4)
-> Seq Scan on tbl1_p1 d_1 (cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on tbl1_p2 d_2 (cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on tbl1_p3 d_3 (cost=0.00..35.50 rows=2550 width=4)
-> Sort (cost=882.13..907.63 rows=10200 width=8)
Sort Key: e.c3
-> Append (cost=0.00..203.00 rows=10200 width=8)
-> Seq Scan on tbl2_p1 e_1 (cost=0.00..30.40 rows=2040 width=8)
-> Seq Scan on tbl2_p2 e_2 (cost=0.00..30.40 rows=2040 width=8)
-> Seq Scan on tbl2_p3 e_3 (cost=0.00..30.40 rows=2040 width=8)
-> Seq Scan on tbl2_p4 e_4 (cost=0.00..30.40 rows=2040 width=8)
-> Seq Scan on tbl2_p5 e_5 (cost=0.00..30.40 rows=2040 width=8)
(17 rows)

postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where d.c1=e.c3;
c2 | row_number
-----+------------
200 | 1
100 | 2
100 | 3
100 | 4
100 | 5
(5 rows)

postgres=#
postgres=# set parallel_setup_cost = 0;
SET
postgres=# set parallel_tuple_cost = 0;
SET
postgres=#
postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e where d.c1=e.c3;
QUERY PLAN
------------------------------------------------------------------------------------------------------
WindowAgg (cost=130.75..7521.21 rows=390150 width=12)
-> Gather (cost=130.75..2644.34 rows=390150 width=4)
Workers Planned: 2
-> Parallel Hash Join (cost=130.75..2644.34 rows=162562 width=4)
Hash Cond: (e.c3 = d.c1)
-> Parallel Append (cost=0.00..131.25 rows=4250 width=8)
-> Parallel Seq Scan on tbl2_p1 e_1 (cost=0.00..22.00 rows=1200 width=8)
-> Parallel Seq Scan on tbl2_p2 e_2 (cost=0.00..22.00 rows=1200 width=8)
-> Parallel Seq Scan on tbl2_p3 e_3 (cost=0.00..22.00 rows=1200 width=8)
-> Parallel Seq Scan on tbl2_p4 e_4 (cost=0.00..22.00 rows=1200 width=8)
-> Parallel Seq Scan on tbl2_p5 e_5 (cost=0.00..22.00 rows=1200 width=8)
-> Parallel Hash (cost=90.93..90.93 rows=3186 width=4)
-> Parallel Append (cost=0.00..90.93 rows=3186 width=4)
-> Parallel Seq Scan on tbl1_p1 d_1 (cost=0.00..25.00 rows=1500 width=4)
-> Parallel Seq Scan on tbl1_p2 d_2 (cost=0.00..25.00 rows=1500 width=4)
-> Parallel Seq Scan on tbl1_p3 d_3 (cost=0.00..25.00 rows=1500 width=4)
(16 rows)

postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where d.c1=e.c3;
c2 | row_number
-----+------------
100 | 1
100 | 2
100 | 3
200 | 4
100 | 5
(5 rows)

there are not ORDER BY clause, so order is not defined - paralel hash join surely doesn't ensure a order.
I think so this behave is expected.

thanks.

Regards

Pavel

Thanks & Regards,
Rajkumar Raghuwanshi