Home > mailing lists

Re: [HACKERS] Possible regression with gather merge. - Mailing list pgsql-hackers

From	Rushabh Lathia
Subject	Re: [HACKERS] Possible regression with gather merge.
Date	March 22, 2017 10:39:22
Msg-id	CAGPqQf1akRcSgC9=6iwx=sEPap9UvPpHJLzg8_N+OuHdb6fL+g@mail.gmail.com Whole thread
In response to	Re: [HACKERS] Possible regression with gather merge. (Rushabh Lathia <rushabh.lathia@gmail.com>)
Responses	Re: [HACKERS] Possible regression with gather merge. Re: [HACKERS] Possible regression with gather merge.
List	pgsql-hackers

Tree view

Hi,

postgres=# explain analyze select * from test order by v1, v2 limit 10;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=19576.71..19577.88 rows=10 width=16) (actual time=408.842..408.853 rows=10 loops=1)
   -> Gather Merge (cost=19576.71..116805.80 rows=833334 width=16) (actual time=408.841..408.850 rows=10 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         -> Sort (cost=18576.69..19618.36 rows=416667 width=16) (actual time=393.602..394.080 rows=911 loops=3)
               Sort Key: v1, v2
               Sort Method: external merge Disk: 8128kB
               -> Parallel Seq Scan on test (cost=0.00..9572.67 rows=416667 width=16) (actual time=0.053..46.238 rows=333333 loops=3)
Planning time: 0.118 ms
Execution time: 414.763 ms
(10 rows)

postgres=# set enable_gathermerge = off;
SET
postgres=# explain analyze select * from test order by v1, v2 limit 10;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=37015.64..37015.67 rows=10 width=16) (actual time=268.859..268.861 rows=10 loops=1)
   -> Sort (cost=37015.64..39515.64 rows=1000000 width=16) (actual time=268.858..268.859 rows=10 loops=1)
         Sort Key: v1, v2
         Sort Method: top-N heapsort Memory: 25kB
         -> Seq Scan on test (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.085..107.841 rows=1000000 loops=1)
Planning time: 0.163 ms
Execution time: 268.897 ms
(7 rows)

Looking at the explain analyze output of both the plan, its clear that GM
taking longer as its using external merge dist for the sort, where as
another plan perform top-N heapsort. For normal sort path, it can consider
the limit as bound, but for GM its not possible.

In the code, gather merge consider the limit for the sort into create_ordered_paths,
which is wrong. Into create_ordered_paths(), GM should not consider the limit
while doing costing for the sort node.

Attached patch fix the bug.

On Wed, Mar 22, 2017 at 12:05 PM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:

Thanks for reporting, I am looking into this.

On Wed, Mar 22, 2017 at 11:51 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
Adding more rows to table make gather merge execution time very slow
when compared to non-parallel plan we get after disabling gather
merge.

create table test as (select id, (random()*10000)::int as v1, random() as
v2 from generate_series(1,100000000) id);

postgres=# set max_parallel_workers_per_gather = default;
SET
postgres=# explain analyze select * from test order by v1, v2 limit 10;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1858610.53..1858611.70 rows=10 width=16) (actual
time=31103.880..31103.885 rows=10 loops=1)
-> Gather Merge (cost=1858610.53..11581520.05 rows=83333406
width=16) (actual time=31103.878..31103.882 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=1857610.50..1961777.26 rows=41666703
width=16) (actual time=30560.865..30561.046 rows=911 loops=3)
Sort Key: v1, v2
Sort Method: external merge Disk: 841584kB
-> Parallel Seq Scan on test (cost=0.00..957208.03
rows=41666703 width=16) (actual time=0.050..2330.275 rows=33333333
loops=3)
Planning time: 0.292 ms
Execution time: 31502.896 ms
(10 rows)

postgres=# set max_parallel_workers_per_gather = 0;
SET
postgres=# explain analyze select * from test order by v1, v2 limit 10;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3701507.83..3701507.85 rows=10 width=16) (actual
time=13231.264..13231.266 rows=10 loops=1)
-> Sort (cost=3701507.83..3951508.05 rows=100000088 width=16)
(actual time=13231.261..13231.262 rows=10 loops=1)
Sort Key: v1, v2
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on test (cost=0.00..1540541.88 rows=100000088
width=16) (actual time=0.045..6759.363 rows=100000000 loops=1)
Planning time: 0.131 ms
Execution time: 13231.299 ms
(7 rows)

On Wed, Mar 22, 2017 at 11:07 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> I accidently encountered a case where gather merge was picked as
> default but disabling same by setting max_parallel_workers_per_gather
> = 0; produced a non-parallel plan which was faster than gather merge,
> but its cost is marked too high when compared to gather merge.
>
> I guess we need some cost adjustment is planner code.
>
> Test setting
> =========
> create table test as (select id, (random()*10000)::int as v1, random() as
> v2 from generate_series(1,1000000) id);
> create index test_v1_idx on test (v1);
>
>
> Server setting is default.
>
>
> postgres=# explain analyze select * from test order by v1, v2 limit 10;
> QUERY
> PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=19576.71..19577.88 rows=10 width=16) (actual
> time=265.989..265.995 rows=10 loops=1)
> -> Gather Merge (cost=19576.71..116805.80 rows=833334 width=16)
> (actual time=265.987..265.992 rows=10 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> -> Sort (cost=18576.69..19618.36 rows=416667 width=16)
> (actual time=250.202..250.424 rows=911 loops=3)
> Sort Key: v1, v2
> Sort Method: external merge Disk: 9272kB
> -> Parallel Seq Scan on test (cost=0.00..9572.67
> rows=416667 width=16) (actual time=0.053..41.397 rows=333333 loops=3)
> Planning time: 0.193 ms
> Execution time: 271.222 ms
>
> postgres=# set max_parallel_workers_per_gather = 0;
> SET
> postgres=# explain analyze select * from test order by v1, v2 limit 10;
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------
> Limit (cost=37015.64..37015.67 rows=10 width=16) (actual
> time=211.582..211.584 rows=10 loops=1)
> -> Sort (cost=37015.64..39515.64 rows=1000000 width=16) (actual
> time=211.581..211.582 rows=10 loops=1)
> Sort Key: v1, v2
> Sort Method: top-N heapsort Memory: 25kB
> -> Seq Scan on test (cost=0.00..15406.00 rows=1000000
> width=16) (actual time=0.085..107.522 rows=1000000 loops=1)
> Planning time: 0.093 ms
> Execution time: 211.608 ms
> (7 rows)
>
>
>
> --
> Thanks and Regards
> Mithun C Y
> EnterpriseDB: http://www.enterprisedb.com

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Rushabh Lathia

regards,

Rushabh Lathia

www.EnterpriseDB.com

Attachment

gm_with_limit.patch

pgsql-hackers by date:

From: "Mengxing Liu"
Date: 22 March 2017, 10:24:50
Subject: Re: [HACKERS] Guidelines for GSoC student proposals / EliminateO(N^2) scaling from rw-conflict tracking in serializable transactions

From: Haribabu Kommi
Date: 22 March 2017, 10:47:37
Subject: Re: [WIP] RE: [HACKERS] DECLARE STATEMENT setting up a connection in ECPG

Re: [HACKERS] Possible regression with gather merge. - Mailing list pgsql-hackers

Attachment

Previous

Next