Re: [HACKERS] Gather Merge - Mailing list pgsql-hackers

From Rushabh Lathia
Subject Re: [HACKERS] Gather Merge
Date
Msg-id CAGPqQf3D1bPT4ONxkpWAX+BevpgY5mgP_Ucu_OjFknBByLy_AQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Gather Merge  (Rushabh Lathia <rushabh.lathia@gmail.com>)
Responses Re: [HACKERS] Gather Merge
List pgsql-hackers


On Tue, Jan 17, 2017 at 5:19 PM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:


On Fri, Jan 13, 2017 at 10:52 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:


On Thu, Jan 12, 2017 at 8:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Dec 4, 2016 at 7:36 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> On Thu, Nov 24, 2016 at 11:12 PM, Rushabh Lathia <rushabh.lathia@gmail.com>
> wrote:
>> PFA latest patch with fix as well as few cosmetic changes.
>
> Moved to next CF with "needs review" status.

I spent quite a bit of time on this patch over the last couple of
days.  I was hoping to commit it, but I think it's not quite ready for
that yet and I hit a few other issues along the way.  Meanwhile,
here's an updated version with the following changes:

* Adjusted cost_gather_merge because we don't need to worry about less
than 1 worker.
* Don't charge double maintenance cost of the heap per 34ca0905.  This
was pointed out previous and Rushabh said it was fixed, but it wasn't
fixed in v5.
* cost_gather_merge claimed to charge a slightly higher IPC cost
because we have to block, but didn't.  Fix it so it does.
* Move several hunks to more appropriate places in the file, near
related code or in a more logical position relative to surrounding
code.
* Fixed copyright dates for the new files.  One said 2015, one said 2016.
* Removed unnecessary code from create_gather_merge_plan that tried to
handle an empty list of pathkeys (shouldn't happen).
* Make create_gather_merge_plan more consistent with
create_merge_append_plan.  Remove make_gather_merge for the same
reason.
* Changed generate_gather_paths to generate gather merge paths.  In
the previous coding, only the upper planner nodes ever tried to
generate gather merge nodes, but that seems unnecessarily limiting,
since it could be useful to generate a gathered path with pathkeys at
any point in the tree where we'd generate a gathered path with no
pathkeys.
* Rewrote generate_ordered_paths() logic to consider only the one
potentially-useful path not now covered by the new code in
generate_gather_paths().
* Reverted changes in generate_distinct_paths().  I think we should
add something here but the existing logic definitely isn't right
considering the change to generate_gather_paths().
* Assorted cosmetic cleanup in nodeGatherMerge.c.
* Documented the new GUC enable_gathermerge.
* Improved comments.  Dropped one that seemed unnecessary.
* Fixed parts of the patch to be more pgindent-clean.


Thanks Robert for hacking into this.
 
Testing this against the TPC-H queries at 10GB with
max_parallel_workers_per_gather = 4, seq_page_cost = 0.1,
random_page_cost = 0.1, work_mem = 64MB initially produced somewhat
demoralizing results.  Only Q17, Q4, and Q8 picked Gather Merge, and
of those only Q17 got faster.  Investigating this led to me realizing
that join costing for parallel joins is all messed up: see
https://www.postgresql.org/message-id/CA+TgmoYt2pyk2CTyvYCtFySXN=jsorGh8_MJTTLoWU5qkJOkYQ@mail.gmail.com

With that patch applied, in my testing, Gather Merge got picked for
Q3, Q4, Q5, Q6, Q7, Q8, Q10, and Q17, but a lot of those queries get a
little slower instead of a little faster.  Here are the timings --
these are with EXPLAIN ANALYZE, so take them with a grain of salt --
first number is without Gather Merge, second is with Gather Merge:

Q3 16943.938 ms -> 18645.957 ms
Q4 3155.350 ms -> 4179.431 ms
Q5 13611.484 ms -> 13831.946 ms
Q6 9264.942 ms -> 8734.899 ms
Q7 9759.026 ms -> 10007.307 ms
Q8 2473.899 ms -> 2459.225 ms
Q10 13814.950 ms -> 12255.618 ms
Q17 49552.298 ms -> 46633.632 ms


This is strange, I will re-run the test again and post the results soon.


Here is the benchmark number which I got with the latest (v6) patch:

- max_worker_processes = DEFAULT (8)
- max_parallel_workers_per_gather = 4
- Cold cache environment is ensured. With every query execution - server is
  stopped and also OS caches were dropped.
- The reported values of execution time (in ms) is median of 3 executions.
- power2 machine with 512GB of RAM
- With default postgres.conf

Timing with v6 patch on REL9_6_STABLE branch
(last commit: 8a70d8ae7501141d283e56b31e10c66697c986d5).

Query 3: 49888.300 -> 45914.426
Query 4: 8531.939 -> 7790.498
Query 5: 40668.920 -> 38403.658
Query 9: 90922.825 -> 50277.646
Query 10: 45776.445 -> 39189.086
Query 12: 21644.593 -> 21180.402
Query 15: 63889.061 -> 62027.351
Query 17: 142208.463 -> 118704.424
Query 18: 244051.155 -> 186498.456
Query 20: 212046.605 -> 159360.520

Timing with v6 patch on master branch:
(last commit: 0777f7a2e8e0a51f0f60cfe164d538bb459bf9f2)

Query 3: 45261.722 -> 43499.739
Query 4: 7444.630 -> 6363.999
Query 5: 37146.458 -> 37081.952
Query 9: 88874.243 -> 50232.088
Query 10: 43583.133 -> 38118.195
Query 12: 19918.149 -> 20414.114
Query 15: 62554.860 -> 61039.048
Query 17: 131369.235 -> 111587.287
Query 18: 246162.686 -> 195434.292
Query 20: 201221.952 -> 169093.834

Looking at this results it seems like patch is good to go ahead.
I did notice that with your tpch run, query 9, 18. 20 were unable to pick
gather merge plan and that are the queries which are the most benefited
with gather merge. On another observation, if the work_mem set to high
then some queries end up picking Hash Aggregate - even though gather
merge performs better (I manually tested that by forcing gather merge).
I am still looking into this issue.
 

I am able to reproduce the issue with smaller case, where gather merge
is not getting pick against hash aggregate.

Consider the following cases:

Testcase setup:

1) ./db/bin/pgbench postgres -i -F 100 -s 20

2) update pgbench_accounts set filler = 'foo' where aid%10 = 0;

Example:

postgres=# show shared_buffers ;
 shared_buffers
----------------
 1GB
(1 row)

postgres=# show work_mem ;
 work_mem
----------
 64MB
(1 row)

1) Case 1:

postgres=# explain analyze select aid, sum(abalance) from pgbench_accounts where filler like '%foo%' group by aid;
                                                            QUERY PLAN                                                           
----------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=62081.49..64108.32 rows=202683 width=12) (actual time=1017.802..1079.324 rows=200000 loops=1)
   Group Key: aid
   ->  Seq Scan on pgbench_accounts  (cost=0.00..61068.07 rows=202683 width=8) (actual time=738.439..803.310 rows=200000 loops=1)
         Filter: (filler ~~ '%foo%'::text)
         Rows Removed by Filter: 1800000
 Planning time: 0.189 ms
 Execution time: 1094.933 ms
(7 rows)


2) Case 2:

postgres=# set enable_hashagg = off;
SET
postgres=# set enable_gathermerge = off;
SET
postgres=# explain analyze select aid, sum(abalance) from pgbench_accounts where filler like '%foo%' group by aid;
                                                               QUERY PLAN                                                              
----------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=78933.43..82480.38 rows=202683 width=12) (actual time=980.983..1097.461 rows=200000 loops=1)
   Group Key: aid
   ->  Sort  (cost=78933.43..79440.14 rows=202683 width=8) (actual time=980.975..1006.891 rows=200000 loops=1)
         Sort Key: aid
         Sort Method: quicksort  Memory: 17082kB
         ->  Seq Scan on pgbench_accounts  (cost=0.00..61068.07 rows=202683 width=8) (actual time=797.553..867.359 rows=200000 loops=1)
               Filter: (filler ~~ '%foo%'::text)
               Rows Removed by Filter: 1800000
 Planning time: 0.152 ms
 Execution time: 1111.742 ms
(10 rows)

3)  Case 3:

postgres=# set enable_hashagg = off;
SET
postgres=# set enable_gathermerge = on;
SET
postgres=# explain analyze select aid, sum(abalance) from pgbench_accounts where filler like '%foo%' group by aid;
                                                                        QUERY PLAN                                                                        
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=47276.23..76684.51 rows=202683 width=12) (actual time=287.383..542.064 rows=200000 loops=1)
   Group Key: aid
   ->  Gather Merge  (cost=47276.23..73644.26 rows=202684 width=0) (actual time=287.375..441.698 rows=200000 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Partial GroupAggregate  (cost=46276.17..47162.91 rows=50671 width=12) (actual time=278.801..305.772 rows=40000 loops=5)
               Group Key: aid
               ->  Sort  (cost=46276.17..46402.85 rows=50671 width=8) (actual time=278.792..285.111 rows=40000 loops=5)
                     Sort Key: aid
                     Sort Method: quicksort  Memory: 9841kB
                     ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..42316.52 rows=50671 width=8) (actual time=206.602..223.203 rows=40000 loops=5)
                           Filter: (filler ~~ '%foo%'::text)
                           Rows Removed by Filter: 360000
 Planning time: 0.251 ms
 Execution time: 553.569 ms
(15 rows)

Now, in the above case we can clearly see that GM is perform way better - but
still planner choosing HashAggregate - because cost of hash aggregate is low
compare to GM.

Another observation is, HashAggregate (case 1) is performs better compare to
GroupAggregate (case 2), but still it doesn't justify the cost difference of those two.

-- Cost difference
postgres=# select (82480.38 - 64108.32)/64108.32;
        ?column?       
------------------------
 0.28657840355198825987
(1 row)

-- Execution time
postgres=# select (1111.742 - 1094.933) / 1094.933;
        ?column?       
------------------------
 0.01535162425463475847
(1 row)

Might be a problem that HashAggregate costing or something else. I am
still looking into this problem.


--
Rushabh Lathia

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] pageinspect: Hash index support