Thread: pg13dev: explain partial, parallel hashagg, and memory use
I'm testing with a customer's data on pg13dev and got output for which Peak Memory doesn't look right/useful. I reproduced it on 565f16902. CREATE TABLE p(i int) PARTITION BY RANGE(i); CREATE TABLE p1 PARTITION OF p FOR VALUES FROM (0)TO(1000); CREATE TABLE p2 PARTITION OF p FOR VALUES FROM (1000)TO(2000); CREATE TABLE p3 PARTITION OF p FOR VALUES FROM (2000)TO(3000); INSERT INTO p SELECT i%3000 FROM generate_series(1,999999)i; VACUUM ANALYZE p; postgres=# explain(analyze,settings) SELECT i, COUNT(1) FROM p GROUP BY 1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=7469.00..14214.45 rows=2502 width=12) (actual time=489.409..514.209 rows=3000 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Append (cost=6469.00..12964.25 rows=1251 width=12) (actual time=476.291..477.179 rows=1000 loops=3) -> HashAggregate (cost=6487.99..6497.99 rows=1000 width=12) (actual time=474.454..475.203 rows=1000 loops=1) Group Key: p.i Batches: 1 Memory Usage: 0kB Worker 0: Batches: 1 Memory Usage: 193kB Worker 1: Batches: 1 Memory Usage: 0kB -> Seq Scan on p1 p (cost=0.00..4817.99 rows=333999 width=4) (actual time=0.084..100.677 rows=333999 loops=1) -> HashAggregate (cost=6469.00..6479.00 rows=1000 width=12) (actual time=468.517..469.272 rows=1000 loops=1) Group Key: p_1.i Batches: 1 Memory Usage: 0kB Worker 0: Batches: 1 Memory Usage: 0kB Worker 1: Batches: 1 Memory Usage: 193kB -> Seq Scan on p2 p_1 (cost=0.00..4804.00 rows=333000 width=4) (actual time=0.082..102.154 rows=333000 loops=1) -> HashAggregate (cost=6469.00..6479.00 rows=1000 width=12) (actual time=485.887..486.509 rows=1000 loops=1) Group Key: p_2.i Batches: 1 Memory Usage: 193kB Worker 0: Batches: 1 Memory Usage: 0kB Worker 1: Batches: 1 Memory Usage: 0kB -> Seq Scan on p3 p_2 (cost=0.00..4804.00 rows=333000 width=4) (actual time=0.043..104.631 rows=333000 loops=1) Settings: effective_io_concurrency = '0', enable_partitionwise_aggregate = 'on', enable_partitionwise_join = 'on', work_mem= '127MB' -- Justin
On Wed, 5 Aug 2020 at 13:21, Justin Pryzby <pryzby@telsasoft.com> wrote: > > I'm testing with a customer's data on pg13dev and got output for which Peak > Memory doesn't look right/useful. I reproduced it on 565f16902. Likely the sanity of those results depends on whether you think that the Memory Usage reported outside of the workers is meant to be the sum of all processes or the memory usage for the leader backend. All that's going on here is that the Parallel Append is using some parallel safe paths and giving one to each worker. The 2 workers take the first 2 subpaths and the leader takes the third. The memory usage reported helps confirm that's the case. Can you explain what you'd want to see changed about this? Or do you want to see the non-parallel worker memory be the sum of all workers? Sort does not seem to do that, so I'm not sure if we should consider hash agg as an exception to that. One thing I did notice from playing with this table is that Sort does not show the memory used by the leader process when it didn't do any of the work itself. postgres=# set parallel_leader_participation =off; SET postgres=# explain analyze select i from p group by i; -> Sort (cost=59436.92..60686.92 rows=500000 width=4) (actual time=246.836..280.985 rows=500000 loops=2) Sort Key: p.i Worker 0: Sort Method: quicksort Memory: 27898kB Worker 1: Sort Method: quicksort Memory: 55842kB Whereas with the leader helping out we get: -> Sort (cost=51284.39..52326.05 rows=416666 width=4) (actual time=191.814..213.418 rows=333333 loops=3) Sort Key: p.i Sort Method: quicksort Memory: 33009kB Worker 0: Sort Method: quicksort Memory: 25287kB Worker 1: Sort Method: quicksort Memory: 25445kB Maybe we should do the same for hash agg when the leader didn't assist? David
On Tue, Aug 4, 2020 at 9:44 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Wed, 5 Aug 2020 at 13:21, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > I'm testing with a customer's data on pg13dev and got output for which Peak > > Memory doesn't look right/useful. I reproduced it on 565f16902. > > Likely the sanity of those results depends on whether you think that > the Memory Usage reported outside of the workers is meant to be the > sum of all processes or the memory usage for the leader backend. > > All that's going on here is that the Parallel Append is using some > parallel safe paths and giving one to each worker. The 2 workers take > the first 2 subpaths and the leader takes the third. The memory usage > reported helps confirm that's the case. > > Can you explain what you'd want to see changed about this? Or do you > want to see the non-parallel worker memory be the sum of all workers? > Sort does not seem to do that, so I'm not sure if we should consider > hash agg as an exception to that. I've always found the way we report parallel workers in EXPLAIN quite confusing. I realize it matches the actual implementation model (the leader often is also "another worker", but I think the natural expectation from a user perspective would be that you'd show as workers all backends (including the leader) that did work, and then aggregate into a summary line (where the leader is displayed now). In the current output there's nothing really to hint to the use that the model is leader + workers and that the "summary" line is really the leader. If I were to design this from scratch, I'd want to propose doing what I said above (summary aggregate line + treat leader as a worker line, likely with a "leader" tag), but that seems like a big change to make now. On the other hand, perhaps designating what looks like a summary line as the "leader" or some such would help clear up the confusion? Perhaps it could also say "Participating" or "Non-participating"? James
On Wed, Aug 05, 2020 at 01:44:17PM +1200, David Rowley wrote: > On Wed, 5 Aug 2020 at 13:21, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > I'm testing with a customer's data on pg13dev and got output for which Peak > > Memory doesn't look right/useful. I reproduced it on 565f16902. > > Likely the sanity of those results depends on whether you think that > the Memory Usage reported outside of the workers is meant to be the > sum of all processes or the memory usage for the leader backend. > > All that's going on here is that the Parallel Append is using some > parallel safe paths and giving one to each worker. The 2 workers take > the first 2 subpaths and the leader takes the third. The memory usage > reported helps confirm that's the case. I'm not sure there's a problem, but all the 0kB were suspicious to me. I think you're saying that one worker alone handled each HashAgg, and the other worker (and leader) show 0kB. I guess in my naive thinking it's odd to show a worker which wasn't active for that subpath (at least in text output). But I don't know the expected behavior of parallel hashagg, so that explains most of my confusion. On Tue, Aug 04, 2020 at 10:01:18PM -0400, James Coleman wrote: > Perhaps it could also say "Participating" or "Non-participating"? Yes, that'd help me alot :) Also odd (to me). If I encourage more workers, there are "slots" for each "planned" worker, even though fewer were launched: postgres=# ALTER TABLE p3 SET (parallel_workers=11); postgres=# SET max_parallel_workers_per_gather=11; Finalize HashAggregate (cost=10299.64..10329.64 rows=3000 width=12) (actual time=297.793..299.933 rows=3000 loops=1) Group Key: p.i Batches: 1 Memory Usage: 625kB -> Gather (cost=2928.09..10134.64 rows=33000 width=12) (actual time=233.398..282.429 rows=13000 loops=1) Workers Planned: 11 Workers Launched: 7 -> Parallel Append (cost=1928.09..5834.64 rows=3000 width=12) (actual time=214.358..232.980 rows=1625 loops=8) -> Partial HashAggregate (cost=1933.46..1943.46 rows=1000 width=12) (actual time=167.936..171.345 rows=1000loops=4) Group Key: p.i Batches: 1 Memory Usage: 0kB Worker 0: Batches: 1 Memory Usage: 193kB Worker 1: Batches: 1 Memory Usage: 193kB Worker 2: Batches: 1 Memory Usage: 0kB Worker 3: Batches: 1 Memory Usage: 0kB Worker 4: Batches: 1 Memory Usage: 193kB Worker 5: Batches: 1 Memory Usage: 0kB Worker 6: Batches: 1 Memory Usage: 193kB Worker 7: Batches: 0 Memory Usage: 0kB Worker 8: Batches: 0 Memory Usage: 0kB Worker 9: Batches: 0 Memory Usage: 0kB Worker 10: Batches: 0 Memory Usage: 0kB Thanks, -- Justin
On Wed, 5 Aug 2020 at 14:13, Justin Pryzby <pryzby@telsasoft.com> wrote: > Also odd (to me). If I encourage more workers, there are "slots" for each > "planned" worker, even though fewer were launched: Looking at explain.c for "num_workers; " (including the final space at the end), looking at each for loop that loops over each worker, quite a number of those locations have a condition that skips the worker. For example, show_sort_info() does if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS) continue; /* ignore any unfilled slots */ So maybe Hash Agg should be doing something similar. Additionally, maybe it should not show the leader details if the leader didn't help. David
On Wed, 5 Aug 2020 at 14:27, David Rowley <dgrowleyml@gmail.com> wrote: > > On Wed, 5 Aug 2020 at 14:13, Justin Pryzby <pryzby@telsasoft.com> wrote: > > Also odd (to me). If I encourage more workers, there are "slots" for each > > "planned" worker, even though fewer were launched: > > Looking at explain.c for "num_workers; " (including the final space at > the end), looking at each for loop that loops over each worker, quite > a number of those locations have a condition that skips the worker. > > For example, show_sort_info() does > > if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS) > continue; /* ignore any unfilled slots */ > > So maybe Hash Agg should be doing something similar. Additionally, > maybe it should not show the leader details if the leader didn't help. Here's what I had in mind. The unpatched format got even more broken with EXPLAIN (ANALYZE, VERBOSE), so this is certainly a bug fix. David
Attachment
On Wed, 5 Aug 2020 at 17:25, David Rowley <dgrowleyml@gmail.com> wrote: > > On Wed, 5 Aug 2020 at 14:27, David Rowley <dgrowleyml@gmail.com> wrote: > > So maybe Hash Agg should be doing something similar. Additionally, > > maybe it should not show the leader details if the leader didn't help. > > Here's what I had in mind. Just coming back to this. I'd like to push it soon, but it's currently late here. I'll look at pushing it in my morning in about 8 hours time. If anyone has any comments please let me know before then. David
On Fri, 7 Aug 2020 at 00:44, David Rowley <dgrowleyml@gmail.com> wrote: > Just coming back to this. I'd like to push it soon, but it's currently > late here. I'll look at pushing it in my morning in about 8 hours > time. Pushed. David