Thread: Why is Postgres only using 8 cores for partitioned count?
hi, How can I convince Postgres to use more than 8 cores? I've got an r6gd.16xlarge with 64 vCpus and 512gb RAM running Postgres 13.1 on Ubuntu 20.04. CREATE TABLE tbl ( [...] ) PARTITION BY HASH (address_key); It has 64 partitions - e.g. FOR VALUES WITH (modulus 64, remainder 0) etc. We're running `SELECT COUNT(*) FROM tbl`. I've watched top and I never see more than 8 cores going 100%. Here is my (admittedly ridiculous) postgresql.conf: checkpoint_completion_target = 0.9 data_directory='/tank/postgresql/13/main' default_statistics_target = 100 effective_cache_size = 381696MB effective_io_concurrency = 200 enable_partition_pruning=on enable_partitionwise_aggregate=on enable_partitionwise_join=on listen_addresses='*' maintenance_work_mem = 2GB max_connections = 200 max_parallel_maintenance_workers = 4 max_parallel_workers = 512 max_parallel_workers_per_gather = 512 max_wal_size = 4GB max_worker_processes = 512 min_wal_size = 1GB random_page_cost = 1.1 shared_buffers = 127232MB shared_preload_libraries = 'cstore_fdw' synchronous_commit=off wal_buffers = 16MB work_mem = 1628560kB Best, Seamus -- Seamus Abshere, SCEA https://faraday.ai https://github.com/seamusabshere https://linkedin.com/in/seamusabshere
Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]
From
"Seamus Abshere"
Date:
hi, I've traced this back to the formula for Parallel Append workers - log2(partitions). The comment from Robert says: (src/backend/optimizer/path/allpaths.c) /* * If the use of parallel append is permitted, always request at least * log2(# of children) workers. In my case, every partition takes 1 second to scan, I have 64 cores, I have 64 partitions, and the wall time is 8 secondswith 8 workers. I assume that if it it planned significantly more workers (16? 32? even 64?), it would get significantly faster (even accountingfor transaction cost). So why doesn't it ask for more? Note that I've set max_parallel_workers=512, etc. (postgresql.confin my first message). Here are full plans https://gist.github.com/seamusabshere/1c1d869ee24145689f45e4124d8730d3 Best, Seamus On Fri, Feb 12, 2021, at 4:50 PM, Seamus Abshere wrote: > hi, > > How can I convince Postgres to use more than 8 cores? > > I've got an r6gd.16xlarge with 64 vCpus and 512gb RAM running Postgres > 13.1 on Ubuntu 20.04. > > CREATE TABLE tbl ( > [...] > ) PARTITION BY HASH (address_key); > > It has 64 partitions - e.g. FOR VALUES WITH (modulus 64, remainder 0) etc. > > We're running `SELECT COUNT(*) FROM tbl`. > > I've watched top and I never see more than 8 cores going 100%. > > Here is my (admittedly ridiculous) postgresql.conf: > > checkpoint_completion_target = 0.9 > data_directory='/tank/postgresql/13/main' > default_statistics_target = 100 > effective_cache_size = 381696MB > effective_io_concurrency = 200 > enable_partition_pruning=on > enable_partitionwise_aggregate=on > enable_partitionwise_join=on > listen_addresses='*' > maintenance_work_mem = 2GB > max_connections = 200 > max_parallel_maintenance_workers = 4 > max_parallel_workers = 512 > max_parallel_workers_per_gather = 512 > max_wal_size = 4GB > max_worker_processes = 512 > min_wal_size = 1GB > random_page_cost = 1.1 > shared_buffers = 127232MB > shared_preload_libraries = 'cstore_fdw' > synchronous_commit=off > wal_buffers = 16MB > work_mem = 1628560kB > > Best, > Seamus > > > -- > Seamus Abshere, SCEA > https://faraday.ai > https://github.com/seamusabshere > https://linkedin.com/in/seamusabshere
700% faster (was: Re: Why is Postgres only using 8 cores for partitioned count? [Parallel append])
From
"Seamus Abshere"
Date:
hi, Apologies for the self-replying, but I made it 700% faster by compiling from source and forcing the parallel_workers from7 to 96. If this weren't a partitioned table, I could just do `alter table test_3pd_cstore_partitioned set (parallel_workers = 96)`- but that isn't currently supported. diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index cd3fdd259c..0057a69d4e 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -1544,6 +1544,9 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, max_parallel_workers_per_gather); Assert(parallel_workers > 0); + // force a crazy parallelism + parallel_workers = 96; + appendpath = create_append_path(root, rel, pa_nonpartial_subpaths, pa_partial_subpaths, NIL, NULL, parallel_workers, true, BEFORE: postgres=# explain analyze select count(*) from test_3pd_cstore_partitioned where age > 50; Finalize Aggregate (cost=558860.96..558860.97 rows=1 width=8) (actual time=8528.509..8538.627 rows=1 loops=1) -> Gather (cost=56918.65..558860.94 rows=7 width=8) (actual time=1863.462..8538.603 rows=64 loops=1) Workers Planned: 7 Workers Launched: 7 -> Parallel Append (cost=55918.65..557860.24 rows=1 width=8) (actual time=1877.875..8417.486 rows=8 loops=8) [...] Execution Time: 8565.734 ms AFTER: postgres=# explain analyze select count(*) from test_3pd_cstore_partitioned where age > 50; Finalize Aggregate (cost=57030.20..57030.21 rows=1 width=8) (actual time=1125.828..1129.958 rows=1 loops=1) -> Gather (cost=56611.92..57029.96 rows=96 width=8) (actual time=994.708..1129.940 rows=64 loops=1) Workers Planned: 96 Workers Launched: 96 -> Parallel Append (cost=55611.92..56020.36 rows=1 width=8) (actual time=656.027..656.051 rows=1 loops=97) [...] Execution Time: 1133.810 ms Should I try to submit a patch that adds support for "alter table test_3pd_cstore_partitioned set (parallel_workers = 96)"for partitioned tables? Best, Seamus PS. The awesome thing about this is that I'm table scanning a 270 million row, 600 column table in 1.2 seconds, which I neverthought was possible. PPS. I have no idea why 96 worked better than 64 (the number of cores I have), but it did - 700% vs 400%. On Sat, Feb 13, 2021, at 7:09 PM, Seamus Abshere wrote: > hi, > > I've traced this back to the formula for Parallel Append workers - > log2(partitions). > > The comment from Robert says: (src/backend/optimizer/path/allpaths.c) > > /* > * If the use of parallel append is permitted, always request at least > * log2(# of children) workers. > > In my case, every partition takes 1 second to scan, I have 64 cores, I > have 64 partitions, and the wall time is 8 seconds with 8 workers. > > I assume that if it it planned significantly more workers (16? 32? even > 64?), it would get significantly faster (even accounting for > transaction cost). So why doesn't it ask for more? Note that I've set > max_parallel_workers=512, etc. (postgresql.conf in my first message). > > Here are full plans > https://gist.github.com/seamusabshere/1c1d869ee24145689f45e4124d8730d3 > > Best, > Seamus > > On Fri, Feb 12, 2021, at 4:50 PM, Seamus Abshere wrote: > > hi, > > > > How can I convince Postgres to use more than 8 cores? > > > > I've got an r6gd.16xlarge with 64 vCpus and 512gb RAM running Postgres > > 13.1 on Ubuntu 20.04. > > > > CREATE TABLE tbl ( > > [...] > > ) PARTITION BY HASH (address_key); > > > > It has 64 partitions - e.g. FOR VALUES WITH (modulus 64, remainder 0) etc. > > > > We're running `SELECT COUNT(*) FROM tbl`. > > > > I've watched top and I never see more than 8 cores going 100%. > > > > Here is my (admittedly ridiculous) postgresql.conf: > > > > checkpoint_completion_target = 0.9 > > data_directory='/tank/postgresql/13/main' > > default_statistics_target = 100 > > effective_cache_size = 381696MB > > effective_io_concurrency = 200 > > enable_partition_pruning=on > > enable_partitionwise_aggregate=on > > enable_partitionwise_join=on > > listen_addresses='*' > > maintenance_work_mem = 2GB > > max_connections = 200 > > max_parallel_maintenance_workers = 4 > > max_parallel_workers = 512 > > max_parallel_workers_per_gather = 512 > > max_wal_size = 4GB > > max_worker_processes = 512 > > min_wal_size = 1GB > > random_page_cost = 1.1 > > shared_buffers = 127232MB > > shared_preload_libraries = 'cstore_fdw' > > synchronous_commit=off > > wal_buffers = 16MB > > work_mem = 1628560kB > > > > Best, > > Seamus
Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]
From
David Rowley
Date:
On Sun, 14 Feb 2021 at 13:15, Seamus Abshere <sabshere@alumni.princeton.edu> wrote: > The comment from Robert says: (src/backend/optimizer/path/allpaths.c) > > /* > * If the use of parallel append is permitted, always request at least > * log2(# of children) workers. > > In my case, every partition takes 1 second to scan, I have 64 cores, I have 64 partitions, and the wall time is 8 secondswith 8 workers. > > I assume that if it it planned significantly more workers (16? 32? even 64?), it would get significantly faster (even accountingfor transaction cost). So why doesn't it ask for more? Note that I've set max_parallel_workers=512, etc. (postgresql.confin my first message). There's perhaps an argument for allowing ALTER TABLE <partitioned table> SET (parallel_workers=N); to be set on partitioned tables, but we don't currently allow it. What you might want to try is setting that for any of those 64 partitions. Shortly above the code comment that you quoted above, there's some code that finds the path for the partition with the maximum number of parallel workers. If one of those partitions is using, say 64 workers because you set the partitions "parallel_workers" setting to 64, and providing you have max_parallel_workers_per_gather set highly enough, then your Append should get 64 workers. You'll need to be careful though since changing the partitions parallel_workers may affect things for other queries too. Also, if you were to only change 1 partition and that partition were to be pruned, then you'd not get the 64 workers. David
Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]
From
"Seamus Abshere"
Date:
Hi David, That is a great suggestion, except my partitions are foreign tables. I wonder if I could figure out a single empty partition that’s just a normal table... but I’m using hash partitioning, soI think I would lose a modulus. Best, Seamus -- Seamus Abshere, SCEA +1 (608) 772-0696 https://www.faraday.io https://github.com/seamusabshere https://linkedin.com/in/seamusabshere On Sun, Feb 14, 2021, at 4:47 AM, David Rowley wrote: > On Sun, 14 Feb 2021 at 13:15, Seamus Abshere > <sabshere@alumni.princeton.edu> wrote: > > The comment from Robert says: (src/backend/optimizer/path/allpaths.c) > > > > /* > > * If the use of parallel append is permitted, always request at least > > * log2(# of children) workers. > > > > In my case, every partition takes 1 second to scan, I have 64 cores, I have 64 partitions, and the wall time is 8 secondswith 8 workers. > > > > I assume that if it it planned significantly more workers (16? 32? even 64?), it would get significantly faster (evenaccounting for transaction cost). So why doesn't it ask for more? Note that I've set max_parallel_workers=512, etc.(postgresql.conf in my first message). > > There's perhaps an argument for allowing ALTER TABLE <partitioned > table> SET (parallel_workers=N); to be set on partitioned tables, but > we don't currently allow it. > > What you might want to try is setting that for any of those 64 > partitions. Shortly above the code comment that you quoted above, > there's some code that finds the path for the partition with the > maximum number of parallel workers. If one of those partitions is > using, say 64 workers because you set the partitions > "parallel_workers" setting to 64, and providing you have > max_parallel_workers_per_gather set highly enough, then your Append > should get 64 workers. > > You'll need to be careful though since changing the partitions > parallel_workers may affect things for other queries too. Also, if you > were to only change 1 partition and that partition were to be pruned, > then you'd not get the 64 workers. > > David >
Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]
From
Gavin Flower
Date:
On 14/02/2021 22:47, David Rowley wrote: > On Sun, 14 Feb 2021 at 13:15, Seamus Abshere > <sabshere@alumni.princeton.edu> wrote: >> The comment from Robert says: (src/backend/optimizer/path/allpaths.c) >> >> /* >> * If the use of parallel append is permitted, always request at least >> * log2(# of children) workers. >> >> In my case, every partition takes 1 second to scan, I have 64 cores, I have 64 partitions, and the wall time is 8 secondswith 8 workers. >> >> I assume that if it it planned significantly more workers (16? 32? even 64?), it would get significantly faster (evenaccounting for transaction cost). So why doesn't it ask for more? Note that I've set max_parallel_workers=512, etc.(postgresql.conf in my first message). > There's perhaps an argument for allowing ALTER TABLE <partitioned > table> SET (parallel_workers=N); to be set on partitioned tables, but > we don't currently allow it. [...] > David Just wondering why there is a hard coded limit. While I agree it might be good to be able specify the number of workers, sure it would be possible to derive a suitable default based on the number of effective processors available? Cheers, Gavin
Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]
From
Laurenz Albe
Date:
On Sun, 2021-02-14 at 22:47 +1300, David Rowley wrote: > On Sun, 14 Feb 2021 at 13:15, Seamus Abshere > > <sabshere@alumni.princeton.edu> wrote: > > > The comment from Robert says: (src/backend/optimizer/path/allpaths.c) > > /* > > * If the use of parallel append is permitted, always request at least > > * log2(# of children) workers. > > In my case, every partition takes 1 second to scan, I have 64 cores, I have 64 partitions, and the wall time is 8 secondswith 8 workers. > > I assume that if it it planned significantly more workers (16? 32? even 64?), it would get significantly faster (evenaccounting for transaction cost). So why doesn't it ask for more? Note that > > I've set max_parallel_workers=512, etc. (postgresql.conf in my first message). > > There's perhaps an argument for allowing ALTER TABLE <partitioned > table> SET (parallel_workers=N); to be set on partitioned tables, but > we don't currently allow it. That would be great; I have been hit by this before. > What you might want to try is setting that for any of those 64 > partitions. Shortly above the code comment that you quoted above, > there's some code that finds the path for the partition with the > maximum number of parallel workers. If one of those partitions is > using, say 64 workers because you set the partitions > "parallel_workers" setting to 64, and providing you have > max_parallel_workers_per_gather set highly enough, then your Append > should get 64 workers. Hmm - that didn't work when I tried it, but perhaps I should try again. > You'll need to be careful though since changing the partitions > parallel_workers may affect things for other queries too. Also, if you > were to only change 1 partition and that partition were to be pruned, > then you'd not get the 64 workers. I think this is an area where parallel query could be improved. One think is runtime partition pruning: if the optimizer thinks that it will have to scan a lot of partitions, it will plan a lot of workers. But if the executor reduces that number to 1, we end up with way too many workers. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]
From
Fabio Pardi
Date:
On 14/02/2021 22:16, Gavin Flower wrote: > On 14/02/2021 22:47, David Rowley wrote: >> On Sun, 14 Feb 2021 at 13:15, Seamus Abshere >> <sabshere@alumni.princeton.edu> wrote: >>> The comment from Robert says: (src/backend/optimizer/path/allpaths.c) >>> >>> /* >>> * If the use of parallel append is permitted, always request at least >>> * log2(# of children) workers. >>> >>> In my case, every partition takes 1 second to scan, I have 64 cores, I have 64 partitions, and the wall time is 8 secondswith 8 workers. >>> >>> I assume that if it it planned significantly more workers (16? 32? even 64?), it would get significantly faster (evenaccounting for transaction cost). So why doesn't it ask for more? Note that I've set max_parallel_workers=512, etc.(postgresql.conf in my first message). >> There's perhaps an argument for allowing ALTER TABLE <partitioned >> table> SET (parallel_workers=N); to be set on partitioned tables, but >> we don't currently allow it. > [...] >> David > > Just wondering why there is a hard coded limit. > > While I agree it might be good to be able specify the number of workers, sure it would be possible to derive a suitabledefault based on the number of effective processors available? > I had the same problem and my conclusion was that it is not possible to go above 8 cores because of Amdahl's law on parallelcomputing. More here: https://en.wikipedia.org/wiki/Amdahl%27s_law regards, fabio pardi
Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]
From
David Rowley
Date:
On Mon, 15 Feb 2021 at 10:16, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote: > Just wondering why there is a hard coded limit. I don't see where the hardcoded limit is. The total number is limited to max_parallel_workers_per_gather, but there's nothing hardcoded about the value of that. > While I agree it might be good to be able specify the number of workers, > sure it would be possible to derive a suitable default based on the > number of effective processors available? It's a pretty tricky thing to get right. The problem is that something has to rationalise the use of parallel worker processes. Does it seem reasonable to you to use the sum of the Append child parallel workers? If so, I imagine someone else would think that would be pretty insane. We do have to consider the fact that we're trying to share those parallel worker processes with other backends which also might want to get some use out of them. As for if we need some rel option for partitioned tables. I think that's also tricky. Sure, we could go and add a "parallel_workers" relopt to partitioned tables, but then that's going to be applied regardless of how many partitions survive partition pruning. There could be as little as 2 subpaths in an Append, or the number could be in the thousands. I can't imagine anyone really wants the same number of parallel workers in each of those two cases. So I can understand why ab7271677 wanted to take into account the number of append children. Maybe there's some room for some other relopt that just changes the behaviour of that code. It does not seem too unreasonable that someone might like to take the sum of the Append child parallel workers. That value would still be capped at max_parallel_workers_per_gather, so it shouldn't ever go too insane unless someone set that GUC to something insane, which would be their choice. I'm not too sure which such a relopt would be called. Additionally, for the case being reported here. Since all Append children are foreign tables, there is actually some work going on to make it so workers don't have to sit by and wait until the foreign server returns the results. I don't think anyone would disagree that it's pretty poor use of a parallel worker to have it sit there doing nothing for minutes at a time waiting for a single tuple from a foreign data wrapper. I'm not sure of the status of that work, but if you want to learn more about it, please see [1] David [1] https://commitfest.postgresql.org/32/2491/
Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]
From
David Rowley
Date:
On Tue, 16 Feb 2021 at 02:12, Fabio Pardi <f.pardi@portavita.eu> wrote: > > On 14/02/2021 22:16, Gavin Flower wrote: > > While I agree it might be good to be able specify the number of workers, sure it would be possible to derive a suitabledefault based on the number of effective processors available? > > I had the same problem and my conclusion was that it is not possible to go above 8 cores because of Amdahl's law on parallelcomputing. More here: https://en.wikipedia.org/wiki/Amdahl%27s_law That would really depend on what the non-parallel part of the equation was. There are some plan shapes such as GROUP BY or aggregate queries with very few or just 1 group where the serial portion of the execution is very small indeed. David