Re: Expose Parallelism counters planned/execute in pg_stat_statements - Mailing list pgsql-hackers

From Anthony Sotolongo
Subject Re: Expose Parallelism counters planned/execute in pg_stat_statements
Date
Msg-id 18bad111-2fbe-97dd-33f1-7b3fafa0e506@gmail.com
Whole thread Raw
In response to Re: Expose Parallelism counters planned/execute in pg_stat_statements  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Expose Parallelism counters planned/execute in pg_stat_statements
List pgsql-hackers


On 22-07-22 12:08, Julien Rouhaud wrote:
Hi,

On Fri, Jul 22, 2022 at 11:17:52AM -0400, Anthony Sotolongo wrote:
On 21-07-22 20:35, Justin Pryzby wrote:
On Thu, Jul 21, 2022 at 06:26:58PM -0400, Anthony Sotolongo wrote:
Hi all:
Here's a patch to add counters about  planned/executed  for parallelism  to
pg_stat_statements, as a way to follow-up on if the queries are
planning/executing with parallelism, this can help to understand if you have
a good/bad configuration or if your hardware is enough
+1, I was missing something like this before, but it didn't occur to me to use
PSS:
First of all, thanks for review the the patch and for the comments


https://www.postgresql.org/message-id/20200310190142.GB29065@telsasoft.com
My hope is to answer to questions like these:

. is query (ever? usually?) using parallel paths?
. is query usefully using parallel paths?
. what queries are my max_parallel_workers(_per_process) being used for ?
. Are certain longrunning or frequently running queries which are using   parallel paths using all max_parallel_workers and precluding other queries   from using parallel query ?  Or, are semi-short queries sometimes precluding   longrunning queries from using parallelism, when the long queries would   better benefit ?
This patch is storing the number of times the query was planned/executed using
parallelism, but not the number of workers.  Would it make sense to instead
store the the *number* of workers launched/planned ?  Otherwise, it might be
that a query is consistently planned to use a large number of workers, but then
runs with few.  I'm referring to the fields shown in "explain/analyze".  (Then,
the 2nd field should be renamed to "launched").
          Workers Planned: 2          Workers Launched: 2
The main idea of the patch is to store the number of times the statements
were planned and executed in parallel, not the number of workers used in the
execution. Of course, what you mention can be helpful, it will be given a
review to see how it can be achieved
I think you would need both information.

With your current patch it only says if the plan and execution had parallelism
enabled, but not if it could actually use with parallelism at all.  It gives
some information, but it's not that useful on its own.

The original idea of this patch was  identify when occurred some of the circumstances under which it was  impossible to execute that plan in parallel at execution time

as mentioned on the documentation at [1]

For example:

Due to the different client configuration, the execution behavior can be  different , and can affect the performance:

As you can see in the above execution plan


From psql

            ->  Gather Merge  (cost=779747.43..795700.62 rows=126492 width=40) (actual time=1109.515..1472.369 rows=267351 loops=1)
                  Output: t.entity_node_id, t.configuration_id, t.stream_def_id, t.run_type_id, t.state_datetime, (PARTIAL count(1))
                  Workers Planned: 6
                  Workers Launched: 6
                  ->  Partial GroupAggregate  (cost=778747.33..779327.09 rows=21082 width=40) (actual time=889.129..974.028 rows=38193 loops=7)

From jdbc (from dbeaver)

            ->  Gather Merge  (cost=779747.43..795700.62 rows=126492 width=40) (actual time=4383.576..4385.856 rows=398 loops=1)
                  Output: t.entity_node_id, t.configuration_id, t.stream_def_id, t.run_type_id, t.state_datetime, (PARTIAL count(1))
                  Workers Planned: 6
                  Workers Launched: 0
                  ->  Partial GroupAggregate  (cost=778747.33..779327.09 rows=21082 width=40) (actual time=4383.574..4385.814 rows=398 loops=1)

This example was  discussed also at this Thread [2]

With these PSS counters will be easily identified when some of these causes are happening.
 [1] https://www.postgresql.org/docs/current/when-can-parallel-query-be-used.html

 [2] https://www.postgresql.org/message-id/flat/32277_1555482629_5CB6C805_32277_8_1_A971FB43DFBC3D4C859ACB3316C9FF4632D98B37%40OPEXCAUBM42.corporate.adroot.infra.ftgroup


Also, a cumulated number of workers isn't really useful if you don't know what
fraction of the number of executions (or planning) they refer to.

We will try to investigate how to do this.


That being said, I'm not sure how exactly the information about the number of
workers can be exposed, as there might be multiple gathers per plan and AKAIK
they can run at different part of the query execution.  So in some case having
a total of 3 workers planned means that you ideally needed 3 workers available
at the same time, and in some other case it might be only 2 or even 1.

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: warn if GUC set to an invalid shared library
Next
From: Alvaro Herrera
Date:
Subject: Re: make -C libpq check fails obscurely if tap tests are disabled