Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes - Mailing list pgsql-hackers

From Guillaume Lelarge
Subject Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes
Date
Msg-id CAECtzeWb15z6UJcvRx14Pp0Z5YC5KYq2rGd__W_Y=QH-QFF4ug@mail.gmail.com
Whole thread Raw
In response to Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes
List pgsql-hackers

Le lun. 7 oct. 2024 à 02:41, Michael Paquier <michael@paquier.xyz> a écrit :
On Mon, Oct 07, 2024 at 12:43:18AM +0300, Alena Rybakina wrote:
> Maybe I'm not aware of the whole context of the thread and maybe my
> questions will seem a bit stupid, but honestly
> it's not entirely clear to me how this statistics will help to adjust the
> number of parallel workers.
> We may have situations when during overestimation of the cardinality during
> query optimization a several number of parallel workers were unjustifiably
> generated and vice versa -
> due to a high workload only a few number of workers were generated.
> How do we identify such cases so as not to increase or decrease the number
> of parallel workers when it is not necessary?

Well.  For spiky workloads, only these numbers are not going to help.
If you can map them with the number of times a query related to these
tables has been called, something that pg_stat_statements would be
able to show more information about.

FWIW, I have doubts that these numbers attached to this portion of the
system are always useful.  For OLTP workloads, parallel workers would
unlikely be spawned because even with JOINs we won't work with a high
number of tuples that require them.  This could be interested with
analytics, however complex query sequences mean that we'd still need
to look at all the plans involving the relations where there is an
unbalance of planned/spawned workers, because these can usually
involve quite a few gather nodes.  At the end of the day, it seems to
me that we would still need data that involves statements to track
down specific plans that are starving.  If your application does not
have that many statements, looking at individial plans is OK, but if
you have hundreds of them to dig into, this is time-consuming and
stats at table/index level don't offer data in terms of stuff that
stands out and needs adjustments.

And this is without the argument of bloating more the stats entries
for each table, even if it matters less now that these stats are in
shmem lately.

We need granularity because we have granularity in the config. There is pg_stat_database because it gives the whole picture and it is easier to monitor. And then, there is pg_stat_statements to analyze problematic statements. And finally there is pg_stat_all* because you can set parallel_workers on a specific table.

Anyway, offering various ways of getting the same information is not unheard of. Pretty much like temp_files/temp_bytes in pg_stat_database, temp_blks_read/temp_blks_written in pg_stat_statements and log_temp_files in log files if you ask me :)


--
Guillaume.

pgsql-hackers by date:

Previous
From: Hunaid Sohail
Date:
Subject: Re: Psql meta-command conninfo+
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: Make default subscription streaming option as Parallel