Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes
Date
Msg-id aWCz5D3vkbhIlCpX@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
List pgsql-hackers
Hi,

On Tue, Nov 12, 2024 at 12:41:19PM +0900, Michael Paquier wrote:
> On Mon, Nov 11, 2024 at 11:06:43AM -0500, Robert Haas wrote:
> > But it is unclear to me what sort of tuning we would do based on
> > knowing how many of the scans on a certain table or a certain index
> > were parallel vs non-parallel. I have not fully reviewed the threads
> > linked in the original post; but I did look at them briefly and did
> > not immediately see discussion of the specific counters proposed here.
> > I also don't see anything in this thread that clearly explains why we
> > should want this exact thing. I don't want to make it sound like I
> > know that this is useless; I'm sure that Guillaume probably has lots
> > of hands-on tuning experience with this stuff that I lack. But the
> > reasons aren't clearly spelled out as far as I can see, and I'm having
> > some trouble imagining what they are.
> 
> Thanks for the summary.  My main worry is that these are kind of hard
> to act on for tuning when aggregated at relation level (Guillaume,
> feel free to counter-argue!).  The main point that comes into mind is
> that single table scans would be mostly involved with OLTP workloads
> or simple joins, where parallel workers are of little use.  That could
> be much more interesting for analytical-ish workloads with more
> complex plan pattern where one or more Gather or GatherMerge nodes are
> involved.  Still, even in this case I suspect that most users will
> finish by looking at plan patterns, and that these counters added for
> index or tables would have a limited impact at the end.

While working on flushing stats outside of transaction boundaries (patch not
shared yet but linked to [1]), I realized that parallel workers could lead to
incomplete and misleading statistics. Indeed, they update "their" relation
stats during their shutdown regardless of the "main" transaction status. 

It means that, for example, stats like seq_scan, last_seq_scan and seq_tup_read
are updated by the parallel workers during their shutdown while the main
transaction has not finished. The stats are then somehow incomplete because the main
worker has not updated its stats yet. I think that could lead to misleading stats
that a patch like this one could help to address. For example, parallel workers
could update parallel_* dedicated stats and leave the non parallel_* stats update
responsibility to the main worker when the transaction finishes. That would make
the non parallel_* stats consistent whether parallel workers are used or not.

Thoughts?

[1]: https://www.postgresql.org/message-id/aVvgJu0BhnmzBWZ1@ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Exit walsender before confirming remote flush in logical replication
Next
From: Fujii Masao
Date:
Subject: Re: Exit walsender before confirming remote flush in logical replication