Thread: Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes

Hi,

On Thu, Aug 29, 2024 at 04:04:05PM +0200, Guillaume Lelarge wrote:
> Hello,
> 
> This patch was a bit discussed on [1], and with more details on [2]. It
> introduces four new columns in pg_stat_all_tables:
> 
> * parallel_seq_scan
> * last_parallel_seq_scan
> * parallel_idx_scan
> * last_parallel_idx_scan
> 
> and two new columns in pg_stat_all_indexes:
> 
> * parallel_idx_scan
> * last_parallel_idx_scan
> 
> As Benoit said yesterday, the intent is to help administrators evaluate the
> usage of parallel workers in their databases and help configuring
> parallelization usage.

Thanks for the patch. I think that's a good idea to provide more instrumentation
in this area. So, +1 regarding this patch.

A few random comments:

1 ===

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>parallel_seq_scan</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of parallel sequential scans initiated on this table
+      </para></entry>
+     </row>

I wonder if we should not update the seq_scan too to indicate that it
includes the parallel_seq_scan.

Same kind of comment for last_seq_scan, idx_scan and last_idx_scan.

2 ===

@@ -410,6 +410,8 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
         */
        if (scan->rs_base.rs_flags & SO_TYPE_SEQSCAN)
                pgstat_count_heap_scan(scan->rs_base.rs_rd);
+  if (scan->rs_base.rs_parallel != NULL)
+               pgstat_count_parallel_heap_scan(scan->rs_base.rs_rd);

Indentation seems broken.

Shouldn't the parallel counter relies on the "scan->rs_base.rs_flags & SO_TYPE_SEQSCAN"
test too?

What about to get rid of the pgstat_count_parallel_heap_scan and add an extra
bolean parameter to pgstat_count_heap_scan to indicate if counts.parallelnumscans
should be incremented too?

Something like:

pgstat_count_heap_scan(scan->rs_base.rs_rd, scan->rs_base.rs_parallel != NULL)

3 ===

Same comment for pgstat_count_index_scan (add an extra bolean parameter) and
get rid of pgstat_count_parallel_index_scan()).

I think that 2 === and 3 === would help to avoid missing increments should we
add those call to other places in the future.

4 ===

+       if (lstats->counts.numscans || lstats->counts.parallelnumscans)

Is it possible to have (lstats->counts.parallelnumscans) whithout having
(lstats->counts.numscans) ?

> First time I had to add new columns to a statistics catalog. I'm actually
> not sure that we were right to change pg_proc.dat manually.

I think that's the right way to do.

I don't see a CF entry for this patch. Would you mind creating one so that
we don't lost track of it?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Wed, Sep 04, 2024 at 02:51:51PM +0200, Guillaume Lelarge wrote:
> Le mer. 4 sept. 2024 à 10:47, Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
> a écrit :
> 
> > I don't see a CF entry for this patch. Would you mind creating one so that
> > we don't lost track of it?
> >
> >
> I don't mind adding it, though I don't know if I should add it to the
> September or November commit fest. Which one should I choose?

Thanks! That should be the November one (as the September one already started).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Le mer. 4 sept. 2024 à 14:58, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> a écrit :
Hi,

On Wed, Sep 04, 2024 at 02:51:51PM +0200, Guillaume Lelarge wrote:
> Le mer. 4 sept. 2024 ą 10:47, Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
> a écrit :
>
> > I don't see a CF entry for this patch. Would you mind creating one so that
> > we don't lost track of it?
> >
> >
> I don't mind adding it, though I don't know if I should add it to the
> September or November commit fest. Which one should I choose?

Thanks! That should be the November one (as the September one already started).


I should have gone to the commit fest website, it says the same. I had the recollection that it started on the 15th. Anyway, added to the november commit fest (https://commitfest.postgresql.org/50/5238/).
 

--
Guillaume.
Hi,

On Wed, Sep 04, 2024 at 02:51:51PM +0200, Guillaume Lelarge wrote:
> Hi,
> 
> Le mer. 4 sept. 2024 à 10:47, Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
> a écrit :
> 
> > What about to get rid of the pgstat_count_parallel_heap_scan and add an
> > extra
> > bolean parameter to pgstat_count_heap_scan to indicate if
> > counts.parallelnumscans
> > should be incremented too?
> >
> > Something like:
> >
> > pgstat_count_heap_scan(scan->rs_base.rs_rd, scan->rs_base.rs_parallel !=
> > NULL)
> >
> > 3 ===
> >
> > Same comment for pgstat_count_index_scan (add an extra bolean parameter)
> > and
> > get rid of pgstat_count_parallel_index_scan()).
> >
> > I think that 2 === and 3 === would help to avoid missing increments should
> > we
> > add those call to other places in the future.
> >
> >
> Oh OK, understood.  Done for both.

Thanks for v2!

1 ===

-#define pgstat_count_heap_scan(rel)
+#define pgstat_count_heap_scan(rel, parallel)
        do {
-               if (pgstat_should_count_relation(rel))
-                       (rel)->pgstat_info->counts.numscans++;
+            if (pgstat_should_count_relation(rel)) {
+                       if (!parallel)
+                               (rel)->pgstat_info->counts.numscans++;
+                       else
+                               (rel)->pgstat_info->counts.parallelnumscans++;
+               }

I think counts.numscans has to be incremented in all the cases (so even if
"parallel" is true).

Same comment for pgstat_count_index_scan().

> 4 ===
> >
> > +       if (lstats->counts.numscans || lstats->counts.parallelnumscans)
> >
> > Is it possible to have (lstats->counts.parallelnumscans) whithout having
> > (lstats->counts.numscans) ?
> >
> >
> Nope, parallel scans are included in seq/index scans, as far as I can tell.
> I could remove the parallelnumscans testing but it would be less obvious to
> read.

2 ===

What about adding a comment instead of this extra check?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Wed, Sep 04, 2024 at 04:37:19PM +0200, Guillaume Lelarge wrote:
> Hi,
> 
> Le mer. 4 sept. 2024 à 16:18, Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
> a écrit :
> > What about adding a comment instead of this extra check?
> >
> >
> Done too in v3.

Thanks!

1 ===

+       /*
+        * Don't check counts.parallelnumscans because counts.numscans includes
+        * counts.parallelnumscans
+        */

"." is missing at the end of the comment.

2 ===

-               if (t > tabentry->lastscan)
+               if (t > tabentry->lastscan && lstats->counts.numscans)

The extra check on lstats->counts.numscans is not needed as it's already done
a few lines before.

3 ===

+               if (t > tabentry->parallellastscan && lstats->counts.parallelnumscans)

This one makes sense.
 
And now I'm wondering if the extra comment added in v3 is really worth it (and
does not sound confusing)? I mean, the parallel check is done once we passe
the initial test on counts.numscans. I think the code is clear enough without
this extra comment, thoughts? 

4 ===

What about adding a few tests? or do you want to wait a bit more to see if "
there's an agreement on this patch" (as you stated at the start of this thread).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi!

Finally found some time to work on this. Tests added on v5 patch (attached).
Maybe I'm not aware of the whole context of the thread and maybe my questions will seem a bit stupid, but honestly
it's not entirely clear to me how this statistics will help to adjust the number of parallel workers.
We may have situations when during overestimation of the cardinality during query optimization a several number of parallel workers were unjustifiably generated and vice versa -
due to a high workload only a few number of workers were generated.
How do we identify such cases so as not to increase or decrease the number of parallel workers when it is not necessary?
-- 
Regards,
Alena Rybakina
Postgres Professional

Le lun. 7 oct. 2024 à 02:41, Michael Paquier <michael@paquier.xyz> a écrit :
On Mon, Oct 07, 2024 at 12:43:18AM +0300, Alena Rybakina wrote:
> Maybe I'm not aware of the whole context of the thread and maybe my
> questions will seem a bit stupid, but honestly
> it's not entirely clear to me how this statistics will help to adjust the
> number of parallel workers.
> We may have situations when during overestimation of the cardinality during
> query optimization a several number of parallel workers were unjustifiably
> generated and vice versa -
> due to a high workload only a few number of workers were generated.
> How do we identify such cases so as not to increase or decrease the number
> of parallel workers when it is not necessary?

Well.  For spiky workloads, only these numbers are not going to help.
If you can map them with the number of times a query related to these
tables has been called, something that pg_stat_statements would be
able to show more information about.

FWIW, I have doubts that these numbers attached to this portion of the
system are always useful.  For OLTP workloads, parallel workers would
unlikely be spawned because even with JOINs we won't work with a high
number of tuples that require them.  This could be interested with
analytics, however complex query sequences mean that we'd still need
to look at all the plans involving the relations where there is an
unbalance of planned/spawned workers, because these can usually
involve quite a few gather nodes.  At the end of the day, it seems to
me that we would still need data that involves statements to track
down specific plans that are starving.  If your application does not
have that many statements, looking at individial plans is OK, but if
you have hundreds of them to dig into, this is time-consuming and
stats at table/index level don't offer data in terms of stuff that
stands out and needs adjustments.

And this is without the argument of bloating more the stats entries
for each table, even if it matters less now that these stats are in
shmem lately.

We need granularity because we have granularity in the config. There is pg_stat_database because it gives the whole picture and it is easier to monitor. And then, there is pg_stat_statements to analyze problematic statements. And finally there is pg_stat_all* because you can set parallel_workers on a specific table.

Anyway, offering various ways of getting the same information is not unheard of. Pretty much like temp_files/temp_bytes in pg_stat_database, temp_blks_read/temp_blks_written in pg_stat_statements and log_temp_files in log files if you ask me :)


--
Guillaume.
On 07.10.2024 03:41, Michael Paquier wrote:
> On Mon, Oct 07, 2024 at 12:43:18AM +0300, Alena Rybakina wrote:
>> Maybe I'm not aware of the whole context of the thread and maybe my
>> questions will seem a bit stupid, but honestly
>> it's not entirely clear to me how this statistics will help to adjust the
>> number of parallel workers.
>> We may have situations when during overestimation of the cardinality during
>> query optimization a several number of parallel workers were unjustifiably
>> generated and vice versa -
>> due to a high workload only a few number of workers were generated.
>> How do we identify such cases so as not to increase or decrease the number
>> of parallel workers when it is not necessary?
> Well.  For spiky workloads, only these numbers are not going to help.
> If you can map them with the number of times a query related to these
> tables has been called, something that pg_stat_statements would be
> able to show more information about.
>
> FWIW, I have doubts that these numbers attached to this portion of the
> system are always useful.  For OLTP workloads, parallel workers would
> unlikely be spawned because even with JOINs we won't work with a high
> number of tuples that require them.  This could be interested with
> analytics, however complex query sequences mean that we'd still need
> to look at all the plans involving the relations where there is an
> unbalance of planned/spawned workers, because these can usually
> involve quite a few gather nodes.  At the end of the day, it seems to
> me that we would still need data that involves statements to track
> down specific plans that are starving.  If your application does not
> have that many statements, looking at individial plans is OK, but if
> you have hundreds of them to dig into, this is time-consuming and
> stats at table/index level don't offer data in terms of stuff that
> stands out and needs adjustments.
>
> And this is without the argument of bloating more the stats entries
> for each table, even if it matters less now that these stats are in
> shmem lately.

To be honest, it’s not entirely clear to me how these statistics will 
help in setting up parallel workers.

As I understand, we need additional tools for analytics, which are 
available in pg_stat_statements, but how then does it work? maybe you 
have the opportunity to demonstrate this?

-- 
Regards,
Alena Rybakina
Postgres Professional




On 07.10.2024 09:34, Guillaume Lelarge wrote:
> We need granularity because we have granularity in the config. There 
> is pg_stat_database because it gives the whole picture and it is 
> easier to monitor. And then, there is pg_stat_statements to analyze 
> problematic statements. And finally there is pg_stat_all* because you 
> can set parallel_workers on a specific table.
>
yes, I agree with you. Even when I experimented with vacuum settings for 
database and used my vacuum statistics patch [0] for analyzes , I first 
looked at this change in the number of blocks or deleted rows at the 
database level,
and only then did an analysis of each table and index.

[0] https://commitfest.postgresql.org/50/5012/

-- 
Regards,
Alena Rybakina
Postgres Professional




Le lun. 11 nov. 2024 à 03:05, Michael Paquier <michael@paquier.xyz> a écrit :
On Tue, Oct 08, 2024 at 06:24:54PM +0300, Alena Rybakina wrote:
> yes, I agree with you. Even when I experimented with vacuum settings for
> database and used my vacuum statistics patch [0] for analyzes , I first
> looked at this change in the number of blocks or deleted rows at the
> database level,
> and only then did an analysis of each table and index.
>
> [0] https://commitfest.postgresql.org/50/5012/

As hinted on other related threads like around [1], I am so-so about
the proposal of these numbers at table and index level now that we
have e7a9496de906 and 5d4298e75f25.

In such cases, I apply the concept that I call the "Mention Bien" (or
when you get a baccalaureat diploma with honors and with a 14~16/20 in
France).  What we have is not perfect, still it's good enough to get
a 14/20 IMO, making hopefully 70~80% of users happy with these new
metrics.  Perhaps I'm wrong, but I'd be curious to know if this
thread's proposal is required at all at the end.


I agree with you. We'll see if we need more, but it's already good to have the metrics already commited.
 
I have not looked at the logging proposal yet.


I hope you'll have time to look at it. It seems to me very important to get that kind of info in the logs.

Thanks again.
 
[1]: https://www.postgresql.org/message-id/Zywxw7vqPLBfVfXN@paquier.xyz
--
Michael


--
Guillaume.
On Sun, Nov 10, 2024 at 9:05 PM Michael Paquier <michael@paquier.xyz> wrote:
> As hinted on other related threads like around [1], I am so-so about
> the proposal of these numbers at table and index level now that we
> have e7a9496de906 and 5d4298e75f25.

I think the question to which we don't have a clear answer is: for
what purpose would you want to be able to distinguish parallel and
non-parallel scans on a per-table basis?

I think it's fairly clear why the existing counters exist at a table
level. If an index isn't getting very many index scans, perhaps it's
useless -- or worse than useless if it interferes with HOT -- and
should be dropped. On the other hand if a table is getting a lot of
sequential scans even though it happens to be quite large, perhaps new
indexes are needed. Or if the indexes that we expect to get used are
not the same as those actually getting used, perhaps we want to add or
drop indexes or adjust the queries.

But it is unclear to me what sort of tuning we would do based on
knowing how many of the scans on a certain table or a certain index
were parallel vs non-parallel. I have not fully reviewed the threads
linked in the original post; but I did look at them briefly and did
not immediately see discussion of the specific counters proposed here.
I also don't see anything in this thread that clearly explains why we
should want this exact thing. I don't want to make it sound like I
know that this is useless; I'm sure that Guillaume probably has lots
of hands-on tuning experience with this stuff that I lack. But the
reasons aren't clearly spelled out as far as I can see, and I'm having
some trouble imagining what they are.

Compare the parallel worker draught stuff. It's really clear how that
is intended to be used. If we're routinely failing to launch workers,
then either max_parallel_workers_per_gather is too high or
max_parallel_workers is too low. Now, I will admit that I have a few
doubts about whether that feature will get much real-world use but it
seems hard to doubt that it HAS a use. In this case, that seems less
clear.

--
Robert Haas
EDB: http://www.enterprisedb.com