Re: Add index scan progress to pg_stat_progress_vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Add index scan progress to pg_stat_progress_vacuum
Date
Msg-id CAD21AoD13LQ4+suYVWCyp07PHgoHVmAPZwc8neNdkTFVsSi-ww@mail.gmail.com
Whole thread Raw
In response to Re: Add index scan progress to pg_stat_progress_vacuum  ("Imseih (AWS), Sami" <simseih@amazon.com>)
Responses Re: Add index scan progress to pg_stat_progress_vacuum  ("Imseih (AWS), Sami" <simseih@amazon.com>)
List pgsql-hackers
On Sat, Mar 12, 2022 at 4:00 PM Imseih (AWS), Sami <simseih@amazon.com> wrote:
>
> > nitpick: Can we remove the extra spaces in the parentheses?
>
> fixed
>
> > What does it mean if there isn't an entry in the map?  Is this actually
> > expected, or should we ERROR instead?
>
> I cleaned up the code here and added comments.
>
> > I think the number of entries should be shared between
> > VacuumWorkerProgressShmemInit() and VacuumWorkerProgressShmemSize().
> > Otherwise, we might update one and not the other.
>
> Fixed
>
> > I think we should elaborate a bit more in this comment.  It's difficult to
> > follow what this is doing without referencing the comment above
> > set_vacuum_worker_progress().
>
> More comments added
>
> I also simplified the 0002 patch as well.

I'm still unsure the current design of 0001 patch is better than other
approaches we’ve discussed. Even users who don't use parallel vacuum
are forced to allocate shared memory for index vacuum progress, with
GetMaxBackends() entries from the beginning. Also, it’s likely to
extend the progress tracking feature for other parallel operations in
the future but I think the current design is not extensible. If we
want to do that, we will end up creating similar things for each of
them or re-creating index vacuum progress tracking feature while
creating a common infra. It might not be a problem as of now but I'm
concerned that introducing a feature that is not extensible and forces
users to allocate additional shmem might be a blocker in the future.
Looking at the precedent example, When we introduce the progress
tracking feature, we implemented it in an extensible way. On the other
hand, others in this thread seem to agree with this approach, so I'd
like to leave it to committers.

Anyway, here are some comments on v5-0001 patch:

+/* in commands/vacuumparallel.c */
+extern void VacuumWorkerProgressShmemInit(void);
+extern Size VacuumWorkerProgressShmemSize(void);
+extern void vacuum_worker_end(int leader_pid);
+extern void vacuum_worker_update(int leader_pid);
+extern void vacuum_worker_end_callback(int code, Datum arg);
+extern void set_vaccum_worker_progress(Datum *values);

These functions' body is not in vacuumparallel.c. As the comment says,
I think these functions should be implemented in vacuumparallel.c.

---
+/*
+ * set_vaccum_worker_progress --- updates the number of indexes that have been
+ * vacuumed or cleaned up in a parallel vacuum.
+ */
+void
+set_vaccum_worker_progress(Datum *values)

s/vaccum/vacuum/

---
+void
+set_vaccum_worker_progress(Datum *values)
+{
+        VacProgressEntry *entry;
+        int leader_pid = values[0];

I thik we should use DatumGetInt32().

---
+        entry = (VacProgressEntry *)
hash_search(VacuumWorkerProgressHash, &leader_pid, HASH_ENTER_NULL,
&found);
+
+        if (!entry)
+                elog(ERROR, "cannot allocate shared memory for vacuum
worker progress");

Since we raise an error in case of out of memory, I think we can use
HASH_ENTER instead of HASH_ENTER_NULL. Or do we want to emit a
detailed error message here?

---
+       VacuumWorkerProgressHash = NULL;

This line is not necessary.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: logical decoding and replication of sequences
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: pg_walinspect - a new extension to get raw WAL data and WAL stats