Thread: Add index scan progress to pg_stat_progress_vacuum
The current implementation of pg_stat_progress_vacuum does not provide progress on which index is being vacuumed making it difficult for a user to determine if the "vacuuming indexes" phase is making progress. By exposing which index is being scanned as well as the total progress the scan has made for the current cycle, a user can make better estimations on when the vacuum will complete.
The proposed patch adds 4 new columns to pg_stat_progress_vacuum:
1. indrelid - the relid of the index being vacuumed
2. index_blks_total - total number of blocks to be scanned in the current cycle
3. index_blks_scanned - number of blocks scanned in the current cycle
4. leader_pid - if the pid for the pg_stat_progress_vacuum entry is a leader or a vacuum worker. This patch places an entry for every worker pid ( if parallel ) as well as the leader pid
Attached is the patch.
Here is a sample output of a parallel vacuum for table with relid = 16638
postgres=# select * from pg_stat_progress_vacuum ;
-[ RECORD 1 ]------+------------------
pid | 18180
datid | 13732
datname | postgres
relid | 16638
phase | vacuuming indexes
heap_blks_total | 5149825
heap_blks_scanned | 5149825
heap_blks_vacuumed | 3686381
index_vacuum_count | 2
max_dead_tuples | 178956969
num_dead_tuples | 142086544
indrelid | 0 <<-----
index_blks_total | 0 <<-----
index_blks_scanned | 0 <<-----
leader_pid | <<-----
-[ RECORD 2 ]------+------------------
pid | 1543
datid | 13732
datname | postgres
relid | 16638
phase | vacuuming indexes
heap_blks_total | 0
heap_blks_scanned | 0
heap_blks_vacuumed | 0
index_vacuum_count | 0
max_dead_tuples | 0
num_dead_tuples | 0
indrelid | 16646
index_blks_total | 3030305
index_blks_scanned | 2356564
leader_pid | 18180
-[ RECORD 3 ]------+------------------
pid | 1544
datid | 13732
datname | postgres
relid | 16638
phase | vacuuming indexes
heap_blks_total | 0
heap_blks_scanned | 0
heap_blks_vacuumed | 0
index_vacuum_count | 0
max_dead_tuples | 0
num_dead_tuples | 0
indrelid | 16651
index_blks_total | 2685921
index_blks_scanned | 2119179
leader_pid | 18180
Regards,
Sami Imseih
Database Engineer @ Amazon Web Services
Attachment
On 12/1/21, 3:02 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: > The current implementation of pg_stat_progress_vacuum does not > provide progress on which index is being vacuumed making it > difficult for a user to determine if the "vacuuming indexes" phase > is making progress. By exposing which index is being scanned as well > as the total progress the scan has made for the current cycle, a > user can make better estimations on when the vacuum will complete. +1 > The proposed patch adds 4 new columns to pg_stat_progress_vacuum: > > 1. indrelid - the relid of the index being vacuumed > 2. index_blks_total - total number of blocks to be scanned in the > current cycle > 3. index_blks_scanned - number of blocks scanned in the current > cycle > 4. leader_pid - if the pid for the pg_stat_progress_vacuum entry is > a leader or a vacuum worker. This patch places an entry for every > worker pid ( if parallel ) as well as the leader pid nitpick: Shouldn't index_blks_scanned be index_blks_vacuumed? IMO it is more analogous to heap_blks_vacuumed. This will tell us which indexes are currently being vacuumed and the current progress of those operations, but it doesn't tell us which indexes have already been vacuumed or which ones are pending vacuum. I think such information is necessary to truly understand the current progress of vacuuming indexes, and I can think of a couple of ways we might provide it: 1. Make the new columns you've proposed return arrays. This isn't very clean, but it would keep all the information for a given vacuum operation in a single row. The indrelids column would be populated with all the indexes that have been vacuumed, need to be vacuumed, or are presently being vacuumed. The other index- related columns would then have the associated stats and the worker PID (which might be the same as the pid column depending on whether parallel index vacuum was being done). Alternatively, the index column could have an array of records, each containing all the information for a given index. 2. Create a new view for just index vacuum progress information. This would have similar information as 1. There would be an entry for each index that has been vacuumed, needs to be vacuumed, or is currently being vacuumed. And there would be an easy way to join with pg_stat_progress_vacuum (e.g., leader_pid, which again might be the same as our index vacuum PID depending on whether we were doing parallel index vacuum). Note that it would be possible for the PID of these entries to be null before and after we process the index. 3. Instead of adding columns to pg_stat_progress_vacuum, adjust the current ones to be more general, and then add new entries for each of the indexes that have been, need to be, or currently are being vacuumed. This is the most similar option to your current proposal, but instead of introducing a column like index_blks_total, we'd rename heap_blks_total to blks_total and use that for both the heap and indexes. I think we'd still want to add a leader_pid column. Again, we have to be prepared for the PID to be null in this case. Or we could just make the pid column always refer to the leader, and we could introduce a worker_pid column. That might create confusion, though. I wish option #1 was cleaner, because I think it would be really nice to have all this information in a single row. However, I don't expect much support for a 3-dimensional view, so I suspect option #2 (creating a separate view for index vacuum progress) is the way to go. The other benefit of option #2 versus option #3 or your original proposal is that it cleanly separates the top-level vacuum operations and the index vacuum operations, which are related at the moment, but which might not always be tied so closely together. Nathan
On 12/15/21, 4:10 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote: On 12/1/21, 3:02 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: > The current implementation of pg_stat_progress_vacuum does not > provide progress on which index is being vacuumed making it > difficult for a user to determine if the "vacuuming indexes" phase > is making progress. By exposing which index is being scanned as well > as the total progress the scan has made for the current cycle, a > user can make better estimations on when the vacuum will complete. +1 > The proposed patch adds 4 new columns to pg_stat_progress_vacuum: > > 1. indrelid - the relid of the index being vacuumed > 2. index_blks_total - total number of blocks to be scanned in the > current cycle > 3. index_blks_scanned - number of blocks scanned in the current > cycle > 4. leader_pid - if the pid for the pg_stat_progress_vacuum entry is > a leader or a vacuum worker. This patch places an entry for every > worker pid ( if parallel ) as well as the leader pid nitpick: Shouldn't index_blks_scanned be index_blks_vacuumed? IMO it is more analogous to heap_blks_vacuumed. No, What is being tracked is the number of index blocks scanned from the total index blocks. The block will be scanned regardlessif it will be vacuumed or not. This will tell us which indexes are currently being vacuumed and the current progress of those operations, but it doesn't tell us which indexes have already been vacuumed or which ones are pending vacuum. I think such information is necessary to truly understand the current progress of vacuuming indexes, and I can think of a couple of ways we might provide it: 1. Make the new columns you've proposed return arrays. This isn't very clean, but it would keep all the information for a given vacuum operation in a single row. The indrelids column would be populated with all the indexes that have been vacuumed, need to be vacuumed, or are presently being vacuumed. The other index- related columns would then have the associated stats and the worker PID (which might be the same as the pid column depending on whether parallel index vacuum was being done). Alternatively, the index column could have an array of records, each containing all the information for a given index. 2. Create a new view for just index vacuum progress information. This would have similar information as 1. There would be an entry for each index that has been vacuumed, needs to be vacuumed, or is currently being vacuumed. And there would be an easy way to join with pg_stat_progress_vacuum (e.g., leader_pid, which again might be the same as our index vacuum PID depending on whether we were doing parallel index vacuum). Note that it would be possible for the PID of these entries to be null before and after we process the index. 3. Instead of adding columns to pg_stat_progress_vacuum, adjust the current ones to be more general, and then add new entries for each of the indexes that have been, need to be, or currently are being vacuumed. This is the most similar option to your current proposal, but instead of introducing a column like index_blks_total, we'd rename heap_blks_total to blks_total and use that for both the heap and indexes. I think we'd still want to add a leader_pid column. Again, we have to be prepared for the PID to be null in this case. Or we could just make the pid column always refer to the leader, and we could introduce a worker_pid column. That might create confusion, though. I wish option #1 was cleaner, because I think it would be really nice to have all this information in a single row. However, I don't expect much support for a 3-dimensional view, so I suspect option #2 (creating a separate view for index vacuum progress) is the way to go. The other benefit of option #2 versus option #3 or your original proposal is that it cleanly separates the top-level vacuum operations and the index vacuum operations, which are related at the moment, but which might not always be tied so closely together. Option #1 is not clean as you will need to unnest the array to make sense out of it. It will be too complex to use. Option #3 I am reluctant to spent time looking at this option. It's more valuable to see progress per index instead of total. Option #2 was one that I originally designed but backed away as it was introducing a new view. Thinking about it a bit more,this is a cleaner approach. 1. Having a view called pg_stat_progress_vacuum_worker to join with pg_stat_progress_vacuum is clean 2. No changes required to pg_stat_progress_vacuum 3. I’ll lean towards calling the view " pg_stat_progress_vacuum_worker" instead of " pg_stat_progress_vacuum_index", to perhapsallow us to track other items a vacuum worker may do in future releases. As of now, only indexes are vacuumed by workers. I will rework the patch for option #2 Nathan
I had a similar question. And I'm still not clear from the response what exactly index_blks_total is and whether it addresses it. I think I agree that a user is likely to want to see the progress in a way they can understand which means for a single index at a time. I think what you're describing is that index_blks_total and index_blks_scanned are the totals across all the indexes? That isn't clear from the definitions but if that's what you intend then I think that would work. (For what it's worth what I was imagining was having a pair of counters for blocks scanned and max blocks in this index and a second counter for number of indexes processed and max number of indexes. But I don't think that's necessarily any better than what you have)
On Wed, Dec 15, 2021 at 2:10 PM Bossart, Nathan <bossartn@amazon.com> wrote: > nitpick: Shouldn't index_blks_scanned be index_blks_vacuumed? IMO it > is more analogous to heap_blks_vacuumed. +1. > This will tell us which indexes are currently being vacuumed and the > current progress of those operations, but it doesn't tell us which > indexes have already been vacuumed or which ones are pending vacuum. VACUUM will process a table's indexes in pg_class OID order (outside of parallel VACUUM, I suppose). See comments about sort order above RelationGetIndexList(). Anyway, it might be useful to add ordinal numbers to each index, that line up with this processing/OID order. It would also be reasonable to display the same number in log_autovacuum* (and VACUUM VERBOSE) per-index output, to reinforce the idea. Note that we don't necessarily display a distinct line for each distinct index in this log output, which is why including the ordinal number there makes sense. > I wish option #1 was cleaner, because I think it would be really nice > to have all this information in a single row. I do too. I agree with the specific points you raise in your remarks about what you've called options #2 and #3, but those options still seem unappealing to me. -- Peter Geoghegan
On Wed, Dec 1, 2021 at 2:59 PM Imseih (AWS), Sami <simseih@amazon.com> wrote: > The current implementation of pg_stat_progress_vacuum does not provide progress on which index is being vacuumed makingit difficult for a user to determine if the "vacuuming indexes" phase is making progress. I notice that your patch largely assumes that indexes can be treated like heap relations, in the sense that they're scanned sequentially, and process each block exactly once (or exactly once per "pass"). But that isn't quite true. There are a few differences that seem like they might matter: * An ambulkdelete() scan of an index cannot take the size of the relation once, at the start, and ignore any blocks that are added after the scan begins. And so the code may need to re-establish the total size of the index multiple times, to make sure no index tuples are missed -- there may be index tuples that VACUUM needs to process that appear in later pages due to concurrent page splits. You don't have the issue with things like IndexBulkDeleteResult.num_pages, because they report on the index after ambulkdelete/amvacuumcleanup return (they're not granular progress indicators). * Some index AMs don't work like nbtree and GiST in that they cannot do their scan sequentially -- they have to do something like a logical/keyspace order scan instead, which is *totally* different to heapam (not just a bit different). There is no telling how many times each page will be accessed in these other index AMs, and in what order, even under optimal conditions. We should arguably not even try to provide any granular progress information here, since it'll probably be too messy. I'm not sure what to recommend for your patch, in light of this. Maybe you should change the names of the new columns to own the squishiness. For example, instead of using the name index_blks_total, you might instead use the name index_blks_initial. That might be enough to avoid user confusion when we scan more blocks than the index initially contained (within a single ambulkdelete scan). Note also that we have to do something called backtracking in btvacuumpage(), which you've ignored -- that's another reasonably common way that we'll end up scanning a page twice. But that probably should just be ignored -- it's too narrow a case to be worth caring about. -- Peter Geoghegan
This view also doesn't show vacuum progress across a partitioned table. For comparison: pg_stat_progress_create_index (added in v12) has: partitions_total partitions_done pg_stat_progress_analyze (added in v13) has: child_tables_total child_tables_done pg_stat_progress_cluster should have something similar. -- Justin Pryzby System Administrator Telsasoft +1-952-707-8581
On 21/12/2021 00:05, Peter Geoghegan wrote: > * Some index AMs don't work like nbtree and GiST in that they cannot > do their scan sequentially -- they have to do something like a > logical/keyspace order scan instead, which is *totally* different to > heapam (not just a bit different). There is no telling how many times > each page will be accessed in these other index AMs, and in what > order, even under optimal conditions. We should arguably not even try > to provide any granular progress information here, since it'll > probably be too messy. Maybe we could add callbacks into AM interface for send/receive/representation implementation of progress? So AM would define a set of parameters to send into stat collector and show to users. -- regards, Andrey Lepikhov Postgres Professional
Please send your patches as *.diff or *.patch, so they're processed by the patch tester. Preferably with commit messages; git format-patch is the usual tool for this. http://cfbot.cputube.org/sami-imseih.html (Occasionally, it's also useful to send a *.txt to avoid the cfbot processing the wrong thing, in case one sends an unrelated, secondary patch, or sends fixes to a patch as a "relative patch" which doesn't include the main patch.) I'm including a patch rebased on 8e1fae193.
Attachment
I do agree that tracking progress by # of blocks scanned is not deterministic for all index types. Based on this feedback, I went back to the drawing board on this. Something like below may make more sense. In pg_stat_progress_vacuum, introduce 2 new columns: 1. total_index_vacuum - total # of indexes to vacuum 2. max_cycle_time - the time in seconds of the longest index cycle. Introduce another view called pg_stat_progress_vacuum_index_cycle: postgres=# \d pg_stat_progress_vacuum_index_cycle View "public.pg_stat_progress_vacuum_worker" Column | Type | Collation | Nullable | Default ----------------+---------+-----------+----------+--------- pid | integer | | | <<<-- the PID of the vacuum worker ( or leader if it's doingindex vacuuming ) leader_pid | bigint | | | <<<-- the leader PID to allow this view to be joined backto pg_stat_progress_vacuum indrelid | bigint | | | <<<- the index relid of the index being vacuumed ordinal_position | bigint | | | <<<- the processing position, which will give an idea ofthe processing position of the index being vacuumed. dead_tuples_removed | bigint | | <<<- the number of dead rows removed in the current cycle for the index. Having this information, one can 1. Determine which index is being vacuumed. For monitoring tools, this can help identify the index that accounts for mostof the index vacuuming time. 2. Having the processing order of the current index will allow the user to determine how many of the total indexes has beencompleted in the current cycle. 3. dead_tuples_removed will show progress on the index vacuum in the current cycle. 4. the max_cycle_time will give an idea on how long the longest index cycle took for the current vacuum operation. On 12/23/21, 2:46 AM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you canconfirm the sender and know the content is safe. On Tue, Dec 21, 2021 at 3:37 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Wed, Dec 15, 2021 at 2:10 PM Bossart, Nathan <bossartn@amazon.com> wrote: > > nitpick: Shouldn't index_blks_scanned be index_blks_vacuumed? IMO it > > is more analogous to heap_blks_vacuumed. > > +1. > > > This will tell us which indexes are currently being vacuumed and the > > current progress of those operations, but it doesn't tell us which > > indexes have already been vacuumed or which ones are pending vacuum. > > VACUUM will process a table's indexes in pg_class OID order (outside > of parallel VACUUM, I suppose). See comments about sort order above > RelationGetIndexList(). Right. > > Anyway, it might be useful to add ordinal numbers to each index, that > line up with this processing/OID order. It would also be reasonable to > display the same number in log_autovacuum* (and VACUUM VERBOSE) > per-index output, to reinforce the idea. Note that we don't > necessarily display a distinct line for each distinct index in this > log output, which is why including the ordinal number there makes > sense. An alternative idea would be to show the number of indexes on the table and the number of indexes that have been processed in the leader's entry of pg_stat_progress_vacuum. Even in parallel vacuum cases, since we have index vacuum status for each index it would not be hard for the leader process to count how many indexes have been processed. Regarding the details of the progress of index vacuum, I'm not sure this progress information can fit for pg_stat_progress_vacuum. As Peter already mentioned, the behavior quite varies depending on index AM. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attached is the latest revision of the patch. In "pg_stat_progress_vacuum", introduce 2 columns: * total_index_vacuum : This is the # of indexes that will be vacuumed. Keep in mind that if failsafe mode kicks in mid-flightto the vacuum, Postgres may choose to forgo index scans. This value will be adjusted accordingly. * max_index_vacuum_cycle_time : The total elapsed time for a index vacuum cycle is calculated and this value will be updatedto reflect the longest vacuum cycle. Until the first cycle completes, this value will be 0. The purpose of this columnis to give the user an idea of how long an index vacuum cycle takes to complete. postgres=# \d pg_stat_progress_vacuum View "pg_catalog.pg_stat_progress_vacuum" Column | Type | Collation | Nullable | Default -----------------------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | relid | oid | | | phase | text | | | heap_blks_total | bigint | | | heap_blks_scanned | bigint | | | heap_blks_vacuumed | bigint | | | index_vacuum_count | bigint | | | max_dead_tuples | bigint | | | num_dead_tuples | bigint | | | total_index_vacuum | bigint | | | max_index_vacuum_cycle_time | bigint | | | Introduce a new view called "pg_stat_progress_vacuum_index". This view will track the progress of a worker ( or leader PID) while it's vacuuming an index. It will expose some key columns: * pid: The PID of the worker process * leader_pid: The PID of the leader process. This is the column that can be joined with "pg_stat_progress_vacuum". leader_pidand pid can have the same value as a leader can also perform an index vacuum. * indrelid: The relid of the index currently being vacuumed * vacuum_cycle_ordinal_position: The processing position of the index being vacuumed. This can be useful to determine howmany indexes out of the total indexes ( pg_stat_progress_vacuum.total_index_vacuum ) have been vacuumed * index_tuples_vacuumed: This is the number of index tuples vacuumed for the index overall. This is useful to show that thevacuum is actually doing work, as the # of tuples keeps increasing. postgres=# \d pg_stat_progress_vacuum_index View "pg_catalog.pg_stat_progress_vacuum_index" Column | Type | Collation | Nullable | Default -------------------------------+---------+-----------+----------+--------- pid | integer | | | leader_pid | bigint | | | indrelid | bigint | | | vacuum_cycle_ordinal_position | bigint | | | index_tuples_vacuumed | bigint | | | On 12/27/21, 6:12 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: I do agree that tracking progress by # of blocks scanned is not deterministic for all index types. Based on this feedback, I went back to the drawing board on this. Something like below may make more sense. In pg_stat_progress_vacuum, introduce 2 new columns: 1. total_index_vacuum - total # of indexes to vacuum 2. max_cycle_time - the time in seconds of the longest index cycle. Introduce another view called pg_stat_progress_vacuum_index_cycle: postgres=# \d pg_stat_progress_vacuum_index_cycle View "public.pg_stat_progress_vacuum_worker" Column | Type | Collation | Nullable | Default ----------------+---------+-----------+----------+--------- pid | integer | | | <<<-- the PID of the vacuum worker ( or leader if it'sdoing index vacuuming ) leader_pid | bigint | | | <<<-- the leader PID to allow this view to be joinedback to pg_stat_progress_vacuum indrelid | bigint | | | <<<- the index relid of the index being vacuumed ordinal_position | bigint | | | <<<- the processing position, which will give an ideaof the processing position of the index being vacuumed. dead_tuples_removed | bigint | | <<<- the number of dead rows removed in the current cycle for the index. Having this information, one can 1. Determine which index is being vacuumed. For monitoring tools, this can help identify the index that accounts formost of the index vacuuming time. 2. Having the processing order of the current index will allow the user to determine how many of the total indexes hasbeen completed in the current cycle. 3. dead_tuples_removed will show progress on the index vacuum in the current cycle. 4. the max_cycle_time will give an idea on how long the longest index cycle took for the current vacuum operation. On 12/23/21, 2:46 AM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless youcan confirm the sender and know the content is safe. On Tue, Dec 21, 2021 at 3:37 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Wed, Dec 15, 2021 at 2:10 PM Bossart, Nathan <bossartn@amazon.com> wrote: > > nitpick: Shouldn't index_blks_scanned be index_blks_vacuumed? IMO it > > is more analogous to heap_blks_vacuumed. > > +1. > > > This will tell us which indexes are currently being vacuumed and the > > current progress of those operations, but it doesn't tell us which > > indexes have already been vacuumed or which ones are pending vacuum. > > VACUUM will process a table's indexes in pg_class OID order (outside > of parallel VACUUM, I suppose). See comments about sort order above > RelationGetIndexList(). Right. > > Anyway, it might be useful to add ordinal numbers to each index, that > line up with this processing/OID order. It would also be reasonable to > display the same number in log_autovacuum* (and VACUUM VERBOSE) > per-index output, to reinforce the idea. Note that we don't > necessarily display a distinct line for each distinct index in this > log output, which is why including the ordinal number there makes > sense. An alternative idea would be to show the number of indexes on the table and the number of indexes that have been processed in the leader's entry of pg_stat_progress_vacuum. Even in parallel vacuum cases, since we have index vacuum status for each index it would not be hard for the leader process to count how many indexes have been processed. Regarding the details of the progress of index vacuum, I'm not sure this progress information can fit for pg_stat_progress_vacuum. As Peter already mentioned, the behavior quite varies depending on index AM. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
Attaching the latest revision of the patch with the fixes suggested. Also ran make check and make check-world successfully. On 12/29/21, 11:51 AM, "Justin Pryzby" <pryzby@telsasoft.com> wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you canconfirm the sender and know the content is safe. http://cfbot.cputube.org/sami-imseih.html You should run "make check" and update rules.out. You should also use make check-world - usually something like: make check-world -j4 >check-world.out 2>&1 ; echo ret $? > indrelid: The relid of the index currently being vacuumed I think it should be called indexrelid not indrelid, for consistency with pg_index. > S.param10 vacuum_cycle_ordinal_position, > S.param13 index_rows_vacuumed These should both say "AS" for consistency. system_views.sql is using tabs, but should use spaces for consistency. > #include "commands/progress.h" The postgres convention is to alphabetize the includes. > /* VACCUM operation's longest index scan cycle */ VACCUM => VACUUM Ultimately you'll also need to update the docs.
Attachment
On 12/29/21, 8:44 AM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: > In "pg_stat_progress_vacuum", introduce 2 columns: > > * total_index_vacuum : This is the # of indexes that will be vacuumed. Keep in mind that if failsafe mode kicks in mid-flightto the vacuum, Postgres may choose to forgo index scans. This value will be adjusted accordingly. > * max_index_vacuum_cycle_time : The total elapsed time for a index vacuum cycle is calculated and this value will be updatedto reflect the longest vacuum cycle. Until the first cycle completes, this value will be 0. The purpose of this columnis to give the user an idea of how long an index vacuum cycle takes to complete. I think that total_index_vacuum is a good thing to have. I would expect this to usually just be the number of indexes on the table, but as you pointed out, this can be different when we are skipping indexes. My only concern with this new column is the potential for confusion when compared with the index_vacuum_count value. index_vacuum_count indicates the number of vacuum cycles completed, but total_index_vacuum indicates the number of indexes that will be vacuumed. However, the names sound like they could refer to the same thing to me. Perhaps we should rename index_vacuum_count to index_vacuum_cycles/index_vacuum_cycle_count, and the new column should be something like num_indexes_to_vacuum or index_vacuum_total. I don't think we need the max_index_vacuum_cycle_time column. While the idea is to give users a rough estimate for how long an index cycle will take, I don't think it will help generate any meaningful estimates for how much longer the vacuum operation will take. IIUC we won't have any idea how many total index vacuum cycles will be needed. Even if we did, the current cycle could take much more or much less time. Also, none of the other progress views seem to provide any timing information, which I suspect is by design to avoid inaccurate estimates. > Introduce a new view called "pg_stat_progress_vacuum_index". This view will track the progress of a worker ( or leaderPID ) while it's vacuuming an index. It will expose some key columns: > > * pid: The PID of the worker process > > * leader_pid: The PID of the leader process. This is the column that can be joined with "pg_stat_progress_vacuum". leader_pidand pid can have the same value as a leader can also perform an index vacuum. > > * indrelid: The relid of the index currently being vacuumed > > * vacuum_cycle_ordinal_position: The processing position of the index being vacuumed. This can be useful to determine howmany indexes out of the total indexes ( pg_stat_progress_vacuum.total_index_vacuum ) have been vacuumed > > * index_tuples_vacuumed: This is the number of index tuples vacuumed for the index overall. This is useful to show thatthe vacuum is actually doing work, as the # of tuples keeps increasing. Should we also provide some information for determining the progress of the current cycle? Perhaps there should be an index_tuples_vacuumed_current_cycle column that users can compare with the num_dead_tuples value in pg_stat_progress_vacuum. However, perhaps the number of tuples vacuumed in the current cycle can already be discovered via index_tuples_vacuumed % max_dead_tuples. +void +rusage_adjust(const PGRUsage *ru0, PGRUsage *ru1) +{ + if (ru1->tv.tv_usec < ru0->tv.tv_usec) + { + ru1->tv.tv_sec--; + ru1->tv.tv_usec += 1000000; + } + if (ru1->ru.ru_stime.tv_usec < ru0->ru.ru_stime.tv_usec) + { + ru1->ru.ru_stime.tv_sec--; + ru1->ru.ru_stime.tv_usec += 1000000; + } + if (ru1->ru.ru_utime.tv_usec < ru0->ru.ru_utime.tv_usec) + { + ru1->ru.ru_utime.tv_sec--; + ru1->ru.ru_utime.tv_usec += 1000000; + } +} I think this function could benefit from a comment. Without going through it line by line, it is not clear to me exactly what it is doing. I know we're still working on what exactly this stuff should look like, but I would suggest adding the documentation changes in the near future. Nathan
Thanks for the review. I am hesitant to make column name changes for obvious reasons, as it breaks existing tooling. However, I think there is areally good case to change "index_vacuum_count" as the name is confusing. "index_vacuum_cycles_completed" is the name Isuggest if we agree to rename. For the new column, "num_indexes_to_vacuum" is good with me. As far as max_index_vacuum_cycle_time goes, Besides the points you make, another reason is that until one cycle completes,this value will remain at 0. It will not be helpful data for most vacuum cases. Removing it also reduces the complexityof the patch. On 1/6/22, 2:41 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote: On 12/29/21, 8:44 AM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: > In "pg_stat_progress_vacuum", introduce 2 columns: > > * total_index_vacuum : This is the # of indexes that will be vacuumed. Keep in mind that if failsafe mode kicks inmid-flight to the vacuum, Postgres may choose to forgo index scans. This value will be adjusted accordingly. > * max_index_vacuum_cycle_time : The total elapsed time for a index vacuum cycle is calculated and this value will beupdated to reflect the longest vacuum cycle. Until the first cycle completes, this value will be 0. The purpose of thiscolumn is to give the user an idea of how long an index vacuum cycle takes to complete. I think that total_index_vacuum is a good thing to have. I would expect this to usually just be the number of indexes on the table, but as you pointed out, this can be different when we are skipping indexes. My only concern with this new column is the potential for confusion when compared with the index_vacuum_count value. index_vacuum_count indicates the number of vacuum cycles completed, but total_index_vacuum indicates the number of indexes that will be vacuumed. However, the names sound like they could refer to the same thing to me. Perhaps we should rename index_vacuum_count to index_vacuum_cycles/index_vacuum_cycle_count, and the new column should be something like num_indexes_to_vacuum or index_vacuum_total. I don't think we need the max_index_vacuum_cycle_time column. While the idea is to give users a rough estimate for how long an index cycle will take, I don't think it will help generate any meaningful estimates for how much longer the vacuum operation will take. IIUC we won't have any idea how many total index vacuum cycles will be needed. Even if we did, the current cycle could take much more or much less time. Also, none of the other progress views seem to provide any timing information, which I suspect is by design to avoid inaccurate estimates. > Introduce a new view called "pg_stat_progress_vacuum_index". This view will track the progress of a worker ( or leaderPID ) while it's vacuuming an index. It will expose some key columns: > > * pid: The PID of the worker process > > * leader_pid: The PID of the leader process. This is the column that can be joined with "pg_stat_progress_vacuum".leader_pid and pid can have the same value as a leader can also perform an index vacuum. > > * indrelid: The relid of the index currently being vacuumed > > * vacuum_cycle_ordinal_position: The processing position of the index being vacuumed. This can be useful to determinehow many indexes out of the total indexes ( pg_stat_progress_vacuum.total_index_vacuum ) have been vacuumed > > * index_tuples_vacuumed: This is the number of index tuples vacuumed for the index overall. This is useful to showthat the vacuum is actually doing work, as the # of tuples keeps increasing. Should we also provide some information for determining the progress of the current cycle? Perhaps there should be an index_tuples_vacuumed_current_cycle column that users can compare with the num_dead_tuples value in pg_stat_progress_vacuum. However, perhaps the number of tuples vacuumed in the current cycle can already be discovered via index_tuples_vacuumed % max_dead_tuples. +void +rusage_adjust(const PGRUsage *ru0, PGRUsage *ru1) +{ + if (ru1->tv.tv_usec < ru0->tv.tv_usec) + { + ru1->tv.tv_sec--; + ru1->tv.tv_usec += 1000000; + } + if (ru1->ru.ru_stime.tv_usec < ru0->ru.ru_stime.tv_usec) + { + ru1->ru.ru_stime.tv_sec--; + ru1->ru.ru_stime.tv_usec += 1000000; + } + if (ru1->ru.ru_utime.tv_usec < ru0->ru.ru_utime.tv_usec) + { + ru1->ru.ru_utime.tv_sec--; + ru1->ru.ru_utime.tv_usec += 1000000; + } +} I think this function could benefit from a comment. Without going through it line by line, it is not clear to me exactly what it is doing. I know we're still working on what exactly this stuff should look like, but I would suggest adding the documentation changes in the near future. Nathan
On 1/6/22, 6:14 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: > I am hesitant to make column name changes for obvious reasons, as it breaks existing tooling. However, I think there isa really good case to change "index_vacuum_count" as the name is confusing. "index_vacuum_cycles_completed" is the nameI suggest if we agree to rename. > > For the new column, "num_indexes_to_vacuum" is good with me. Yeah, I think we can skip renaming index_vacuum_count for now. In any case, it would probably be good to discuss that in a separate thread. Nathan
I agree, Renaming "index_vacuum_count" can be taken up in a separate discussion. I have attached the 3rd revision of the patch which also includes the documentation changes. Also attached is a renderedhtml of the docs for review. "max_index_vacuum_cycle_time" has been removed. "index_rows_vacuumed" renamed to "index_tuples_removed". "tuples" is a more consistent with the terminology used. "vacuum_cycle_ordinal_position" renamed to "index_ordinal_position". On 1/10/22, 12:30 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote: On 1/6/22, 6:14 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: > I am hesitant to make column name changes for obvious reasons, as it breaks existing tooling. However, I think thereis a really good case to change "index_vacuum_count" as the name is confusing. "index_vacuum_cycles_completed" is thename I suggest if we agree to rename. > > For the new column, "num_indexes_to_vacuum" is good with me. Yeah, I think we can skip renaming index_vacuum_count for now. In any case, it would probably be good to discuss that in a separate thread. Nathan
Attachment
On 1/10/22, 5:01 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: > I have attached the 3rd revision of the patch which also includes the documentation changes. Also attached is a renderedhtml of the docs for review. > > "max_index_vacuum_cycle_time" has been removed. > "index_rows_vacuumed" renamed to "index_tuples_removed". "tuples" is a more consistent with the terminology used. > "vacuum_cycle_ordinal_position" renamed to "index_ordinal_position". Thanks for the new version of the patch! nitpick: I get one whitespace error when applying the patch. Applying: Expose progress for the "vacuuming indexes" phase of a VACUUM operation. .git/rebase-apply/patch:44: tab in indent. Whenever <xref linkend="guc-vacuum-failsafe-age"/> is triggered, index warning: 1 line adds whitespace errors. + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>num_indexes_to_vacuum</structfield> <type>bigint</type> + </para> + <para> + The number of indexes that will be vacuumed. Only indexes with + <literal>pg_index.indisready</literal> set to "true" will be vacuumed. + Whenever <xref linkend="guc-vacuum-failsafe-age"/> is triggered, index + vacuuming will be bypassed. + </para></entry> + </row> + </tbody> + </tgroup> + </table> We may want to avoid exhaustively listing the cases when this value will be zero. I would suggest saying, "When index cleanup is skipped, this value will be zero" instead. + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>relid</structfield> <type>oid</type> + </para> + <para> + OID of the table being vacuumed. + </para></entry> + </row> Do we need to include this field? I would expect indexrelid to go here. + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>leader_pid</structfield> <type>bigint</type> + </para> + <para> + Process ID of the parallel group leader. This field is <literal>NULL</literal> + if this process is a parallel group leader or the + <literal>vacuuming indexes</literal> phase is not performed in parallel. + </para></entry> + </row> Are there cases where the parallel group leader will have an entry in this view when parallelism is enabled? + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>index_ordinal_position</structfield> <type>bigint</type> + </para> + <para> + The order in which the index is being vacuumed. Indexes are vacuumed by OID in ascending order. + </para></entry> + </row> Should we include the bit about the OID ordering? I suppose that is unlikely to change in the near future, but I don't know if it is relevant information. Also, do we need to include the "index_" prefix? This view is specific for indexes. (I have the same question for index_tuples_removed.) Should this new table go after the "VACUUM phases" table? It might make sense to keep the phases table closer to where it is referenced. + /* Advertise the number of indexes to vacuum if we are not in failsafe mode */ + if (!lazy_check_wraparound_failsafe(vacrel)) + pgstat_progress_update_param(PROGRESS_VACUUM_TOTAL_INDEX_VACUUM, vacrel->nindexes); Shouldn't this be 0 when INDEX_CLEANUP is off, too? +#define PROGRESS_VACUUM_CURRENT_INDRELID 7 +#define PROGRESS_VACUUM_LEADER_PID 8 +#define PROGRESS_VACUUM_INDEX_ORDINAL 9 +#define PROGRESS_VACUUM_TOTAL_INDEX_VACUUM 10 +#define PROGRESS_VACUUM_DEAD_TUPLES_VACUUMED 11 nitpick: I would suggest the following names to match the existing style: PROGRESS_VACUUM_NUM_INDEXES_TO_VACUUM PROGRESS_VACUUM_INDEX_LEADER_PID PROGRESS_VACUUM_INDEX_INDEXRELID PROGRESS_VACUUM_INDEX_ORDINAL_POSITION PROGRESS_VACUUM_INDEX_TUPLES_REMOVED Nathan
On 1/11/22, 1:01 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote: On 1/10/22, 5:01 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: > I have attached the 3rd revision of the patch which also includes the documentation changes. Also attached is a renderedhtml of the docs for review. > > "max_index_vacuum_cycle_time" has been removed. > "index_rows_vacuumed" renamed to "index_tuples_removed". "tuples" is a more consistent with the terminology used. > "vacuum_cycle_ordinal_position" renamed to "index_ordinal_position". Thanks for the new version of the patch! nitpick: I get one whitespace error when applying the patch. Applying: Expose progress for the "vacuuming indexes" phase of a VACUUM operation. .git/rebase-apply/patch:44: tab in indent. Whenever <xref linkend="guc-vacuum-failsafe-age"/> is triggered, index warning: 1 line adds whitespace errors. That was missed. Will fix it. + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>num_indexes_to_vacuum</structfield> <type>bigint</type> + </para> + <para> + The number of indexes that will be vacuumed. Only indexes with + <literal>pg_index.indisready</literal> set to "true" will be vacuumed. + Whenever <xref linkend="guc-vacuum-failsafe-age"/> is triggered, index + vacuuming will be bypassed. + </para></entry> + </row> + </tbody> + </tgroup> + </table> We may want to avoid exhaustively listing the cases when this value will be zero. I would suggest saying, "When index cleanup is skipped, this value will be zero" instead. What about something like "The number of indexes that are eligible for vacuuming". This covers the cases where either an individual index is skipped or the entire "index vacuuming" phase is skipped. + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>relid</structfield> <type>oid</type> + </para> + <para> + OID of the table being vacuumed. + </para></entry> + </row> Do we need to include this field? I would expect indexrelid to go here. Having indexrelid and relid makes the pg_stat_progress_vacuum_index view "self-contained". A user can lookup the index andtable being vacuumed without joining back to pg_stat_progress_vacuum. + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>leader_pid</structfield> <type>bigint</type> + </para> + <para> + Process ID of the parallel group leader. This field is <literal>NULL</literal> + if this process is a parallel group leader or the + <literal>vacuuming indexes</literal> phase is not performed in parallel. + </para></entry> + </row> Are there cases where the parallel group leader will have an entry in this view when parallelism is enabled? Yes. A parallel group leader can perform an index vacuum just like a parallel worker. If you do something like "vacuum (parallel3) ", you may have up to 4 processes vacuuming indexes. The leader + 3 workers. + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>index_ordinal_position</structfield> <type>bigint</type> + </para> + <para> + The order in which the index is being vacuumed. Indexes are vacuumed by OID in ascending order. + </para></entry> + </row> Should we include the bit about the OID ordering? I suppose that is unlikely to change in the near future, but I don't know if it is relevant information. Also, do we need to include the "index_" prefix? This view is specific for indexes. (I have the same question for index_tuples_removed.) I was on the fence about both of these as well. Will make a change to this. Should this new table go after the "VACUUM phases" table? It might make sense to keep the phases table closer to where it is referenced. I did not think that would read better. The introduction discusses both views and the "phase" table is linked from the pg_stat_progress_vacuum + /* Advertise the number of indexes to vacuum if we are not in failsafe mode */ + if (!lazy_check_wraparound_failsafe(vacrel)) + pgstat_progress_update_param(PROGRESS_VACUUM_TOTAL_INDEX_VACUUM, vacrel->nindexes); Shouldn't this be 0 when INDEX_CLEANUP is off, too? This view is only covering the "vacuum index" phase, but it should also cover index_cleanup phase as well. Will update thepatch. +#define PROGRESS_VACUUM_CURRENT_INDRELID 7 +#define PROGRESS_VACUUM_LEADER_PID 8 +#define PROGRESS_VACUUM_INDEX_ORDINAL 9 +#define PROGRESS_VACUUM_TOTAL_INDEX_VACUUM 10 +#define PROGRESS_VACUUM_DEAD_TUPLES_VACUUMED 11 nitpick: I would suggest the following names to match the existing style: PROGRESS_VACUUM_NUM_INDEXES_TO_VACUUM PROGRESS_VACUUM_INDEX_LEADER_PID PROGRESS_VACUUM_INDEX_INDEXRELID PROGRESS_VACUUM_INDEX_ORDINAL_POSITION PROGRESS_VACUUM_INDEX_TUPLES_REMOVED That looks better. Nathan
On 1/11/22, 12:33 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: > What about something like "The number of indexes that are eligible for vacuuming". > This covers the cases where either an individual index is skipped or the entire "index vacuuming" phase is skipped. Hm. I don't know if "eligible" is the right word. An index can be eligible for vacuuming but skipped because we set INDEX_CLEANUP to false. Maybe we should just stick with "The number of indexes that will be vacuumed." The only thing we may want to clarify is whether this value will change in some cases (e.g., vacuum failsafe takes effect). Nathan
(We had better avoid top-posting[1]) On Tue, Jan 11, 2022 at 10:01 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > I agree, Renaming "index_vacuum_count" can be taken up in a separate discussion. > > I have attached the 3rd revision of the patch which also includes the documentation changes. Also attached is a renderedhtml of the docs for review. Thank you for updating the patch! Regarding the new pg_stat_progress_vacuum_index view, why do we need to have a separate view? Users will have to check two views. If this view is expected to be used together with and joined to pg_stat_progress_vacuum, why don't we provide one view that has full information from the beginning? Especially, I think it's not useful that the total number of indexes to vacuum (num_indexes_to_vacuum column) and the current number of indexes that have been vacuumed (index_ordinal_position column) are shown in separate views. Also, I’m not sure how useful index_tuples_removed is; what can we infer from this value (without a total number)? Regards, [1] https://en.wikipedia.org/wiki/Posting_style#Top-posting -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On 1/11/22, 11:46 PM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote: > Regarding the new pg_stat_progress_vacuum_index view, why do we need > to have a separate view? Users will have to check two views. If this > view is expected to be used together with and joined to > pg_stat_progress_vacuum, why don't we provide one view that has full > information from the beginning? Especially, I think it's not useful > that the total number of indexes to vacuum (num_indexes_to_vacuum > column) and the current number of indexes that have been vacuumed > (index_ordinal_position column) are shown in separate views. I suppose we could add all of the new columns to pg_stat_progress_vacuum and just set columns to NULL as appropriate. But is that really better than having a separate view? > Also, I’m not sure how useful index_tuples_removed is; what can we > infer from this value (without a total number)? I think the idea was that you can compare it against max_dead_tuples and num_dead_tuples to get an estimate of the current cycle progress. Otherwise, it just shows that progress is being made. Nathan [0] https://postgr.es/m/7874FB21-FAA5-49BD-8386-2866552656C7%40amazon.com
On 1/12/22, 1:28 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote: On 1/11/22, 11:46 PM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote: > Regarding the new pg_stat_progress_vacuum_index view, why do we need > to have a separate view? Users will have to check two views. If this > view is expected to be used together with and joined to > pg_stat_progress_vacuum, why don't we provide one view that has full > information from the beginning? Especially, I think it's not useful > that the total number of indexes to vacuum (num_indexes_to_vacuum > column) and the current number of indexes that have been vacuumed > (index_ordinal_position column) are shown in separate views. > I suppose we could add all of the new columns to > pg_stat_progress_vacuum and just set columns to NULL as appropriate. > But is that really better than having a separate view? To add, since a vacuum can utilize parallel worker processes + the main vacuum process to perform index vacuuming, it madesense to separate the backends doing index vacuum/cleanup in a separate view. Besides what Nathan suggested, the only other clean option I can think of is to perhaps create a json column in pg_stat_progress_vacuumwhich will include all the new fields. My concern with this approach is that it will make usability,to flatten the json, difficult for users. > Also, I’m not sure how useful index_tuples_removed is; what can we > infer from this value (without a total number)? > I think the idea was that you can compare it against max_dead_tuples > and num_dead_tuples to get an estimate of the current cycle progress. > Otherwise, it just shows that progress is being made. The main purpose is to really show that the "index vacuum" phase is actually making progress. Note that for certain typesof indexes, i.e. GIN/GIST the number of tuples_removed will end up exceeding the number of num_dead_tuples. Nathan [0] https://postgr.es/m/7874FB21-FAA5-49BD-8386-2866552656C7%40amazon.com
Attached is the latest patch and associated documentation. This version addresses the index_ordinal_position column confusion. Rather than displaying the index position, the pg_stat_progress_vacuumview now has 2 new column(s): index_total - this column will show the total number of indexes to be vacuumed index_complete_count - this column will show the total number of indexes processed so far. In order to deal with the parallelvacuums, the parallel_workers ( planned workers ) value had to be exposed and each backends performing an index vacuum/cleanupin parallel had to advertise the number of indexes it vacuumed/cleaned. The # of indexes vacuumed for the parallelcleanup can then be derived the pg_stat_progress_vacuum view. postgres=# \d pg_stat_progress_vacuum View "pg_catalog.pg_stat_progress_vacuum" Column | Type | Collation | Nullable | Default ----------------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | relid | oid | | | phase | text | | | heap_blks_total | bigint | | | heap_blks_scanned | bigint | | | heap_blks_vacuumed | bigint | | | index_vacuum_count | bigint | | | max_dead_tuples | bigint | | | num_dead_tuples | bigint | | | index_total | bigint | | |. <<<--------------------- index_complete_count | numeric | | |. <<<--------------------- The pg_stat_progress_vacuum_index view includes: Indexrelid - the currently vacuumed index Leader_pid - the pid of the leader process. NULL if the process is the leader or vacuum is not parallel tuples_removed - the amount of indexes tuples removed. The user can use this column to see that the index vacuum has movement. postgres=# \d pg_stat_progress_vacuum_index View "pg_catalog.pg_stat_progress_vacuum_index" Column | Type | Collation | Nullable | Default ----------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | indexrelid | bigint | | | phase | text | | | leader_pid | bigint | | | tuples_removed | bigint | | | On 1/12/22, 9:52 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: On 1/12/22, 1:28 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote: On 1/11/22, 11:46 PM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote: > Regarding the new pg_stat_progress_vacuum_index view, why do we need > to have a separate view? Users will have to check two views. If this > view is expected to be used together with and joined to > pg_stat_progress_vacuum, why don't we provide one view that has full > information from the beginning? Especially, I think it's not useful > that the total number of indexes to vacuum (num_indexes_to_vacuum > column) and the current number of indexes that have been vacuumed > (index_ordinal_position column) are shown in separate views. > I suppose we could add all of the new columns to > pg_stat_progress_vacuum and just set columns to NULL as appropriate. > But is that really better than having a separate view? To add, since a vacuum can utilize parallel worker processes + the main vacuum process to perform index vacuuming, itmade sense to separate the backends doing index vacuum/cleanup in a separate view. Besides what Nathan suggested, the only other clean option I can think of is to perhaps create a json column in pg_stat_progress_vacuumwhich will include all the new fields. My concern with this approach is that it will make usability,to flatten the json, difficult for users. > Also, I’m not sure how useful index_tuples_removed is; what can we > infer from this value (without a total number)? > I think the idea was that you can compare it against max_dead_tuples > and num_dead_tuples to get an estimate of the current cycle progress. > Otherwise, it just shows that progress is being made. The main purpose is to really show that the "index vacuum" phase is actually making progress. Note that for certain typesof indexes, i.e. GIN/GIST the number of tuples_removed will end up exceeding the number of num_dead_tuples. Nathan [0] https://postgr.es/m/7874FB21-FAA5-49BD-8386-2866552656C7%40amazon.com
Attachment
My apologies. The last attachment of documentation was the wrong file. Attached is the correct documentation file. Thanks On 1/26/22, 8:07 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: Attached is the latest patch and associated documentation. This version addresses the index_ordinal_position column confusion. Rather than displaying the index position, the pg_stat_progress_vacuumview now has 2 new column(s): index_total - this column will show the total number of indexes to be vacuumed index_complete_count - this column will show the total number of indexes processed so far. In order to deal with theparallel vacuums, the parallel_workers ( planned workers ) value had to be exposed and each backends performing an indexvacuum/cleanup in parallel had to advertise the number of indexes it vacuumed/cleaned. The # of indexes vacuumed forthe parallel cleanup can then be derived the pg_stat_progress_vacuum view. postgres=# \d pg_stat_progress_vacuum View "pg_catalog.pg_stat_progress_vacuum" Column | Type | Collation | Nullable | Default ----------------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | relid | oid | | | phase | text | | | heap_blks_total | bigint | | | heap_blks_scanned | bigint | | | heap_blks_vacuumed | bigint | | | index_vacuum_count | bigint | | | max_dead_tuples | bigint | | | num_dead_tuples | bigint | | | index_total | bigint | | |. <<<--------------------- index_complete_count | numeric | | |. <<<--------------------- The pg_stat_progress_vacuum_index view includes: Indexrelid - the currently vacuumed index Leader_pid - the pid of the leader process. NULL if the process is the leader or vacuum is not parallel tuples_removed - the amount of indexes tuples removed. The user can use this column to see that the index vacuum hasmovement. postgres=# \d pg_stat_progress_vacuum_index View "pg_catalog.pg_stat_progress_vacuum_index" Column | Type | Collation | Nullable | Default ----------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | indexrelid | bigint | | | phase | text | | | leader_pid | bigint | | | tuples_removed | bigint | | | On 1/12/22, 9:52 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: On 1/12/22, 1:28 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote: On 1/11/22, 11:46 PM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote: > Regarding the new pg_stat_progress_vacuum_index view, why do we need > to have a separate view? Users will have to check two views. If this > view is expected to be used together with and joined to > pg_stat_progress_vacuum, why don't we provide one view that has full > information from the beginning? Especially, I think it's not useful > that the total number of indexes to vacuum (num_indexes_to_vacuum > column) and the current number of indexes that have been vacuumed > (index_ordinal_position column) are shown in separate views. > I suppose we could add all of the new columns to > pg_stat_progress_vacuum and just set columns to NULL as appropriate. > But is that really better than having a separate view? To add, since a vacuum can utilize parallel worker processes + the main vacuum process to perform index vacuuming,it made sense to separate the backends doing index vacuum/cleanup in a separate view. Besides what Nathan suggested, the only other clean option I can think of is to perhaps create a json column in pg_stat_progress_vacuumwhich will include all the new fields. My concern with this approach is that it will make usability,to flatten the json, difficult for users. > Also, I’m not sure how useful index_tuples_removed is; what can we > infer from this value (without a total number)? > I think the idea was that you can compare it against max_dead_tuples > and num_dead_tuples to get an estimate of the current cycle progress. > Otherwise, it just shows that progress is being made. The main purpose is to really show that the "index vacuum" phase is actually making progress. Note that for certaintypes of indexes, i.e. GIN/GIST the number of tuples_removed will end up exceeding the number of num_dead_tuples. Nathan [0] https://postgr.es/m/7874FB21-FAA5-49BD-8386-2866552656C7%40amazon.com
Attachment
Resending patch as I see the last attachment was not annotated to the commitfest entry. On 1/26/22, 8:07 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: Attached is the latest patch and associated documentation. This version addresses the index_ordinal_position column confusion. Rather than displaying the index position, the pg_stat_progress_vacuumview now has 2 new column(s): index_total - this column will show the total number of indexes to be vacuumed index_complete_count - this column will show the total number of indexes processed so far. In order to deal with theparallel vacuums, the parallel_workers ( planned workers ) value had to be exposed and each backends performing an indexvacuum/cleanup in parallel had to advertise the number of indexes it vacuumed/cleaned. The # of indexes vacuumed forthe parallel cleanup can then be derived the pg_stat_progress_vacuum view. postgres=# \d pg_stat_progress_vacuum View "pg_catalog.pg_stat_progress_vacuum" Column | Type | Collation | Nullable | Default ----------------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | relid | oid | | | phase | text | | | heap_blks_total | bigint | | | heap_blks_scanned | bigint | | | heap_blks_vacuumed | bigint | | | index_vacuum_count | bigint | | | max_dead_tuples | bigint | | | num_dead_tuples | bigint | | | index_total | bigint | | |. <<<--------------------- index_complete_count | numeric | | |. <<<--------------------- The pg_stat_progress_vacuum_index view includes: Indexrelid - the currently vacuumed index Leader_pid - the pid of the leader process. NULL if the process is the leader or vacuum is not parallel tuples_removed - the amount of indexes tuples removed. The user can use this column to see that the index vacuum hasmovement. postgres=# \d pg_stat_progress_vacuum_index View "pg_catalog.pg_stat_progress_vacuum_index" Column | Type | Collation | Nullable | Default ----------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | indexrelid | bigint | | | phase | text | | | leader_pid | bigint | | | tuples_removed | bigint | | | On 1/12/22, 9:52 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: On 1/12/22, 1:28 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote: On 1/11/22, 11:46 PM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote: > Regarding the new pg_stat_progress_vacuum_index view, why do we need > to have a separate view? Users will have to check two views. If this > view is expected to be used together with and joined to > pg_stat_progress_vacuum, why don't we provide one view that has full > information from the beginning? Especially, I think it's not useful > that the total number of indexes to vacuum (num_indexes_to_vacuum > column) and the current number of indexes that have been vacuumed > (index_ordinal_position column) are shown in separate views. > I suppose we could add all of the new columns to > pg_stat_progress_vacuum and just set columns to NULL as appropriate. > But is that really better than having a separate view? To add, since a vacuum can utilize parallel worker processes + the main vacuum process to perform index vacuuming,it made sense to separate the backends doing index vacuum/cleanup in a separate view. Besides what Nathan suggested, the only other clean option I can think of is to perhaps create a json column in pg_stat_progress_vacuumwhich will include all the new fields. My concern with this approach is that it will make usability,to flatten the json, difficult for users. > Also, I’m not sure how useful index_tuples_removed is; what can we > infer from this value (without a total number)? > I think the idea was that you can compare it against max_dead_tuples > and num_dead_tuples to get an estimate of the current cycle progress. > Otherwise, it just shows that progress is being made. The main purpose is to really show that the "index vacuum" phase is actually making progress. Note that for certaintypes of indexes, i.e. GIN/GIST the number of tuples_removed will end up exceeding the number of num_dead_tuples. Nathan [0] https://postgr.es/m/7874FB21-FAA5-49BD-8386-2866552656C7%40amazon.com
Attachment
After speaking with Nathan offline, A few changes have been made to the patch. As mentioned earlier in the thread, tracking how many indexes are processed in PARALLEL vacuum mode is not very straightforwardsince only the workers or leader process have ability to inspect the Vacuum shared parallel state. The latest version of the patch introduces a shared memory to track indexes vacuumed/cleaned by each worker ( or leader )in a PARALLEL vacuum. In order to present this data in the pg_stat_progress_vacuum view, the value of the new column "indexes_processed" is retrieved from shared memory by pg_stat_get_progress_info. For non-parallel vacuums, the value of"indexes_processed" is retrieved from the backend progress array directly. The patch also includes the changes to implement the new view pg_stat_progress_vacuum_index which exposes the index beingvacuumed/cleaned up. postgres=# \d+ pg_stat_progress_vacuum ; View "pg_catalog.pg_stat_progress_vacuum" Column | Type | Collation | Nullable | Default | Storage | Description --------------------+---------+-----------+----------+---------+----------+------------- pid | integer | | | | plain | datid | oid | | | | plain | datname | name | | | | plain | relid | oid | | | | plain | phase | text | | | | extended | heap_blks_total | bigint | | | | plain | heap_blks_scanned | bigint | | | | plain | heap_blks_vacuumed | bigint | | | | plain | index_vacuum_count | bigint | | | | plain | max_dead_tuples | bigint | | | | plain | num_dead_tuples | bigint | | | | plain | indexes_total | bigint | | | | plain | <<<-- new column indexes_processed | bigint | | | | plain | <<<-- new column <<<--- new view --->>> postgres=# \d pg_stat_progress_vacuum_index View "pg_catalog.pg_stat_progress_vacuum_index" Column | Type | Collation | Nullable | Default ----------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | indexrelid | bigint | | | leader_pid | bigint | | | phase | text | | | tuples_removed | bigint | | | On 1/26/22, 8:07 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: Attached is the latest patch and associated documentation. This version addresses the index_ordinal_position column confusion. Rather than displaying the index position, the pg_stat_progress_vacuumview now has 2 new column(s): index_total - this column will show the total number of indexes to be vacuumed index_complete_count - this column will show the total number of indexes processed so far. In order to deal with theparallel vacuums, the parallel_workers ( planned workers ) value had to be exposed and each backends performing an indexvacuum/cleanup in parallel had to advertise the number of indexes it vacuumed/cleaned. The # of indexes vacuumed forthe parallel cleanup can then be derived the pg_stat_progress_vacuum view. postgres=# \d pg_stat_progress_vacuum View "pg_catalog.pg_stat_progress_vacuum" Column | Type | Collation | Nullable | Default ----------------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | relid | oid | | | phase | text | | | heap_blks_total | bigint | | | heap_blks_scanned | bigint | | | heap_blks_vacuumed | bigint | | | index_vacuum_count | bigint | | | max_dead_tuples | bigint | | | num_dead_tuples | bigint | | | index_total | bigint | | |. <<<--------------------- index_complete_count | numeric | | |. <<<--------------------- The pg_stat_progress_vacuum_index view includes: Indexrelid - the currently vacuumed index Leader_pid - the pid of the leader process. NULL if the process is the leader or vacuum is not parallel tuples_removed - the amount of indexes tuples removed. The user can use this column to see that the index vacuum hasmovement. postgres=# \d pg_stat_progress_vacuum_index View "pg_catalog.pg_stat_progress_vacuum_index" Column | Type | Collation | Nullable | Default ----------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | indexrelid | bigint | | | phase | text | | | leader_pid | bigint | | | tuples_removed | bigint | | | On 1/12/22, 9:52 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: On 1/12/22, 1:28 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote: On 1/11/22, 11:46 PM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote: > Regarding the new pg_stat_progress_vacuum_index view, why do we need > to have a separate view? Users will have to check two views. If this > view is expected to be used together with and joined to > pg_stat_progress_vacuum, why don't we provide one view that has full > information from the beginning? Especially, I think it's not useful > that the total number of indexes to vacuum (num_indexes_to_vacuum > column) and the current number of indexes that have been vacuumed > (index_ordinal_position column) are shown in separate views. > I suppose we could add all of the new columns to > pg_stat_progress_vacuum and just set columns to NULL as appropriate. > But is that really better than having a separate view? To add, since a vacuum can utilize parallel worker processes + the main vacuum process to perform index vacuuming,it made sense to separate the backends doing index vacuum/cleanup in a separate view. Besides what Nathan suggested, the only other clean option I can think of is to perhaps create a json column in pg_stat_progress_vacuumwhich will include all the new fields. My concern with this approach is that it will make usability,to flatten the json, difficult for users. > Also, I’m not sure how useful index_tuples_removed is; what can we > infer from this value (without a total number)? > I think the idea was that you can compare it against max_dead_tuples > and num_dead_tuples to get an estimate of the current cycle progress. > Otherwise, it just shows that progress is being made. The main purpose is to really show that the "index vacuum" phase is actually making progress. Note that for certaintypes of indexes, i.e. GIN/GIST the number of tuples_removed will end up exceeding the number of num_dead_tuples. Nathan [0] https://postgr.es/m/7874FB21-FAA5-49BD-8386-2866552656C7%40amazon.com
Attachment
Attached is the latest version of the patch to deal with the changes in the recent commit aa64f23b02924724eafbd9eadbf26d85df30a12b On 2/1/22, 2:32 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: After speaking with Nathan offline, A few changes have been made to the patch. As mentioned earlier in the thread, tracking how many indexes are processed in PARALLEL vacuum mode is not very straightforwardsince only the workers or leader process have ability to inspect the Vacuum shared parallel state. The latest version of the patch introduces a shared memory to track indexes vacuumed/cleaned by each worker ( or leader) in a PARALLEL vacuum. In order to present this data in the pg_stat_progress_vacuum view, the value of the new column"indexes_processed" is retrieved from shared memory by pg_stat_get_progress_info. For non-parallel vacuums, the valueof "indexes_processed" is retrieved from the backend progress array directly. The patch also includes the changes to implement the new view pg_stat_progress_vacuum_index which exposes the index beingvacuumed/cleaned up. postgres=# \d+ pg_stat_progress_vacuum ; View "pg_catalog.pg_stat_progress_vacuum" Column | Type | Collation | Nullable | Default | Storage | Description --------------------+---------+-----------+----------+---------+----------+------------- pid | integer | | | | plain | datid | oid | | | | plain | datname | name | | | | plain | relid | oid | | | | plain | phase | text | | | | extended | heap_blks_total | bigint | | | | plain | heap_blks_scanned | bigint | | | | plain | heap_blks_vacuumed | bigint | | | | plain | index_vacuum_count | bigint | | | | plain | max_dead_tuples | bigint | | | | plain | num_dead_tuples | bigint | | | | plain | indexes_total | bigint | | | | plain | <<<-- new column indexes_processed | bigint | | | | plain | <<<-- new column <<<--- new view --->>> postgres=# \d pg_stat_progress_vacuum_index View "pg_catalog.pg_stat_progress_vacuum_index" Column | Type | Collation | Nullable | Default ----------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | indexrelid | bigint | | | leader_pid | bigint | | | phase | text | | | tuples_removed | bigint | | | On 1/26/22, 8:07 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: Attached is the latest patch and associated documentation. This version addresses the index_ordinal_position column confusion. Rather than displaying the index position, thepg_stat_progress_vacuum view now has 2 new column(s): index_total - this column will show the total number of indexes to be vacuumed index_complete_count - this column will show the total number of indexes processed so far. In order to deal withthe parallel vacuums, the parallel_workers ( planned workers ) value had to be exposed and each backends performing anindex vacuum/cleanup in parallel had to advertise the number of indexes it vacuumed/cleaned. The # of indexes vacuumedfor the parallel cleanup can then be derived the pg_stat_progress_vacuum view. postgres=# \d pg_stat_progress_vacuum View "pg_catalog.pg_stat_progress_vacuum" Column | Type | Collation | Nullable | Default ----------------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | relid | oid | | | phase | text | | | heap_blks_total | bigint | | | heap_blks_scanned | bigint | | | heap_blks_vacuumed | bigint | | | index_vacuum_count | bigint | | | max_dead_tuples | bigint | | | num_dead_tuples | bigint | | | index_total | bigint | | |. <<<--------------------- index_complete_count | numeric | | |. <<<--------------------- The pg_stat_progress_vacuum_index view includes: Indexrelid - the currently vacuumed index Leader_pid - the pid of the leader process. NULL if the process is the leader or vacuum is not parallel tuples_removed - the amount of indexes tuples removed. The user can use this column to see that the index vacuumhas movement. postgres=# \d pg_stat_progress_vacuum_index View "pg_catalog.pg_stat_progress_vacuum_index" Column | Type | Collation | Nullable | Default ----------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | indexrelid | bigint | | | phase | text | | | leader_pid | bigint | | | tuples_removed | bigint | | | On 1/12/22, 9:52 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: On 1/12/22, 1:28 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote: On 1/11/22, 11:46 PM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote: > Regarding the new pg_stat_progress_vacuum_index view, why do we need > to have a separate view? Users will have to check two views. If this > view is expected to be used together with and joined to > pg_stat_progress_vacuum, why don't we provide one view that has full > information from the beginning? Especially, I think it's not useful > that the total number of indexes to vacuum (num_indexes_to_vacuum > column) and the current number of indexes that have been vacuumed > (index_ordinal_position column) are shown in separate views. > I suppose we could add all of the new columns to > pg_stat_progress_vacuum and just set columns to NULL as appropriate. > But is that really better than having a separate view? To add, since a vacuum can utilize parallel worker processes + the main vacuum process to perform index vacuuming,it made sense to separate the backends doing index vacuum/cleanup in a separate view. Besides what Nathan suggested, the only other clean option I can think of is to perhaps create a json columnin pg_stat_progress_vacuum which will include all the new fields. My concern with this approach is that it will makeusability, to flatten the json, difficult for users. > Also, I’m not sure how useful index_tuples_removed is; what can we > infer from this value (without a total number)? > I think the idea was that you can compare it against max_dead_tuples > and num_dead_tuples to get an estimate of the current cycle progress. > Otherwise, it just shows that progress is being made. The main purpose is to really show that the "index vacuum" phase is actually making progress. Note that for certaintypes of indexes, i.e. GIN/GIST the number of tuples_removed will end up exceeding the number of num_dead_tuples. Nathan [0] https://postgr.es/m/7874FB21-FAA5-49BD-8386-2866552656C7%40amazon.com
Attachment
The change has been broken up as 3 separate patches. 0007-Expose-progress-for-the-vacuuming-indexes-and-cleani.patch - Introduces 2 new columns to pg_stat_progress_vacuum, indexes_totaland indexes_processed. These 2 columns will provide progress on the index vacuuming/cleanup. 0001-Expose-the-index-being-processed-in-the-vacuuming-in.patch - Introduces a new view called pg_stat_prgoress_vacuum_index.This view tracks the index being vacuumed/cleaned and the total number of index tuples removed. 0001-Rename-index_vacuum_count-to-index_vacuum_cycle_coun.patch - Renames the existing index_vacuum_count to index_vacuum_cycle_countin pg_stat_progress_vacuum. Due to the other changes, it makes sense to include "cycle" in the columnname to be crystal clear that the column refers to the index cycle count. Thanks On 2/10/22, 1:39 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: Attached is the latest version of the patch to deal with the changes in the recent commit aa64f23b02924724eafbd9eadbf26d85df30a12b On 2/1/22, 2:32 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: After speaking with Nathan offline, A few changes have been made to the patch. As mentioned earlier in the thread, tracking how many indexes are processed in PARALLEL vacuum mode is not very straightforwardsince only the workers or leader process have ability to inspect the Vacuum shared parallel state. The latest version of the patch introduces a shared memory to track indexes vacuumed/cleaned by each worker ( orleader ) in a PARALLEL vacuum. In order to present this data in the pg_stat_progress_vacuum view, the value of the newcolumn "indexes_processed" is retrieved from shared memory by pg_stat_get_progress_info. For non-parallel vacuums, thevalue of "indexes_processed" is retrieved from the backend progress array directly. The patch also includes the changes to implement the new view pg_stat_progress_vacuum_index which exposes the indexbeing vacuumed/cleaned up. postgres=# \d+ pg_stat_progress_vacuum ; View "pg_catalog.pg_stat_progress_vacuum" Column | Type | Collation | Nullable | Default | Storage | Description --------------------+---------+-----------+----------+---------+----------+------------- pid | integer | | | | plain | datid | oid | | | | plain | datname | name | | | | plain | relid | oid | | | | plain | phase | text | | | | extended | heap_blks_total | bigint | | | | plain | heap_blks_scanned | bigint | | | | plain | heap_blks_vacuumed | bigint | | | | plain | index_vacuum_count | bigint | | | | plain | max_dead_tuples | bigint | | | | plain | num_dead_tuples | bigint | | | | plain | indexes_total | bigint | | | | plain | <<<-- new column indexes_processed | bigint | | | | plain | <<<-- new column <<<--- new view --->>> postgres=# \d pg_stat_progress_vacuum_index View "pg_catalog.pg_stat_progress_vacuum_index" Column | Type | Collation | Nullable | Default ----------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | indexrelid | bigint | | | leader_pid | bigint | | | phase | text | | | tuples_removed | bigint | | | On 1/26/22, 8:07 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: Attached is the latest patch and associated documentation. This version addresses the index_ordinal_position column confusion. Rather than displaying the index position,the pg_stat_progress_vacuum view now has 2 new column(s): index_total - this column will show the total number of indexes to be vacuumed index_complete_count - this column will show the total number of indexes processed so far. In order to deal withthe parallel vacuums, the parallel_workers ( planned workers ) value had to be exposed and each backends performing anindex vacuum/cleanup in parallel had to advertise the number of indexes it vacuumed/cleaned. The # of indexes vacuumedfor the parallel cleanup can then be derived the pg_stat_progress_vacuum view. postgres=# \d pg_stat_progress_vacuum View "pg_catalog.pg_stat_progress_vacuum" Column | Type | Collation | Nullable | Default ----------------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | relid | oid | | | phase | text | | | heap_blks_total | bigint | | | heap_blks_scanned | bigint | | | heap_blks_vacuumed | bigint | | | index_vacuum_count | bigint | | | max_dead_tuples | bigint | | | num_dead_tuples | bigint | | | index_total | bigint | | |. <<<--------------------- index_complete_count | numeric | | |. <<<--------------------- The pg_stat_progress_vacuum_index view includes: Indexrelid - the currently vacuumed index Leader_pid - the pid of the leader process. NULL if the process is the leader or vacuum is not parallel tuples_removed - the amount of indexes tuples removed. The user can use this column to see that the index vacuumhas movement. postgres=# \d pg_stat_progress_vacuum_index View "pg_catalog.pg_stat_progress_vacuum_index" Column | Type | Collation | Nullable | Default ----------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | indexrelid | bigint | | | phase | text | | | leader_pid | bigint | | | tuples_removed | bigint | | | On 1/12/22, 9:52 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: On 1/12/22, 1:28 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote: On 1/11/22, 11:46 PM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote: > Regarding the new pg_stat_progress_vacuum_index view, why do we need > to have a separate view? Users will have to check two views. If this > view is expected to be used together with and joined to > pg_stat_progress_vacuum, why don't we provide one view that has full > information from the beginning? Especially, I think it's not useful > that the total number of indexes to vacuum (num_indexes_to_vacuum > column) and the current number of indexes that have been vacuumed > (index_ordinal_position column) are shown in separate views. > I suppose we could add all of the new columns to > pg_stat_progress_vacuum and just set columns to NULL as appropriate. > But is that really better than having a separate view? To add, since a vacuum can utilize parallel worker processes + the main vacuum process to perform index vacuuming,it made sense to separate the backends doing index vacuum/cleanup in a separate view. Besides what Nathan suggested, the only other clean option I can think of is to perhaps create a json columnin pg_stat_progress_vacuum which will include all the new fields. My concern with this approach is that it will makeusability, to flatten the json, difficult for users. > Also, I’m not sure how useful index_tuples_removed is; what can we > infer from this value (without a total number)? > I think the idea was that you can compare it against max_dead_tuples > and num_dead_tuples to get an estimate of the current cycle progress. > Otherwise, it just shows that progress is being made. The main purpose is to really show that the "index vacuum" phase is actually making progress. Note that forcertain types of indexes, i.e. GIN/GIST the number of tuples_removed will end up exceeding the number of num_dead_tuples. Nathan [0] https://postgr.es/m/7874FB21-FAA5-49BD-8386-2866552656C7%40amazon.com
Attachment
> The change has been broken up as 3 separate patches. > 0007-Expose-progress-for-the-vacuuming-indexes-and-cleani.patch - Introduces 2 new columns to pg_stat_progress_vacuum,indexes_total and indexes_processed. These 2 columns will provide progress on the index vacuuming/cleanup. > 0001-Expose-the-index-being-processed-in-the-vacuuming-in.patch - Introduces a new view called pg_stat_prgoress_vacuum_index.This view tracks the index being vacuumed/cleaned and the total number of index tuples removed. > 0001-Rename-index_vacuum_count-to-index_vacuum_cycle_coun.patch - Renames the existing index_vacuum_count to index_vacuum_cycle_countin pg_stat_progress_vacuum. Due to the other changes, it makes sense to include "cycle" in the columnname to be crystal clear that the column refers to the index cycle count. > Thanks Sending again with patch files renamed to ensure correct apply order. On 2/10/22, 1:39 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: Attached is the latest version of the patch to deal with the changes in the recent commit aa64f23b02924724eafbd9eadbf26d85df30a12b On 2/1/22, 2:32 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: After speaking with Nathan offline, A few changes have been made to the patch. As mentioned earlier in the thread, tracking how many indexes are processed in PARALLEL vacuum mode is not verystraightforward since only the workers or leader process have ability to inspect the Vacuum shared parallel state. The latest version of the patch introduces a shared memory to track indexes vacuumed/cleaned by each worker (or leader ) in a PARALLEL vacuum. In order to present this data in the pg_stat_progress_vacuum view, the value of the newcolumn "indexes_processed" is retrieved from shared memory by pg_stat_get_progress_info. For non-parallel vacuums, thevalue of "indexes_processed" is retrieved from the backend progress array directly. The patch also includes the changes to implement the new view pg_stat_progress_vacuum_index which exposes theindex being vacuumed/cleaned up. postgres=# \d+ pg_stat_progress_vacuum ; View "pg_catalog.pg_stat_progress_vacuum" Column | Type | Collation | Nullable | Default | Storage | Description --------------------+---------+-----------+----------+---------+----------+------------- pid | integer | | | | plain | datid | oid | | | | plain | datname | name | | | | plain | relid | oid | | | | plain | phase | text | | | | extended | heap_blks_total | bigint | | | | plain | heap_blks_scanned | bigint | | | | plain | heap_blks_vacuumed | bigint | | | | plain | index_vacuum_count | bigint | | | | plain | max_dead_tuples | bigint | | | | plain | num_dead_tuples | bigint | | | | plain | indexes_total | bigint | | | | plain | <<<-- new column indexes_processed | bigint | | | | plain | <<<-- new column <<<--- new view --->>> postgres=# \d pg_stat_progress_vacuum_index View "pg_catalog.pg_stat_progress_vacuum_index" Column | Type | Collation | Nullable | Default ----------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | indexrelid | bigint | | | leader_pid | bigint | | | phase | text | | | tuples_removed | bigint | | | On 1/26/22, 8:07 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: Attached is the latest patch and associated documentation. This version addresses the index_ordinal_position column confusion. Rather than displaying the index position,the pg_stat_progress_vacuum view now has 2 new column(s): index_total - this column will show the total number of indexes to be vacuumed index_complete_count - this column will show the total number of indexes processed so far. In order to dealwith the parallel vacuums, the parallel_workers ( planned workers ) value had to be exposed and each backends performingan index vacuum/cleanup in parallel had to advertise the number of indexes it vacuumed/cleaned. The # of indexesvacuumed for the parallel cleanup can then be derived the pg_stat_progress_vacuum view. postgres=# \d pg_stat_progress_vacuum View "pg_catalog.pg_stat_progress_vacuum" Column | Type | Collation | Nullable | Default ----------------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | relid | oid | | | phase | text | | | heap_blks_total | bigint | | | heap_blks_scanned | bigint | | | heap_blks_vacuumed | bigint | | | index_vacuum_count | bigint | | | max_dead_tuples | bigint | | | num_dead_tuples | bigint | | | index_total | bigint | | |. <<<--------------------- index_complete_count | numeric | | |. <<<--------------------- The pg_stat_progress_vacuum_index view includes: Indexrelid - the currently vacuumed index Leader_pid - the pid of the leader process. NULL if the process is the leader or vacuum is not parallel tuples_removed - the amount of indexes tuples removed. The user can use this column to see that the indexvacuum has movement. postgres=# \d pg_stat_progress_vacuum_index View "pg_catalog.pg_stat_progress_vacuum_index" Column | Type | Collation | Nullable | Default ----------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | indexrelid | bigint | | | phase | text | | | leader_pid | bigint | | | tuples_removed | bigint | | | On 1/12/22, 9:52 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: On 1/12/22, 1:28 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote: On 1/11/22, 11:46 PM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote: > Regarding the new pg_stat_progress_vacuum_index view, why do we need > to have a separate view? Users will have to check two views. If this > view is expected to be used together with and joined to > pg_stat_progress_vacuum, why don't we provide one view that has full > information from the beginning? Especially, I think it's not useful > that the total number of indexes to vacuum (num_indexes_to_vacuum > column) and the current number of indexes that have been vacuumed > (index_ordinal_position column) are shown in separate views. > I suppose we could add all of the new columns to > pg_stat_progress_vacuum and just set columns to NULL as appropriate. > But is that really better than having a separate view? To add, since a vacuum can utilize parallel worker processes + the main vacuum process to perform indexvacuuming, it made sense to separate the backends doing index vacuum/cleanup in a separate view. Besides what Nathan suggested, the only other clean option I can think of is to perhaps create a jsoncolumn in pg_stat_progress_vacuum which will include all the new fields. My concern with this approach is that it willmake usability, to flatten the json, difficult for users. > Also, I’m not sure how useful index_tuples_removed is; what can we > infer from this value (without a total number)? > I think the idea was that you can compare it against max_dead_tuples > and num_dead_tuples to get an estimate of the current cycle progress. > Otherwise, it just shows that progress is being made. The main purpose is to really show that the "index vacuum" phase is actually making progress. Note thatfor certain types of indexes, i.e. GIN/GIST the number of tuples_removed will end up exceeding the number of num_dead_tuples. Nathan [0] https://postgr.es/m/7874FB21-FAA5-49BD-8386-2866552656C7%40amazon.com
Attachment
On Mon, Feb 21, 2022 at 07:03:39PM +0000, Imseih (AWS), Sami wrote: > Sending again with patch files renamed to ensure correct apply order. I haven't had a chance to test this too much, but I did look through the patch set and have a couple of small comments. + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>indexes_total</structfield> <type>bigint</type> + </para> + <para> + The number of indexes to be processed in the + <literal>vacuuming indexes</literal> or <literal>cleaning up indexes</literal> phase + of the vacuum. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>indexes_processed</structfield> <type>bigint</type> + </para> + <para> + The number of indexes processed in the + <literal>vacuuming indexes</literal> or <literal>cleaning up indexes</literal> phase. + At the start of an index vacuum cycle, this value is set to <literal>0</literal>. + </para></entry> + </row> Will these be set to 0 for failsafe vacuums and vacuums with INDEX_CLEANUP turned off? +typedef struct VacWorkerProgressInfo +{ + int num_vacuums; /* number of active VACUUMS with parallel workers */ + int max_vacuums; /* max number of VACUUMS with parallel workers */ + slock_t mutex; + VacOneWorkerProgressInfo vacuums[FLEXIBLE_ARRAY_MEMBER]; +} VacWorkerProgressInfo; max_vacuums appears to just be a local copy of MaxBackends. Does this information really need to be stored here? Also, is there a strong reason for using a spinlock instead of an LWLock? +void +vacuum_worker_end(int leader_pid) +{ + SpinLockAcquire(&vacworkerprogress->mutex); + for (int i = 0; i < vacworkerprogress->num_vacuums; i++) + { + VacOneWorkerProgressInfo *vac = &vacworkerprogress->vacuums[i]; + + if (vac->leader_pid == leader_pid) + { + *vac = vacworkerprogress->vacuums[vacworkerprogress->num_vacuums - 1]; + vacworkerprogress->num_vacuums--; + SpinLockRelease(&vacworkerprogress->mutex); + break; + } + } + SpinLockRelease(&vacworkerprogress->mutex); +} I see this loop pattern in a couple of places, and it makes me wonder if this information would fit more naturally in a hash table. + if (callback) + callback(values, 3); Why does this need to be set up as a callback function? Could we just call the function if cmdtype == PROGRESS_COMMAND_VACUUM? ISTM that is pretty much all this is doing. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
+ <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>indexes_total</structfield> <type>bigint</type> + </para> + <para> + The number of indexes to be processed in the + <literal>vacuuming indexes</literal> or <literal>cleaning up indexes</literal> phase + of the vacuum. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>indexes_processed</structfield> <type>bigint</type> + </para> + <para> + The number of indexes processed in the + <literal>vacuuming indexes</literal> or <literal>cleaning up indexes</literal> phase. + At the start of an index vacuum cycle, this value is set to <literal>0</literal>. + </para></entry> + </row> > Will these be set to 0 for failsafe vacuums and vacuums with INDEX_CLEANUP > turned off? If the failsafe kicks in midway through a vacuum, the number indexes_total will not be reset to 0. If INDEX_CLEANUP is turnedoff, then the value will be 0 at the start of the vacuum. +typedef struct VacWorkerProgressInfo +{ + int num_vacuums; /* number of active VACUUMS with parallel workers */ + int max_vacuums; /* max number of VACUUMS with parallel workers */ + slock_t mutex; + VacOneWorkerProgressInfo vacuums[FLEXIBLE_ARRAY_MEMBER]; +} VacWorkerProgressInfo; > max_vacuums appears to just be a local copy of MaxBackends. Does this > information really need to be stored here? Also, is there a strong reason > for using a spinlock instead of an LWLock? First, The BTVacInfo code in backend/access/nbtree/nbtutils.c inspired this, so I wanted to follow this pattern. With thatsaid, I do see max_vacuums being redundant here, and I am inclined to replace it with a MaxBackends() call. Second, There is no strong reason to use spinlock here except I incorrectly assumed it will be better for this case. Afterreading more about this and reading up src/backend/storage/lmgr/README, an LWLock will be better. +void +vacuum_worker_end(int leader_pid) +{ + SpinLockAcquire(&vacworkerprogress->mutex); + for (int i = 0; i < vacworkerprogress->num_vacuums; i++) + { + VacOneWorkerProgressInfo *vac = &vacworkerprogress->vacuums[i]; + + if (vac->leader_pid == leader_pid) + { + *vac = vacworkerprogress->vacuums[vacworkerprogress->num_vacuums - 1]; + vacworkerprogress->num_vacuums--; + SpinLockRelease(&vacworkerprogress->mutex); + break; + } + } + SpinLockRelease(&vacworkerprogress->mutex); +} > I see this loop pattern in a couple of places, and it makes me wonder if > this information would fit more naturally in a hash table. Followed the pattern in backend/access/nbtree/nbtutils.c for this as well. Using dynahash may make sense here if it simplifiesthe code. Will look. + if (callback) + callback(values, 3); > Why does this need to be set up as a callback function? Could we just call > the function if cmdtype == PROGRESS_COMMAND_VACUUM? ISTM that is pretty > much all this is doing. The intention will be for the caller to set the callback early on in the function using the existing " if (pg_strcasecmp(cmd,"VACUUM") == 0), etc." statement. This way we avoid having to add another if/else block before tuplestore_putvaluesis called. -- Sami Imseih Amazon Web Services
On Wed, Feb 23, 2022 at 10:02 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: > If the failsafe kicks in midway through a vacuum, the number indexes_total will not be reset to 0. If INDEX_CLEANUP isturned off, then the value will be 0 at the start of the vacuum. The way that this works with num_index_scans is that we "round up" when there has been non-zero work in lazy_vacuum_all_indexes(), but not if the precheck in lazy_vacuum_all_indexes() fails. That seems like a good model to generalize from here. Note that this makes INDEX_CLEANUP=off affect num_index_scans in much the same way as a VACUUM where the failsafe kicks in very early, during the initial heap pass. That is, if the failsafe kicks in before we reach lazy_vacuum() for the first time (which is not unlikely), or even in the lazy_vacuum_all_indexes() precheck, then num_index_scans will remain at 0, just like INDEX_CLEANUP=off. The actual failsafe WARNING shows num_index_scans, possibly before it gets incremented one last time (by "rounding up"). So it's reasonably clear how this all works from that context (assuming that the autovacuum logging stuff, which reports num_index_scans, outputs a report for a table where the failsafe kicked in). -- Peter Geoghegan
On Wed, Feb 23, 2022 at 10:41:36AM -0800, Peter Geoghegan wrote: > On Wed, Feb 23, 2022 at 10:02 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: >> If the failsafe kicks in midway through a vacuum, the number indexes_total will not be reset to 0. If INDEX_CLEANUP isturned off, then the value will be 0 at the start of the vacuum. > > The way that this works with num_index_scans is that we "round up" > when there has been non-zero work in lazy_vacuum_all_indexes(), but > not if the precheck in lazy_vacuum_all_indexes() fails. That seems > like a good model to generalize from here. Note that this makes > INDEX_CLEANUP=off affect num_index_scans in much the same way as a > VACUUM where the failsafe kicks in very early, during the initial heap > pass. That is, if the failsafe kicks in before we reach lazy_vacuum() > for the first time (which is not unlikely), or even in the > lazy_vacuum_all_indexes() precheck, then num_index_scans will remain > at 0, just like INDEX_CLEANUP=off. > > The actual failsafe WARNING shows num_index_scans, possibly before it > gets incremented one last time (by "rounding up"). So it's reasonably > clear how this all works from that context (assuming that the > autovacuum logging stuff, which reports num_index_scans, outputs a > report for a table where the failsafe kicked in). I am confused. If failsafe kicks in during the middle of a vacuum, I (perhaps naively) would expect indexes_total and indexes_processed to go to zero, and I'd expect to no longer see the "vacuuming indexes" and "cleaning up indexes" phases. Otherwise, how would I know that we are now skipping indexes? Of course, you won't have any historical context about the index work done before failsafe kicked in, but IMO it is misleading to still include it in the progress view. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
> On Wed, Feb 23, 2022 at 10:02 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: >> If the failsafe kicks in midway through a vacuum, the number indexes_total will not be reset to 0. If INDEX_CLEANUPis turned off, then the value will be 0 at the start of the vacuum. > > The way that this works with num_index_scans is that we "round up" > when there has been non-zero work in lazy_vacuum_all_indexes(), but > not if the precheck in lazy_vacuum_all_indexes() fails. That seems > like a good model to generalize from here. Note that this makes > INDEX_CLEANUP=off affect num_index_scans in much the same way as a > VACUUM where the failsafe kicks in very early, during the initial heap > pass. That is, if the failsafe kicks in before we reach lazy_vacuum() > for the first time (which is not unlikely), or even in the > lazy_vacuum_all_indexes() precheck, then num_index_scans will remain > at 0, just like INDEX_CLEANUP=off. > > The actual failsafe WARNING shows num_index_scans, possibly before it > gets incremented one last time (by "rounding up"). So it's reasonably > clear how this all works from that context (assuming that the > autovacuum logging stuff, which reports num_index_scans, outputs a > report for a table where the failsafe kicked in). > I am confused. If failsafe kicks in during the middle of a vacuum, I > (perhaps naively) would expect indexes_total and indexes_processed to go to > zero, and I'd expect to no longer see the "vacuuming indexes" and "cleaning > up indexes" phases. Otherwise, how would I know that we are now skipping > indexes? Of course, you won't have any historical context about the index > work done before failsafe kicked in, but IMO it is misleading to still > include it in the progress view. Failsafe occurring in the middle of a vacuum and resetting "indexes_total" to 0 will be misleading. I am thinking that itis a better idea to expose only one column "indexes_remaining". If index_cleanup is set to OFF, the values of indexes_remaining will be 0 at the start of the vacuum. If failsafe kicks in during a vacuum in-progress, "indexes_remaining" will be calculated to 0. This approach will provide a progress based on how many indexes remaining with no ambiguity. -- Sami Imseih Amazon Web Services
> >> If the failsafe kicks in midway through a vacuum, the number indexes_total will not be reset to 0. If INDEX_CLEANUPis turned off, then the value will be 0 at the start of the vacuum. > > > > The way that this works with num_index_scans is that we "round up" > > when there has been non-zero work in lazy_vacuum_all_indexes(), but > > not if the precheck in lazy_vacuum_all_indexes() fails. That seems > > like a good model to generalize from here. Note that this makes > > INDEX_CLEANUP=off affect num_index_scans in much the same way as a > > VACUUM where the failsafe kicks in very early, during the initial heap > > pass. That is, if the failsafe kicks in before we reach lazy_vacuum() > > for the first time (which is not unlikely), or even in the > > lazy_vacuum_all_indexes() precheck, then num_index_scans will remain > > at 0, just like INDEX_CLEANUP=off. > > > > The actual failsafe WARNING shows num_index_scans, possibly before it > > gets incremented one last time (by "rounding up"). So it's reasonably > > clear how this all works from that context (assuming that the > > autovacuum logging stuff, which reports num_index_scans, outputs a > > report for a table where the failsafe kicked in). > I am confused. If failsafe kicks in during the middle of a vacuum, I > (perhaps naively) would expect indexes_total and indexes_processed to go to > zero, and I'd expect to no longer see the "vacuuming indexes" and "cleaning > up indexes" phases. Otherwise, how would I know that we are now skipping > indexes? Of course, you won't have any historical context about the index > work done before failsafe kicked in, but IMO it is misleading to still > include it in the progress view. After speaking with Nathan offline, the best forward is to reset indexes_total and indexes_processed to 0 after the startof "vacuuming indexes" or "cleaning up indexes" phase. Also, if failsafe is triggered midway through a vacuum, the values for both indexes_total and indexes_processed is (re)setto 0. Revision of the patch is attached. Below is a test that shows the output. -[ RECORD 1 ]------+------------------ pid | 4360 datid | 5 datname | postgres relid | 16399 phase | vacuuming indexes heap_blks_total | 401092 heap_blks_scanned | 211798 heap_blks_vacuumed | 158847 index_vacuum_count | 3 max_dead_tuples | 1747625 num_dead_tuples | 1747366 indexes_total | 8 <<<<--- index_vacuum_count is 3, indexes_total is 8 and indexes_processed so faris 1 indexes_processed | 1 -[ RECORD 1 ]------+-------------- pid | 4360 datid | 5 datname | postgres relid | 16399 phase | scanning heap heap_blks_total | 401092 heap_blks_scanned | 234590 heap_blks_vacuumed | 211797 index_vacuum_count | 4 max_dead_tuples | 1747625 num_dead_tuples | 752136 indexes_total | 0 <<<<--- index_vacuum_count is 4 and not in an index phase. indexes_total is 0 andindexes_processed so far is 0 indexes_processed | 0 -[ RECORD 1 ]------+------------------ pid | 4360 datid | 5 datname | postgres relid | 16399 phase | vacuuming indexes heap_blks_total | 401092 heap_blks_scanned | 264748 heap_blks_vacuumed | 211797 index_vacuum_count | 4 max_dead_tuples | 1747625 num_dead_tuples | 1747350 indexes_total | 8 indexes_processed | 6 <<<<--- index_vacuum_count is 4, indexes_total is 8 and indexes_processedso far is 6 -- Sami Imseih Amazon Web Services
Attachment
On Thu, Mar 3, 2022 at 2:08 PM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > >> If the failsafe kicks in midway through a vacuum, the number indexes_total will not be reset to 0. If INDEX_CLEANUPis turned off, then the value will be 0 at the start of the vacuum. > > > > > > The way that this works with num_index_scans is that we "round up" > > > when there has been non-zero work in lazy_vacuum_all_indexes(), but > > > not if the precheck in lazy_vacuum_all_indexes() fails. That seems > > > like a good model to generalize from here. Note that this makes > > > INDEX_CLEANUP=off affect num_index_scans in much the same way as a > > > VACUUM where the failsafe kicks in very early, during the initial heap > > > pass. That is, if the failsafe kicks in before we reach lazy_vacuum() > > > for the first time (which is not unlikely), or even in the > > > lazy_vacuum_all_indexes() precheck, then num_index_scans will remain > > > at 0, just like INDEX_CLEANUP=off. > > > > > > The actual failsafe WARNING shows num_index_scans, possibly before it > > > gets incremented one last time (by "rounding up"). So it's reasonably > > > clear how this all works from that context (assuming that the > > > autovacuum logging stuff, which reports num_index_scans, outputs a > > > report for a table where the failsafe kicked in). > > > I am confused. If failsafe kicks in during the middle of a vacuum, I > > (perhaps naively) would expect indexes_total and indexes_processed to go to > > zero, and I'd expect to no longer see the "vacuuming indexes" and "cleaning > > up indexes" phases. Otherwise, how would I know that we are now skipping > > indexes? Of course, you won't have any historical context about the index > > work done before failsafe kicked in, but IMO it is misleading to still > > include it in the progress view. > > After speaking with Nathan offline, the best forward is to reset indexes_total and indexes_processed to 0 after the startof "vacuuming indexes" or "cleaning up indexes" phase. +1 +/* + * vacuum_worker_init --- initialize this module's shared memory hash + * to track the progress of a vacuum worker + */ +void +vacuum_worker_init(void) +{ + HASHCTL info; + long max_table_size = GetMaxBackends(); + + VacuumWorkerProgressHash = NULL; + + info.keysize = sizeof(pid_t); + info.entrysize = sizeof(VacProgressEntry); + + VacuumWorkerProgressHash = ShmemInitHash("Vacuum Progress Hash", + max_table_size, + max_table_size, + &info, + HASH_ELEM | HASH_BLOBS); +} It seems to me that creating a shmem hash with max_table_size entries for parallel vacuum process tracking is too much. IIRC an old patch had parallel vacuum workers advertise its progress and changed the pg_stat_progress_vacuum view so that it aggregates the results including workers' stats. I think it’s better than the current one. Why did you change that? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
+ +/* + + * vacuum_worker_init --- initialize this module's shared memory hash + + * to track the progress of a vacuum worker + + */ + +void + +vacuum_worker_init(void) + +{ + + HASHCTL info; + + long max_table_size = GetMaxBackends(); + + + + VacuumWorkerProgressHash = NULL; + + + + info.keysize = sizeof(pid_t); + + info.entrysize = sizeof(VacProgressEntry); + + + + VacuumWorkerProgressHash = ShmemInitHash("Vacuum Progress Hash", + + + max_table_size, + + + max_table_size, + + + &info, + + + HASH_ELEM | HASH_BLOBS); + +} + It seems to me that creating a shmem hash with max_table_size entries + for parallel vacuum process tracking is too much. IIRC an old patch + had parallel vacuum workers advertise its progress and changed the + pg_stat_progress_vacuum view so that it aggregates the results + including workers' stats. I think it’s better than the current one. + Why did you change that? + Regards, I was trying to avoid a shared memory to track completed indexes, but aggregating stats does not work with parallel vacuums.This is because a parallel worker will exit before the vacuum completes causing the aggregated total to be wrong. For example Leader_pid advertises it completed 2 indexes Parallel worker advertises it completed 2 indexes When aggregating we see 4 indexes completed. After the parallel worker exits, the aggregation will show only 2 indexes completed. -- Sami Imseih Amazon Web Services
On Wed, Mar 9, 2022 at 12:41 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > + +/* > + + * vacuum_worker_init --- initialize this module's shared memory hash > + + * to track the progress of a vacuum worker > + + */ > + +void > + +vacuum_worker_init(void) > + +{ > + + HASHCTL info; > + + long max_table_size = GetMaxBackends(); > + + > + + VacuumWorkerProgressHash = NULL; > + + > + + info.keysize = sizeof(pid_t); > + + info.entrysize = sizeof(VacProgressEntry); > + + > + + VacuumWorkerProgressHash = ShmemInitHash("Vacuum Progress Hash", > + + > + max_table_size, > + + > + max_table_size, > + + > + &info, > + + > + HASH_ELEM | HASH_BLOBS); > + +} > > + It seems to me that creating a shmem hash with max_table_size entries > + for parallel vacuum process tracking is too much. IIRC an old patch > + had parallel vacuum workers advertise its progress and changed the > + pg_stat_progress_vacuum view so that it aggregates the results > + including workers' stats. I think it’s better than the current one. > + Why did you change that? > > + Regards, > > I was trying to avoid a shared memory to track completed indexes, but aggregating stats does not work with parallel vacuums.This is because a parallel worker will exit before the vacuum completes causing the aggregated total to be wrong. > > For example > > Leader_pid advertises it completed 2 indexes > Parallel worker advertises it completed 2 indexes > > When aggregating we see 4 indexes completed. > > After the parallel worker exits, the aggregation will show only 2 indexes completed. Indeed. It might have already been discussed but other than using a new shmem hash for parallel vacuum, I wonder if we can allow workers to change the leader’s progress information. It would break the assumption that the backend status entry is modified by its own backend, though. But it might help for progress updates of other parallel operations too. This essentially does the same thing as what the current patch does but it doesn't require a new shmem hash. Another idea I come up with is that the parallel vacuum leader checks PVIndStats.status and updates how many indexes are processed to its progress information. The leader can check it and update the progress information before and after index vacuuming. And possibly we can add a callback to the main loop of index AM's bulkdelete and vacuumcleanup so that the leader can periodically make it up-to-date. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
> Indeed. > It might have already been discussed but other than using a new shmem > hash for parallel vacuum, I wonder if we can allow workers to change > the leader’s progress information. It would break the assumption that > the backend status entry is modified by its own backend, though. But > it might help for progress updates of other parallel operations too. > This essentially does the same thing as what the current patch does > but it doesn't require a new shmem hash. I experimented with this idea, but it did not work. The idea would have been to create a pgstat_progress_update functionthat takes the leader pid, however infrastructure does not exist to allow one backend to manipulate another backendsbackend status array. pgstat_fetch_stat_beentry returns a local copy only. > Another idea I come up with is that the parallel vacuum leader checks > PVIndStats.status and updates how many indexes are processed to its > progress information. The leader can check it and update the progress > information before and after index vacuuming. And possibly we can add > a callback to the main loop of index AM's bulkdelete and vacuumcleanup > so that the leader can periodically make it up-to-date. > Regards, The PVIndStats idea is also one I experimented with but it did not work. The reason being the backend checking the progressneeds to do a shm_toc_lookup to access the data, but they are not prepared to do so. I have not considered the callback in the index AM's bulkdelete and vacuumcleanup, but I can imagine this is not possiblesince a leader could be busy vacuuming rather than updating counters, but I may be misunderstanding the suggestion. -- Sami Imseih Amazon Web Services
Attached is the latest revision of the patch(s). Renamed the patches correctly for Cfbot. -- Sami Imseih Amazon Web Services
Attachment
On Wed, Mar 9, 2022 at 11:35 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > Indeed. > > > It might have already been discussed but other than using a new shmem > > hash for parallel vacuum, I wonder if we can allow workers to change > > the leader’s progress information. It would break the assumption that > > the backend status entry is modified by its own backend, though. But > > it might help for progress updates of other parallel operations too. > > This essentially does the same thing as what the current patch does > > but it doesn't require a new shmem hash. > > I experimented with this idea, but it did not work. The idea would have been to create a pgstat_progress_update functionthat takes the leader pid, however infrastructure does not exist to allow one backend to manipulate another backendsbackend status array. > pgstat_fetch_stat_beentry returns a local copy only. I think if it's a better approach we can do that including adding a new infrastructure for it. > > > Another idea I come up with is that the parallel vacuum leader checks > > PVIndStats.status and updates how many indexes are processed to its > > progress information. The leader can check it and update the progress > > information before and after index vacuuming. And possibly we can add > > a callback to the main loop of index AM's bulkdelete and vacuumcleanup > > so that the leader can periodically make it up-to-date. > > > Regards, > > The PVIndStats idea is also one I experimented with but it did not work. The reason being the backend checking the progressneeds to do a shm_toc_lookup to access the data, but they are not prepared to do so. What I imagined is that the leader checks how many PVIndStats.status is PARALLEL_INDVAC_STATUS_COMPLETED and updates the result to its progress information as indexes_processed. That way, the backend checking the progress can see it. > > I have not considered the callback in the index AM's bulkdelete and vacuumcleanup, but I can imagine this is not possiblesince a leader could be busy vacuuming rather than updating counters, but I may be misunderstanding the suggestion. Checking PVIndStats.status values is cheap. Probably the leader can check it every 1GB index block, for example. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
> I think if it's a better approach we can do that including adding a > new infrastructure for it. +1 This is a beneficial idea, especially to other progress reporting, but I see this as a separate thread targeting the nextmajor version.
I took a look at the latest patch set. + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>indexes_total</structfield> <type>bigint</type> + </para> + <para> + The number of indexes to be processed in the + <literal>vacuuming indexes</literal> + or <literal>cleaning up indexes</literal> phase. It is set to + <literal>0</literal> when vacuum is not in any of these phases. + </para></entry> Could we avoid resetting it to 0 unless INDEX_CLEANUP was turned off or failsafe kicked in? It might be nice to know how many indexes the vacuum intends to process. I don't feel too strongly about this, so if it would add a lot of complexity, it might be okay as is. BTreeShmemInit(); SyncScanShmemInit(); AsyncShmemInit(); + vacuum_worker_init(); Don't we also need to add the size of the hash table to CalculateShmemSize()? + * A command type can optionally define a callback function + * which will derive Datum values rather than use values + * directly from the backends progress array. I think this can be removed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
> I took a look at the latest patch set. > + <entry role="catalog_table_entry"><para role="column_definition"> > + <structfield>indexes_total</structfield> <type>bigint</type> > + </para> > + <para> > + The number of indexes to be processed in the > + <literal>vacuuming indexes</literal> > + or <literal>cleaning up indexes</literal> phase. It is set to > + <literal>0</literal> when vacuum is not in any of these phases. > + </para></entry> > Could we avoid resetting it to 0 unless INDEX_CLEANUP was turned off or > failsafe kicked in? It might be nice to know how many indexes the vacuum > intends to process. I don't feel too strongly about this, so if it would > add a lot of complexity, it might be okay as is. Your suggestion is valid. On INDEX_CLEANUP it is set to 0 from the start and when failsafe kicks in it will be reset to 0.I Will remove the reset call for the common index vacuum path. > BTreeShmemInit(); > SyncScanShmemInit(); > AsyncShmemInit(); > + vacuum_worker_init(); > Don't we also need to add the size of the hash table to > CalculateShmemSize()? No, ShmemInitHash takes the min and max size of the hash and in turn calls ShmemInitStruct to setup the shared memory. > + * A command type can optionally define a callback function > + * which will derive Datum values rather than use values > + * directly from the backends progress array. > I think this can be removed. Good catch. -- Sami Imseih Amazon Web Services
> > BTreeShmemInit(); > > SyncScanShmemInit(); > > AsyncShmemInit(); > > + vacuum_worker_init(); > > Don't we also need to add the size of the hash table to > > CalculateShmemSize()? > No, ShmemInitHash takes the min and max size of the hash and in turn calls ShmemInitStruct to setup the shared memory. Sorry, I am wrong here. The size needs to be accounted for at startup. -- Sami Imseih Amazon Web Services
Attached v4 which includes accounting for the hash size on startup, removal of the no longer needed comment in pgstatfuncs.cand a change in both code/docs to only reset the indexes_total to 0 when failsafe is triggered. -- Sami Imseih Amazon Web Services
Attachment
On Thu, Mar 10, 2022 at 09:30:57PM +0000, Imseih (AWS), Sami wrote: > Attached v4 which includes accounting for the hash size on startup, removal of the no longer needed comment in pgstatfuncs.cand a change in both code/docs to only reset the indexes_total to 0 when failsafe is triggered. Thanks for the new patch set. +/* + * Structs for tracking shared Progress information + * amongst worker ( and leader ) processes of a vacuum. + */ nitpick: Can we remove the extra spaces in the parentheses? + if (entry != NULL) + values[PGSTAT_NUM_PROGRESS_COMMON + PROGRESS_VACUUM_INDEXES_COMPLETED] = entry->indexes_processed; What does it mean if there isn't an entry in the map? Is this actually expected, or should we ERROR instead? + /* vacuum worker progress hash table */ + max_table_size = GetMaxBackends(); + size = add_size(size, hash_estimate_size(max_table_size, + sizeof(VacProgressEntry))); I think the number of entries should be shared between VacuumWorkerProgressShmemInit() and VacuumWorkerProgressShmemSize(). Otherwise, we might update one and not the other. + /* Call the command specific function to override datum values */ + if (pg_strcasecmp(cmd, "VACUUM") == 0) + set_vaccum_worker_progress(values); I think we should elaborate a bit more in this comment. It's difficult to follow what this is doing without referencing the comment above set_vacuum_worker_progress(). IMO the patches are in decent shape, and this should likely be marked as ready-for-committer in the near future. Before doing so, I think we should check that Sawada-san is okay with moving the deeper infrastructure changes to a separate threaḋ. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
> nitpick: Can we remove the extra spaces in the parentheses? fixed > What does it mean if there isn't an entry in the map? Is this actually > expected, or should we ERROR instead? I cleaned up the code here and added comments. > I think the number of entries should be shared between > VacuumWorkerProgressShmemInit() and VacuumWorkerProgressShmemSize(). > Otherwise, we might update one and not the other. Fixed > I think we should elaborate a bit more in this comment. It's difficult to > follow what this is doing without referencing the comment above > set_vacuum_worker_progress(). More comments added I also simplified the 0002 patch as well. -- Sami Imseih Amazon Web Services
Attachment
On Sat, Mar 12, 2022 at 07:00:06AM +0000, Imseih (AWS), Sami wrote: >> nitpick: Can we remove the extra spaces in the parentheses? > > fixed > >> What does it mean if there isn't an entry in the map? Is this actually >> expected, or should we ERROR instead? > > I cleaned up the code here and added comments. > >> I think the number of entries should be shared between >> VacuumWorkerProgressShmemInit() and VacuumWorkerProgressShmemSize(). >> Otherwise, we might update one and not the other. > > Fixed > >> I think we should elaborate a bit more in this comment. It's difficult to >> follow what this is doing without referencing the comment above >> set_vacuum_worker_progress(). > > More comments added > > I also simplified the 0002 patch as well. These patches look pretty good to me. Barring additional feedback, I intend to mark this as ready-for-committer early next week. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Sat, Mar 12, 2022 at 4:00 PM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > nitpick: Can we remove the extra spaces in the parentheses? > > fixed > > > What does it mean if there isn't an entry in the map? Is this actually > > expected, or should we ERROR instead? > > I cleaned up the code here and added comments. > > > I think the number of entries should be shared between > > VacuumWorkerProgressShmemInit() and VacuumWorkerProgressShmemSize(). > > Otherwise, we might update one and not the other. > > Fixed > > > I think we should elaborate a bit more in this comment. It's difficult to > > follow what this is doing without referencing the comment above > > set_vacuum_worker_progress(). > > More comments added > > I also simplified the 0002 patch as well. I'm still unsure the current design of 0001 patch is better than other approaches we’ve discussed. Even users who don't use parallel vacuum are forced to allocate shared memory for index vacuum progress, with GetMaxBackends() entries from the beginning. Also, it’s likely to extend the progress tracking feature for other parallel operations in the future but I think the current design is not extensible. If we want to do that, we will end up creating similar things for each of them or re-creating index vacuum progress tracking feature while creating a common infra. It might not be a problem as of now but I'm concerned that introducing a feature that is not extensible and forces users to allocate additional shmem might be a blocker in the future. Looking at the precedent example, When we introduce the progress tracking feature, we implemented it in an extensible way. On the other hand, others in this thread seem to agree with this approach, so I'd like to leave it to committers. Anyway, here are some comments on v5-0001 patch: +/* in commands/vacuumparallel.c */ +extern void VacuumWorkerProgressShmemInit(void); +extern Size VacuumWorkerProgressShmemSize(void); +extern void vacuum_worker_end(int leader_pid); +extern void vacuum_worker_update(int leader_pid); +extern void vacuum_worker_end_callback(int code, Datum arg); +extern void set_vaccum_worker_progress(Datum *values); These functions' body is not in vacuumparallel.c. As the comment says, I think these functions should be implemented in vacuumparallel.c. --- +/* + * set_vaccum_worker_progress --- updates the number of indexes that have been + * vacuumed or cleaned up in a parallel vacuum. + */ +void +set_vaccum_worker_progress(Datum *values) s/vaccum/vacuum/ --- +void +set_vaccum_worker_progress(Datum *values) +{ + VacProgressEntry *entry; + int leader_pid = values[0]; I thik we should use DatumGetInt32(). --- + entry = (VacProgressEntry *) hash_search(VacuumWorkerProgressHash, &leader_pid, HASH_ENTER_NULL, &found); + + if (!entry) + elog(ERROR, "cannot allocate shared memory for vacuum worker progress"); Since we raise an error in case of out of memory, I think we can use HASH_ENTER instead of HASH_ENTER_NULL. Or do we want to emit a detailed error message here? --- + VacuumWorkerProgressHash = NULL; This line is not necessary. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
> I'm still unsure the current design of 0001 patch is better than other > approaches we’ve discussed. Even users who don't use parallel vacuum > are forced to allocate shared memory for index vacuum progress, with > GetMaxBackends() entries from the beginning. Also, it’s likely to > extend the progress tracking feature for other parallel operations in > the future but I think the current design is not extensible. If we > want to do that, we will end up creating similar things for each of > them or re-creating index vacuum progress tracking feature while > creating a common infra. It might not be a problem as of now but I'm > concerned that introducing a feature that is not extensible and forces > users to allocate additional shmem might be a blocker in the future. > Looking at the precedent example, When we introduce the progress > tracking feature, we implemented it in an extensible way. On the other > hand, others in this thread seem to agree with this approach, so I'd > like to leave it to committers. Thanks for the review! I think you make strong arguments as to why we need to take a different approach now than later. Flaws with current patch set: 1. GetMaxBackends() is a really heavy-handed overallocation of a shared memory serving a very specific purpose. 2. Going with the approach of a vacuum specific hash breaks the design of progress which is meant to be extensible. 3. Even if we go with this current approach as an interim solution, it will be a real pain in the future. With that said, v7 introduces the new infrastructure. 0001 includes the new infrastructure and 0002 takes advantage of this. This approach is the following: 1. Introduces a new API called pgstat_progress_update_param_parallel along with some others support functions. This new infrastructureis in backend_progress.c 2. There is still a shared memory involved, but the size is capped to " max_worker_processes" which is the max to how manyparallel workers can be doing work at any given time. The shared memory hash includes a st_progress_param array justlike the Backend Status array. typedef struct ProgressParallelEntry { pid_t leader_pid; int64 st_progress_param[PGSTAT_NUM_PROGRESS_PARAM]; } ProgressParallelEntry; 3. The progress update function is "pgstat_progress_update_param_parallel" and will aggregate totals reported for a specificprogress parameter For example , it can be called lie below. In the case below, PROGRESS_VACUUM_INDEXES_COMPLETED is incremented by 1 in theshared memory entry shared by the workers and leader. case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE: istat_res = vac_bulkdel_one_index(&ivinfo, istat, pvs->dead_items); pgstat_progress_update_param_parallel(pvs->shared->leader_pid, PROGRESS_VACUUM_INDEXES_COMPLETED,1); <<----- break; 4. pg_stat_get_progress_info will call a function called pgstat_progress_set_parallel which will set the parameter valueto the total from the shared memory hash. I believe this approach gives proper infrastructure for future use-cases of workers reporting progress -and- does not dothe heavy-handed shared memory allocation. -- Sami Imseih Amazon Web Services
Attachment
Attaching a v8 due to naming convention fixes and one slight change in where index_processed is set after all indexes arevacuumed. s/indexes_completed/indexes_processed/ -- Sami Imseih Amazon Web Services
Attachment
Spoke to Nathan offline and fixed some more comments/nitpicks in the patch.
Attachment
On Wed, Mar 16, 2022 at 09:47:49PM +0000, Imseih (AWS), Sami wrote: > Spoke to Nathan offline and fixed some more comments/nitpicks in the patch. I don't have any substantial comments for v9, so I think this can be marked as ready-for-committer. However, we probably should first see whether Sawada-san has any comments on the revised approach. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Sorry for the late reply. On Tue, Mar 15, 2022 at 1:20 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > I'm still unsure the current design of 0001 patch is better than other > > approaches we’ve discussed. Even users who don't use parallel vacuum > > are forced to allocate shared memory for index vacuum progress, with > > GetMaxBackends() entries from the beginning. Also, it’s likely to > > extend the progress tracking feature for other parallel operations in > > the future but I think the current design is not extensible. If we > > want to do that, we will end up creating similar things for each of > > them or re-creating index vacuum progress tracking feature while > > creating a common infra. It might not be a problem as of now but I'm > > concerned that introducing a feature that is not extensible and forces > > users to allocate additional shmem might be a blocker in the future. > > Looking at the precedent example, When we introduce the progress > > tracking feature, we implemented it in an extensible way. On the other > > hand, others in this thread seem to agree with this approach, so I'd > > like to leave it to committers. > > Thanks for the review! > > I think you make strong arguments as to why we need to take a different approach now than later. > > Flaws with current patch set: > > 1. GetMaxBackends() is a really heavy-handed overallocation of a shared memory serving a very specific purpose. > 2. Going with the approach of a vacuum specific hash breaks the design of progress which is meant to be extensible. > 3. Even if we go with this current approach as an interim solution, it will be a real pain in the future. > > With that said, v7 introduces the new infrastructure. 0001 includes the new infrastructure and 0002 takes advantage ofthis. > > This approach is the following: > > 1. Introduces a new API called pgstat_progress_update_param_parallel along with some others support functions. This newinfrastructure is in backend_progress.c > > 2. There is still a shared memory involved, but the size is capped to " max_worker_processes" which is the max to how manyparallel workers can be doing work at any given time. The shared memory hash includes a st_progress_param array justlike the Backend Status array. I think that there is a corner case where a parallel operation could not perform due to the lack of a free shared hash entry, because there is a window between a parallel worker exiting and the leader deallocating the hash table entry. BTW have we discussed another idea I mentioned before that we have the leader process periodically check the number of completed indexes and advertise it in its progress information? I'm not sure which one is better but this idea would require only changes of vacuum code and probably simpler than the current idea. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Hi, On 2022-03-16 21:47:49 +0000, Imseih (AWS), Sami wrote: > From 85c47dfb3bb72f764b9052e74a7282c19ebd9898 Mon Sep 17 00:00:00 2001 > From: "Sami Imseih (AWS)" <simseih@amazon.com> > Date: Wed, 16 Mar 2022 20:39:52 +0000 > Subject: [PATCH 1/1] Add infrastructure for parallel progress reporting > > Infrastructure to allow a parallel worker to report > progress. In a PARALLEL command, the workers and > leader can report progress using a new pgstat_progress > API. What happens if we run out of memory for hashtable entries? > +void > +pgstat_progress_update_param_parallel(int leader_pid, int index, int64 val) > +{ > + ProgressParallelEntry *entry; > + bool found; > + > + LWLockAcquire(ProgressParallelLock, LW_EXCLUSIVE); > + > + entry = (ProgressParallelEntry *) hash_search(ProgressParallelHash, &leader_pid, HASH_ENTER, &found); > + > + /* > + * If the entry is not found, set the value for the index'th member, > + * else increment the current value of the index'th member. > + */ > + if (!found) > + entry->st_progress_param[index] = val; > + else > + entry->st_progress_param[index] += val; > + > + LWLockRelease(ProgressParallelLock); > +} I think that's an absolute no-go. Adding locking to progress reporting, particularly a single central lwlock, is going to *vastly* increase the overhead incurred by progress reporting. Greetings, Andres Freund
> BTW have we discussed another idea I mentioned before that we have the > leader process periodically check the number of completed indexes and > advertise it in its progress information? I'm not sure which one is > better but this idea would require only changes of vacuum code and > probably simpler than the current idea. > Regards, If I understand correctly, to accomplish this we will need to have the leader check the number of indexes completed In the ambukdelete or amvacuumcleanup callbacks. These routines do not know about PVIndStats, and they are called by both parallel and non-parallel vacuums. From what I can see, PVIndstats will need to be passed down to these routines or pass a NULL for non-parallel vacuums. Sami
On Tue, Mar 22, 2022 at 4:27 PM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > BTW have we discussed another idea I mentioned before that we have the > > leader process periodically check the number of completed indexes and > > advertise it in its progress information? I'm not sure which one is > > better but this idea would require only changes of vacuum code and > > probably simpler than the current idea. > > > Regards, > > > If I understand correctly, to accomplish this we will need to have the leader > check the number of indexes completed In the ambukdelete or amvacuumcleanup > callbacks. These routines do not know about PVIndStats, and they are called > by both parallel and non-parallel vacuums. > > From what I can see, PVIndstats will need to be passed down to these routines > or pass a NULL for non-parallel vacuums. > Can the leader pass a callback that checks PVIndStats to ambulkdelete an amvacuumcleanup callbacks? I think that in the passed callback, the leader checks if the number of processed indexes and updates its progress information if the current progress needs to be updated. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
> Can the leader pass a callback that checks PVIndStats to ambulkdelete > an amvacuumcleanup callbacks? I think that in the passed callback, the > leader checks if the number of processed indexes and updates its > progress information if the current progress needs to be updated. Thanks for the suggestion. I looked at this option a but today and found that passing the callback will also require signature changes to the ambulkdelete and amvacuumcleanup routines. This will also require us to check after x pages have been scanned inside vacuumscan and vacuumcleanup. After x pages the callback can then update the leaders progress. I am not sure if adding additional complexity to the scan/cleanup path is justified for what this patch is attempting to do. There will also be a lag of the leader updating the progress as it must scan x amount of pages before updating. Obviously, the more Pages to the scan, the longer the lag will be. Would like to hear your thoughts on the above. Regards, Sami Imseih Amazon Web Services.
On Wed, Mar 23, 2022 at 6:57 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > Can the leader pass a callback that checks PVIndStats to ambulkdelete > > an amvacuumcleanup callbacks? I think that in the passed callback, the > > leader checks if the number of processed indexes and updates its > > progress information if the current progress needs to be updated. > > Thanks for the suggestion. > > I looked at this option a but today and found that passing the callback > will also require signature changes to the ambulkdelete and > amvacuumcleanup routines. I think it would not be a critical problem since it's a new feature. > > This will also require us to check after x pages have been > scanned inside vacuumscan and vacuumcleanup. After x pages > the callback can then update the leaders progress. > I am not sure if adding additional complexity to the scan/cleanup path > is justified for what this patch is attempting to do. > > There will also be a lag of the leader updating the progress as it > must scan x amount of pages before updating. Obviously, the more > Pages to the scan, the longer the lag will be. Fair points. On the other hand, the approach of the current patch requires more memory for progress tracking, which could fail, e.g., due to running out of hashtable entries. I think that it would be worse that the parallel operation failed to start due to not being able to track the progress than the above concerns you mentioned such as introducing additional complexity and a possible lag of progress updates. So if we go with the current approach, I think we need to make sure enough (and not too many) hash table entries. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Sorry for the late reply. > additional complexity and a possible lag of progress updates. So if we > go with the current approach, I think we need to make sure enough (and > not too many) hash table entries. The hash table can be set 4 times the size of max_worker_processes which should give more than enough padding. Note that max_parallel_maintenance_workers is what should be used, but since it's dynamic, it cannot be used to determine the size of shared memory. Regards, --- Sami Imseih Amazon Web Services
> I think that's an absolute no-go. Adding locking to progress reporting, > particularly a single central lwlock, is going to *vastly* increase the > overhead incurred by progress reporting. Sorry for the late reply. The usage of the shared memory will be limited to PARALLEL maintenance operations. For now, it will only be populated for parallel vacuums. Autovacuum for example will not be required to populate this shared memory. Regards, --- Sami Imseih Amazon Web Services
Hi, On 2022-03-29 12:25:52 +0000, Imseih (AWS), Sami wrote: > > I think that's an absolute no-go. Adding locking to progress reporting, > > particularly a single central lwlock, is going to *vastly* increase the > > overhead incurred by progress reporting. > > Sorry for the late reply. > > The usage of the shared memory will be limited > to PARALLEL maintenance operations. For now, > it will only be populated for parallel vacuums. > Autovacuum for example will not be required to > populate this shared memory. I nevertheless think that's not acceptable. The whole premise of the progress reporting infrastructure is to be low overhead. It's OK to require locking to initialize parallel progress reporting, it's definitely not ok to require locking to report progress. Leaving the locking aside, doing a hashtable lookup for each progress report is pretty expensive. Why isn't the obvious thing to do here to provide a way to associate workers with their leaders in shared memory, but to use the existing progress fields to report progress? Then, when querying progress, the leader and workers progress fields can be combined to show the overall progress? Greetings, Andres Freund
> I nevertheless think that's not acceptable. The whole premise of the progress > reporting infrastructure is to be low overhead. It's OK to require locking to > initialize parallel progress reporting, it's definitely not ok to require > locking to report progress. Fair point. > Why isn't the obvious thing to do here to provide a way to associate workers > with their leaders in shared memory, but to use the existing progress fields > to report progress? Then, when querying progress, the leader and workers > progress fields can be combined to show the overall progress? The original intent was this, however the workers can exit before the command completes and the worker progress data will be lost. This is why the shared memory was introduced. This allows the worker progress to persist for the duration of the command. Regards, Sami Imseih Amazon Web Services.
On Tue, Apr 5, 2022 at 12:42 PM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > Why isn't the obvious thing to do here to provide a way to associate workers > > with their leaders in shared memory, but to use the existing progress fields > > to report progress? Then, when querying progress, the leader and workers > > progress fields can be combined to show the overall progress? > > The original intent was this, however the workers > can exit before the command completes and the > worker progress data will be lost. > This is why the shared memory was introduced. > This allows the worker progress to persist for the duration > of the command. At the beginning of a parallel operation, we allocate a chunk of dynamic shared memory which persists even after some or all workers have exited. It's only torn down at the end of the parallel operation. That seems like the appropriate place to be storing any kind of data that needs to be propagated between parallel workers. The current patch uses the main shared memory segment, which seems unacceptable to me. -- Robert Haas EDB: http://www.enterprisedb.com
On 2022-04-05 16:42:28 +0000, Imseih (AWS), Sami wrote: > > Why isn't the obvious thing to do here to provide a way to associate workers > > with their leaders in shared memory, but to use the existing progress fields > > to report progress? Then, when querying progress, the leader and workers > > progress fields can be combined to show the overall progress? > > The original intent was this, however the workers > can exit before the command completes and the > worker progress data will be lost. Can't the progress data trivially be inferred by the fact that the worker completed?
> Can't the progress data trivially be inferred by the fact that the worker > completed? Yes, at some point, this idea was experimented with in 0004-Expose-progress-for-the-vacuuming-indexes-cleanup-ph.patch. This patch did the calculation in system_views.sql However, the view is complex and there could be some edge cases with inferring the values that lead to wrong values being reported. Regards, Sami Imseih Amazon Web Services
> At the beginning of a parallel operation, we allocate a chunk of> > dynamic shared memory which persists even after some or all workers > have exited. It's only torn down at the end of the parallel operation. > That seems like the appropriate place to be storing any kind of data > that needs to be propagated between parallel workers. The current > patch uses the main shared memory segment, which seems unacceptable to > me. Correct, DSM does track shared data. However only participating processes in the parallel vacuum can attach and lookup this data. The purpose of the main shared memory is to allow a process that Is querying the progress views to retrieve the information. Regards, Sami Imseih Amazon Web Services
On Wed, Apr 6, 2022 at 5:22 PM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > At the beginning of a parallel operation, we allocate a chunk of> > > dynamic shared memory which persists even after some or all workers > > have exited. It's only torn down at the end of the parallel operation. > > That seems like the appropriate place to be storing any kind of data > > that needs to be propagated between parallel workers. The current > > patch uses the main shared memory segment, which seems unacceptable to > > me. > > Correct, DSM does track shared data. However only participating > processes in the parallel vacuum can attach and lookup this data. > > The purpose of the main shared memory is to allow a process that > Is querying the progress views to retrieve the information. Sure, but I think that you should likely be doing what Andres recommended before: # Why isn't the obvious thing to do here to provide a way to associate workers # with their leaders in shared memory, but to use the existing progress fields # to report progress? Then, when querying progress, the leader and workers # progress fields can be combined to show the overall progress? That is, I am imagining that you would want to use DSM to propagate data from workers back to the leader and then have the leader report the data using the existing progress-reporting facilities. Now, if we really need a whole row from each worker that doesn't work, but why do we need that? -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Apr 7, 2022 at 10:20 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Apr 6, 2022 at 5:22 PM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > At the beginning of a parallel operation, we allocate a chunk of> > > > dynamic shared memory which persists even after some or all workers > > > have exited. It's only torn down at the end of the parallel operation. > > > That seems like the appropriate place to be storing any kind of data > > > that needs to be propagated between parallel workers. The current > > > patch uses the main shared memory segment, which seems unacceptable to > > > me. > > > > Correct, DSM does track shared data. However only participating > > processes in the parallel vacuum can attach and lookup this data. > > > > The purpose of the main shared memory is to allow a process that > > Is querying the progress views to retrieve the information. > > Sure, but I think that you should likely be doing what Andres > recommended before: > > # Why isn't the obvious thing to do here to provide a way to associate workers > # with their leaders in shared memory, but to use the existing progress fields > # to report progress? Then, when querying progress, the leader and workers > # progress fields can be combined to show the overall progress? > > That is, I am imagining that you would want to use DSM to propagate > data from workers back to the leader and then have the leader report > the data using the existing progress-reporting facilities. Now, if we > really need a whole row from each worker that doesn't work, but why do > we need that? +1 I also proposed the same idea before[1]. The leader can know how many indexes are processed so far by checking PVIndStats.status allocated on DSM for each index. We can have the leader check it and update the progress information before and after vacuuming one index. If we want to update the progress information more timely, probably we can pass a callback function to ambulkdelete and amvacuumcleanup so that the leader can do that periodically, e.g., every 1000 blocks, while vacuuming an index. Regards, [1] https://www.postgresql.org/message-id/CAD21AoBW6SMJ96CNoMeu%2Bf_BR4jmatPcfVA016FdD2hkLDsaTA%40mail.gmail.com -- Masahiko Sawada EDB: https://www.enterprisedb.com/
It looks like this patch got feedback from Andres and Robert with some significant design change recommendations. I'm marking the patch Returned with Feedback. Feel free to add it back to a future commitfest when a new version is ready.
Taking the above feedback, I modified the patches and I believe this approach should be acceptable. For now, I also reduced the scope to only exposing Indexes_total and indexes_completed in pg_stat_progress_vacuum. I will create a new CF entry for the new view pg_stat_progress_vacuum_index. V10-0001: This patch adds a callback to ParallelContext that could be called by the leader in vacuumparallel.c and more importantly while the leader is waiting for the parallel workers to complete inside. This ensures that the leader is continuously polling and reporting completed indexes for the life of the PARALLEL VACUUM. This covers cases where the leader completes vacuuming before the workers complete. V10-0002: This implements the indexes_total and indexes_completed columns in pg_stat_progress_vacuum. This work is now tracked in the next commitfest: https://commitfest.postgresql.org/38/3617/ Regards, Sami Imseih Amazon Web Services
Attachment
On Thu, Apr 14, 2022 at 10:32 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > Taking the above feedback, I modified the patches > and I believe this approach should be acceptable. > > For now, I also reduced the scope to only exposing > Indexes_total and indexes_completed in > pg_stat_progress_vacuum. I will create a new CF entry > for the new view pg_stat_progress_vacuum_index. > > V10-0001: This patch adds a callback to ParallelContext > that could be called by the leader in vacuumparallel.c > and more importantly while the leader is waiting > for the parallel workers to complete inside. Thank you for updating the patch! The new design looks much better to me. typedef struct ParallelWorkerInfo @@ -46,6 +49,8 @@ typedef struct ParallelContext ParallelWorkerInfo *worker; int nknown_attached_workers; bool *known_attached_workers; + ParallelProgressCallback parallel_progress_callback; + void *parallel_progress_callback_arg; } ParallelContext; I think we can pass the progress update function to WaitForParallelWorkersToFinish(), which seems simpler. And we can call the function after updating the index status to PARALLEL_INDVAC_STATUS_COMPLETED. BTW, currently we don't need a lock for touching index status since each worker touches different indexes. But after this patch, the leader will touch all index status, do we need a lock for that? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Thank you for the feedback! > I think we can pass the progress update function to > WaitForParallelWorkersToFinish(), which seems simpler. And we can call Directly passing the callback to WaitForParallelWorkersToFinish will require us to modify the function signature. To me, it seemed simpler and touches less code to have the caller set the callback in the ParallelContext. > the function after updating the index status to > PARALLEL_INDVAC_STATUS_COMPLETED. I also like this better. Will make the change. > BTW, currently we don't need a lock for touching index status since > each worker touches different indexes. But after this patch, the > leader will touch all index status, do we need a lock for that? I do not think locking is needed here. The leader and workers will continue to touch different indexes to update the status. However, if the process is a leader, it will call the function which will go through indstats and count how many Indexes have a status of PARALLEL_INDVAC_STATUS_COMPLETED. This value is then reported to the leaders backend only. Regards, Sami Imseih Amazon Web Services
> the function after updating the index status to > PARALLEL_INDVAC_STATUS_COMPLETED. > I also like this better. Will make the change. I updated the patch. The progress function is called after updating index status to PARALLEL_INDVAC_STATUS_COMPLETED. I believe all comments have been addressed at this point. Regards, Sami Imseih Amazon Web Services
Attachment
On Fri, May 6, 2022 at 4:26 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > Thank you for the feedback! > > > I think we can pass the progress update function to > > WaitForParallelWorkersToFinish(), which seems simpler. And we can call > > Directly passing the callback to WaitForParallelWorkersToFinish > will require us to modify the function signature. > > To me, it seemed simpler and touches less code to have > the caller set the callback in the ParallelContext. Okay, but if we do that, I think we should add comments about when it's used. The callback is used only when WaitForParallelWorkersToFinish(), but not when WaitForParallelWorkersToExit(). Another idea I came up with is that we can wait for all index vacuums to finish while checking and updating the progress information, and then calls WaitForParallelWorkersToFinish after confirming all index status became COMPLETED. That way, we don’t need to change the parallel query infrastructure. What do you think? > > > the function after updating the index status to > > PARALLEL_INDVAC_STATUS_COMPLETED. > > I also like this better. Will make the change. > > > BTW, currently we don't need a lock for touching index status since > > each worker touches different indexes. But after this patch, the > > leader will touch all index status, do we need a lock for that? > > I do not think locking is needed here. The leader and workers > will continue to touch different indexes to update the status. > > However, if the process is a leader, it will call the function > which will go through indstats and count how many > Indexes have a status of PARALLEL_INDVAC_STATUS_COMPLETED. > This value is then reported to the leaders backend only. I was concerned that the leader process could report the wrong progress if updating and checking index status happen concurrently. But I think it should be fine since we can read PVIndVacStatus atomically. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
> Another idea I came up with is that we can wait for all index vacuums > to finish while checking and updating the progress information, and > then calls WaitForParallelWorkersToFinish after confirming all index > status became COMPLETED. That way, we don’t need to change the > parallel query infrastructure. What do you think? Thinking about this a bit more, the idea of using WaitForParallelWorkersToFinish Will not work if you have a leader worker that is stuck on a large index. The progress will not be updated until the leader completes. Even if the parallel workers finish. What are your thought about piggybacking on the vacuum_delay_point to update progress. The leader can perhaps keep a counter to update progress every few thousand calls to vacuum_delay_point. This goes back to your original idea to keep updating progress while scanning the indexes. /* * vacuum_delay_point --- check for interrupts and cost-based delay. * * This should be called in each major loop of VACUUM processing, * typically once per page processed. */ void vacuum_delay_point(void) { --- Sami Imseih Amazon Web Services
On Fri, May 27, 2022 at 10:52 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > Another idea I came up with is that we can wait for all index vacuums > > to finish while checking and updating the progress information, and > > then calls WaitForParallelWorkersToFinish after confirming all index > > status became COMPLETED. That way, we don’t need to change the > > parallel query infrastructure. What do you think? > > Thinking about this a bit more, the idea of using > WaitForParallelWorkersToFinish > Will not work if you have a leader worker that is > stuck on a large index. The progress will not be updated > until the leader completes. Even if the parallel workers > finish. Right. > > What are your thought about piggybacking on the > vacuum_delay_point to update progress. The leader can > perhaps keep a counter to update progress every few thousand > calls to vacuum_delay_point. > > This goes back to your original idea to keep updating progress > while scanning the indexes. I think we can have the leader process wait for all index statuses to become COMPLETED before WaitForParallelWorkersToFinish(). While waiting for it, the leader can update its progress information. After the leader confirmed all index statuses became COMPLETED, it can wait for the workers to finish by WaitForParallelWorkersToFinish(). Regarding waiting in vacuum_delay_point, it might be a side effect as it’s called every page and used not only by vacuum such as analyze, but it seems to be worth trying. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Thu, May 26, 2022 at 11:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Another idea I came up with is that we can wait for all index vacuums > to finish while checking and updating the progress information, and > then calls WaitForParallelWorkersToFinish after confirming all index > status became COMPLETED. That way, we don’t need to change the > parallel query infrastructure. What do you think? +1 from me. It doesn't seem to me that we should need to add something like parallel_vacuum_progress_callback in order to solve this problem, because the parallel index vacuum code could just do the waiting itself, as you propose here. The question Sami asks him his reply is a good one, though -- who is to say that the leader only needs to update progress at the end, once it's finished the index it's handling locally? There will need to be a callback system of some kind to allow the leader to update progress as other workers finish, even if the leader is still working. I am not too sure that the idea of using the vacuum delay points is the best plan. I think we should try to avoid piggybacking on such general infrastructure if we can, and instead look for a way to tie this to something that is specific to parallel vacuum. However, I haven't studied the problem so I'm not sure whether there's a reasonable way to do that. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Jun 6, 2022 at 11:42 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, May 26, 2022 at 11:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Another idea I came up with is that we can wait for all index vacuums > > to finish while checking and updating the progress information, and > > then calls WaitForParallelWorkersToFinish after confirming all index > > status became COMPLETED. That way, we don’t need to change the > > parallel query infrastructure. What do you think? > > +1 from me. It doesn't seem to me that we should need to add something > like parallel_vacuum_progress_callback in order to solve this problem, > because the parallel index vacuum code could just do the waiting > itself, as you propose here. > > The question Sami asks him his reply is a good one, though -- who is > to say that the leader only needs to update progress at the end, once > it's finished the index it's handling locally? There will need to be a > callback system of some kind to allow the leader to update progress as > other workers finish, even if the leader is still working. I am not > too sure that the idea of using the vacuum delay points is the best > plan. I think we should try to avoid piggybacking on such general > infrastructure if we can, and instead look for a way to tie this to > something that is specific to parallel vacuum. However, I haven't > studied the problem so I'm not sure whether there's a reasonable way > to do that. One idea would be to add a flag, say report_parallel_vacuum_progress, to IndexVacuumInfo struct and expect index AM to check and update the parallel index vacuum progress, say every 1GB blocks processed. The flag is true only when the leader process is vacuuming an index. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
This entry has been waiting on author input for a while (our current threshold is roughly two weeks), so I've marked it Returned with Feedback. Once you think the patchset is ready for review again, you (or any interested party) can resurrect the patch entry by visiting https://commitfest.postgresql.org/38/3617/ and changing the status to "Needs Review", and then changing the status again to "Move to next CF". (Don't forget the second step; hopefully we will have streamlined this in the near future!) Thanks, --Jacob
> One idea would be to add a flag, say report_parallel_vacuum_progress,
> to IndexVacuumInfo struct and expect index AM to check and update the
> parallel index vacuum progress, say every 1GB blocks processed. The
> flag is true only when the leader process is vacuuming an index.
Sorry for the long delay on this. I have taken the approach as suggested
by Sawada-san and Robert and attached is v12.
1. The patch introduces a new counter in the shared memory already
used by the parallel leader and workers to keep track of the number
of indexes completed. This way there is no reason to loop through
the index status everytime we want to get the status of indexes completed.
2. A new function in vacuumparallel.c will be used to update
the progress of a indexes completed by reading from the
counter created in point #1.
3. The function is called during the vacuum_delay_point as a
matter of convenience, since it's called in all major vacuum
loops. The function will only do anything if the caller
sets a boolean to report progress. Doing so will also ensure
progress is being reported in case the parallel workers completed
before the leader.
4. Rather than adding any complexity to WaitForParallelWorkersToFinish
and introducing a new callback, vacuumparallel.c will wait until
the number of vacuum workers is 0 and then process to call
WaitForParallelWorkersToFinish as it does.
5. Went back to the idea of adding a new view called pg_stat_progress_vacuum_index
which is accomplished by adding a new type called VACUUM_PARALLEL in progress.h
Thanks,
Sami Imseih
Amazon Web Servies (AWS)
Attachment
> One idea would be to add a flag, say report_parallel_vacuum_progress, > to IndexVacuumInfo struct and expect index AM to check and update the > parallel index vacuum progress, say every 1GB blocks processed. The > flag is true only when the leader process is vacuuming an index. > Regards, Sorry for the long delay on this. I have taken the approach as suggested by Sawada-san and Robert and attached is v12. 1. The patch introduces a new counter in the same shared memory already used by the parallel leader and workers to keep track of the number of indexes completed. This way there is no reason to loop through the index status every time we want to get the status of indexes completed. 2. A new function in vacuumparallel.c will be used to update the progress of indexes completed by reading from the counter created in point #1. 3. The function is called during the vacuum_delay_point as a matter of convenience, since this is called in all major vacuum loops. The function will only do something if the caller sets a boolean to report progress. Doing so will also ensure progress is being reported in case the parallel workers completed before the leader. 4. Rather than adding any complexity to WaitForParallelWorkersToFinish and introducing a new callback, vacuumparallel.c will wait until the number of vacuum workers is 0 and then call WaitForParallelWorkersToFinish as it does currently. 5. Went back to the idea of adding a new view called pg_stat_progress_vacuum_index which is accomplished by adding a new type called VACUUM_PARALLEL in progress.h Thanks, Sami Imseih Amazon Web Servies (AWS) FYI: the above message was originally sent yesterday but was created under a separate thread. Please ignore this thread[1] [1]: https://www.postgresql.org/message-id/flat/4CD97E17-B9E4-421E-9A53-4317C90EFF35%40amazon.com
Attachment
On Tue, Oct 11, 2022 at 10:50 PM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > One idea would be to add a flag, say report_parallel_vacuum_progress, > > to IndexVacuumInfo struct and expect index AM to check and update the > > parallel index vacuum progress, say every 1GB blocks processed. The > > flag is true only when the leader process is vacuuming an index. > > > Regards, > > Sorry for the long delay on this. I have taken the approach as suggested > by Sawada-san and Robert and attached is v12. Thank you for updating the patch! > > 1. The patch introduces a new counter in the same shared memory already > used by the parallel leader and workers to keep track of the number > of indexes completed. This way there is no reason to loop through > the index status every time we want to get the status of indexes completed. While it seems to be a good idea to have the atomic counter for the number of indexes completed, I think we should not use the global variable referencing the counter as follow: +static pg_atomic_uint32 *index_vacuum_completed = NULL; : +void +parallel_vacuum_progress_report(void) +{ + if (IsParallelWorker() || !report_parallel_vacuum_progress) + return; + + pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED, + pg_atomic_read_u32(index_vacuum_completed)); +} I think we can pass ParallelVacuumState (or PVIndStats) to the reporting function so that it can check the counter and report the progress. > 2. A new function in vacuumparallel.c will be used to update > the progress of indexes completed by reading from the > counter created in point #1. > > 3. The function is called during the vacuum_delay_point as a > matter of convenience, since this is called in all major vacuum > loops. The function will only do something if the caller > sets a boolean to report progress. Doing so will also ensure > progress is being reported in case the parallel workers completed > before the leader. Robert pointed out: --- I am not too sure that the idea of using the vacuum delay points is the best plan. I think we should try to avoid piggybacking on such general infrastructure if we can, and instead look for a way to tie this to something that is specific to parallel vacuum. --- I agree with this part. Instead, I think we can add a boolean and the pointer for ParallelVacuumState to IndexVacuumInfo. If the boolean is true, an index AM can call the reporting function with the pointer to ParallelVacuumState while scanning index blocks, for example, for every 1GB. The boolean can be true only for the leader process. > > 4. Rather than adding any complexity to WaitForParallelWorkersToFinish > and introducing a new callback, vacuumparallel.c will wait until > the number of vacuum workers is 0 and then call > WaitForParallelWorkersToFinish as it does currently. Agreed, but with the following change, the leader process waits in a busy loop, which should not be acceptable: + if (VacuumActiveNWorkers) + { + while (pg_atomic_read_u32(VacuumActiveNWorkers) > 0) + { + parallel_vacuum_progress_report(); + } + } + Also, I think it's better to check whether idx_completed_progress equals to the number indexes instead. > 5. Went back to the idea of adding a new view called pg_stat_progress_vacuum_index > which is accomplished by adding a new type called VACUUM_PARALLEL in progress.h Probably, we can devide the patch into two patches. One for adding the new statistics of the number of vacuumed indexes to pg_stat_progress_vacuum and another one for adding new statistics view pg_stat_progress_vacuum_index. Regards, -- Masahiko Sawada PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Thank you for the feedback! > While it seems to be a good idea to have the atomic counter for the > number of indexes completed, I think we should not use the global > variable referencing the counter as follow: > +static pg_atomic_uint32 *index_vacuum_completed = NULL; > : > +void > +parallel_vacuum_progress_report(void) > +{ > + if (IsParallelWorker() || !report_parallel_vacuum_progress) > + return; > + > + pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED, > + pg_atomic_read_u32(index_vacuum_completed)); > +} > I think we can pass ParallelVacuumState (or PVIndStats) to the > reporting function so that it can check the counter and report the > progress. Yes, that makes sense. > --- > I am not too sure that the idea of using the vacuum delay points is the best > plan. I think we should try to avoid piggybacking on such general > infrastructure if we can, and instead look for a way to tie this to > something that is specific to parallel vacuum. > --- Adding the call to vacuum_delay_point made sense since it's called in all major vacuum scans. While it is also called by analyze, it will only do anything if the caller sets a flag to report_parallel_vacuum_progress. > Instead, I think we can add a boolean and the pointer for > ParallelVacuumState to IndexVacuumInfo. If the boolean is true, an > index AM can call the reporting function with the pointer to > ParallelVacuumState while scanning index blocks, for example, for > every 1GB. The boolean can be true only for the leader process. Will doing this every 1GB be necessary? Considering the function will not do much more than update progress from the value stored in index_vacuum_completed? > Agreed, but with the following change, the leader process waits in a > busy loop, which should not be acceptable: Good point. > Also, I think it's better to check whether idx_completed_progress equals to the number indexes instead. Good point > 5. Went back to the idea of adding a new view called pg_stat_progress_vacuum_index > which is accomplished by adding a new type called VACUUM_PARALLEL in progress.h > Probably, we can devide the patch into two patches. One for adding the Makes sense. Thanks Sami Imseih Amazon Web Services (AWS)
Attached is v13-0001--Show-progress-for-index-vacuums.patch which addresses the latest comments. The main changes are: 1/ Call the parallel_vacuum_progress_report inside the AMs rather than vacuum_delay_point. 2/ A Boolean when set to True in vacuumparallel.c will be used to determine if calling parallel_vacuum_progress_report is necessary. 3/ Removed global varilable from vacuumparallel.c 4/ Went back to calling parallel_vacuum_progress_report inside WaitForParallelWorkersToFinish to cover the case when a leader is waiting for parallel workers to finish. 5/ I did not see a need to only report progress after 1GB as it's a fairly cheap call to update progress. 6/ v1-0001-Function-to-return-currently-vacuumed-or-cleaned-ind.patch is a separate patch for exposing the index relid being vacuumed by a backend. Thanks Sami Imseih Amazon Web Services (AWS)
Attachment
2022年11月3日(木) 1:52 Imseih (AWS), Sami <simseih@amazon.com>: > > Attached is v13-0001--Show-progress-for-index-vacuums.patch which addresses > the latest comments. The main changes are: > > 1/ Call the parallel_vacuum_progress_report inside the AMs rather than vacuum_delay_point. > > 2/ A Boolean when set to True in vacuumparallel.c will be used to determine if calling > parallel_vacuum_progress_report is necessary. > > 3/ Removed global varilable from vacuumparallel.c > > 4/ Went back to calling parallel_vacuum_progress_report inside > WaitForParallelWorkersToFinish to cover the case when a > leader is waiting for parallel workers to finish. > > 5/ I did not see a need to only report progress after 1GB as it's a fairly cheap call to update > progress. > > 6/ v1-0001-Function-to-return-currently-vacuumed-or-cleaned-ind.patch is a separate patch > for exposing the index relid being vacuumed by a backend. This entry was marked "Needs review" in the CommitFest app but cfbot reports the patch [1] no longer applies. [1] this patch: v1-0001-Function-to-return-currently-vacuumed-or-cleaned-ind.patch We've marked it as "Waiting on Author". As CommitFest 2022-11 is currently underway, this would be an excellent time update the patch. Once you think the patchset is ready for review again, you (or any interested party) can move the patch entry forward by visiting https://commitfest.postgresql.org/40/3617/ and changing the status to "Needs review". Thanks Ian Barwick
Resubmitting patches with proper format. Thanks Sami Imseih Amazon Web Services (AWS)
Attachment
On Thu, Nov 3, 2022 at 1:52 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > Attached is v13-0001--Show-progress-for-index-vacuums.patch which addresses > the latest comments. Thank you for updating the patch! > 4/ Went back to calling parallel_vacuum_progress_report inside > WaitForParallelWorkersToFinish to cover the case when a > leader is waiting for parallel workers to finish. I don't think we need to modify WaitForParallelWorkersToFinish to cover that case. Instead, I think the leader process can execute a new function. The function will be like WaitForParallelWorkersToFinish() but simpler; it just updates the progress information if necessary and then checks if idx_completed_progress is equal to the number of indexes to vacuum. If yes, return from the function and call WaitForParallelWorkersToFinish() to wait for all workers to finish. Otherwise, it naps by using WaitLatch() and does this loop again. --- @@ -46,6 +46,8 @@ typedef struct ParallelContext ParallelWorkerInfo *worker; int nknown_attached_workers; bool *known_attached_workers; + void (*parallel_progress_callback)(void *arg); + void *parallel_progress_arg; } ParallelContext; With the above change I suggested, I think we won't need to have a callback function in ParallelContext. Instead, I think we can have index-AMs call parallel_vacuum_report() if report_parallel_progress is true. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi, On 2022-11-04 13:27:34 +0000, Imseih (AWS), Sami wrote: > diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c > index b4fa5f6bf8..3d5e4600dc 100644 > --- a/src/backend/access/gin/ginvacuum.c > +++ b/src/backend/access/gin/ginvacuum.c > @@ -633,6 +633,9 @@ ginbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats, > UnlockReleaseBuffer(buffer); > buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno, > RBM_NORMAL, info->strategy); > + > + if (info->report_parallel_progress) > + info->parallel_progress_callback(info->parallel_progress_arg); > } > > /* right now we found leftmost page in entry's BTree */ I don't think any of these progress callbacks should be done while pinning a buffer and ... > @@ -677,6 +680,9 @@ ginbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats, > buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno, > RBM_NORMAL, info->strategy); > LockBuffer(buffer, GIN_EXCLUSIVE); > + > + if (info->report_parallel_progress) > + info->parallel_progress_callback(info->parallel_progress_arg); > } > > MemoryContextDelete(gvs.tmpCxt); ... definitely not while holding a buffer lock. I also don't understand why info->parallel_progress_callback exists? It's only set to parallel_vacuum_progress_report(). Why make this stuff more expensive than it has to already be? > +parallel_vacuum_progress_report(void *arg) > +{ > + ParallelVacuumState *pvs = (ParallelVacuumState *) arg; > + > + if (IsParallelWorker()) > + return; > + > + pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED, > + pg_atomic_read_u32(&(pvs->shared->idx_completed_progress))); > +} So each of the places that call this need to make an additional external function call for each page, just to find that there's nothing to do or to make yet another indirect function call. This should probably a static inline function. This is called, for every single page, just to read the number of indexes completed by workers? A number that barely ever changes? This seems all wrong. Greetings, Andres Freund
> I don't think any of these progress callbacks should be done while pinning a > buffer and ... Good point. > I also don't understand why info->parallel_progress_callback exists? It's only > set to parallel_vacuum_progress_report(). Why make this stuff more expensive > than it has to already be? Agree. Modified the patch to avoid the callback . > So each of the places that call this need to make an additional external > function call for each page, just to find that there's nothing to do or to > make yet another indirect function call. This should probably a static inline > function. Even better to just remove a function call altogether. > This is called, for every single page, just to read the number of indexes > completed by workers? A number that barely ever changes? I will take the initial suggestion by Sawada-san to update the progress every 1GB of blocks scanned. Also, It sems to me that we don't need to track progress in brin indexes, Since it is rare, if ever, this type of index will go through very heavy block scans. In fact, I noticed the vacuum AMs for brin don't invoke the vacuum_delay_point at all. The attached patch addresses the feedback. Regards, Sami Imseih Amazon Web Services (AWS)
Attachment
On Sat, Nov 12, 2022 at 4:10 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > I don't think any of these progress callbacks should be done while pinning a > > buffer and ... > > Good point. > > > I also don't understand why info->parallel_progress_callback exists? It's only > > set to parallel_vacuum_progress_report(). Why make this stuff more expensive > > than it has to already be? > > Agree. Modified the patch to avoid the callback . > > > So each of the places that call this need to make an additional external > > function call for each page, just to find that there's nothing to do or to > > make yet another indirect function call. This should probably a static inline > > function. > > Even better to just remove a function call altogether. > > > This is called, for every single page, just to read the number of indexes > > completed by workers? A number that barely ever changes? > > I will take the initial suggestion by Sawada-san to update the progress > every 1GB of blocks scanned. > > Also, It sems to me that we don't need to track progress in brin indexes, > Since it is rare, if ever, this type of index will go through very heavy > block scans. In fact, I noticed the vacuum AMs for brin don't invoke the > vacuum_delay_point at all. > > The attached patch addresses the feedback. > Thank you for updating the patch! Here are review comments on v15 patch: + <para> + Number of indexes that wil be vacuumed. This value will be + <literal>0</literal> if there are no indexes to vacuum or + vacuum failsafe is triggered. I think that indexes_total should be 0 also when INDEX_CLEANUP is off. --- + /* + * Reset the indexes completed at this point. + * If we end up in another index vacuum cycle, we will + * start counting from the start. + */ + pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED, 0); I think we don't need to reset it at the end of index vacuuming. There is a small window before switching to the next phase. If we reset this value while showing "index vacuuming" phase, the user might get confused. Instead, we can reset it at the beginning of the index vacuuming. --- +void +parallel_wait_for_workers_to_finish(ParallelVacuumState *pvs) +{ + while (pg_atomic_read_u32(&(pvs->shared->idx_completed_progress)) < pvs->nindexes) + { + pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED, + pg_atomic_read_u32(&(pvs->shared->idx_completed_progress))); + + (void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, -1, + WAIT_EVENT_PARALLEL_FINISH); + ResetLatch(MyLatch); + } +} We should add CHECK_FOR_INTERRUPTS() at the beginning of the loop to make the wait interruptible. I think it would be better to update the counter only when the value has been increased. I think we should set a timeout, say 1 sec, to WaitLatch so that it can periodically check the progress. Probably it's better to have a new wait event for this wait in order to distinguish wait for workers to complete index vacuum from the wait for workers to exit. --- @@ -838,7 +867,12 @@ parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel, ivinfo.estimated_count = pvs->shared->estimated_count; ivinfo.num_heap_tuples = pvs->shared->reltuples; ivinfo.strategy = pvs->bstrategy; - + ivinfo.idx_completed_progress = pvs->shared->idx_completed_progress; and @@ -998,6 +998,9 @@ btvacuumscan(IndexVacuumInfo *info, IndexBulkDeleteResult *stats, if (info->report_progress) pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE, scanblkno); + if (info->report_parallel_progress && (scanblkno % REPORT_PARALLEL_VACUUM_EVERY_PAGES) == 0) + pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED, + pg_atomic_read_u32(&(info->idx_completed_progress))); } I think this doesn't work, since ivinfo.idx_completed is in the backend-local memory. Instead, I think we can have a function in vacuumparallel.c that updates the progress. Then we can have index AM call this function. --- + if (!IsParallelWorker()) + ivinfo.report_parallel_progress = true; + else + ivinfo.report_parallel_progress = false; We can do like: ivinfo.report_parallel_progress = !IsParallelWorker(); Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
> I think that indexes_total should be 0 also when INDEX_CLEANUP is off. Patch updated for handling of INDEX_CLEANUP = off, with an update to the documentation as well. > I think we don't need to reset it at the end of index vacuuming. There > is a small window before switching to the next phase. If we reset this > value while showing "index vacuuming" phase, the user might get > confused. Instead, we can reset it at the beginning of the index > vacuuming. No, I think the way it's currently done is correct. We want to reset the number of indexes completed before we increase the num_index_scans ( index vacuum cycle ). This ensures that when we enter a new index cycle, the number of indexes completed are already reset. The 2 fields that matter here is how many indexes are vacuumed in the currently cycle and this is called out in the documentation as such. > We should add CHECK_FOR_INTERRUPTS() at the beginning of the loop to > make the wait interruptible. Done. > I think it would be better to update the counter only when the value > has been increased. Done. Did so by checking the progress value from the beentry directly in the function. We can do a more generalized > I think we should set a timeout, say 1 sec, to WaitLatch so that it > can periodically check the progress. Done. > Probably it's better to have a new wait event for this wait in order > to distinguish wait for workers to complete index vacuum from the wait > for workers to exit. I was trying to avoid introducing a new wait event, but thinking about it, I agree with your suggestion. I created a new ParallelVacuumFinish wait event and documentation for the wait event. > I think this doesn't work, since ivinfo.idx_completed is in the > backend-local memory. Instead, I think we can have a function in > vacuumparallel.c that updates the progress. Then we can have index AM > call this function. Yeah, you're absolutely correct. To make this work correctly, I did something similar to VacuumActiveNWorkers and introduced an extern variable called ParallelVacuumProgress. This variable points to pvs->shared->idx_completed_progress. The index AMs then call parallel_vacuum_update_progress which Is responsible for updating the progress with the current value of ParallelVacuumProgress. ParallelVacuumProgress is also set to NULL at the end of every index cycle. > ivinfo.report_parallel_progress = !IsParallelWorker(); Done Regards, Sami Imseih Amazon Web Services (AWS)
Attachment
Hi, Thank you for updating the patch! On Tue, Nov 29, 2022 at 8:57 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > I think that indexes_total should be 0 also when INDEX_CLEANUP is off. > > Patch updated for handling of INDEX_CLEANUP = off, with an update to > the documentation as well. > > > I think we don't need to reset it at the end of index vacuuming. There > > is a small window before switching to the next phase. If we reset this > > value while showing "index vacuuming" phase, the user might get > > confused. Instead, we can reset it at the beginning of the index > > vacuuming. > > No, I think the way it's currently done is correct. We want to reset the number > of indexes completed before we increase the num_index_scans ( index vacuum cycle ). > This ensures that when we enter a new index cycle, the number of indexes completed > are already reset. The 2 fields that matter here is how many indexes are vacuumed in the > currently cycle and this is called out in the documentation as such. > Agreed. Here are comments on v16 patch. --- a/contrib/bloom/blvacuum.c +++ b/contrib/bloom/blvacuum.c @@ -15,12 +15,14 @@ #include "access/genam.h" #include "bloom.h" #include "catalog/storage.h" +#include "commands/progress.h" #include "commands/vacuum.h" #include "miscadmin.h" #include "postmaster/autovacuum.h" #include "storage/bufmgr.h" #include "storage/indexfsm.h" #include "storage/lmgr.h" +#include "utils/backend_progress.h" I think we don't need to include them here. Probably the same is true for other index AMs. --- vacuum_delay_point(); + if (info->report_parallel_progress && (blkno % REPORT_PARALLEL_VACUUM_EVERY_PAGES) == 0) + parallel_vacuum_update_progress(); + buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno, There is an extra new line. --- + <row> + <entry><literal>ParallelVacuumFinish</literal></entry> + <entry>Waiting for parallel vacuum workers to finish computing.</entry> + </row> How about "Waiting for parallel vacuum workers to finish index vacuum"? --- vacrel->rel = rel; vac_open_indexes(vacrel->rel, RowExclusiveLock, &vacrel->nindexes, &vacrel->indrels); + if (instrument && vacrel->nindexes > 0) { /* Copy index names used by instrumentation (not error reporting) */ This added line is not necessary. --- /* Counter for vacuuming and cleanup */ pg_atomic_uint32 idx; + + /* + * Counter for vacuuming and cleanup progress reporting. + * This value is used to report index vacuum/cleanup progress + * in parallel_vacuum_progress_report. We keep this + * counter to avoid having to loop through + * ParallelVacuumState->indstats to determine the number + * of indexes completed. + */ + pg_atomic_uint32 idx_completed_progress; I think the name of idx_completed_progress is very confusing. Since the idx of PVShared refers to the current index in the pvs->indstats[] whereas idx_completed_progress is the number of vacuumed indexes. How about "nindexes_completed"? --- + /* + * Set the shared parallel vacuum progress. This will be used + * to periodically update progress.h with completed indexes + * in a parallel vacuum. See comments in parallel_vacuum_update_progress + * for more details. + */ + ParallelVacuumProgress = &(pvs->shared->idx_completed_progress); + Since we pass pvs to parallel_wait_for_workers_to_finish(), we don't need to have ParallelVacuumProgress. I see parallel_vacuum_update_progress() uses this value but I think it's better to pass ParallelVacuumState to via IndexVacuumInfo. --- + /* + * To wait for parallel workers to finish, + * first call parallel_wait_for_workers_to_finish + * which is responsible for reporting the + * number of indexes completed. + * + * Afterwards, WaitForParallelWorkersToFinish is called + * to do the real work of waiting for parallel workers + * to finish. + * + * Note: Both routines will acquire a WaitLatch in their + * respective loops. + */ How about something like: Wait for all indexes to be vacuumed while updating the parallel vacuum index progress. And then wait for all workers to finish. --- RelationGetRelationName(indrel)); } + if (ivinfo.report_parallel_progress) + parallel_vacuum_update_progress(); + I think it's better to update the progress info after updating pvs->shared->idx_completed_progress. --- +/* + * Check if we are done vacuuming indexes and report + * progress. How about "Waiting for all indexes to be vacuumed while updating the parallel index vacuum progress"? + * + * We nap using with a WaitLatch to avoid a busy loop. + * + * Note: This function should be used by the leader process only, + * and it's up to the caller to ensure this. + */ I think these comments are not necessary. +void +parallel_wait_for_workers_to_finish(ParallelVacuumState *pvs) How about "parallel_vacuum_wait_to_finish"? --- +/* + * Read the shared ParallelVacuumProgress and update progress.h + * with indexes vacuumed so far. This function is called periodically + * by index AMs as well as parallel_vacuum_process_one_index. + * + * To avoid unnecessarily updating progress, we check the progress + * values from the backend entry and only update if the value + * of completed indexes increases. + * + * Note: This function should be used by the leader process only, + * and it's up to the caller to ensure this. + */ +void +parallel_vacuum_update_progress(void) +{ + volatile PgBackendStatus *beentry = MyBEEntry; + + Assert(!IsParallelWorker); + + if (beentry && ParallelVacuumProgress) + { + int parallel_vacuum_current_value = beentry->st_progress_param[PROGRESS_VACUUM_INDEX_COMPLETED]; + int parallel_vacuum_new_value = pg_atomic_read_u32(ParallelVacuumProgress); + + if (parallel_vacuum_new_value > parallel_vacuum_current_value) + pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED, parallel_vacuum_new_value); + } +} parallel_vacuum_update_progress() is typically called every 1GB so I think we don't need to worry about unnecessary update. Also, I think this code doesn't work when pgstat_track_activities is false. Instead, I think that in parallel_wait_for_workers_to_finish(), we can check the value of pvs->nindexes_completed and update the progress if there is an update or it's first time. --- + (void) WaitLatch(MyLatch, WL_TIMEOUT | WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, PARALLEL_VACUUM_PROGRESS_TIMEOUT, + WAIT_EVENT_PARALLEL_VACUUM_FINISH); + ResetLatch(MyLatch); I think we don't necessarily need to use PARALLEL_VACUUM_PROGRESS_TIMEOUT here. Probably we can use 1000L instead. If we want to use PARALLEL_VACUUM_PROGRESS_TIMEOUT, we need comments for that: +#define PARALLEL_VACUUM_PROGRESS_TIMEOUT 1000 --- - WAIT_EVENT_XACT_GROUP_UPDATE + WAIT_EVENT_XACT_GROUP_UPDATE, + WAIT_EVENT_PARALLEL_VACUUM_FINISH } WaitEventIPC; Enums of WaitEventIPC should be defined in alphabetical order. --- cfbot fails. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Thanks for the feedback. I agree with the feedback, except for > need to have ParallelVacuumProgress. I see > parallel_vacuum_update_progress() uses this value but I think it's > better to pass ParallelVacuumState to via IndexVacuumInfo. I was trying to avoid passing a pointer to ParallelVacuumState in IndexVacuuminfo. ParallelVacuumProgress is implemented in the same way as VacuumSharedCostBalance and VacuumActiveNWorkers. See vacuum.h These values are reset at the start of a parallel vacuum cycle and reset at the end of an index vacuum cycle. This seems like a better approach and less invasive. What would be a reason not to go with this approach? > parallel_vacuum_update_progress() is typically called every 1GB so I > think we don't need to worry about unnecessary update. Also, I think > this code doesn't work when pgstat_track_activities is false. Instead, > I think that in parallel_wait_for_workers_to_finish(), we can check > the value of pvs->nindexes_completed and update the progress if there > is an update or it's first time. I agree that we don’t need to worry about unnecessary updates in parallel_vacuum_update_progress since we are calling every 1GB. I also don't think we should do anything additional in parallel_wait_for_workers_to_finish since here we are only updating every 1 second. Thanks, Sami Imseih Amazon Web Services
On Tue, Dec 13, 2022 at 1:40 PM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > Thanks for the feedback. I agree with the feedback, except > for > > > need to have ParallelVacuumProgress. I see > > parallel_vacuum_update_progress() uses this value but I think it's > > better to pass ParallelVacuumState to via IndexVacuumInfo. > > I was trying to avoid passing a pointer to > ParallelVacuumState in IndexVacuuminfo. > > ParallelVacuumProgress is implemented in the same > way as VacuumSharedCostBalance and > VacuumActiveNWorkers. See vacuum.h > > These values are reset at the start of a parallel vacuum cycle > and reset at the end of an index vacuum cycle. > > This seems like a better approach and less invasive. > What would be a reason not to go with this approach? First of all, I don't think we need to declare ParallelVacuumProgress in vacuum.c since it's set and used only in vacuumparallel.c. But I don't even think it's a good idea to declare it in vacuumparallel.c as a static variable. The primary reason is that it adds things we need to care about. For example, what if we raise an error during index vacuum? The transaction aborts but ParallelVacuumProgress still refers to something old. Suppose further that the next parallel vacuum doesn't launch any workers, the leader process would still end up accessing the old value pointed by ParallelVacuumProgress, which causes a SEGV. So we need to reset it anyway at the beginning of the parallel vacuum. It's easy to fix at this time but once the parallel vacuum code gets more complex, it could forget to care about it. IMO VacuumSharedCostBalance and VacuumActiveNWorkers have a different story. They are set in vacuumparallel.c and are used in vacuum.c for vacuum delay. If they weren't global variables, we would need to pass them to every function that could eventually call the vacuum delay function. So it makes sense to me to have them as global variables.On the other hand, for ParallelVacuumProgress, it's a common pattern that ambulkdelete(), amvacuumcleanup() or a common index scan routine like btvacuumscan() checks the progress. I don't think index AM needs to pass the value down to many of its functions. So it makes sense to me to pass it via IndexVacuumInfo. Having said that, I'd like to hear opinions also from other hackers, I might be wrong and it's more invasive as you pointed out. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
> First of all, I don't think we need to declare ParallelVacuumProgress > in vacuum.c since it's set and used only in vacuumparallel.c. But I > don't even think it's a good idea to declare it in vacuumparallel.c as > a static variable. The primary reason is that it adds things we need > to care about. For example, what if we raise an error during index > vacuum? The transaction aborts but ParallelVacuumProgress still refers > to something old. Suppose further that the next parallel vacuum > doesn't launch any workers, the leader process would still end up > accessing the old value pointed by ParallelVacuumProgress, which > causes a SEGV. So we need to reset it anyway at the beginning of the > parallel vacuum. It's easy to fix at this time but once the parallel > vacuum code gets more complex, it could forget to care about it. > IMO VacuumSharedCostBalance and VacuumActiveNWorkers have a different > story. They are set in vacuumparallel.c and are used in vacuum.c for > vacuum delay. If they weren't global variables, we would need to pass > them to every function that could eventually call the vacuum delay > function. So it makes sense to me to have them as global variables.On > the other hand, for ParallelVacuumProgress, it's a common pattern that > ambulkdelete(), amvacuumcleanup() or a common index scan routine like > btvacuumscan() checks the progress. I don't think index AM needs to > pass the value down to many of its functions. So it makes sense to me > to pass it via IndexVacuumInfo. Thanks for the detailed explanation and especially clearing up my understanding of VacuumSharedCostBalance and VacuumActiveNWorker. I do now think that passing ParallelVacuumState in IndexVacuumInfo is a more optimal choice. Attached version addresses the above and the previous comments. Thanks Sami Imseih Amazon Web Services (AWS)
Attachment
On Wed, Dec 14, 2022 at 05:09:46AM +0000, Imseih (AWS), Sami wrote: > Attached version addresses the above and the previous comments. cfbot is complaining that this patch no longer applies. Sami, would you mind rebasing it? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
> cfbot is complaining that this patch no longer applies. Sami, would you > mind rebasing it? Rebased patch attached. -- Sami Imseih Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, 2 Jan 2023 at 10:04, Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > cfbot is complaining that this patch no longer applies. Sami, would you > > mind rebasing it? > > Rebased patch attached. CFBot shows some compilation errors as in [1], please post an updated version for the same: [07:01:58.889] In file included from ../../../src/include/postgres.h:47, [07:01:58.889] from vacuumparallel.c:27: [07:01:58.889] vacuumparallel.c: In function ‘parallel_vacuum_update_progress’: [07:01:58.889] vacuumparallel.c:1118:10: error: ‘IsParallelWorker’ undeclared (first use in this function); did you mean ‘ParallelWorkerMain’? [07:01:58.889] 1118 | Assert(!IsParallelWorker); [07:01:58.889] | ^~~~~~~~~~~~~~~~ [07:01:58.889] ../../../src/include/c.h:859:9: note: in definition of macro ‘Assert’ [07:01:58.889] 859 | if (!(condition)) \ [07:01:58.889] | ^~~~~~~~~ [07:01:58.889] vacuumparallel.c:1118:10: note: each undeclared identifier is reported only once for each function it appears in [07:01:58.889] 1118 | Assert(!IsParallelWorker); [07:01:58.889] | ^~~~~~~~~~~~~~~~ [1] - https://cirrus-ci.com/task/4557389261701120 Regards, Vignesh
> cirrus-ci.com/task/4557389261701120 I earlier compiled without building with --enable-cassert, which is why the compilation errors did not produce on my buid. Fixed in v19. Thanks -- Sami Imseih Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 1/3/23 5:46 PM, Imseih (AWS), Sami wrote: >> cirrus-ci.com/task/4557389261701120 > > I earlier compiled without building with --enable-cassert, > which is why the compilation errors did not produce on my > buid. > > Fixed in v19. > > Thanks > Thanks for the updated patch! Some comments about it: + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>indexes_total</structfield> <type>bigint</type> + </para> + <para> + Number of indexes that wil be vacuumed. This value will be Typo: wil + /* report number of indexes to vacuum, if we are told to cleanup indexes */ + if (vacrel->do_index_cleanup) + pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_TOTAL, vacrel->nindexes); "Report" instead? (to looks like the surrounding code) + /* + * Done vacuuming an index. Increment the indexes completed + */ + pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED, + idx + 1); "Increment the indexes completed." (dot at the end) instead? - * Increase and report the number of index scans. + * Reset and report the number of indexes scanned. + * Also, increase and report the number of index + * scans. What about "Reset and report zero as the number of indexes scanned."? (just to make clear we don't want to report the value as it was prior to the reset) - /* Disable index vacuuming, index cleanup, and heap rel truncation */ + /* + * Disable index vacuuming, index cleanup, and heap rel truncation + * The new "Disable index vacuuming, index cleanup, and heap rel truncation" needs a dot at the end. + * Also, report to progress.h that we are no longer tracking + * index vacuum/cleanup. + */ "Also, report that we are" instead? + /* + * Done cleaning an index. Increment the indexes completed + */ Needs a dot at the end. - /* Reset the parallel index processing counter */ + /* Reset the parallel index processing counter ( index progress counter also ) */ "Reset the parallel index processing and progress counters" instead? + /* Update the number of indexes completed. */ + pg_atomic_add_fetch_u32(&(pvs->shared->nindexes_completed), 1); Remove the dot at the end? (to looks like the surrounding code) + +/* + * Read pvs->shared->nindexes_completed and update progress.h + * with indexes vacuumed so far. This function is called periodically "Read pvs->shared->nindexes_completed and report the number of indexes vacuumed so far" instead? + * Note: This function should be used by the leader process only, "called" instead of "used"? case WAIT_EVENT_XACT_GROUP_UPDATE: event_name = "XactGroupUpdate"; break; + case WAIT_EVENT_PARALLEL_VACUUM_FINISH: + event_name = "ParallelVacuumFinish"; + break; /* no default case, so that compiler will warn */ It seems to me that the case ordering should follow the alphabetical order (that's how it is done currently without the patch). +#define REPORT_PARALLEL_VACUUM_EVERY_PAGES ((BlockNumber) (((uint64) 1024 * 1024 * 1024) / BLCKSZ)) It seems to me that "#define REPORT_PARALLEL_VACUUM_EVERY_PAGES ((BlockNumber) (1024 * 1024 * 1024 / BLCKSZ))" would be finetoo. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Thanks for the review! Addressed the comments. > "Increment the indexes completed." (dot at the end) instead? Used the commenting format being used in other places in this file with an inclusion of a double-dash. i.,e. /* Wraparound emergency -- end current index scan */ > It seems to me that "#define REPORT_PARALLEL_VACUUM_EVERY_PAGES ((BlockNumber) (1024 * 1024 * 1024 / BLCKSZ))" would befine too. I kept this the same as it matches what we are doing in other places such as FAILSAFE_EVERY_PAGES v20 attached. Regards, -- Sami Imseih Amazon Web Services: https://aws.amazon.com
Attachment
On Thu, Jan 5, 2023 at 4:24 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > Thanks for the review! > > Addressed the comments. > > > "Increment the indexes completed." (dot at the end) instead? > > Used the commenting format being used in other places in this > file with an inclusion of a double-dash. i.,e. > /* Wraparound emergency -- end current index scan */ > > > It seems to me that "#define REPORT_PARALLEL_VACUUM_EVERY_PAGES ((BlockNumber) (1024 * 1024 * 1024 / BLCKSZ))" wouldbe fine too. > > I kept this the same as it matches what we are doing in other places such > as FAILSAFE_EVERY_PAGES > > v20 attached. + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>indexes_total</structfield> <type>bigint</type> + </para> + <para> + Number of indexes that will be vacuumed. This value will be + <literal>0</literal> if there are no indexes to vacuum, <literal>INDEX_CLEANUP</literal> + is set to <literal>OFF</literal>, or vacuum failsafe is triggered. Similar to above three cases, vacuum can bypass index vacuuming if there are almost zero TIDs. Should we set indexes_total to 0 in this case too? If so, I think we can set both indexes_total and indexes_completed at the beginning of the index vacuuming/cleanup and reset them at the end. That is, these values are valid only in index vacuum phase and index cleanup phase. Otherwise, 0. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
> Similar to above three cases, vacuum can bypass index vacuuming if > there are almost zero TIDs. Should we set indexes_total to 0 in this > case too? If so, I think we can set both indexes_total and > indexes_completed at the beginning of the index vacuuming/cleanup and > reset them at the end. Unlike the other 3 cases, in which the vacuum and cleanup are totally skipped, a cleanup still occurs when the index vacuum is bypassed. From what I can tell, this is to allow for things like a gin pending list cleanup even if the index is not vacuumed. There could be other reasons as well. if (bypass) { /* * There are almost zero TIDs. Behave as if there were precisely * zero: bypass index vacuuming, but do index cleanup. * * We expect that the ongoing VACUUM operation will finish very * quickly, so there is no point in considering speeding up as a * failsafe against wraparound failure. (Index cleanup is expected to * finish very quickly in cases where there were no ambulkdelete() * calls.) */ vacrel->do_index_vacuuming = false; } So it seems like we should still report the total number of indexes as we are currently doing in the patch. With that said, the documentation should make this be more clear. For indexes_total, the description should be: Number of indexes that will be vacuumed or cleaned up. This value will be <literal>0</literal> if there are no indexes to vacuum, <literal>INDEX_CLEANUP</literal> is set to <literal>OFF</literal>, or vacuum failsafe is triggered. See <xref linkend="guc-vacuum-failsafe-age"/> For indexes_completed, it should be: Number of indexes vacuumed in the current vacuum cycle when the phase is <literal>vacuuming indexes</liternal>, or the number of indexes cleaned up in the <literal>cleaning up indexes<literal> phase. Regards, -- Sami Imseih Amazon Web Services: https://aws.amazon.com
On Fri, Jan 6, 2023 at 12:07 PM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > Similar to above three cases, vacuum can bypass index vacuuming if > > there are almost zero TIDs. Should we set indexes_total to 0 in this > > case too? If so, I think we can set both indexes_total and > > indexes_completed at the beginning of the index vacuuming/cleanup and > > reset them at the end. > > Unlike the other 3 cases, in which the vacuum and cleanup are totally skipped, > a cleanup still occurs when the index vacuum is bypassed. From what I can tell, > this is to allow for things like a gin pending list cleanup even if the index > is not vacuumed. Right. But since we set indexes_total also at the beginning of index cleanup (i.e. lazy_cleanup_all_indexes()), can't we still show the valid value in this case? My point is whether we should show indexes_total throughout the vacuum execution (even also in not relevant phases such as heap scanning/vacuum/truncation). For example, in the analyze progress report, we have child_tables_total and child_tables_done. child_tables_total is set before the actual work of sampling rows from child tables, but not the beginning of the analyze. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
> My point is whether we should show > indexes_total throughout the vacuum execution (even also in not > relevant phases such as heap scanning/vacuum/truncation). That is a good point. We should only show indexes_total and indexes_completed only during the relevant phases. V21 addresses this along with a documentation fix. indexes_total and indexes_completed can only be a value > 0 while in the "vacuuming indexes" or "cleaning up indexes" phases of vacuum. Indexes_total is set to 0 at the start of the index vacuum cycle or index cleanups and set back to 0, along with indexes_completed, at the end of the index vacuum cycle and index cleanups Also, for the progress updates in vacuumlazy.c that should be done atomically, I made a change to use pgstat_progress_update_multi_param. Regards, -- Sami Imseih Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2023-01-07 01:59:40 +0000, Imseih (AWS), Sami wrote: > --- a/src/backend/access/nbtree/nbtree.c > +++ b/src/backend/access/nbtree/nbtree.c > @@ -998,6 +998,8 @@ btvacuumscan(IndexVacuumInfo *info, IndexBulkDeleteResult *stats, > if (info->report_progress) > pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE, > scanblkno); > + if (info->report_parallel_progress && (scanblkno % REPORT_PARALLEL_VACUUM_EVERY_PAGES) == 0) > + parallel_vacuum_update_progress(info->parallel_vacuum_state); > } > } I still think it's wrong to need multiple pgstat_progress_*() calls within one scan. If we really need it, it should be abstracted into a helper function that wrapps all the vacuum progress stuff needed for an index. > @@ -688,7 +703,13 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan > */ > if (nworkers > 0) > { > - /* Wait for all vacuum workers to finish */ > + /* > + * Wait for all indexes to be vacuumed while > + * updating the parallel vacuum index progress, > + * and then wait for all workers to finish. > + */ > + parallel_vacuum_wait_to_finish(pvs); > + > WaitForParallelWorkersToFinish(pvs->pcxt); > > for (int i = 0; i < pvs->pcxt->nworkers_launched; i++) I don't think it's a good idea to have two difference routines that wait for workers to exit. And certainly not when one of them basically just polls in a regular interval as parallel_vacuum_wait_to_finish(). I think the problem here is that you're basically trying to work around the lack of an asynchronous state update mechanism between leader and workers. The workaround is to add a lot of different places that poll whether there has been any progress. And you're not doing that by integrating with the existing machinery for interrupt processing (i.e. CHECK_FOR_INTERRUPTS()), but by developing a new mechanism. I think your best bet would be to integrate with HandleParallelMessages(). Greetings, Andres Freund
Thanks for the feedback and I apologize for the delay in response. > I think the problem here is that you're basically trying to work around the > lack of an asynchronous state update mechanism between leader and workers. The > workaround is to add a lot of different places that poll whether there has > been any progress. And you're not doing that by integrating with the existing > machinery for interrupt processing (i.e. CHECK_FOR_INTERRUPTS()), but by > developing a new mechanism. > I think your best bet would be to integrate with HandleParallelMessages(). You are correct. I have been trying to work around the async nature of parallel workers performing the index vacuum. As you have pointed out, integrating with HandleParallelMessages does appear to be the proper way. Doing so will also avoid having to do any progress updates in the index AMs. In the attached patch, the parallel workers send a new type of protocol message type to the leader called 'P' which signals the leader that it should handle a progress update. The leader then performs the progress update by invoking a callback set in the ParallelContext. This is done inside HandleParallelMessages. In the index vacuum case, the callback is parallel_vacuum_update_progress. The new message does not contain a payload, and it's merely used to signal the leader that it can invoke a progress update. Also, If the leader performs the index vacuum, it can call parallel_vacuum_update_progress directly inside vacuumparallel.c. Regards, Sami Imseih Amazon Web Services (AWS)
Attachment
On Thu, Jan 12, 2023 at 11:02 PM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > Thanks for the feedback and I apologize for the delay in response. > > > I think the problem here is that you're basically trying to work around the > > lack of an asynchronous state update mechanism between leader and workers. The > > workaround is to add a lot of different places that poll whether there has > > been any progress. And you're not doing that by integrating with the existing > > machinery for interrupt processing (i.e. CHECK_FOR_INTERRUPTS()), but by > > developing a new mechanism. > > > I think your best bet would be to integrate with HandleParallelMessages(). > > You are correct. I have been trying to work around the async nature > of parallel workers performing the index vacuum. As you have pointed out, > integrating with HandleParallelMessages does appear to be the proper way. > Doing so will also avoid having to do any progress updates in the index AMs. Very interesting idea. I need to study the parallel query infrastructure more to consider potential downside of this idea but it seems okay as far as I researched so far. > In the attached patch, the parallel workers send a new type of protocol message > type to the leader called 'P' which signals the leader that it should handle a > progress update. The leader then performs the progress update by > invoking a callback set in the ParallelContext. This is done inside HandleParallelMessages. > In the index vacuum case, the callback is parallel_vacuum_update_progress. > > The new message does not contain a payload, and it's merely used to > signal the leader that it can invoke a progress update. Thank you for updating the patch. Here are some comments for v22 patch: --- + <para> + Number of indexes that will be vacuumed or cleaned up. This value will be + <literal>0</literal> if the phase is not <literal>vacuuming indexes</literal> + or <literal>cleaning up indexes</literal>, <literal>INDEX_CLEANUP</literal> + is set to <literal>OFF</literal>, index vacuum is skipped due to very + few dead tuples in the table, or vacuum failsafe is triggered. I think that if INDEX_CLEANUP is set to OFF or index vacuum is skipped due to failsafe mode, we enter neither vacuum indexes phase nor cleanup indexes phase. So probably we can say something like: Number of indexes that will be vacuumed or cleaned up. This counter only advances when the phase is vacuuming indexes or cleaning up indexes. --- - /* Report that we are now vacuuming indexes */ - pgstat_progress_update_param(PROGRESS_VACUUM_PHASE, - PROGRESS_VACUUM_PHASE_VACUUM_INDEX); + /* + * Report that we are now vacuuming indexes + * and the number of indexes to vacuum. + */ + progress_start_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_INDEX; + progress_start_val[1] = vacrel->nindexes; + pgstat_progress_update_multi_param(2, progress_start_index, progress_start_val); According to our code style guideline[1], we limit line lengths so that the code is readable in an 80-column window. Some comments updated in this patch seem too short. --- + StringInfoData msgbuf; + + pq_beginmessage(&msgbuf, 'P'); + pq_endmessage(&msgbuf); I think we can use pq_putmessage() instead. --- +/* progress callback definition */ +typedef void (*ParallelProgressCallback) (void *parallel_progress_callback_state); I think it's better to define "void *arg". --- + /* + * A Leader process that receives this message + * must be ready to update progress. + */ + Assert(pcxt->parallel_progress_callback); + Assert(pcxt->parallel_progress_callback_arg); + + /* Report progress */ + pcxt->parallel_progress_callback(pcxt->parallel_progress_callback_arg); I think the parallel query infra should not require parallel_progress_callback_arg to always be set. I think it can be NULL. --- +void +parallel_vacuum_update_progress(void *arg) +{ + ParallelVacuumState *pvs = (ParallelVacuumState *)arg; + + Assert(!IsParallelWorker()); + + if (pvs) + pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED, + pg_atomic_add_fetch_u32(&(pvs->shared->nindexes_completed), 1)); +} Since parallel vacuum always sets the arg, I think we don't need to check it. Regards, [1] https://www.postgresql.org/docs/devel/source-format.html -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
> Number of indexes that will be vacuumed or cleaned up. This counter only > advances when the phase is vacuuming indexes or cleaning up indexes. I agree, this reads better. --- - /* Report that we are now vacuuming indexes */ - pgstat_progress_update_param(PROGRESS_VACUUM_PHASE, - PROGRESS_VACUUM_PHASE_VACUUM_INDEX); + /* + * Report that we are now vacuuming indexes + * and the number of indexes to vacuum. + */ + progress_start_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_INDEX; + progress_start_val[1] = vacrel->nindexes; + pgstat_progress_update_multi_param(2, progress_start_index, progress_start_val); > According to our code style guideline[1], we limit line lengths so > that the code is readable in an 80-column window. Some comments > updated in this patch seem too short. I will correct this. > I think it's better to define "void *arg". Agree > --- > + /* > + * A Leader process that receives this message > + * must be ready to update progress. > + */ > + Assert(pcxt->parallel_progress_callback); > + Assert(pcxt->parallel_progress_callback_arg); > + > + /* Report progress */ > + > pcxt->parallel_progress_callback(pcxt->parallel_progress_callback_arg); > I think the parallel query infra should not require > parallel_progress_callback_arg to always be set. I think it can be > NULL. This assertion is inside the new 'P' message type handling. If a leader is consuming this message, they must have a progress callback set. Right now we only set the callback in the parallel vacuum case only, so not all leaders will be prepared to handle this case. Would you agree this is needed for safety? case 'P': /* Parallel progress reporting */ { /* * A Leader process that receives this message * must be ready to update progress. */ Assert(pcxt->parallel_progress_callback); Assert(pcxt->parallel_progress_callback_arg); --- > +void > +parallel_vacuum_update_progress(void *arg) > +{ > + ParallelVacuumState *pvs = (ParallelVacuumState *)arg; > + > + Assert(!IsParallelWorker()); > + > + if (pvs) > + pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED, > + > pg_atomic_add_fetch_u32(&(pvs->shared->nindexes_completed), 1)); > +} > Since parallel vacuum always sets the arg, I think we don't need to check it. The arg is only set during parallel vacuum. During non-parallel vacuum, It's NULL. This check can be removed, but I realize now that we do need an Assert(pvs). Do you agree? -- Regards, Sami Imseih Amazon Web Services (AWS)
On Fri, Jan 20, 2023 at 11:38 PM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > Number of indexes that will be vacuumed or cleaned up. This counter only > > advances when the phase is vacuuming indexes or cleaning up indexes. > > I agree, this reads better. > > --- > - /* Report that we are now vacuuming indexes */ > - pgstat_progress_update_param(PROGRESS_VACUUM_PHASE, > - > PROGRESS_VACUUM_PHASE_VACUUM_INDEX); > + /* > + * Report that we are now vacuuming indexes > + * and the number of indexes to vacuum. > + */ > + progress_start_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_INDEX; > + progress_start_val[1] = vacrel->nindexes; > + pgstat_progress_update_multi_param(2, progress_start_index, > progress_start_val); > > > According to our code style guideline[1], we limit line lengths so > > that the code is readable in an 80-column window. Some comments > > updated in this patch seem too short. > > I will correct this. > > > I think it's better to define "void *arg". > > Agree > > > --- > > + /* > > + * A Leader process that receives this message > > + * must be ready to update progress. > > + */ > > + Assert(pcxt->parallel_progress_callback); > > + Assert(pcxt->parallel_progress_callback_arg); > > + > > + /* Report progress */ > > + > > pcxt->parallel_progress_callback(pcxt->parallel_progress_callback_arg); > > > I think the parallel query infra should not require > > parallel_progress_callback_arg to always be set. I think it can be > > NULL. > > This assertion is inside the new 'P' message type handling. > If a leader is consuming this message, they must have a > progress callback set. Right now we only set the callback > in the parallel vacuum case only, so not all leaders will be prepared > to handle this case. > > Would you agree this is needed for safety? I think it makes sense that we assume pcxt->parallel_progress_callback is always not NULL but my point is that in the future one might want to use this callback without the argument and we should allow it. If parallel vacuum assumes pcxt->parallel_progress_callback_arg is not NULL, I think we should add an assertion in parallel_vacuum_update_progress(), but not in HandleParallelMessage(). > > case 'P': /* Parallel progress reporting */ > { > /* > * A Leader process that receives this message > * must be ready to update progress. > */ > Assert(pcxt->parallel_progress_callback); > Assert(pcxt->parallel_progress_callback_arg); > > --- > > +void > > +parallel_vacuum_update_progress(void *arg) > > +{ > > + ParallelVacuumState *pvs = (ParallelVacuumState *)arg; > > + > > + Assert(!IsParallelWorker()); > > + > > + if (pvs) > > + pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED, > > + > > pg_atomic_add_fetch_u32(&(pvs->shared->nindexes_completed), 1)); > > +} > > > Since parallel vacuum always sets the arg, I think we don't need to check it. > > The arg is only set during parallel vacuum. During non-parallel vacuum, > It's NULL. This check can be removed, but I realize now that we do need > an Assert(pvs). Do you agree? Agreed to add the assertion in this function. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi, Thanks for your reply! I addressed the latest comments in v23. 1/ cleaned up the asserts as discussed. 2/ used pq_putmessage to send the message on index scan completion. Thanks -- Sami Imseih Amazon Web Services (AWS)
Attachment
On Wed, Feb 8, 2023 at 11:03 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > Hi, > > Thanks for your reply! > > I addressed the latest comments in v23. > > 1/ cleaned up the asserts as discussed. > 2/ used pq_putmessage to send the message on index scan completion. Thank you for updating the patch! Here are some comments for v23 patch: + <row> + <entry><literal>ParallelVacuumFinish</literal></entry> + <entry>Waiting for parallel vacuum workers to finish index vacuum.</entry> + </row> This change is out-of-date. --- + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>indexes_total</structfield> <type>bigint</type> + </para> + <para> + Number of indexes that will be vacuumed or cleaned up. This value will be + <literal>0</literal> if the phase is not <literal>vacuuming indexes</literal> + or <literal>cleaning up indexes</literal>, <literal>INDEX_CLEANUP</literal> + is set to <literal>OFF</literal>, index vacuum is skipped due to very + few dead tuples in the table, or vacuum failsafe is triggered. + See <xref linkend="guc-vacuum-failsafe-age"/> + for more on vacuum failsafe. + </para></entry> + </row> This explanation looks redundant: setting INDEX_CLEANUP to OFF essentially means the vacuum doesn't enter the vacuuming indexes phase. The same is true for the case of skipping index vacuum and failsafe mode. How about the following? Total number of indexes that will be vacuumed or cleaned up. This number is reported as of the beginning of the vacuuming indexes phase or the cleaning up indexes phase. --- + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>indexes_completed</structfield> <type>bigint</type> + </para> + <para> + Number of indexes vacuumed in the current vacuum cycle when the + phase is <literal>vacuuming indexes</literal>, or the number + of indexes cleaned up during the <literal>cleaning up indexes</literal> + phase. + </para></entry> + </row> How about simplfyy the description to something like: Number of indexes processed. This counter only advances when the phase is vacuuming indexes or cleaning up indexes. Also, index_processed sounds accurate to me. What do you think? --- + pcxt->parallel_progress_callback = NULL; + pcxt->parallel_progress_callback_arg = NULL; I think these settings are not necessary since the pcxt is palloc0'ed. --- +void +parallel_vacuum_update_progress(void *arg) +{ + ParallelVacuumState *pvs = (ParallelVacuumState *)arg; + + Assert(!IsParallelWorker()); + Assert(pvs->pcxt->parallel_progress_callback_arg); + + if (pvs) + pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED, + pg_atomic_add_fetch_u32(&(pvs->shared->nindexes_completed), 1)); +} Assert(pvs->pcxt->parallel_progress_callback_arg) looks wrong to me. If 'arg' is NULL, a SEGV happens. I think it's better to update pvs->shared->nindexes_completed by both leader and worker processes who processed the index. --- +/* progress callback definition */ +typedef void (*ParallelProgressCallback) (void *parallel_progress_callback_state); + typedef void (*parallel_worker_main_type) (dsm_segment *seg, shm_toc *toc); I think it's better to make the function type consistent with the existing parallel_worker_main_type. How about parallel_progress_callback_type? I've attached a patch that incorporates the above comments and has some suggestions of updating comments etc. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
Thanks for the review! > + <row> > + <entry><literal>ParallelVacuumFinish</literal></entry> > + <entry>Waiting for parallel vacuum workers to finish index > vacuum.</entry> > + </row> > This change is out-of-date. That was an oversight. Thanks for catching. > Total number of indexes that will be vacuumed or cleaned up. This > number is reported as of the beginning of the vacuuming indexes phase > or the cleaning up indexes phase. This is cleaner. I was being unnecessarily verbose in the original description. > Number of indexes processed. This counter only advances when the phase > is vacuuming indexes or cleaning up indexes. I agree. > Also, index_processed sounds accurate to me. What do you think? At one point, II used index_processed, but decided to change it. "processed" makes sense also. I will use this. > I think these settings are not necessary since the pcxt is palloc0'ed. Good point. > Assert(pvs->pcxt->parallel_progress_callback_arg) looks wrong to me. > If 'arg' is NULL, a SEGV happens. Correct, Assert(pvs) is all that is needed. > I think it's better to update pvs->shared->nindexes_completed by both > leader and worker processes who processed the index. No reason for that, since only the leader process can report process to backend_progress. > I think it's better to make the function type consistent with the > existing parallel_worker_main_type. How about > parallel_progress_callback_type? Yes, that makes sense. > I've attached a patch that incorporates the above comments and has > some suggestions of updating comments etc. I reviewed and incorporated these changes, with a slight change. See v24. - * Increase and report the number of index. Also, we reset the progress - * counters. + * Increase and report the number of index scans. Also, we reset the progress + * counters. Thanks -- Sami Imseih Amazon Web Services (AWS)
Attachment
On Sat, Feb 18, 2023 at 11:46 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > Thanks for the review! > > > + <row> > > + <entry><literal>ParallelVacuumFinish</literal></entry> > > + <entry>Waiting for parallel vacuum workers to finish index > > vacuum.</entry> > > + </row> > > > This change is out-of-date. > > That was an oversight. Thanks for catching. > > > Total number of indexes that will be vacuumed or cleaned up. This > > number is reported as of the beginning of the vacuuming indexes phase > > or the cleaning up indexes phase. > > This is cleaner. I was being unnecessarily verbose in the original description. > > > Number of indexes processed. This counter only advances when the phase > > is vacuuming indexes or cleaning up indexes. > > I agree. > > > Also, index_processed sounds accurate to me. What do you think? > > At one point, II used index_processed, but decided to change it. > "processed" makes sense also. I will use this. > > > I think these settings are not necessary since the pcxt is palloc0'ed. > > Good point. > > > Assert(pvs->pcxt->parallel_progress_callback_arg) looks wrong to me. > > If 'arg' is NULL, a SEGV happens. > > Correct, Assert(pvs) is all that is needed. > > > I think it's better to update pvs->shared->nindexes_completed by both > > leader and worker processes who processed the index. > > No reason for that, since only the leader process can report process to > backend_progress. > > > > I think it's better to make the function type consistent with the > > existing parallel_worker_main_type. How about > > parallel_progress_callback_type? > > Yes, that makes sense. > > > I've attached a patch that incorporates the above comments and has > > some suggestions of updating comments etc. > > I reviewed and incorporated these changes, with a slight change. See v24. > > - * Increase and report the number of index. Also, we reset the progress > - * counters. > > > + * Increase and report the number of index scans. Also, we reset the progress > + * counters. > > > Thanks Thanks for updating the patch! #define PROGRESS_VACUUM_NUM_INDEX_VACUUMS 4 #define PROGRESS_VACUUM_MAX_DEAD_TUPLES 5 #define PROGRESS_VACUUM_NUM_DEAD_TUPLES 6 +#define PROGRESS_VACUUM_INDEX_TOTAL 7 +#define PROGRESS_VACUUM_INDEX_PROCESSED 8 - s.param7 AS num_dead_tuples + s.param7 AS num_dead_tuples, + s.param8 AS indexes_total, + s.param9 AS indexes_processed I think PROGRESS_VACUUM_INDEXES_TOTAL and PROGRESS_VACUUM_INDEXES_PROCESSED are better for consistency. The rest looks good to me. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Thanks! > I think PROGRESS_VACUUM_INDEXES_TOTAL and > PROGRESS_VACUUM_INDEXES_PROCESSED are better for consistency. The rest > looks good to me. Took care of that in v25. Regards -- Sami Imseih Amazon Web Services
Attachment
On Tue, Feb 21, 2023 at 1:48 AM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > Thanks! > > > I think PROGRESS_VACUUM_INDEXES_TOTAL and > > PROGRESS_VACUUM_INDEXES_PROCESSED are better for consistency. The rest > > looks good to me. > > Took care of that in v25. Thanks! It looks good to me so I've marked it as Ready for Committer. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Fri, Feb 24, 2023 at 03:16:10PM +0900, Masahiko Sawada wrote: > Thanks! It looks good to me so I've marked it as Ready for Committer. + case 'P': /* Parallel progress reporting */ + { + /* Call the progress reporting callback */ + Assert(pcxt->parallel_progress_callback); + pcxt->parallel_progress_callback(pcxt->parallel_progress_callback_arg); + + break; + } The key point of the patch is here. From what I understand based on the information of the thread, this is used as a way to make the progress reporting done by the leader more responsive so as we'd update the index counters each time the leader is poked at with a 'P' message by one of its workers, once a worker is done with the parallel cleanup of one of the indexes. That's appealing, because this design is responsive and cheap, while we'd rely on CFIs to make sure that the leader triggers its callback on a timely basis. Unfortunately, sticking a concept of "Parallel progress reporting" is rather confusing here? This stuff can be used for much more purposes than just progress reporting: the leader would execute the callback it has registered based on the timing given by one or more of its workers, these willing to push an event on the leader. Progress reporting is one application of that to force a refresh and make the information of the leader accurate. What about things like a chain of callbacks, for example? Could the leader want to register more than one callback and act on all of them with one single P message? Another question I have: could the reporting of each individual worker make sense on its own? The cleanup of the indexes depends on the order they are processed, their number, size and AM with their cleanup strategy, still there may be a point in getting information about how much work a single worker is doing rather than just have the aggregated information given to the leader? Btw, Is an assertion really helpful here? If parallel_progress_callback is not set, we'd just crash one line after. It seems to me that it could be cleaner to do nothing if a leader gets a poke message from a worker if there are no callbacks registered. -- Michael
Attachment
On Wed, Apr 5, 2023 at 4:47 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Feb 24, 2023 at 03:16:10PM +0900, Masahiko Sawada wrote: > > Thanks! It looks good to me so I've marked it as Ready for Committer. > > + case 'P': /* Parallel progress reporting */ > + { > + /* Call the progress reporting callback */ > + Assert(pcxt->parallel_progress_callback); > + pcxt->parallel_progress_callback(pcxt->parallel_progress_callback_arg); > + > + break; > + } > > The key point of the patch is here. From what I understand based on > the information of the thread, this is used as a way to make the > progress reporting done by the leader more responsive so as we'd > update the index counters each time the leader is poked at with a 'P' > message by one of its workers, once a worker is done with the parallel > cleanup of one of the indexes. That's appealing, because this design > is responsive and cheap, while we'd rely on CFIs to make sure that the > leader triggers its callback on a timely basis. Unfortunately, > sticking a concept of "Parallel progress reporting" is rather > confusing here? This stuff can be used for much more purposes than > just progress reporting: the leader would execute the callback it has > registered based on the timing given by one or more of its workers, > these willing to push an event on the leader. Progress reporting is > one application of that to force a refresh and make the information of > the leader accurate. What about things like a chain of callbacks, for > example? Could the leader want to register more than one callback and > act on all of them with one single P message? That seems a valid argument. I was thinking that such an asynchronous state update mechanism would be a good infrastructure for progress reporting of parallel operations. It might be worth considering to use it in more general usage but since the current implementation is minimal can we extend it in the future when we need it for other use cases? > > Another question I have: could the reporting of each individual worker > make sense on its own? The cleanup of the indexes depends on the > order they are processed, their number, size and AM with their cleanup > strategy, still there may be a point in getting information about how > much work a single worker is doing rather than just have the > aggregated information given to the leader? It would also be useful information for users but I don't think it can alternate the aggregated information. The aggregated information can answer the question from the user like "how many indexes to vacuum are remaining?", which helps estimate the remaining time to complete. > > Btw, Is an assertion really helpful here? If > parallel_progress_callback is not set, we'd just crash one line > after. It seems to me that it could be cleaner to do nothing if a > leader gets a poke message from a worker if there are no callbacks > registered. Agreed. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
> > The key point of the patch is here. From what I understand based on > > the information of the thread, this is used as a way to make the > progress reporting done by the leader more responsive so as we'd > > update the index counters each time the leader is poked at with a 'P' > > message by one of its workers, once a worker is done with the parallel > > cleanup of one of the indexes. That's appealing, because this design > > is responsive and cheap, while we'd rely on CFIs to make sure that the > > leader triggers its callback on a timely basis. Unfortunately, > > sticking a concept of "Parallel progress reporting" is rather > > confusing here? This stuff can be used for much more purposes than > > just progress reporting: the leader would execute the callback it has > > registered based on the timing given by one or more of its workers, > > these willing to push an event on the leader. Progress reporting is > > one application of that to force a refresh and make the information of > > the leader accurate. What about things like a chain of callbacks, for > > example? Could the leader want to register more than one callback and > > act on all of them with one single P message? > That seems a valid argument. I was thinking that such an asynchronous > state update mechanism would be a good infrastructure for progress > reporting of parallel operations. It might be worth considering to use > it in more general usage but since the current implementation is > minimal can we extend it in the future when we need it for other use > cases? I don't think we should delay this patch to design a more general infrastructure. I agree this can be handled by a future requirement. > > > > Another question I have: could the reporting of each individual worker > > make sense on its own? The cleanup of the indexes depends on the > > order they are processed, their number, size and AM with their cleanup > > strategy, still there may be a point in getting information about how > > much work a single worker is doing rather than just have the > > aggregated information given to the leader? > It would also be useful information for users but I don't think it can > alternate the aggregated information. The aggregated information can > answer the question from the user like "how many indexes to vacuum are > remaining?", which helps estimate the remaining time to complete. The original intention of the thread was to expose stats for both aggregate (leader level) and individual index progress. Both the aggregate and individual indexes information have benefit as mentioned by Swada-San. For the individual index progress, a suggested patch was suggested earlier in the thread, v1-0001-Function-to-return-currently-vacuumed-or-cleaned-ind.patch. However, since this particular thread has focused mainly on the aggregated stats work, my thoughts have been to start a new thread for the individual index progress once this gets committed. > > Btw, Is an assertion really helpful here? If > > parallel_progress_callback is not set, we'd just crash one line > > after. It seems to me that it could be cleaner to do nothing if a > > leader gets a poke message from a worker if there are no callbacks > > registered. > Agreed. I removed the assert and added an if condition instead. See the attached v26 please. Regards, -- Sami Imseih Amazon Web Services (AWS)
Attachment
On Wed, Apr 05, 2023 at 02:31:54PM +0000, Imseih (AWS), Sami wrote: >> That seems a valid argument. I was thinking that such an asynchronous >> state update mechanism would be a good infrastructure for progress >> reporting of parallel operations. It might be worth considering to use >> it in more general usage but since the current implementation is >> minimal can we extend it in the future when we need it for other use >> cases? > > I don't think we should delay this patch to design a more general > infrastructure. I agree this can be handled by a future requirement. Not so sure to agree on that. As the patch stands, we have a rather generally-purposed new message type and facility (callback trigger poke from workers to their leader) used for a not-so-general purpose, while being documented under this not-so-general purpose, which is progress reporting. I agree that relying on pqmq.c to force the leader to be more sensible to refreshes is sensible, because it is cheap, but the interface seems kind of misplaced to me. As one thing, for example, it introduces a dependency to parallel.h to do progress reporting without touching at backend_progress.h. Is a callback approach combined with a counter in shared memory the best thing there could be? Could it be worth thinking about a different design where the value incremented and the parameters of pgstat_progress_update_param() are passed through the 'P' message instead? It strikes me that gathering data in the leader from a poke of the workers is something that could be useful in so much more areas than just the parallel index operations done in a vacuum because we do more and more things in parallel these days, so the API interface ought to have some attention. As some say, the introduction of a new message type in pqmq.c would be basically a one-way door, because we'd have to maintain it in a stable branch. I would not take that lightly. One idea of interface that could be used is an extra set of APIs for workers to do progress reporting, part of backend_progress.h, where we use a pqmq message type in a centralized location, say something like a pgstat_progress_parallel_incr_param(). About the callback interface, we may also want to be more careful about more things, like the handling of callback chains, or even unregistrations of callbacks? There could be much more uses to that than just progress reporting, though this comes to a balance of what the leader needs to do before the workers are able to poke at it on a periodic basis to make the refresh of the aggregated progress reporting data more verbose. There is also an argument where we could have each worker report their progress independently of the leader? -- Michael
Attachment
> As one thing, > for example, it introduces a dependency to parallel.h to do progress > reporting without touching at backend_progress.h. Containing the logic in backend_progress.h is a reasonable point from a maintenance standpoint. We can create a new function in backend_progress.h called pgstat_progress_update_leader which is called from vacuumparallel.c. pgstat_progress_update_leader can then call pq_putmessage('P', NULL, 0) > Is a callback > approach combined with a counter in shared memory the best thing there > could be? It seems to be the best way. The shared memory, ParallelVacuumState, is already tracking the counters for the Parallel Vacuum. Also, the callback in ParallelContext is the only way I can see to let the 'P' message know what to do for updating progress to the leader. > Could it be worth thinking about a different design where > the value incremented and the parameters of > pgstat_progress_update_param() are passed through the 'P' message > instead? I am not sure how this is different than the approach suggested. In the current design, the 'P' message is used to pass the ParallelvacuumState to parallel_vacuum_update_progress which then calls pgstat_progress_update_param. > It strikes me that gathering data in the leader from a poke > of the workers is something that could be useful in so much more areas > than just the parallel index operations done in a vacuum because we do > more and more things in parallel these days, so the API interface > ought to have some attention. We may need an interface that does more than progress reporting, but I am not sure what those use cases are at this point, besides progress reporting. > There is also an argument where we could > have each worker report their progress independently of the leader? In this case, we don't need ParallelContext at all or to go through the 'P' message. -- Regards, Sami Imseih Amazon Web Services (AWS)
On Thu, Apr 06, 2023 at 03:14:20PM +0000, Imseih (AWS), Sami wrote: >> Could it be worth thinking about a different design where >> the value incremented and the parameters of >> pgstat_progress_update_param() are passed through the 'P' message >> instead? > > I am not sure how this is different than the approach suggested. > In the current design, the 'P' message is used to pass the > ParallelvacuumState to parallel_vacuum_update_progress which then > calls pgstat_progress_update_param. The arguments of pgstat_progress_update_param() would be given by the worker directly as components of the 'P' message. It seems to me that this approach would have the simplicity to not require the setup of a shmem area for the extra counters, and there would be no need for a callback. Hence, the only thing the code paths of workers would need to do is to call this routine, then the leaders would increment their progress when they see a CFI to process the 'P' message. Also, I guess that we would only need an interface in backend_progress.c to increment counters, like pgstat_progress_incr_param(), but usable by workers. Like a pgstat_progress_worker_incr_param()? -- Michael
Attachment
Hi, On 2023-04-06 12:28:04 +0900, Michael Paquier wrote: > As some say, the introduction of a new message type in pqmq.c would be > basically a one-way door, because we'd have to maintain it in a stable > branch. Why would it mean that? Parallel workers are updated together with the leader, so there's no compatibility issue? Greetings, Andres Freund
> The arguments of pgstat_progress_update_param() would be given by the > worker directly as components of the 'P' message. It seems to me that > this approach would have the simplicity to not require the setup of a > shmem area for the extra counters, and there would be no need for a > callback. Hence, the only thing the code paths of workers would need > to do is to call this routine, then the leaders would increment their > progress when they see a CFI to process the 'P' message. Also, I > guess that we would only need an interface in backend_progress.c to > increment counters, like pgstat_progress_incr_param(), but usable by > workers. Like a pgstat_progress_worker_incr_param()? So, here is what I think should be workable to give a generic progress interface. pgstat_progress_parallel_incr_param will be a new API that can be called by either worker of leader from any parallel code path that chooses to increment a progress index. If called by a worker, it will send a 'P' message to the front end passing both the progress index, i.e. PROGRESS_VACUUM_INDEXES_PROCESSED And the value to increment by, i.e. 1 for index vacuum progress. With that, the additional shared memory counters in ParallelVacuumState are not needed, and the poke of the worker to the leader goes directly through a generic backend_progress API. Let me know your thoughts. Thanks! -- Sami Imseih Amazon Web Services (AWS)
Attachment
On Fri, Apr 07, 2023 at 07:27:17PM +0000, Imseih (AWS), Sami wrote: > If called by a worker, it will send a 'P' message to the front end > passing both the progress index, i.e. PROGRESS_VACUUM_INDEXES_PROCESSED > And the value to increment by, i.e. 1 for index vacuum progress. > > With that, the additional shared memory counters in ParallelVacuumState > are not needed, and the poke of the worker to the leader goes directly > through a generic backend_progress API. Thanks for the new version. This has unfortunately not been able to make the cut for v16, but let's see it done at the beginning of the v17 cycle. +void +pgstat_progress_parallel_incr_param(int index, int64 incr) +{ + /* + * Parallel workers notify a leader through a 'P' + * protocol message to update progress, passing the + * progress index and increment value. Leaders can + * just call pgstat_progress_incr_param directly. + */ + if (IsParallelWorker()) + { + static StringInfoData progress_message; + + initStringInfo(&progress_message); + + pq_beginmessage(&progress_message, 'P'); + pq_sendint32(&progress_message, index); + pq_sendint64(&progress_message, incr); + pq_endmessage(&progress_message); + } + else + pgstat_progress_incr_param(index, incr); +} I see. You need to handle both the leader and worker case because parallel_vacuum_process_one_index() can be called by either of them. + case 'P': /* Parallel progress reporting */ Perhaps this comment should say that this is only about incremental progress reporting, for the moment. + * Increase and report the number of index scans. Also, we reset the progress + * counters. The counters reset are the two index counts, perhaps this comment should mention this fact. + /* update progress */ + int index = pq_getmsgint(msg, 4); + int incr = pq_getmsgint(msg, 1); [...] + pq_beginmessage(&progress_message, 'P'); + pq_sendint32(&progress_message, index); + pq_sendint64(&progress_message, incr); + pq_endmessage(&progress_message); It seems to me that the receiver side is missing one pq_getmsgend()? incr is defined and sent as an int64 on the sender side, hence the receiver should use pq_getmsgint64(), no? pq_getmsgint(msg, 1) means to receive only one byte, see pqformat.c. And the order is reversed? There may be a case in the future about making 'P' more complicated with more arguments, but what you have here should be sufficient for your use-case. Were there plans to increment more data for some different and/or new progress indexes in the VACUUM path, by the way? Most of that looked a bit tricky to me as this was AM-dependent, but I may have missed something. -- Michael
Attachment
On Fri, Apr 07, 2023 at 12:01:17PM -0700, Andres Freund wrote: > Why would it mean that? Parallel workers are updated together with the leader, > so there's no compatibility issue? My point is that the callback system would still need to be maintained in a stable branch, and, while useful, it could be used for much more than it is originally written. I guess that this could be used in custom nodes with their own custom parallel nodes. -- Michael
Attachment
> + case 'P': /* Parallel progress reporting */ I kept this comment as-is but inside case code block I added more comments. This is to avoid cluttering up the one-liner comment. > + * Increase and report the number of index scans. Also, we reset the progress > + * counters. > The counters reset are the two index counts, perhaps this comment > should mention this fact. Yes, since we are using the multi_param API here, it makes sense to mention the progress fields being reset in the comments. + /* update progress */ + int index = pq_getmsgint(msg, 4); + int incr = pq_getmsgint(msg, 1); [...] + pq_beginmessage(&progress_message, 'P'); + pq_sendint32(&progress_message, index); + pq_sendint64(&progress_message, incr); + pq_endmessage(&progress_message); > It seems to me that the receiver side is missing one pq_getmsgend()? Yes. I added this. > incr is defined and sent as an int64 on the sender side, hence the > receiver should use pq_getmsgint64(), no? pq_getmsgint(msg, 1) means > to receive only one byte, see pqformat.c. Ah correct, incr is an int64 so what we need is. int64 incr = pq_getmsgint64(msg); I also added the pq_getmsgend call. > And the order is reversed? I don't think so. The index then incr are sent and they are back in the same order. Testing the patch shows the value increments correctly. See v28 addressing the comments. Regards, Sami Imseih AWS (Amazon Web Services)
Attachment
Hi, On 2023-04-10 08:14:18 +0900, Michael Paquier wrote: > On Fri, Apr 07, 2023 at 12:01:17PM -0700, Andres Freund wrote: > > Why would it mean that? Parallel workers are updated together with the leader, > > so there's no compatibility issue? > > My point is that the callback system would still need to be maintained > in a stable branch, and, while useful, it could be used for much more > than it is originally written. I guess that this could be used in > custom nodes with their own custom parallel nodes. Hm, I'm somewhat doubtful that that's something we should encourage. And doubtful we'd get it right without a concrete use case at hand to verify the design. Greetings, Andres Freund
On Mon, Apr 10, 2023 at 07:20:42PM +0000, Imseih (AWS), Sami wrote: > See v28 addressing the comments. This should be OK (also checked the code paths where the reports are added). Note that the patch needed a few adjustments for its indentation. -- Michael
Attachment
> This should be OK (also checked the code paths where the reports are > added). Note that the patch needed a few adjustments for its > indentation. Thanks for the formatting corrections! This looks good to me. -- Sami
On Wed, Apr 12, 2023 at 9:22 PM Imseih (AWS), Sami <simseih@amazon.com> wrote: > > > This should be OK (also checked the code paths where the reports are > > added). Note that the patch needed a few adjustments for its > > indentation. > > Thanks for the formatting corrections! This looks good to me. Thank you for updating the patch. It looks good to me too. I've updated the commit message. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
On Thu, Jul 06, 2023 at 11:07:14AM +0900, Masahiko Sawada wrote: > Thank you for updating the patch. It looks good to me too. I've > updated the commit message. Thanks. I was planning to review this patch today and/or tomorrow now that my stack of things to do is getting slightly lower (registered my name as committer as well a few weeks ago to not format). One thing I was planning to do is to move the new message processing API for the incrementational updates in its own commit for clarity, as that's a separate concept than the actual feature, useful on its own. Anyway, would you prefer taking care of it? -- Michael
Attachment
On Thu, Jul 6, 2023 at 11:15 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Jul 06, 2023 at 11:07:14AM +0900, Masahiko Sawada wrote: > > Thank you for updating the patch. It looks good to me too. I've > > updated the commit message. > > Thanks. I was planning to review this patch today and/or tomorrow now > that my stack of things to do is getting slightly lower (registered my > name as committer as well a few weeks ago to not format). > > One thing I was planning to do is to move the new message processing > API for the incrementational updates in its own commit for clarity, as > that's a separate concept than the actual feature, useful on its own. +1. I had the same idea. Please find the attached patches. > Anyway, would you prefer taking care of it? I can take care of it if you're okay. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
On Thu, Jul 6, 2023 at 2:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Jul 6, 2023 at 11:15 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > On Thu, Jul 06, 2023 at 11:07:14AM +0900, Masahiko Sawada wrote: > > > Thank you for updating the patch. It looks good to me too. I've > > > updated the commit message. > > > > Thanks. I was planning to review this patch today and/or tomorrow now > > that my stack of things to do is getting slightly lower (registered my > > name as committer as well a few weeks ago to not format). > > > > One thing I was planning to do is to move the new message processing > > API for the incrementational updates in its own commit for clarity, as > > that's a separate concept than the actual feature, useful on its own. > > +1. I had the same idea. Please find the attached patches. > > > Anyway, would you prefer taking care of it? > > I can take care of it if you're okay. > Pushed. -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com