Re: Add index scan progress to pg_stat_progress_vacuum - Mailing list pgsql-hackers

From Imseih (AWS), Sami
Subject Re: Add index scan progress to pg_stat_progress_vacuum
Date
Msg-id 1E7CB510-FB91-4DDB-93E1-A19C05F299F3@amazon.com
Whole thread Raw
In response to Re: Add index scan progress to pg_stat_progress_vacuum  ("Imseih (AWS), Sami" <simseih@amazon.com>)
Responses Re: Add index scan progress to pg_stat_progress_vacuum  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
>    The change has been broken up as 3 separate patches.

>    0007-Expose-progress-for-the-vacuuming-indexes-and-cleani.patch - Introduces 2 new columns to
pg_stat_progress_vacuum,indexes_total and indexes_processed. These 2 columns will provide progress on the index
vacuuming/cleanup.
 >   0001-Expose-the-index-being-processed-in-the-vacuuming-in.patch - Introduces a new view called
pg_stat_prgoress_vacuum_index.This view tracks the index being vacuumed/cleaned and the total number of index tuples
removed.
 >   0001-Rename-index_vacuum_count-to-index_vacuum_cycle_coun.patch - Renames the existing index_vacuum_count to
index_vacuum_cycle_countin pg_stat_progress_vacuum. Due to the other changes, it makes sense to include "cycle" in the
columnname to be crystal clear that the column refers to the index cycle count.
 

 >   Thanks

Sending again with patch files renamed to ensure correct apply order.

    On 2/10/22, 1:39 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote:

        Attached is the latest version of the patch to deal with the changes in the recent commit
aa64f23b02924724eafbd9eadbf26d85df30a12b

        On 2/1/22, 2:32 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote:

            After speaking with Nathan offline, A few changes have been made to the patch.

            As mentioned earlier in the thread, tracking how many indexes are processed in PARALLEL vacuum mode is not
verystraightforward since only the workers or leader process have ability to inspect the Vacuum shared parallel state.


            The latest version of the patch introduces a shared memory to track indexes vacuumed/cleaned by each worker
(or leader ) in a PARALLEL vacuum. In order to present this data in the pg_stat_progress_vacuum view, the value of the
newcolumn "indexes_processed"  is retrieved from shared memory by pg_stat_get_progress_info. For non-parallel vacuums,
thevalue of "indexes_processed" is retrieved from the backend progress array directly. 
 

            The patch also includes the changes to implement the new view pg_stat_progress_vacuum_index which exposes
theindex being vacuumed/cleaned up.
 

            postgres=# \d+ pg_stat_progress_vacuum ;
                                   View "pg_catalog.pg_stat_progress_vacuum"
                   Column       |  Type   | Collation | Nullable | Default | Storage  | Description
            --------------------+---------+-----------+----------+---------+----------+-------------
             pid                | integer |           |          |         | plain    |
             datid              | oid     |           |          |         | plain    |
             datname            | name    |           |          |         | plain    |
             relid              | oid     |           |          |         | plain    |
             phase              | text    |           |          |         | extended |
             heap_blks_total    | bigint  |           |          |         | plain    |
             heap_blks_scanned  | bigint  |           |          |         | plain    |
             heap_blks_vacuumed | bigint  |           |          |         | plain    |
             index_vacuum_count | bigint  |           |          |         | plain    |
             max_dead_tuples    | bigint  |           |          |         | plain    |
             num_dead_tuples    | bigint  |           |          |         | plain    |
             indexes_total      | bigint  |           |          |         | plain    |                  <<<-- new
column
             indexes_processed  | bigint  |           |          |         | plain    |                  <<<-- new
column


            <<<--- new view --->>>

            postgres=# \d pg_stat_progress_vacuum_index
                  View "pg_catalog.pg_stat_progress_vacuum_index"
                 Column     |  Type   | Collation | Nullable | Default
            ----------------+---------+-----------+----------+---------
             pid            | integer |           |          |
             datid          | oid     |           |          |
             datname        | name    |           |          |
             indexrelid     | bigint  |           |          |
             leader_pid     | bigint  |           |          |
             phase          | text    |           |          |
             tuples_removed | bigint  |           |          |



            On 1/26/22, 8:07 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote:

                Attached is the latest patch and associated documentation.

                This version addresses the index_ordinal_position column confusion. Rather than displaying the index
position,the pg_stat_progress_vacuum view now has 2 new column(s):
 
                index_total - this column will show the total number of indexes to be vacuumed
                index_complete_count - this column will show the total number of indexes processed so far. In order to
dealwith the parallel vacuums, the parallel_workers ( planned workers ) value had to be exposed and each backends
performingan index vacuum/cleanup in parallel had to advertise the number of indexes it vacuumed/cleaned. The # of
indexesvacuumed for the parallel cleanup can then be derived the pg_stat_progress_vacuum view. 
 

                postgres=# \d pg_stat_progress_vacuum
                            View "pg_catalog.pg_stat_progress_vacuum"
                        Column        |  Type   | Collation | Nullable | Default
                ----------------------+---------+-----------+----------+---------
                 pid                  | integer |           |          |
                 datid                | oid     |           |          |
                 datname              | name    |           |          |
                 relid                | oid     |           |          |
                 phase                | text    |           |          |
                 heap_blks_total      | bigint  |           |          |
                 heap_blks_scanned    | bigint  |           |          |
                 heap_blks_vacuumed   | bigint  |           |          |
                 index_vacuum_count   | bigint  |           |          |
                 max_dead_tuples      | bigint  |           |          |
                 num_dead_tuples      | bigint  |           |          |
                 index_total          | bigint  |           |          |.
<<<---------------------
                 index_complete_count | numeric |           |          |.           <<<---------------------

                The pg_stat_progress_vacuum_index view includes:

                Indexrelid - the currently vacuumed index
                Leader_pid - the pid of the leader process. NULL if the process is the leader or vacuum is not
parallel
                tuples_removed - the amount of indexes tuples removed. The user can use this column to see that the
indexvacuum has movement.
 

                postgres=# \d pg_stat_progress_vacuum_index
                      View "pg_catalog.pg_stat_progress_vacuum_index"
                     Column     |  Type   | Collation | Nullable | Default
                ----------------+---------+-----------+----------+---------
                 pid            | integer |           |          |
                 datid          | oid     |           |          |
                 datname        | name    |           |          |
                 indexrelid     | bigint  |           |          |
                 phase          | text    |           |          |
                 leader_pid     | bigint  |           |          |
                 tuples_removed | bigint  |           |          |



                On 1/12/22, 9:52 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote:

                    On 1/12/22, 1:28 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

                        On 1/11/22, 11:46 PM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote:
                        > Regarding the new pg_stat_progress_vacuum_index view, why do we need
                        > to have a separate view? Users will have to check two views. If this
                        > view is expected to be used together with and joined to
                        > pg_stat_progress_vacuum, why don't we provide one view that has full
                        > information from the beginning? Especially, I think it's not useful
                        > that the total number of indexes to vacuum (num_indexes_to_vacuum
                        > column) and the current number of indexes that have been vacuumed
                        > (index_ordinal_position column) are shown in separate views.

                     > I suppose we could add all of the new columns to
                     > pg_stat_progress_vacuum and just set columns to NULL as appropriate.
                     > But is that really better than having a separate view?

                    To add, since a vacuum can utilize parallel worker processes + the main vacuum process to perform
indexvacuuming, it made sense to separate the backends doing index vacuum/cleanup in a separate view. 
 
                    Besides what Nathan suggested, the only other clean option I can think of is to perhaps create a
jsoncolumn in pg_stat_progress_vacuum which will include all the new fields. My concern with this approach is that it
willmake usability, to flatten the json, difficult for users.
 

                        > Also, I’m not sure how useful index_tuples_removed is; what can we
                        > infer from this value (without a total number)?

                    >    I think the idea was that you can compare it against max_dead_tuples
                    >   and num_dead_tuples to get an estimate of the current cycle progress.
                    >    Otherwise, it just shows that progress is being made.

                    The main purpose is to really show that the "index vacuum" phase is actually making progress. Note
thatfor certain types of indexes, i.e. GIN/GIST the number of tuples_removed will end up exceeding the number of
num_dead_tuples.

                        Nathan

                        [0] https://postgr.es/m/7874FB21-FAA5-49BD-8386-2866552656C7%40amazon.com








Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: making pg_regress less noisy by removing boilerplate
Next
From: Nathan Bossart
Date:
Subject: Re: Add index scan progress to pg_stat_progress_vacuum