Re: Add index scan progress to pg_stat_progress_vacuum - Mailing list pgsql-hackers

From Imseih (AWS), Sami
Subject Re: Add index scan progress to pg_stat_progress_vacuum
Date
Msg-id 40222BEA-9C20-433F-8476-58179F3547A7@amazon.com
Whole thread Raw
In response to Re: Add index scan progress to pg_stat_progress_vacuum  ("Bossart, Nathan" <bossartn@amazon.com>)
Responses Re: Add index scan progress to pg_stat_progress_vacuum
List pgsql-hackers
On 1/11/22, 1:01 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

    On 1/10/22, 5:01 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote:
    > I have attached the 3rd revision of the patch which also includes the documentation changes. Also attached is a
renderedhtml of the docs for review.
 
    >
    > "max_index_vacuum_cycle_time" has been removed.
    > "index_rows_vacuumed" renamed to "index_tuples_removed". "tuples" is a more consistent with the terminology
used.
    > "vacuum_cycle_ordinal_position" renamed to "index_ordinal_position".

    Thanks for the new version of the patch!

    nitpick: I get one whitespace error when applying the patch.

            Applying: Expose progress for the "vacuuming indexes" phase of a VACUUM operation.
            .git/rebase-apply/patch:44: tab in indent.
               Whenever <xref linkend="guc-vacuum-failsafe-age"/> is triggered, index
            warning: 1 line adds whitespace errors.

That was missed. Will fix it.

    +     <row>
    +      <entry role="catalog_table_entry"><para role="column_definition">
    +       <structfield>num_indexes_to_vacuum</structfield> <type>bigint</type>
    +      </para>
    +      <para>
    +       The number of indexes that will be vacuumed. Only indexes with
    +       <literal>pg_index.indisready</literal> set to "true" will be vacuumed.
    +       Whenever <xref linkend="guc-vacuum-failsafe-age"/> is triggered, index
    +       vacuuming will be bypassed.
    +      </para></entry>
    +     </row>
    +    </tbody>
    +   </tgroup>
    +  </table>

    We may want to avoid exhaustively listing the cases when this value
    will be zero.  I would suggest saying, "When index cleanup is skipped,
    this value will be zero" instead.

What about something like  "The number of indexes that are eligible for vacuuming".
This covers the cases where either an individual index is skipped or the entire "index vacuuming" phase is skipped.

    +     <row>
    +      <entry role="catalog_table_entry"><para role="column_definition">
    +       <structfield>relid</structfield> <type>oid</type>
    +      </para>
    +      <para>
    +       OID of the table being vacuumed.
    +      </para></entry>
    +     </row>

    Do we need to include this field?  I would expect indexrelid to go
    here.

Having indexrelid and relid makes the pg_stat_progress_vacuum_index view "self-contained". A user can lookup the index
andtable being vacuumed without joining back to pg_stat_progress_vacuum.
 

    +     <row>
    +      <entry role="catalog_table_entry"><para role="column_definition">
    +       <structfield>leader_pid</structfield> <type>bigint</type>
    +      </para>
    +      <para>
    +       Process ID of the parallel group leader. This field is <literal>NULL</literal>
    +       if this process is a parallel group leader or the
    +       <literal>vacuuming indexes</literal> phase is not performed in parallel.
    +      </para></entry>
    +     </row>

    Are there cases where the parallel group leader will have an entry in
    this view when parallelism is enabled?

Yes. A parallel group leader can perform an index vacuum just like a parallel worker. If you do something like "vacuum
(parallel3) ", you may have up to 4 processes vacuuming indexes. The leader + 3 workers. 
 

    +     <row>
    +      <entry role="catalog_table_entry"><para role="column_definition">
    +       <structfield>index_ordinal_position</structfield> <type>bigint</type>
    +      </para>
    +      <para>
    +       The order in which the index is being vacuumed. Indexes are vacuumed by OID in ascending order.
    +      </para></entry>
    +     </row>

    Should we include the bit about the OID ordering?  I suppose that is
    unlikely to change in the near future, but I don't know if it is
    relevant information.  Also, do we need to include the "index_"
    prefix?  This view is specific for indexes.  (I have the same question
    for index_tuples_removed.)

I was on the fence about both of these as well. Will make a change to this.

    Should this new table go after the "VACUUM phases" table?  It might
    make sense to keep the phases table closer to where it is referenced.

I did not think that would read better. The introduction discusses both views and the "phase" table is linked from the
pg_stat_progress_vacuum
 

    +    /* Advertise the number of indexes to vacuum if we are not in failsafe mode */
    +    if (!lazy_check_wraparound_failsafe(vacrel))
    +        pgstat_progress_update_param(PROGRESS_VACUUM_TOTAL_INDEX_VACUUM, vacrel->nindexes);

    Shouldn't this be 0 when INDEX_CLEANUP is off, too?

This view is only covering the "vacuum index" phase, but it should also cover index_cleanup phase as well. Will update
thepatch.
 

    +#define PROGRESS_VACUUM_CURRENT_INDRELID         7
    +#define PROGRESS_VACUUM_LEADER_PID               8
    +#define PROGRESS_VACUUM_INDEX_ORDINAL            9
    +#define PROGRESS_VACUUM_TOTAL_INDEX_VACUUM       10
    +#define PROGRESS_VACUUM_DEAD_TUPLES_VACUUMED     11

    nitpick: I would suggest the following names to match the existing
    style:

            PROGRESS_VACUUM_NUM_INDEXES_TO_VACUUM
            PROGRESS_VACUUM_INDEX_LEADER_PID
            PROGRESS_VACUUM_INDEX_INDEXRELID
            PROGRESS_VACUUM_INDEX_ORDINAL_POSITION
            PROGRESS_VACUUM_INDEX_TUPLES_REMOVED

That looks better.

    Nathan



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Boyer-More-Horspool searching LIKE queries
Next
From: "Bossart, Nathan"
Date:
Subject: Re: Add jsonlog log_destination for JSON server logs