Re: monitoring CREATE INDEX [CONCURRENTLY] - Mailing list pgsql-hackers

From Rahila Syed
Subject Re: monitoring CREATE INDEX [CONCURRENTLY]
Date
Msg-id CAH2L28v_rWhBxnOM2di30=7GSmx3RSDHWfS-aa7hXYvfesrZCw@mail.gmail.com
Whole thread Raw
In response to monitoring CREATE INDEX [CONCURRENTLY]  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: monitoring CREATE INDEX [CONCURRENTLY]  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Hi Alvaro,

The WIP patch needs a rebase. Please see few in-line comments.

On Fri, Dec 21, 2018 at 3:30 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Monitoring progress of CREATE INDEX [CONCURRENTLY] is sure to be welcome,
so here's a proposal.

There are three distinct interesting cases.  One is straight CREATE
INDEX of a standalone table; then we have CREATE INDEX CONCURRENTLY;
finally, CREATE INDEX on a partitioned table.  Note that there's no
CONCURRENTLY for partitioned tables.

A non-concurrent build is a very straightforward: we call create_index,
which does index_build, done.  See below for how to report for
index_build, which is the interesting part.  I propose not to report
anything else than that for non-concurrent build.  There's some
preparatory work that's identical than for CIC (see below).  Like
VACUUM, it seems a bit pointless to report an initial phase that's
almost immediate, so I propose we just don't report anything until the
actual index building starts.
 
 Aren't we reporting this initial preparatory work in the form of 'initializing' phase that you
have in current patch? IIUC, there are no metrics but the name of the phase.

CREATE INDEX CONCURRENTLY does these things first, which we would not
report (this is just like VACUUM, which only starts reporting once it
starts scanning blocks):

a. lock rel.  No metrics to report.
b. other prep; includes lots of catalog access.  Unlikely to lock, but
   not impossible.  No metrics to report.
c. create_index.  CIC skips index_build here, so there's no reason to
   report it either.

We would start reporting at this point, with these phases:

1. WaitForLockers 1.  Report how many xacts do we need to wait for, how
   many are done.
2. index_build.  See below.
3. WaitForLockers 2.  Report how many xacts do we need to wait for, how
   many are done.
4. validate_index.  Scans the whole rel again.  Report number of blocks
   scanned.
5. wait for virtual XIDs.  Like WaitForLockers: report how many xacts we
   need to wait for, how many are done.

We're done.

(Alternatively, we could have an initial "prep" phase for a/b/c for the
concurrent case and a/b for non-concurrent.  I'm just not sure it's
useful.)
 
index_build
-----------

The actual index building is an AM-specific undertaking, and we report
its progress separately from the AM-agnostic code.  That is, each AM has
freedom to define its own list of phases and counters, separate from the
generic code.  This avoids the need to provide a new AM method or invoke
callbacks.  So when you see that CREATE_INDEX_PHASE is either "index
build" you'll have a separate BTREE_CREATE_PHASE value set to either
"scanning heap" or "sorting" or "building upper layers"; equivalently
for other AMs.

OK. 
I think the main phases in which index_build for most AMs can be divided is as follows:
1. Scanning heap tuples for building index which is common for all AMs
2. Forming index tuples which is AM-specific
3. Writing tuples into the index which is AM-specific. 
Out of these, metrics for phase 1 can be heap_tuples_scanned / total_heap_tuples and 
for phase 3, it can be index_tuples_computed/ total_index_tuples.
I am not sure about metrics for phase 2 especially for Btree which involves
reporting progress of sorting. 

Partitioned indexes
-------------------

For partitioned indexes, we only have the index build phase, but we
repeat it for each partition.  In addition to the index_build metrics
described above, we should report how many partitions we need to handle
in total and how many partitions are already done.  (I'm avoiding
getting in the trouble of reporting *which* partition we're currently
handling and have already handled.)

Thoughts?

+CREATE VIEW pg_stat_progress_create_index AS
+ SELECT
+ s.pid AS pid, S.datid AS datid, D.datname AS datname,
+ S.relid AS relid,
+ CASE s.param1 WHEN 0 THEN 'initializing'
+   WHEN 1 THEN 'waiting for lockers 1'
+   WHEN 2 THEN 'building index'
+   WHEN 3 THEN 'waiting for lockers 2'
+   WHEN 4 THEN 'validating index'
+   WHEN 5 THEN 'waiting for lockers 3'
+   END as phase,
+ S.param2 AS procs_to_wait_for,
+ S.param3 AS procs_waited_for,
+ S.param4 AS partitions_to_build,
+ S.param5 AS partitions_built
+ FROM pg_stat_get_progress_info('CREATE INDEX') AS S
+ LEFT JOIN pg_database D ON S.datid = D.oid;
+

1. In the above code, I think it will be useful if we report phases as
'Initializing phase 1 of n'
'Waiting for lockers phase 2 of n' etc. i.e reporting total number of phases as well.

+               holders = lappend(holders,
+                                                 GetLockConflicts(locktag, lockmode, &count));
+               total += count;
2. IIUC, the above code in WaitForLockersMultiple can be written under
if(progress) condition like rest of the progress checking code in that function
and pass NULL for count otherwise.

3. Will it be useful to report pid's of the backend's 
for the transactions which CREATE INDEX concurrently is waiting for?
I think it can be used to debug long running transactions.

4. Should we have additional statistics update phase before index_update_stats
like PROGRESS_VACUUM_PHASE_FINAL_CLEANUP?

5. I think it may be useful to report number of parallel workers requested and number of workers 
actually running index build in case of btree.

6. Also, this can be reported as an additional validation phase for exclusion constraint   
in index_build function as it involves scanning all live tuples of heap relation. 

 /*
         * If it's for an exclusion constraint, make a second pass over the heap
         * to verify that the constraint is satisfied.  We must not do this until
         * the index is fully valid.  (Broken HOT chains shouldn't matter, though;
         * see comments for IndexCheckExclusion.)
         */
        if (indexInfo->ii_ExclusionOps != NULL)
                IndexCheckExclusion(heapRelation, indexRelation, indexInfo);
*/

Thank you,
Rahila Syed

pgsql-hackers by date:

Previous
From: Adrien NAYRAT
Date:
Subject: Re: Log a sample of transactions
Next
From: "David G. Johnston"
Date:
Subject: Re: pg_dump multi VALUES INSERT