Re: monitoring CREATE INDEX [CONCURRENTLY] - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: monitoring CREATE INDEX [CONCURRENTLY]
Date
Msg-id 20190212031824.GA14185@alvherre.pgsql
Whole thread Raw
In response to Re: monitoring CREATE INDEX [CONCURRENTLY]  (Rahila Syed <rahilasyed90@gmail.com>)
Responses Re: monitoring CREATE INDEX [CONCURRENTLY]
Re: monitoring CREATE INDEX [CONCURRENTLY]
List pgsql-hackers
Hi Rahila, Pavan,

Thanks for the review.  I incorporated some fixes per your comments.
More fixes are needed still.  That said, the patch in attachment gives
good insight into how I think this should turn out.

> > index_build
> > -----------

> OK.
> I think the main phases in which index_build for most AMs can be divided is
> as follows:
> [...]

I ended up defining phases for the index_build phase in the AM itself;
the code reports a phase number using the regular API, and then the
pgstat_progress view obtains the name of each phase using a support
method.

For btree, I ended up not reporting anything about the sort per se; we
just scan the heap (reporting block numbers, which is easy because we
know the heap size in advance) and count heap tuples while scanning;
once that's done, we know how many tuples we need to write out to the
index, so that total becomes the next stage's target total.  That's
simpler. (It is wrong for partial indexes currently, though.)

So for btree we have this:

/*
 *    btphasename() -- Return name of phase, for index build progress report
 */
char *
btphasename(int64 phasenum)
{
    switch (phasenum)
    {
        case PROGRESS_CREATEIDX_SUBPHASE_INITIALIZE:
            return "initializing (1/5)";
        case PROGRESS_BTREE_PHASE_INDEXBUILD_HEAPSCAN:
            return "table scan (2/5)";
        case PROGRESS_BTREE_PHASE_PERFORMSORT_1:
            return "sorting tuples, spool 1 (3/5)";
        case PROGRESS_BTREE_PHASE_PERFORMSORT_2:
            return "sorting tuples, spool 2 (4/5)";
        case PROGRESS_BTREE_PHASE_LEAF_LOAD:
            return "final btree sort & load (5/5)";
        default:
            return NULL;
    }
}

Now this is a bit strange, because the report looks like this:

  phase                 | building index (3 of 8): initializing (1/5)
  [...]
  blocks total          | 442478
  blocks done           | 3267

So for phase 3, we always have phase and subphase counters in the phase
name.  However, I don't think there's any clean way to know from the
very beginning how many subphases are there going to be, and it would be
even more confusing to have the total "of N" number vary each time
depending on the access method.  So this leaves the phase counter going
from 1 to 8, and then for phase 3 you have a second part that goes from
1 to N.

Anyway, at some point it completes the heap scan, and the phase changes
to this:

  phase                 | building index (3 of 8): heap scan(2/5)

I think I should take Rahila's suggestion to report the number of
workers running (but I'm not sure I see the point in reporting number of
workers planned).

The heap scan quickly gives way to the next one,

  phase                 | building index (3 of 8): final btree sort & load (5/5)
  [...]
  tuples total          | 100000000
  tuples done           | 58103713

Finally,
  phase                 | validating index scan (phase 5 of 8)
and
  phase                 | validate index heapscan (phase 7 of 8)

and then it's over.

Now, it's clear that I haven't yet nailed all the progress updates
correctly, because there are some stalls where nothing seems to be
happening.  The final index_validate phases have been ignored so far.

> 1. In the above code, I think it will be useful if we report phases as
> 'Initializing phase 1 of n'
> 'Waiting for lockers phase 2 of n' etc. i.e reporting total number of
> phases as well.

Great idea, done.

> 2. IIUC, the above code in WaitForLockersMultiple can be written under
> if(progress) condition like rest of the progress checking code in that
> function
> and pass NULL for count otherwise.

Yep, good point.

> 3. Will it be useful to report pid's of the backend's
> for the transactions which CREATE INDEX concurrently is waiting for?
> I think it can be used to debug long running transactions.

Done.

> 4. Should we have additional statistics update phase before
> index_update_stats
> like PROGRESS_VACUUM_PHASE_FINAL_CLEANUP?

Not sure about this one ... it's supposed to be a fairly quick
operation.

> 5. I think it may be useful to report number of parallel workers requested
> and number of workers
> actually running index build in case of btree.

True, I haven't done this one.  I'll add it next.

> 6. Also, this can be reported as an additional validation phase for
> exclusion constraint
> in index_build function as it involves scanning all live tuples of heap
> relation.
> 
>  /*
>          * If it's for an exclusion constraint, make a second pass over the
> heap
>          * to verify that the constraint is satisfied.  We must not do this
> until
>          * the index is fully valid.  (Broken HOT chains shouldn't matter,
> though;
>          * see comments for IndexCheckExclusion.)
>          */
>         if (indexInfo->ii_ExclusionOps != NULL)
>                 IndexCheckExclusion(heapRelation, indexRelation, indexInfo);
> */

Hmm, I haven't looked into exclusion constraints thus far ... I suppose
this is a critical point to keep in mind.

Thanks for the review!

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: pg11.1: dsa_area could not attach to segment
Next
From: Kevin Hale Boyes
Date:
Subject: Re: Connection slots reserved for replication