Re: monitoring CREATE INDEX [CONCURRENTLY] - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: monitoring CREATE INDEX [CONCURRENTLY]
Date
Msg-id 201901242251.sh36e4poxsek@alvherre.pgsql
Whole thread Raw
In response to Re: monitoring CREATE INDEX [CONCURRENTLY]  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: monitoring CREATE INDEX [CONCURRENTLY]
List pgsql-hackers
On 2019-Jan-09, Pavan Deolasee wrote:

> Looks like it's missing the validate_index blocks-scanned report, which is
> not AM-specific and your original proposal does mention that.

True. That phase is actually 3 phases, which would be reported
separately:

  5. index_bulk_delete() scan
  6. performsort
  7. validate_index_heapscan


> I thought a bit about index_build part. If most AMs follow a somewhat
> standard phases while building an index, it might be simpler to define
> those phases and have AM agnostic code report those phases.

Well, think about btrees.  We first scan the whole table putting all
tuples in a spool (two spools actually), then we tell the spools to get
sorted, then we extract tuples from the spools, and finally we read the
spool to produce the leaf pages.  If we just report the table scan, the
reports gets to 100% complete in that phase and then waits for a very
long time during which nothing seems to happen.  That's not cool.

I'm adding a new AM support routine which turns the subphase number into
a textual description, so that we don't have to hardcode phase names in
the agnostic code.

> Can we have more descriptive text for waiters? Such as "waiting for
> existing writers", "waiting for intermediate writers" and "waiting for
> old readers".  Not sure if I got those correct, something of that sort
> will definitely give more insight into what the transaction is waiting
> for.

Can do.

> Can we actually also report the list of transactions the command is waiting
> on?
> That could be useful to the user if CIC appears to be stuck too long.

I'm afraid this is not possible, because the progress report interface
doesn't allow for something like that.

> IMHO we should just use full term INDEX instead of IDX, such as
> PROGRESS_CREATE_INDEX_PARTITIONS_TOTAL. It's already a long name, so couple
> of extra characters won't make a difference.

Really?  I though it was clear enough and it's *three* characters saved
even.

> I think we should clear the PROGRESS_WAITFOR_TOTAL and PROGRESS_WAITFOR_DONE
> when the wait phase is over, to avoid any confusion. For example, I noticed
> that the counters from WAIT_1 are reported as-is if WAIT_2 had no lockers.

Yes, absolutely.

> +GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *ocount)
> 
> Could that out variable be named something differently? "countp" or
> something like that. I did not check if there is some practice that we
> follow, but I remember suffixing with "p" rather than prefixing with
> "o" (for out I assume)

Sure.

> +/* Progress parameters for CREATE INDEX */
> +#define PROGRESS_CREATEIDX_PHASE 0
> +/* 1 and 2 reserved for "waitfor" metrics */
> +#define PROGRESS_CREATEIDX_PARTITIONS_TOTAL 3
> +#define PROGRESS_CREATEIDX_PARTITIONS_DONE 4
> +
> 
> Is there a reason to leave those reserve placeholders, only to fill them a
> few lines down?

Yes -- those are going to be used by other reports that also use similar
metrics, such as the CLUSTER report.


I'm running out of columns to put the numbers into :-(  Right now I have

1. phase
2. subphase (for index_build)
3. lockers total (to wait for)
4. lockers done
5. blocks total
6. blocks done
7. tapes total
8. tapes done
9. partitions total
10. partitions done

The "tapes total/done" bit is about reporting the performsort steps; I'm
not really sure yet it'll be tapes, but I hope it can be done with two
numbers.  Anyway, in btree build I have these subphases

1. spool heapscan using IndexBuildHeapScan
2. _bt_parallel_heapscan stuff            (only one of these in a build)
3. bt_leafbuild, performsort spool 1
4. bt_leafbuild, performsort spool 2
5. bt_load

and I don't have columns to put the metrics for the btload thing,
assuming I figure out what those are.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IOFailure Occurs
Next
From: Thomas Munro
Date:
Subject: Re: Synchronous replay take III