Re: [HACKERS] CLUSTER command progress monitor - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] CLUSTER command progress monitor
Date
Msg-id CA+TgmoYueDAgNKmPtV3AWrKCX8rQcnTeQRY7FxfhrVK63dzoeg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] CLUSTER command progress monitor  (Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp>)
Responses Re: [HACKERS] CLUSTER command progress monitor
List pgsql-hackers
On Mon, Mar 18, 2019 at 10:03 PM Tatsuro Yamada
<yamada.tatsuro@lab.ntt.co.jp> wrote:
> Attached patch is a rebased document patch. :)

Attached is an updated patch.  I went through this patch carefully
today, in the hopes of committing it, and I think the attached version
is pretty closet to being committable, but there's at least one open
issue remaining, as described below.

- The regression tests did not pass because expected/rules.out was not
properly updated.  I fixed that.

- The documentation did not build because some tags were not properly
terminated e.g. <xref linkend='...'> rather than <xref
linkend='...'/>.  I also fixed that.

- The documentation had two nearly-identical lists of phases.  I
merged them into one.  There might be room for some further
fine-tuning here.

- cluster_rel() had multiple places where it could return without
calling pgstat_progress_end_command().  I fixed that.

- cluster_rel() inexplicably delayed updating PROGRESS_CLUSTER_COMMAND
for longer than seems necessary.  I fixed that.

- copy_heap_data() zeroed out the heap-tuples-scanned,
heap-blocks-scanned, and total-heap-blocks counters when it began
PROGRESS_CLUSTER_PHASE_SORT_TUPLES and
PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES.  This seems like throwing away
useful information for no good reason.  I changed it not to do that in
all cases except the one mentioned in the next paragraph.

- It *is* currently to reset PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED
because that counter gets reused to indicate the number of heap tuples
*written back out*, but I think that is bad design for two reasons.
First, the documentation does not explain that sometimes the number of
heap tuples scanned is really reporting the number of heap tuples
written.  Second, it's bad for columns to have misleading names.
Third, it would actually be really useful to store these values in
separate columns, because then you could expect that the number tuples
written would eventually equal the number scanned, and you'd still
have the number that were scanned around so that you could clearly see
how close you were getting to rewriting the entire heap.  This is the
one thing I found but did not fix; any chance you could make this
change and update the documentation to match?

- The comment about reporting that we are now reindexing relations was
jammed in between an existing comment and the associated code.  I
moved it to a more logical place.

- The new if-statement in pg_stat_get_progress_info was missing a
space required by project style.  I added the space.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Willing to fix a PQexec() in libpq module
Next
From: Robert Haas
Date:
Subject: Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?