Re: [HACKERS] CLUSTER command progress monitor - Mailing list pgsql-hackers
From | Tatsuro Yamada |
---|---|
Subject | Re: [HACKERS] CLUSTER command progress monitor |
Date | |
Msg-id | 76d84ded-c81c-98b1-6fa4-4551ff295008@lab.ntt.co.jp Whole thread Raw |
In response to | Re: [HACKERS] CLUSTER command progress monitor (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] CLUSTER command progress monitor
Re: [HACKERS] CLUSTER command progress monitor |
List | pgsql-hackers |
On 2019/02/23 6:02, Robert Haas wrote: > On Fri, Dec 28, 2018 at 3:20 AM Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: >> This patch is rebased on HEAD. >> I'll tackle revising the patch based on feedbacks next month. > > + Running <command>VACUUM FULL</command> is listed in > <structname>pg_stat_progress_cluster</structname> > + view because it uses <command>CLUSTER</command> command > internally. See <xref linkend='cluster-progress-reporting'>. > > It's not really true to say that VACUUM FULL uses the CLUSTER command > internally. It's not really true. It uses a good chunk of the same > infrastructure, but it certainly doesn't use the actual command, and > it's not really the exact same thing either, because internally it's > doing a sequential scan but no sort, which never happens with CLUSTER. > I'm not sure exactly how to rephrase this, but I think we need to make > it more precise. > > One idea is that maybe we should try to think of a design that could > also handle the rewriting variants of ALTER TABLE, and call it > pg_stat_progress_rewrite. Maybe that's moving the goalposts too far, > but I'm not saying we'd necessarily have to do all the work now, just > have a view that we think could also handle that. Then again, maybe > the needs are too different. > Hmm..., I see. If possible, I'd like to stop thinking of VACUUM FULL to avoid complication of the implementation. For now, I haven't enough time to design pg_stat_progress_rewrite. I suppose that it's tough work. > + Whenever <command>CLUSTER</command> is running, the > + <structname>pg_stat_progress_cluster</structname> view will contain > + one row for each backend that is currently clustering or vacuuming > (VACUUM FULL). > > That sentence contradicts itself. Just say that it contains a row for > each backend that is currently running CLUSTER or VACUUM FULL. > Fixed. > @@ -105,6 +107,7 @@ static void reform_and_rewrite_tuple(HeapTuple tuple, > void > cluster(ClusterStmt *stmt, bool isTopLevel) > { > + > if (stmt->relation != NULL) > { > /* This is the single-relation case. */ > > Useless hunk. > Fixed. > @@ -186,7 +189,9 @@ cluster(ClusterStmt *stmt, bool isTopLevel) > heap_close(rel, NoLock); > > /* Do the job. */ > + pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid); > cluster_rel(tableOid, indexOid, stmt->options); > + pgstat_progress_end_command(); > } > else > { > > It seems like that stuff should be inside cluster_rel(). > Fixed. > + /* Set reltuples to total_tuples */ > + pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_TUPLES, > OldHeap->rd_rel->reltuples); > > I object. If the user wants that, they can get it from pg_class > themselves via an SQL query. It's also an estimate, not something we > know to be accurate; I want us to only report facts here, not theories I understand that progress monitor should only report facts, so I removed that code. > + pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE, > PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP); > + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, > PROGRESS_CLUSTER_METHOD_INDEX_SCAN); > > I think you should use pgstat_progress_update_multi_param() if > updating multiple parameters at the same time. > > Also, some lines in this patch, such as this one, are very long. > Consider techniques to reduce the line length to 80 characters or > less, such as inserting a line break between the two arguments. Fixed. > + /* Set scan_method to NULL */ > + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, -1); > > NULL and -1 are not the same thing. Oops, fixed. > I think that we shouldn't have both > PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP and > PROGRESS_CLUSTER_PHASE_SCAN_HEAP. They're the same thing. Let's just > use PROGRESS_CLUSTER_PHASE_SCAN_HEAP for both. Actually, better yet, > let's get rid of PROGRESS_CLUSTER_SCAN_METHOD and have > PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP and > PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP. That seems noticeably > simpler. Fixed. > I agree that it's acceptable to report > PROGRESS_CLUSTER_HEAP_TUPLES_VACUUMED and > PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD, but I'm not sure I > understand why it's valuable to do so in the context of a progress > indicator. Actually, I'm not sure why I added it since so much time has passed. :( So, I'll remove PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD at least. Attached patch is wip patch. Thanks! Tatsuro Yamada
Attachment
pgsql-hackers by date: