Re: [HACKERS] CLUSTER command progress monitor - Mailing list pgsql-hackers
From | Tatsuro Yamada |
---|---|
Subject | Re: [HACKERS] CLUSTER command progress monitor |
Date | |
Msg-id | 03cc5c0e-243c-e4a0-c5cf-a1f8380ca530@lab.ntt.co.jp Whole thread Raw |
In response to | Re: [HACKERS] CLUSTER command progress monitor (Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] CLUSTER command progress monitor
|
List | pgsql-hackers |
On 2019/03/05 17:56, Tatsuro Yamada wrote: > Hi Robert! > > On 2019/03/05 11:35, Robert Haas wrote: >> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada >> <yamada.tatsuro@lab.ntt.co.jp> wrote: >>> === Current design === >>> >>> CLUSTER command uses Index Scan or Seq Scan when scanning the heap. >>> Depending on which one is chosen, the command will proceed in the >>> following sequence of phases: >>> >>> * Scan method: Seq Scan >>> 0. initializing (*2) >>> 1. seq scanning heap (*1) >>> 3. sorting tuples (*2) >>> 4. writing new heap (*1) >>> 5. swapping relation files (*2) >>> 6. rebuilding index (*2) >>> 7. performing final cleanup (*2) >>> >>> * Scan method: Index Scan >>> 0. initializing (*2) >>> 2. index scanning heap (*1) >>> 5. swapping relation files (*2) >>> 6. rebuilding index (*2) >>> 7. performing final cleanup (*2) >>> >>> VACUUM FULL command will proceed in the following sequence of phases: >>> >>> 1. seq scanning heap (*1) >>> 5. swapping relation files (*2) >>> 6. rebuilding index (*2) >>> 7. performing final cleanup (*2) >>> >>> (*1): increasing the value in heap_tuples_scanned column >>> (*2): only shows the phase in the phase column >> >> All of that sounds good. >> >>> The view provides the information of CLUSTER command progress details as follows >>> # \d pg_stat_progress_cluster >>> View "pg_catalog.pg_stat_progress_cluster" >>> Column | Type | Collation | Nullable | Default >>> ---------------------------+---------+-----------+----------+--------- >>> pid | integer | | | >>> datid | oid | | | >>> datname | name | | | >>> relid | oid | | | >>> command | text | | | >>> phase | text | | | >>> cluster_index_relid | bigint | | | >>> heap_tuples_scanned | bigint | | | >>> heap_tuples_vacuumed | bigint | | | >> >> Still not sure if we need heap_tuples_vacuumed. We could try to >> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if >> we're using a Seq Scan. > > I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in > next patch. > > Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to > get those from initscan(). I'll investigate it more. > > cluster.c > copy_heap_data() > heap_beginscan() > heap_beginscan_internal() > initscan() > > > >>> === Discussion points === >>> >>> - Progress counter for "3. sorting tuples" phase >>> - Should we add pgstat_progress_update_param() in tuplesort.c like a >>> "trace_sort"? >>> Thanks to Peter Geoghegan for the useful advice! >> >> How would we avoid an abstraction violation? > > Hmm... What do you mean an abstraction violation? > If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples. > > >>> - Progress counter for "6. rebuilding index" phase >>> - Should we add "index_vacuum_count" in the view like a vacuum progress monitor? >>> If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c. >>> However, I'm not sure whether it is okay or not. >> >> Doesn't seem unreasonable to me. > > I see, I'll add it later. Attached file is revised and WIP patch including: - Remove heap_tuples_vacuumed - Add heap_blks_scanned and heap_blks_total - Add index_vacuum_count I tried to "add heap_blks_scanned and heap_blks_total" columns and I realized that "heap_tuples_scanned" column is suitable as a counter when a scan method is both index-scan and seq-scan because CLUSTER is on a tuple basis. Regards, Tatsuro Yamada
Attachment
pgsql-hackers by date: