[HACKERS] CLUSTER command progress monitor - Mailing list pgsql-hackers

From Tatsuro Yamada
Subject [HACKERS] CLUSTER command progress monitor
Date
Msg-id 59A77072.3090401@lab.ntt.co.jp
Whole thread Raw
Responses Re: [HACKERS] CLUSTER command progress monitor  (Thomas Munro <thomas.munro@enterprisedb.com>)
Re: [HACKERS] CLUSTER command progress monitor  (Michael Paquier <michael.paquier@gmail.com>)
Re: [HACKERS] CLUSTER command progress monitor  (Robert Haas <robertmhaas@gmail.com>)
Re: [HACKERS] CLUSTER command progress monitor  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
Hi,

Following is a proposal for reporting the progress of CLUSTER command:

It seems that the following could be the phases of CLUSTER processing:

  1. scanning heap
  2. sort tuples
  3. writing new heap
  4. scan heap and write new heap
  5. swapping relation files
  6. rebuild index
  7. performing final cleanup

These phases are based on Rahila's presentation at PGCon 2017
 (https://www.pgcon.org/2017/schedule/attachments/472_Progress%20Measurement%20PostgreSQL.pdf)
and I added some phases on it.

CLUSTER command may use Index Scan or Seq Scan when scanning the heap.
Depending on which one is chosen, the command will proceed in the
following sequence of phases:

  * Seq Scan
    1. scanning heap
    2. sort tuples
    3. writing new heap
    5. swapping relation files
    6. rebuild index
    7. performing final cleanup

  * Index Scan
    4. scan heap and write new heap
    5. swapping relation files
    6. rebuild index
    7. performing final cleanup

The view provides the information of CLUSTER command progress details as follows
postgres=# \d pg_stat_progress_cluster
           View "pg_catalog.pg_stat_progress_cluster"
       Column        |  Type   | Collation | Nullable | Default
---------------------+---------+-----------+----------+---------
 pid                 | integer |           |          |
 datid               | oid     |           |          |
 datname             | name    |           |          |
 relid               | oid     |           |          |
 phase               | text    |           |          |
 scan_method         | text    |           |          |
 scan_index_relid    | bigint  |           |          |
 heap_tuples_total   | bigint  |           |          |
 heap_tuples_scanned | bigint  |           |          |


Then I have questions.

  * Should we have separate views for them?  Or should both be covered by the
    same view with some indication of which command (CLUSTER or VACUUM FULL)
    is actually running?
    I mean this progress monitor could be covering not only CLUSTER command but also
    VACUUM FULL command.

  * I chose tuples as scan heap's counter (heap_tuples_scanned) since it's not
    easy to get current blocks from Index Scan. Is it Ok?


I'll add this patch to CF2017-09.
Any comments or suggestion are welcome.

Regards,
Tatsuro Yamada
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: "Bossart, Nathan"
Date:
Subject: Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands
Next
From: Peter Eisentraut
Date:
Subject: [HACKERS] document and use SPI_result_code_string()