Re: [HACKERS] CLUSTER command progress monitor - Mailing list pgsql-hackers

From Tatsuro Yamada
Subject Re: [HACKERS] CLUSTER command progress monitor
Date
Msg-id e73dc8f7-1bb8-efb2-0c20-45f1b7d995d9@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] CLUSTER command progress monitor  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] CLUSTER command progress monitor
Re: [HACKERS] CLUSTER command progress monitor
List pgsql-hackers
Hi Robert!

On 2019/03/05 11:35, Robert Haas wrote:
> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>> === Current design ===
>>
>> CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
>> Depending on which one is chosen, the command will proceed in the
>> following sequence of phases:
>>
>>     * Scan method: Seq Scan
>>       0. initializing                 (*2)
>>       1. seq scanning heap            (*1)
>>       3. sorting tuples               (*2)
>>       4. writing new heap             (*1)
>>       5. swapping relation files      (*2)
>>       6. rebuilding index             (*2)
>>       7. performing final cleanup     (*2)
>>
>>     * Scan method: Index Scan
>>       0. initializing                 (*2)
>>       2. index scanning heap          (*1)
>>       5. swapping relation files      (*2)
>>       6. rebuilding index             (*2)
>>       7. performing final cleanup     (*2)
>>
>> VACUUM FULL command will proceed in the following sequence of phases:
>>
>>       1. seq scanning heap            (*1)
>>       5. swapping relation files      (*2)
>>       6. rebuilding index             (*2)
>>       7. performing final cleanup     (*2)
>>
>> (*1): increasing the value in heap_tuples_scanned column
>> (*2): only shows the phase in the phase column
> 
> All of that sounds good.
> 
>> The view provides the information of CLUSTER command progress details as follows
>> # \d pg_stat_progress_cluster
>>                 View "pg_catalog.pg_stat_progress_cluster"
>>             Column           |  Type   | Collation | Nullable | Default
>> ---------------------------+---------+-----------+----------+---------
>>    pid                       | integer |           |          |
>>    datid                     | oid     |           |          |
>>    datname                   | name    |           |          |
>>    relid                     | oid     |           |          |
>>    command                   | text    |           |          |
>>    phase                     | text    |           |          |
>>    cluster_index_relid       | bigint  |           |          |
>>    heap_tuples_scanned       | bigint  |           |          |
>>    heap_tuples_vacuumed      | bigint  |           |          |
> 
> Still not sure if we need heap_tuples_vacuumed.  We could try to
> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
> we're using a Seq Scan.

I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in
next patch.

Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to
get those from initscan(). I'll investigate it more.

cluster.c
   copy_heap_data()
     heap_beginscan()
       heap_beginscan_internal()
         initscan()



>> === Discussion points ===
>>
>>    - Progress counter for "3. sorting tuples" phase
>>       - Should we add pgstat_progress_update_param() in tuplesort.c like a
>>         "trace_sort"?
>>         Thanks to Peter Geoghegan for the useful advice!
> 
> How would we avoid an abstraction violation?

Hmm... What do you mean an abstraction violation?
If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples.


>>    - Progress counter for "6. rebuilding index" phase
>>       - Should we add "index_vacuum_count" in the view like a vacuum progress monitor?
>>         If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c.
>>         However, I'm not sure whether it is okay or not.
> 
> Doesn't seem unreasonable to me.

I see, I'll add it later.


Regards,
Tatsuro Yamada






pgsql-hackers by date:

Previous
From: "Iwata, Aya"
Date:
Subject: RE: libpq debug log
Next
From: David Steele
Date:
Subject: Re: NOT IN subquery optimization