Re: [HACKERS] CLUSTER command progress monitor - Mailing list pgsql-hackers

From Tatsuro Yamada
Subject Re: [HACKERS] CLUSTER command progress monitor
Date
Msg-id 288ef3f3-3b0b-9780-191b-67fea98e91ed@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] CLUSTER command progress monitor  (Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp>)
Responses Re: [HACKERS] CLUSTER command progress monitor
List pgsql-hackers
On 2019/03/06 15:38, Tatsuro Yamada wrote:
> On 2019/03/05 17:56, Tatsuro Yamada wrote:
>> On 2019/03/05 11:35, Robert Haas wrote:
>>> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
>>> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>>>> === Current design ===
>>>>
>>>> CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
>>>> Depending on which one is chosen, the command will proceed in the
>>>> following sequence of phases:
>>>>
>>>>     * Scan method: Seq Scan
>>>>       0. initializing                 (*2)
>>>>       1. seq scanning heap            (*1)
>>>>       3. sorting tuples               (*2)
>>>>       4. writing new heap             (*1)
>>>>       5. swapping relation files      (*2)
>>>>       6. rebuilding index             (*2)
>>>>       7. performing final cleanup     (*2)
>>>>
>>>>     * Scan method: Index Scan
>>>>       0. initializing                 (*2)
>>>>       2. index scanning heap          (*1)
>>>>       5. swapping relation files      (*2)
>>>>       6. rebuilding index             (*2)
>>>>       7. performing final cleanup     (*2)
>>>>
>>>> VACUUM FULL command will proceed in the following sequence of phases:
>>>>
>>>>       1. seq scanning heap            (*1)
>>>>       5. swapping relation files      (*2)
>>>>       6. rebuilding index             (*2)
>>>>       7. performing final cleanup     (*2)
>>>>
>>>> (*1): increasing the value in heap_tuples_scanned column
>>>> (*2): only shows the phase in the phase column
>>>
>>> All of that sounds good.
>>>
>>>> The view provides the information of CLUSTER command progress details as follows
>>>> # \d pg_stat_progress_cluster
>>>>                 View "pg_catalog.pg_stat_progress_cluster"
>>>>             Column           |  Type   | Collation | Nullable | Default
>>>> ---------------------------+---------+-----------+----------+---------
>>>>    pid                       | integer |           |          |
>>>>    datid                     | oid     |           |          |
>>>>    datname                   | name    |           |          |
>>>>    relid                     | oid     |           |          |
>>>>    command                   | text    |           |          |
>>>>    phase                     | text    |           |          |
>>>>    cluster_index_relid       | bigint  |           |          |
>>>>    heap_tuples_scanned       | bigint  |           |          |
>>>>    heap_tuples_vacuumed      | bigint  |           |          |
>>>
>>> Still not sure if we need heap_tuples_vacuumed.  We could try to
>>> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
>>> we're using a Seq Scan.
>>
>> I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in
>> next patch.
>>
>> Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to
>> get those from initscan(). I'll investigate it more.
>>
>> cluster.c
>>    copy_heap_data()
>>      heap_beginscan()
>>        heap_beginscan_internal()
>>          initscan()
>>
>>
>>
>>>> === Discussion points ===
>>>>
>>>>    - Progress counter for "3. sorting tuples" phase
>>>>       - Should we add pgstat_progress_update_param() in tuplesort.c like a
>>>>         "trace_sort"?
>>>>         Thanks to Peter Geoghegan for the useful advice!
>>>
>>> How would we avoid an abstraction violation?
>>
>> Hmm... What do you mean an abstraction violation?
>> If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples.
>>
>>
>>>>    - Progress counter for "6. rebuilding index" phase
>>>>       - Should we add "index_vacuum_count" in the view like a vacuum progress monitor?
>>>>         If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c.
>>>>         However, I'm not sure whether it is okay or not.
>>>
>>> Doesn't seem unreasonable to me.
>>
>> I see, I'll add it later.
> 
> 
> Attached file is revised and WIP patch including:
> 
>    - Remove heap_tuples_vacuumed
>    - Add heap_blks_scanned and heap_blks_total
>    - Add index_vacuum_count
> 
> I tried to "add heap_blks_scanned and heap_blks_total" columns and I realized that
> "heap_tuples_scanned" column is suitable as a counter when a scan method is
> both index-scan and seq-scan because CLUSTER is on a tuple basis.


Attached file is rebased patch on current HEAD.
I changed a status. :)


Regards,
Tatsuro Yamada




Attachment

pgsql-hackers by date:

Previous
From: "Nagaura, Ryohei"
Date:
Subject: RE: Timeout parameters
Next
From: Rajkumar Raghuwanshi
Date:
Subject: Re: [HACKERS] advanced partition matching algorithm forpartition-wise join