Home > mailing lists

Re: [HACKERS] CLUSTER command progress monitor - Mailing list pgsql-hackers

From	Tatsuro Yamada
Subject	Re: [HACKERS] CLUSTER command progress monitor
Date	March 6, 2019 06:38:54
Msg-id	03cc5c0e-243c-e4a0-c5cf-a1f8380ca530@lab.ntt.co.jp Whole thread Raw
In response to	Re: [HACKERS] CLUSTER command progress monitor (Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp>)
Responses	Re: [HACKERS] CLUSTER command progress monitor
List	pgsql-hackers

Tree view

On 2019/03/05 17:56, Tatsuro Yamada wrote:
> Hi Robert!
> 
> On 2019/03/05 11:35, Robert Haas wrote:
>> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
>> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>>> === Current design ===
>>>
>>> CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
>>> Depending on which one is chosen, the command will proceed in the
>>> following sequence of phases:
>>>
>>>     * Scan method: Seq Scan
>>>       0. initializing                 (*2)
>>>       1. seq scanning heap            (*1)
>>>       3. sorting tuples               (*2)
>>>       4. writing new heap             (*1)
>>>       5. swapping relation files      (*2)
>>>       6. rebuilding index             (*2)
>>>       7. performing final cleanup     (*2)
>>>
>>>     * Scan method: Index Scan
>>>       0. initializing                 (*2)
>>>       2. index scanning heap          (*1)
>>>       5. swapping relation files      (*2)
>>>       6. rebuilding index             (*2)
>>>       7. performing final cleanup     (*2)
>>>
>>> VACUUM FULL command will proceed in the following sequence of phases:
>>>
>>>       1. seq scanning heap            (*1)
>>>       5. swapping relation files      (*2)
>>>       6. rebuilding index             (*2)
>>>       7. performing final cleanup     (*2)
>>>
>>> (*1): increasing the value in heap_tuples_scanned column
>>> (*2): only shows the phase in the phase column
>>
>> All of that sounds good.
>>
>>> The view provides the information of CLUSTER command progress details as follows
>>> # \d pg_stat_progress_cluster
>>>                 View "pg_catalog.pg_stat_progress_cluster"
>>>             Column           |  Type   | Collation | Nullable | Default
>>> ---------------------------+---------+-----------+----------+---------
>>>    pid                       | integer |           |          |
>>>    datid                     | oid     |           |          |
>>>    datname                   | name    |           |          |
>>>    relid                     | oid     |           |          |
>>>    command                   | text    |           |          |
>>>    phase                     | text    |           |          |
>>>    cluster_index_relid       | bigint  |           |          |
>>>    heap_tuples_scanned       | bigint  |           |          |
>>>    heap_tuples_vacuumed      | bigint  |           |          |
>>
>> Still not sure if we need heap_tuples_vacuumed.  We could try to
>> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
>> we're using a Seq Scan.
> 
> I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in
> next patch.
> 
> Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to
> get those from initscan(). I'll investigate it more.
> 
> cluster.c
>    copy_heap_data()
>      heap_beginscan()
>        heap_beginscan_internal()
>          initscan()
> 
> 
> 
>>> === Discussion points ===
>>>
>>>    - Progress counter for "3. sorting tuples" phase
>>>       - Should we add pgstat_progress_update_param() in tuplesort.c like a
>>>         "trace_sort"?
>>>         Thanks to Peter Geoghegan for the useful advice!
>>
>> How would we avoid an abstraction violation?
> 
> Hmm... What do you mean an abstraction violation?
> If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples.
> 
> 
>>>    - Progress counter for "6. rebuilding index" phase
>>>       - Should we add "index_vacuum_count" in the view like a vacuum progress monitor?
>>>         If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c.
>>>         However, I'm not sure whether it is okay or not.
>>
>> Doesn't seem unreasonable to me.
> 
> I see, I'll add it later.


Attached file is revised and WIP patch including:

   - Remove heap_tuples_vacuumed
   - Add heap_blks_scanned and heap_blks_total
   - Add index_vacuum_count

I tried to "add heap_blks_scanned and heap_blks_total" columns and I realized that
"heap_tuples_scanned" column is suitable as a counter when a scan method is
both index-scan and seq-scan because CLUSTER is on a tuple basis.



Regards,
Tatsuro Yamada

Attachment

progress_monitor_for_cluster_command_v8_code.patch

pgsql-hackers by date:

From: Amit Langote
Date: 06 March 2019, 06:34:12
Subject: Re: Update does not move row across foreign partitions in v11

From: David Rowley
Date: 06 March 2019, 06:45:06
Subject: pg_dump is broken for partition tablespaces

Re: [HACKERS] CLUSTER command progress monitor - Mailing list pgsql-hackers

Attachment

Previous

Next