Thread: [HACKERS] CLUSTER command progress monitor

[HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

31 August 2017, 05:12:02

Hi,

Following is a proposal for reporting the progress of CLUSTER command:

It seems that the following could be the phases of CLUSTER processing:

  1. scanning heap
  2. sort tuples
  3. writing new heap
  4. scan heap and write new heap
  5. swapping relation files
  6. rebuild index
  7. performing final cleanup

These phases are based on Rahila's presentation at PGCon 2017
 (https://www.pgcon.org/2017/schedule/attachments/472_Progress%20Measurement%20PostgreSQL.pdf)
and I added some phases on it.

CLUSTER command may use Index Scan or Seq Scan when scanning the heap.
Depending on which one is chosen, the command will proceed in the
following sequence of phases:

  * Seq Scan
    1. scanning heap
    2. sort tuples
    3. writing new heap
    5. swapping relation files
    6. rebuild index
    7. performing final cleanup

  * Index Scan
    4. scan heap and write new heap
    5. swapping relation files
    6. rebuild index
    7. performing final cleanup

The view provides the information of CLUSTER command progress details as follows
postgres=# \d pg_stat_progress_cluster
           View "pg_catalog.pg_stat_progress_cluster"
       Column        |  Type   | Collation | Nullable | Default
---------------------+---------+-----------+----------+---------
 pid                 | integer |           |          |
 datid               | oid     |           |          |
 datname             | name    |           |          |
 relid               | oid     |           |          |
 phase               | text    |           |          |
 scan_method         | text    |           |          |
 scan_index_relid    | bigint  |           |          |
 heap_tuples_total   | bigint  |           |          |
 heap_tuples_scanned | bigint  |           |          |


Then I have questions.

  * Should we have separate views for them?  Or should both be covered by the
    same view with some indication of which command (CLUSTER or VACUUM FULL)
    is actually running?
    I mean this progress monitor could be covering not only CLUSTER command but also
    VACUUM FULL command.

  * I chose tuples as scan heap's counter (heap_tuples_scanned) since it's not
    easy to get current blocks from Index Scan. Is it Ok?


I'll add this patch to CF2017-09.
Any comments or suggestion are welcome.

Regards,
Tatsuro Yamada
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

progress_monitor_cluster_v1.patch

Re: [HACKERS] CLUSTER command progress monitor

From

Thomas Munro

Date:

01 September 2017, 08:17:29

On Thu, Aug 31, 2017 at 2:12 PM, Tatsuro Yamada
<yamada.tatsuro@lab.ntt.co.jp> wrote:
> Any comments or suggestion are welcome.

Although this patch updates src/test/regress/expected/rules.out I
think perhaps you included the wrong version?  That regression test
fails for me

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

01 September 2017, 09:38:12

Hi Thomas,

>> Any comments or suggestion are welcome.
>
> Although this patch updates src/test/regress/expected/rules.out I
> think perhaps you included the wrong version?  That regression test
> fails for me

Thanks for the comment.

I use the patch on 7b69b6ce and it's fine.
Did you use "initdb" command after "make install"?
The pg_stat_progress_cluster view is created in initdb, probably.

Regards,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

From

Masahiko Sawada

Date:

01 September 2017, 11:59:47

On Fri, Sep 1, 2017 at 3:38 PM, Tatsuro Yamada
<yamada.tatsuro@lab.ntt.co.jp> wrote:
> Hi Thomas,
>
>>> Any comments or suggestion are welcome.
>>
>>
>> Although this patch updates src/test/regress/expected/rules.out I
>> think perhaps you included the wrong version?  That regression test
>> fails for me
>
>
> Thanks for the comment.
>
> I use the patch on 7b69b6ce and it's fine.
> Did you use "initdb" command after "make install"?
> The pg_stat_progress_cluster view is created in initdb, probably.
>

I also got a regression test error (applied to abe85ef). Here is
regression.diff file.

*** /home/masahiko/source/postgresql/src/test/regress/expected/rules.out      2017-09-01 17:27:33.680055612 -0700
--- /home/masahiko/source/postgresql/src/test/regress/results/rules.out
2017-09-01 17:28:10.410055596 -0700
***************
*** 1819,1824 ****
--- 1819,1849 ----     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
pg_stat_get_db_conflict_startup_deadlock(d.oid)AS confl_deadlock    FROM pg_database d;
 
+ pg_stat_progress_cluster| SELECT s.pid,
+     s.datid,
+     d.datname,
+     s.relid,
+         CASE s.param1
+             WHEN 0 THEN 'initializing'::text
+             WHEN 1 THEN 'scanning heap'::text
+             WHEN 2 THEN 'sorting tuples'::text
+             WHEN 3 THEN 'writing new heap'::text
+             WHEN 4 THEN 'scan heap and write new heap'::text
+             WHEN 5 THEN 'swapping relation files'::text
+             WHEN 6 THEN 'rebuilding index'::text
+             WHEN 7 THEN 'performing final cleanup'::text
+             ELSE NULL::text
+         END AS phase,
+         CASE s.param2
+             WHEN 0 THEN 'index scan'::text
+             WHEN 1 THEN 'seq scan'::text
+             ELSE NULL::text
+         END AS scan_method,
+     s.param3 AS scan_index_relid,
+     s.param4 AS heap_tuples_total,
+     s.param5 AS heap_tuples_scanned
+    FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid,
relid, param1, param2, param3, param4, param5, param6, param7, param8,
param9, param10)
+      LEFT JOIN pg_database d ON ((s.datid = d.oid))); pg_stat_progress_vacuum| SELECT s.pid,     s.datid,
d.datname,
***************
*** 1841,1871 ****     s.param7 AS num_dead_tuples    FROM (pg_stat_get_progress_info('VACUUM'::text) s(pid, datid,
relid, param1, param2, param3, param4, param5, param6, param7, param8,
param9, param10)      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
- pg_stat_progress_cluster| SELECT
-     s.pid,
-     s.datid,
-     d.datname,
-     s.relid,
-         CASE s.param1
-             WHEN 0 THEN 'initializing'::text
-             WHEN 1 THEN 'scanning heap'::text
-             WHEN 2 THEN 'sorting tuples'::text
-             WHEN 3 THEN 'writing new heap'::text
-             WHEN 4 THEN 'scan heap and write new heap'::text
-             WHEN 5 THEN 'swapping relation files'::text
-             WHEN 6 THEN 'rebuilding index'::text
-             WHEN 7 THEN 'performing final cleanup'::text
-             ELSE NULL::text
-         END AS phase,
-         CASE S.param2
-             WHEN 0 THEN 'index scan'
-             WHEN 1 THEN 'seq scan'
-             END AS scan_method,
-     s.param3 AS index_relid,
-     s.param4 AS heap_blks_total,
-     s.param5 AS heap_blks_scanned
-    FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid,
relid, param1, param2, param3, param4, param5)
-      LEFT JOIN pg_database d ON ((s.datid = d.oid))); pg_stat_replication| SELECT s.pid,     s.usesysid,
u.rolnameAS usename,
 
--- 1866,1871 ----

======================================================================

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

04 September 2017, 05:37:04

Hi Sawada-san, Thomas,

Thanks for sharing the reggression.diff.
I realized Thomas's comment is right.

Attached patch is fixed version.
Could you try it?

Regards,

Tatsuro Yamada
NTT Open Source Software Center

On 2017/09/01 17:59, Masahiko Sawada wrote:
> On Fri, Sep 1, 2017 at 3:38 PM, Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>> Hi Thomas,
>>
>>>> Any comments or suggestion are welcome.
>>>
>>>
>>> Although this patch updates src/test/regress/expected/rules.out I
>>> think perhaps you included the wrong version?  That regression test
>>> fails for me
>>
>>
>> Thanks for the comment.
>>
>> I use the patch on 7b69b6ce and it's fine.
>> Did you use "initdb" command after "make install"?
>> The pg_stat_progress_cluster view is created in initdb, probably.
>>
>
> I also got a regression test error (applied to abe85ef). Here is
> regression.diff file.
>
> *** /home/masahiko/source/postgresql/src/test/regress/expected/rules.out
>         2017-09-01 17:27:33.680055612 -0700
> --- /home/masahiko/source/postgresql/src/test/regress/results/rules.out
> 2017-09-01 17:28:10.410055596 -0700
> ***************
> *** 1819,1824 ****
> --- 1819,1849 ----
>        pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
>        pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
>       FROM pg_database d;
> + pg_stat_progress_cluster| SELECT s.pid,
> +     s.datid,
> +     d.datname,
> +     s.relid,
> +         CASE s.param1
> +             WHEN 0 THEN 'initializing'::text
> +             WHEN 1 THEN 'scanning heap'::text
> +             WHEN 2 THEN 'sorting tuples'::text
> +             WHEN 3 THEN 'writing new heap'::text
> +             WHEN 4 THEN 'scan heap and write new heap'::text
> +             WHEN 5 THEN 'swapping relation files'::text
> +             WHEN 6 THEN 'rebuilding index'::text
> +             WHEN 7 THEN 'performing final cleanup'::text
> +             ELSE NULL::text
> +         END AS phase,
> +         CASE s.param2
> +             WHEN 0 THEN 'index scan'::text
> +             WHEN 1 THEN 'seq scan'::text
> +             ELSE NULL::text
> +         END AS scan_method,
> +     s.param3 AS scan_index_relid,
> +     s.param4 AS heap_tuples_total,
> +     s.param5 AS heap_tuples_scanned
> +    FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid,
> relid, param1, param2, param3, param4, param5, param6, param7, param8,
> param9, param10)
> +      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
>    pg_stat_progress_vacuum| SELECT s.pid,
>        s.datid,
>        d.datname,
> ***************
> *** 1841,1871 ****
>        s.param7 AS num_dead_tuples
>       FROM (pg_stat_get_progress_info('VACUUM'::text) s(pid, datid,
> relid, param1, param2, param3, param4, param5, param6, param7, param8,
> param9, param10)
>         LEFT JOIN pg_database d ON ((s.datid = d.oid)));
> - pg_stat_progress_cluster| SELECT
> -     s.pid,
> -     s.datid,
> -     d.datname,
> -     s.relid,
> -         CASE s.param1
> -             WHEN 0 THEN 'initializing'::text
> -             WHEN 1 THEN 'scanning heap'::text
> -             WHEN 2 THEN 'sorting tuples'::text
> -             WHEN 3 THEN 'writing new heap'::text
> -             WHEN 4 THEN 'scan heap and write new heap'::text
> -             WHEN 5 THEN 'swapping relation files'::text
> -             WHEN 6 THEN 'rebuilding index'::text
> -             WHEN 7 THEN 'performing final cleanup'::text
> -             ELSE NULL::text
> -         END AS phase,
> -         CASE S.param2
> -             WHEN 0 THEN 'index scan'
> -             WHEN 1 THEN 'seq scan'
> -             END AS scan_method,
> -     s.param3 AS index_relid,
> -     s.param4 AS heap_blks_total,
> -     s.param5 AS heap_blks_scanned
> -    FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid,
> relid, param1, param2, param3, param4, param5)
> -      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
>    pg_stat_replication| SELECT s.pid,
>        s.usesysid,
>        u.rolname AS usename,
> --- 1866,1871 ----
>
> ======================================================================
>
> Regards,
>
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

progress_monitor_cluster_v2.patch

Re: [HACKERS] CLUSTER command progress monitor

From

Masahiko Sawada

Date:

04 September 2017, 05:51:28

On Mon, Sep 4, 2017 at 11:37 AM, Tatsuro Yamada
<yamada.tatsuro@lab.ntt.co.jp> wrote:
> Hi Sawada-san, Thomas,
>
> Thanks for sharing the reggression.diff.
> I realized Thomas's comment is right.
>
> Attached patch is fixed version.
> Could you try it?
>

Yeah, in my environment the regression test passed. Thanks.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

04 September 2017, 08:53:30

Hi Sawada-san,

Thanks for taking your time.
I'll be more careful.

Regards,
Tatsuro Yamada

On 2017/09/04 11:51, Masahiko Sawada wrote:
> On Mon, Sep 4, 2017 at 11:37 AM, Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>> Hi Sawada-san, Thomas,
>>
>> Thanks for sharing the reggression.diff.
>> I realized Thomas's comment is right.
>>
>> Attached patch is fixed version.
>> Could you try it?
>>
>
> Yeah, in my environment the regression test passed. Thanks.
>
> Regards,
>
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
>
>

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

04 September 2017, 09:38:32

On Thu, Aug 31, 2017 at 11:12 AM, Tatsuro Yamada
<yamada.tatsuro@lab.ntt.co.jp> wrote:
> Then I have questions.
>
>   * Should we have separate views for them?  Or should both be covered by the
>     same view with some indication of which command (CLUSTER or VACUUM FULL)
>     is actually running?

Using the same view for both, and tell that this is rather VACUUM or
CLUSTER in the view, would be better IMO. Coming up with a name more
generic than pg_stat_progress_cluster may be better though if this
speaks with VACUUM FULL as well, user-facing documentation does not
say that VACUUM FULL is actually CLUSTER.

> I'll add this patch to CF2017-09.
> Any comments or suggestion are welcome.

Nice to see that you are taking the time to implement patches for
upstream, Yamada-san!
-- 
Michael

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

04 September 2017, 14:17:38

On 2017/09/04 15:38, Michael Paquier wrote:
> On Thu, Aug 31, 2017 at 11:12 AM, Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>> Then I have questions.
>>
>>    * Should we have separate views for them?  Or should both be covered by the
>>      same view with some indication of which command (CLUSTER or VACUUM FULL)
>>      is actually running?
>
> Using the same view for both, and tell that this is rather VACUUM or
> CLUSTER in the view, would be better IMO. Coming up with a name more
> generic than pg_stat_progress_cluster may be better though if this
> speaks with VACUUM FULL as well, user-facing documentation does not
> say that VACUUM FULL is actually CLUSTER.

Thanks for sharing your thoughts.
Agreed.
I'll add new column like a "command" to tell whether running CLUSTER or VACUUM.
And how about this new view name? - pg_stat_progress_reorg
Is it more general name than previous name if it covers both commands?

>> I'll add this patch to CF2017-09.
>> Any comments or suggestion are welcome.
>
> Nice to see that you are taking the time to implement patches for
> upstream, Yamada-san!

Same here. :)
I'd like to contribute creating feature that is for DBA and users.
Progress monitoring feature is important from my DBA experiences.
I'm happy if you lend your hand.

Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

06 September 2017, 09:58:26

Hi Hackers,

I revised the patch like this:

   - Add "command" column in the view
     It tells that the running command is CLUSTER or VACUUM FULL.

   - Enable VACUUM FULL progress monitor
     Add heap_tuples_vacuumed and heap_tuples_recently_dead as a counter in the view.
     Sequence of phases are below:
       1. scanning heap
       5. swapping relation files
       6. rebuild index
       7. performing final cleanup

I didn't change the name of view (pg_stat_progress_cluster) because I'm not sure
whether the new name (pg_stat_progress_reorg) is suitable or not.

Any comments or suggestion are welcome.

Thanks,
Tatsuro Yamada


On 2017/09/04 20:17, Tatsuro Yamada wrote:
> On 2017/09/04 15:38, Michael Paquier wrote:
>> On Thu, Aug 31, 2017 at 11:12 AM, Tatsuro Yamada
>> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>>> Then I have questions.
>>>
>>>    * Should we have separate views for them?  Or should both be covered by the
>>>      same view with some indication of which command (CLUSTER or VACUUM FULL)
>>>      is actually running?
>>
>> Using the same view for both, and tell that this is rather VACUUM or
>> CLUSTER in the view, would be better IMO. Coming up with a name more
>> generic than pg_stat_progress_cluster may be better though if this
>> speaks with VACUUM FULL as well, user-facing documentation does not
>> say that VACUUM FULL is actually CLUSTER.
>
> Thanks for sharing your thoughts.
> Agreed.
> I'll add new column like a "command" to tell whether running CLUSTER or VACUUM.
> And how about this new view name?
>   - pg_stat_progress_reorg
> Is it more general name than previous name if it covers both commands?
>
>
>>> I'll add this patch to CF2017-09.
>>> Any comments or suggestion are welcome.
>>
>> Nice to see that you are taking the time to implement patches for
>> upstream, Yamada-san!
>
> Same here. :)
> I'd like to contribute creating feature that is for DBA and users.
> Progress monitoring feature is important from my DBA experiences.
> I'm happy if you lend your hand.
>
> Thanks,
> Tatsuro Yamada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

progress_monitor_cluster_v3.patch

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

06 September 2017, 10:11:55

On Wed, Sep 6, 2017 at 3:58 PM, Tatsuro Yamada
<yamada.tatsuro@lab.ntt.co.jp> wrote:
> I revised the patch like this:

You should avoid top-posting.

> I didn't change the name of view (pg_stat_progress_cluster) because I'm not
> sure
> whether the new name (pg_stat_progress_reorg) is suitable or not.

Here are some ideas: rewrite (incorrect for ALTER TABLE), organize
(somewhat fine), order, operate (too general?), bundle, reform,
assemble.
-- 
Michael

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

07 September 2017, 09:46:03

On 2017/09/06 16:11, Michael Paquier wrote:
> On Wed, Sep 6, 2017 at 3:58 PM, Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>> I revised the patch like this:
>
> You should avoid top-posting.

I see.

>> I didn't change the name of view (pg_stat_progress_cluster) because I'm not
>> sure
>> whether the new name (pg_stat_progress_reorg) is suitable or not.
>
> Here are some ideas: rewrite (incorrect for ALTER TABLE), organize
> (somewhat fine), order, operate (too general?), bundle, reform,
> assemble.

Thanks for sharing your ideas.
I searched the words like a "reform table", "reassemble table" and "reorganize table"
on google. I realized "reorganaize table" is good choice than others
because many DBMS uses this keyword. Therefore, I'll change the name to it like this:

before  pg_stat_progress_cluster

after  pg_stat_progress_reorg (I abbreviate reorganize to reorg.)

Does anyone have any suggestions?
I'll revise the patch.

Regards,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

08 September 2017, 12:55:53

On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada
<yamada.tatsuro@lab.ntt.co.jp> wrote:
>   1. scanning heap
>   2. sort tuples

These two phases overlap, though. I believe progress reporting for
sorts is really hard.  In the simple case where the data fits in
work_mem, none of the work of the sort gets done until all the data is
read.  Once you switch to an external sort, you're writing batch
files, so a lot of the work is now being done during data loading.
But as the number of batch files grows, the final merge at the end
becomes an increasingly noticeable part of the cost, and eventually
you end up needing multiple merge passes.  I think we need some smart
way to report on sorts so that we can tell how much of the work has
really been done, but I don't know how to do it.

>  heap_tuples_total   | bigint  |           |          |

The patch is getting the value reported as heap_tuples_total from
OldHeap->rd_rel->reltuples.  I think this is pointless: the user can
see that value anyway if they wish.  The point of the progress
counters is to expose things the user couldn't otherwise see.  It's
also not necessarily accurate: it's only an estimate in the best case,
and may be way off if the relation has recently be extended by a large
amount.  I think it's pretty important that we try hard to only report
values that are known to be accurate, because users hate (and mock)
inaccurate progress reports.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

11 September 2017, 05:36:48

On 2017/09/08 18:55, Robert Haas wrote:
> On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>>    1. scanning heap
>>    2. sort tuples
>
> These two phases overlap, though. I believe progress reporting for
> sorts is really hard.  In the simple case where the data fits in
> work_mem, none of the work of the sort gets done until all the data is
> read.  Once you switch to an external sort, you're writing batch
> files, so a lot of the work is now being done during data loading.
> But as the number of batch files grows, the final merge at the end
> becomes an increasingly noticeable part of the cost, and eventually
> you end up needing multiple merge passes.  I think we need some smart
> way to report on sorts so that we can tell how much of the work has
> really been done, but I don't know how to do it.

Thanks for the comment.

As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
cost estimation. In the case of SEQ SCAN, these two phases not overlap.
However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
heap and write new heap" when INDEX SCAN was selected.

I agree that progress reporting for sort is difficult. So it only reports
the phase ("sorting tuples") in the current design of progress monitor of cluster.
It doesn't report counter of sort.

>>   heap_tuples_total   | bigint  |           |          |
>
> The patch is getting the value reported as heap_tuples_total from
> OldHeap->rd_rel->reltuples.  I think this is pointless: the user can
> see that value anyway if they wish.  The point of the progress
> counters is to expose things the user couldn't otherwise see.  It's
> also not necessarily accurate: it's only an estimate in the best case,
> and may be way off if the relation has recently be extended by a large
> amount.  I think it's pretty important that we try hard to only report
> values that are known to be accurate, because users hate (and mock)
> inaccurate progress reports.

Do you mean to use the number of rows by using below calculation instead
OldHeap->rd_rel->reltuples?
 estimate rows = physical table size / average row length

I understand that OldHeap->rd_rel->reltuples is sometimes useless because
it is correct by auto analyze and it can't perform when under a threshold.

I'll add it in next patch and also share more detailed the current design of
progress monitor for cluster.

Regards,
Tatsuro Yamada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

11 September 2017, 17:38:44

On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
<yamada.tatsuro@lab.ntt.co.jp> wrote:
> Thanks for the comment.
>
> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
> heap and write new heap" when INDEX SCAN was selected.
>
> I agree that progress reporting for sort is difficult. So it only reports
> the phase ("sorting tuples") in the current design of progress monitor of
> cluster.
> It doesn't report counter of sort.

Doesn't that make it almost useless?  I would guess that scanning the
heap and writing the new heap would ordinarily account for most of the
runtime, or at least enough that you're going to want something more
than just knowing that's the phase you're in.

>> The patch is getting the value reported as heap_tuples_total from
>> OldHeap->rd_rel->reltuples.  I think this is pointless: the user can
>> see that value anyway if they wish.  The point of the progress
>> counters is to expose things the user couldn't otherwise see.  It's
>> also not necessarily accurate: it's only an estimate in the best case,
>> and may be way off if the relation has recently be extended by a large
>> amount.  I think it's pretty important that we try hard to only report
>> values that are known to be accurate, because users hate (and mock)
>> inaccurate progress reports.
>
> Do you mean to use the number of rows by using below calculation instead
> OldHeap->rd_rel->reltuples?
>
>  estimate rows = physical table size / average row length

No, I mean don't report it at all.  The caller can do that calculation
if they wish, without any help from the progress reporting machinery.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER command progress monitor

From

Peter Geoghegan

Date:

11 September 2017, 20:23:01

On Mon, Sep 11, 2017 at 7:38 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>> Thanks for the comment.
>>
>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
>> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
>> heap and write new heap" when INDEX SCAN was selected.
>>
>> I agree that progress reporting for sort is difficult. So it only reports
>> the phase ("sorting tuples") in the current design of progress monitor of
>> cluster.
>> It doesn't report counter of sort.
>
> Doesn't that make it almost useless?  I would guess that scanning the
> heap and writing the new heap would ordinarily account for most of the
> runtime, or at least enough that you're going to want something more
> than just knowing that's the phase you're in.

It's definitely my experience that CLUSTER is incredibly I/O bound.
You're shoveling the tuples through tuplesort.c, but the actual
sorting component isn't where the real costs are. Profiling shows that
writing out the new heap (including moderately complicated
bookkeeping) is the bottleneck, IIRC. That's why parallel CLUSTER
didn't look attractive, even though it would be a fairly
straightforward matter to add that on top of the parallel CREATE INDEX
structure from the patch that I wrote to do that.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

12 September 2017, 15:20:41

On 2017/09/11 23:38, Robert Haas wrote:
> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>> Thanks for the comment.
>>
>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
>> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
>> heap and write new heap" when INDEX SCAN was selected.
>>
>> I agree that progress reporting for sort is difficult. So it only reports
>> the phase ("sorting tuples") in the current design of progress monitor of
>> cluster.
>> It doesn't report counter of sort.
>
> Doesn't that make it almost useless?  I would guess that scanning the
> heap and writing the new heap would ordinarily account for most of the
> runtime, or at least enough that you're going to want something more
> than just knowing that's the phase you're in.

Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort())
I know that external merge sort takes a time than quick sort.
I'll try investigating how to get a counter from external merge sort processing.
Is this the right way?


>>> The patch is getting the value reported as heap_tuples_total from
>>> OldHeap->rd_rel->reltuples.  I think this is pointless: the user can
>>> see that value anyway if they wish.  The point of the progress
>>> counters is to expose things the user couldn't otherwise see.  It's
>>> also not necessarily accurate: it's only an estimate in the best case,
>>> and may be way off if the relation has recently be extended by a large
>>> amount.  I think it's pretty important that we try hard to only report
>>> values that are known to be accurate, because users hate (and mock)
>>> inaccurate progress reports.
>>
>> Do you mean to use the number of rows by using below calculation instead
>> OldHeap->rd_rel->reltuples?
>>
>>   estimate rows = physical table size / average row length
>
> No, I mean don't report it at all.  The caller can do that calculation
> if they wish, without any help from the progress reporting machinery.

I see. I'll remove that column on next patch.


Regards,
Tatsuro Yamada



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

12 September 2017, 15:57:32

On 2017/09/12 21:20, Tatsuro Yamada wrote:
> On 2017/09/11 23:38, Robert Haas wrote:
>> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
>> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>>> Thanks for the comment.
>>>
>>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
>>> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
>>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
>>> heap and write new heap" when INDEX SCAN was selected.
>>>
>>> I agree that progress reporting for sort is difficult. So it only reports
>>> the phase ("sorting tuples") in the current design of progress monitor of
>>> cluster.
>>> It doesn't report counter of sort.
>>
>> Doesn't that make it almost useless?  I would guess that scanning the
>> heap and writing the new heap would ordinarily account for most of the
>> runtime, or at least enough that you're going to want something more
>> than just knowing that's the phase you're in.
>
> Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort())
> I know that external merge sort takes a time than quick sort.
> I'll try investigating how to get a counter from external merge sort processing.
> Is this the right way?
>
>
>>>> The patch is getting the value reported as heap_tuples_total from
>>>> OldHeap->rd_rel->reltuples.  I think this is pointless: the user can
>>>> see that value anyway if they wish.  The point of the progress
>>>> counters is to expose things the user couldn't otherwise see.  It's
>>>> also not necessarily accurate: it's only an estimate in the best case,
>>>> and may be way off if the relation has recently be extended by a large
>>>> amount.  I think it's pretty important that we try hard to only report
>>>> values that are known to be accurate, because users hate (and mock)
>>>> inaccurate progress reports.
>>>
>>> Do you mean to use the number of rows by using below calculation instead
>>> OldHeap->rd_rel->reltuples?
>>>
>>>   estimate rows = physical table size / average row length
>>
>> No, I mean don't report it at all.  The caller can do that calculation
>> if they wish, without any help from the progress reporting machinery.
>
> I see. I'll remove that column on next patch.


I will summarize the current design and future corrections before sending
the next patch.


=== Current design ===

CLUSTER command may use Index Scan or Seq Scan when scanning the heap.
Depending on which one is chosen, the command will proceed in the
following sequence of phases:
  * Scan method: Seq Scan    1. scanning heap                (*1)    2. sorting tuples               (*2)    3. writing
newheap             (*1)    5. swapping relation files      (*2)    6. rebuilding index             (*2)    7.
performingfinal cleanup     (*2)
 
  * Scan method: Index Scan    4. scan heap and write new heap (*1)    5. swapping relation files      (*2)    6.
rebuildingindex             (*2)    7. performing final cleanup     (*2)
 

VACUUM FULL command will proceed in the following sequence of phases:
    1. scanning heap                (*1)    5. swapping relation files      (*2)    6. rebuilding index
(*2)   7. performing final cleanup     (*2)
 

(*1): increasing the value in heap_tuples_scanned column
(*2): only shows the phase in the phase column

The view provides the information of CLUSTER command progress details as follows
# \d pg_stat_progress_cluster              View "pg_catalog.pg_stat_progress_cluster"          Column           |  Type
 | Collation | Nullable | Default
 
---------------------------+---------+-----------+----------+--------- pid                       | integer |
|         | datid                     | oid     |           |          | datname                   | name    |
|          | relid                     | oid     |           |          | command                   | text    |
 |          | phase                     | text    |           |          | scan_method               | text    |
  |          | scan_index_relid          | bigint  |           |          | heap_tuples_total         | bigint  |
   |          | heap_tuples_scanned       | bigint  |           |          | heap_tuples_vacuumed      | bigint  |
    |          | heap_tuples_recently_dead | bigint  |           |          |
 


=== It will be changed on next patch ===
 - Rename to pg_stat_progress_reolg from pg_stat_progress_cluster - Remove heap_tuples_total column from the view - Add
aprogress counter in the phase of "sorting tuples" (difficult?!)
 


=== My test case as a bonus ===

I share my test case of progress monitor.
If someone wants to watch the current progress monitor, you can use
this test case as a example.

[Terminal1]
Run this query on psql:
   select * from pg_stat_progress_cluster; \watch 0.05

[Terminal2]
Run these queries on psql:

drop table t1;

create table t1 as select a, random() * 1000 as b from generate_series(0, 99999999) a;
create index idx_t1 on t1(a);
create index idx_t1_b on t1(b);
analyze t1;

-- index scan
set enable_seqscan to off;
cluster verbose t1 using idx_t1;

-- seq scan
set enable_seqscan to on;
set enable_indexscan to off;
cluster verbose t1 using idx_t1;

-- only given table name to cluster command
cluster verbose t1;

-- only cluster command
cluster verbose;

-- vacuum full
vacuum full t1;

-- vacuum full
vacuum full;


Thanks,
Tatsuro Yamada



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER command progress monitor

From

Jeff Janes

Date:

15 September 2017, 21:09:13

On Wed, Aug 30, 2017 at 7:12 PM, Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote:

The view provides the information of CLUSTER command progress details as follows
postgres=# \d pg_stat_progress_cluster
View "pg_catalog.pg_stat_progress_cluster"
Column | Type | Collation | Nullable | Default
---------------------+---------+-----------+----------+---------
pid | integer | | |
datid | oid | | |
datname | name | | |
relid | oid | | |
phase | text | | |
scan_method | text | | |
scan_index_relid | bigint | | |
heap_tuples_total | bigint | | |
heap_tuples_scanned | bigint | | |

I think it should be cluster_index_relid, not scan_index_relid. If the scan_method is seq, then the index isn't being scanned.

Cheers,

Jeff

Re: [HACKERS] CLUSTER command progress monitor

From

Daniel Gustafsson

Date:

02 October 2017, 03:29:12

> On 12 Sep 2017, at 14:57, Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote:
>
> On 2017/09/12 21:20, Tatsuro Yamada wrote:
>> On 2017/09/11 23:38, Robert Haas wrote:
>>> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
>>> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>>>> Thanks for the comment.
>>>>
>>>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
>>>> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
>>>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
>>>> heap and write new heap" when INDEX SCAN was selected.
>>>>
>>>> I agree that progress reporting for sort is difficult. So it only reports
>>>> the phase ("sorting tuples") in the current design of progress monitor of
>>>> cluster.
>>>> It doesn't report counter of sort.
>>>
>>> Doesn't that make it almost useless?  I would guess that scanning the
>>> heap and writing the new heap would ordinarily account for most of the
>>> runtime, or at least enough that you're going to want something more
>>> than just knowing that's the phase you're in.
>>
>> Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort())
>> I know that external merge sort takes a time than quick sort.
>> I'll try investigating how to get a counter from external merge sort processing.
>> Is this the right way?
>>
>>
>>>>> The patch is getting the value reported as heap_tuples_total from
>>>>> OldHeap->rd_rel->reltuples.  I think this is pointless: the user can
>>>>> see that value anyway if they wish.  The point of the progress
>>>>> counters is to expose things the user couldn't otherwise see.  It's
>>>>> also not necessarily accurate: it's only an estimate in the best case,
>>>>> and may be way off if the relation has recently be extended by a large
>>>>> amount.  I think it's pretty important that we try hard to only report
>>>>> values that are known to be accurate, because users hate (and mock)
>>>>> inaccurate progress reports.
>>>>
>>>> Do you mean to use the number of rows by using below calculation instead
>>>> OldHeap->rd_rel->reltuples?
>>>>
>>>>  estimate rows = physical table size / average row length
>>>
>>> No, I mean don't report it at all.  The caller can do that calculation
>>> if they wish, without any help from the progress reporting machinery.
>>
>> I see. I'll remove that column on next patch.
>
>
> I will summarize the current design and future corrections before sending
> the next patch.
>
>
> === Current design ===
>
> CLUSTER command may use Index Scan or Seq Scan when scanning the heap.
> Depending on which one is chosen, the command will proceed in the
> following sequence of phases:
>
>  * Scan method: Seq Scan
>    1. scanning heap                (*1)
>    2. sorting tuples               (*2)
>    3. writing new heap             (*1)
>    5. swapping relation files      (*2)
>    6. rebuilding index             (*2)
>    7. performing final cleanup     (*2)
>
>  * Scan method: Index Scan
>    4. scan heap and write new heap (*1)
>    5. swapping relation files      (*2)
>    6. rebuilding index             (*2)
>    7. performing final cleanup     (*2)
>
> VACUUM FULL command will proceed in the following sequence of phases:
>
>    1. scanning heap                (*1)
>    5. swapping relation files      (*2)
>    6. rebuilding index             (*2)
>    7. performing final cleanup     (*2)
>
> (*1): increasing the value in heap_tuples_scanned column
> (*2): only shows the phase in the phase column
>
> The view provides the information of CLUSTER command progress details as follows
> # \d pg_stat_progress_cluster
>              View "pg_catalog.pg_stat_progress_cluster"
>          Column           |  Type   | Collation | Nullable | Default
> ---------------------------+---------+-----------+----------+---------
> pid                       | integer |           |          |
> datid                     | oid     |           |          |
> datname                   | name    |           |          |
> relid                     | oid     |           |          |
> command                   | text    |           |          |
> phase                     | text    |           |          |
> scan_method               | text    |           |          |
> scan_index_relid          | bigint  |           |          |
> heap_tuples_total         | bigint  |           |          |
> heap_tuples_scanned       | bigint  |           |          |
> heap_tuples_vacuumed      | bigint  |           |          |
> heap_tuples_recently_dead | bigint  |           |          |
>
>
> === It will be changed on next patch ===
>
> - Rename to pg_stat_progress_reolg from pg_stat_progress_cluster
> - Remove heap_tuples_total column from the view
> - Add a progress counter in the phase of "sorting tuples" (difficult?!)
>
>
> === My test case as a bonus ===
>
> I share my test case of progress monitor.
> If someone wants to watch the current progress monitor, you can use
> this test case as a example.
>
> [Terminal1]
> Run this query on psql:
>
>   select * from pg_stat_progress_cluster; \watch 0.05
>
> [Terminal2]
> Run these queries on psql:
>
> drop table t1;
>
> create table t1 as select a, random() * 1000 as b from generate_series(0, 99999999) a;
> create index idx_t1 on t1(a);
> create index idx_t1_b on t1(b);
> analyze t1;
>
> -- index scan
> set enable_seqscan to off;
> cluster verbose t1 using idx_t1;
>
> -- seq scan
> set enable_seqscan to on;
> set enable_indexscan to off;
> cluster verbose t1 using idx_t1;
>
> -- only given table name to cluster command
> cluster verbose t1;
>
> -- only cluster command
> cluster verbose;
>
> -- vacuum full
> vacuum full t1;
>
> -- vacuum full
> vacuum full;

Based on this thread, this patch has been marked Returned with Feedback.
Please re-submit a new version to a future commitfest.

cheers ./daniel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

02 October 2017, 16:04:30

On Tue, Sep 12, 2017 at 8:20 AM, Tatsuro Yamada
<yamada.tatsuro@lab.ntt.co.jp> wrote:
>>> I agree that progress reporting for sort is difficult. So it only reports
>>> the phase ("sorting tuples") in the current design of progress monitor of
>>> cluster.
>>> It doesn't report counter of sort.
>>
>> Doesn't that make it almost useless?  I would guess that scanning the
>> heap and writing the new heap would ordinarily account for most of the
>> runtime, or at least enough that you're going to want something more
>> than just knowing that's the phase you're in.
>
> Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort())
> I know that external merge sort takes a time than quick sort.
> I'll try investigating how to get a counter from external merge sort
> processing.
> Is this the right way?

Progress reporting on sorts seems like a tricky problem to me, as I
said before.  In most cases, a sort is going to involve an initial
stage where it reads all the input tuples and writes out quicksorted
runs, and then a merge phase where it merges all the output tapes into
a sorted result.  There are some complexities; for example, if the
number of tapes is really large, then we might need multiple merge
phases, only the last of which will produce tuples.  On the other
hand, if work_mem is very large, the time taken for sorting each run
might itself be significant that we'd like to have insight into
progress.  If we ignore those complexities, though, a reasonable way
of reporting progress might be to report the following:

1. blocks read from the relation
2. # of tuples we've put into the tuplesort
3. # of tuples we've extracted from the tuplesort

During the first part of the sort, (1) and (2) will be growing, and
the user can measure progress by comparing (1) to the total size of
the relation.  During the final merge, (3) will be growing, eventually
becoming equal to (2), so the user can measure progress my comparing
(2) with (3).

This approach only works for a seqscan-and-sort, though.  I'm not sure
what to do about the index scan case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER command progress monitor

From

Antonin Houska

Date:

20 November 2017, 20:05:25

Robert Haas <robertmhaas@gmail.com> wrote:

> On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
> >   1. scanning heap
> >   2. sort tuples
>
> These two phases overlap, though. I believe progress reporting for
> sorts is really hard.  In the simple case where the data fits in
> work_mem, none of the work of the sort gets done until all the data is
> read.  Once you switch to an external sort, you're writing batch
> files, so a lot of the work is now being done during data loading.
> But as the number of batch files grows, the final merge at the end
> becomes an increasingly noticeable part of the cost, and eventually
> you end up needing multiple merge passes.  I think we need some smart
> way to report on sorts so that we can tell how much of the work has
> really been done, but I don't know how to do it.

Whatever complexity is hidden in the sort, cost_sort() should have taken it
into consideration when called via plan_cluster_use_sort(). Thus I think that
once we have both startup and total cost, the current progress of the sort
stage can be estimated from the current number of input and output
rows. Please remind me if my proposal appears to be too simplistic.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

Re: [HACKERS] CLUSTER command progress monitor

From

Tom Lane

Date:

20 November 2017, 20:25:53

Antonin Houska <ah@cybertec.at> writes:
> Robert Haas <robertmhaas@gmail.com> wrote:
>> These two phases overlap, though. I believe progress reporting for
>> sorts is really hard.

> Whatever complexity is hidden in the sort, cost_sort() should have taken it
> into consideration when called via plan_cluster_use_sort(). Thus I think that
> once we have both startup and total cost, the current progress of the sort
> stage can be estimated from the current number of input and output
> rows. Please remind me if my proposal appears to be too simplistic.

Well, even if you assume that the planner's cost model omits nothing
(which I wouldn't bet on), its result is only going to be as good as the
planner's estimate of the number of rows to be sorted.  And, in cases
where people actually care about progress monitoring, it's likely that
the planner got that wrong, maybe horribly so.  I think it's a bad idea
for progress monitoring to depend on the planner's estimates in any way
whatsoever.
        regards, tom lane

Re: [HACKERS] CLUSTER command progress monitor

From

Antonin Houska

Date:

21 November 2017, 12:14:11

Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Antonin Houska <ah@cybertec.at> writes:
> > Robert Haas <robertmhaas@gmail.com> wrote:
> >> These two phases overlap, though. I believe progress reporting for
> >> sorts is really hard.
>
> > Whatever complexity is hidden in the sort, cost_sort() should have taken it
> > into consideration when called via plan_cluster_use_sort(). Thus I think that
> > once we have both startup and total cost, the current progress of the sort
> > stage can be estimated from the current number of input and output
> > rows. Please remind me if my proposal appears to be too simplistic.
>
> Well, even if you assume that the planner's cost model omits nothing
> (which I wouldn't bet on), its result is only going to be as good as the
> planner's estimate of the number of rows to be sorted.  And, in cases
> where people actually care about progress monitoring, it's likely that
> the planner got that wrong, maybe horribly so.  I think it's a bad idea
> for progress monitoring to depend on the planner's estimates in any way
> whatsoever.

The general idea was that some sort of prediction of the total cost is needed
anyway if we should tell during execution what fraction of work has already
been done. And also that the cost computation that we perform during execution
shouldn't (ideally) differ from cost_sort(). So I thought that it's easier to
refine cost_sort() than to implement the same computation from scratch
elsewhere.

Besides that I see 2 circumstances that make the estimate of the number of
input tuples simpler in the CLUSTER case:

* There's only 1 input relation w/o any kind of clause.

* CLUSTER uses SnapshotAny, so pg_class(reltuples) is closer to the actual number of input rows than it would be in
generalcase. (Of course, pg_class would only be useful for the initial estimate.) 

Unlike planner, the executor could recalculate the cost estimate at some
point(s) as it recognizes that the actual number of tuples per page appears to
differ from the density derived from pg_class initially. Still wrong?

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

21 November 2017, 23:55:23

On Mon, Nov 20, 2017 at 12:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Antonin Houska <ah@cybertec.at> writes:
>> Robert Haas <robertmhaas@gmail.com> wrote:
>>> These two phases overlap, though. I believe progress reporting for
>>> sorts is really hard.
>
>> Whatever complexity is hidden in the sort, cost_sort() should have taken it
>> into consideration when called via plan_cluster_use_sort(). Thus I think that
>> once we have both startup and total cost, the current progress of the sort
>> stage can be estimated from the current number of input and output
>> rows. Please remind me if my proposal appears to be too simplistic.
>
> Well, even if you assume that the planner's cost model omits nothing
> (which I wouldn't bet on), its result is only going to be as good as the
> planner's estimate of the number of rows to be sorted.  And, in cases
> where people actually care about progress monitoring, it's likely that
> the planner got that wrong, maybe horribly so.  I think it's a bad idea
> for progress monitoring to depend on the planner's estimates in any way
> whatsoever.

I agree.

I have been of the opinion all along that progress monitoring needs to
report facts, not theories.  The number of tuples read thus far is a
fact, and is fine to report for whatever value it may have to someone.
The number of tuples that will be read in the future is a theory, and
as you say, progress monitoring is most likely to be used in cases
where theory and practice ended up being very different.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

21 November 2017, 23:57:19

On Mon, Nov 20, 2017 at 12:05 PM, Antonin Houska <ah@cybertec.at> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada
>> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>> >   1. scanning heap
>> >   2. sort tuples
>>
>> These two phases overlap, though. I believe progress reporting for
>> sorts is really hard.  In the simple case where the data fits in
>> work_mem, none of the work of the sort gets done until all the data is
>> read.  Once you switch to an external sort, you're writing batch
>> files, so a lot of the work is now being done during data loading.
>> But as the number of batch files grows, the final merge at the end
>> becomes an increasingly noticeable part of the cost, and eventually
>> you end up needing multiple merge passes.  I think we need some smart
>> way to report on sorts so that we can tell how much of the work has
>> really been done, but I don't know how to do it.
>
> Whatever complexity is hidden in the sort, cost_sort() should have taken it
> into consideration when called via plan_cluster_use_sort(). Thus I think that
> once we have both startup and total cost, the current progress of the sort
> stage can be estimated from the current number of input and output
> rows. Please remind me if my proposal appears to be too simplistic.

I think it is far too simplistic.  If the sort is being fed by a
sequential scan, reporting the number of blocks scanned so far as
compared to the total number that will be scanned would be a fine way
of reporting on the progress of the sequential scan -- and it's better
to use blocks, which we know for sure about, than rows, at which we
can only guess.  But that's the *scan* progress, not the *sort*
progress.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

From

Peter Geoghegan

Date:

22 November 2017, 00:07:45

On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Progress reporting on sorts seems like a tricky problem to me, as I
> said before.  In most cases, a sort is going to involve an initial
> stage where it reads all the input tuples and writes out quicksorted
> runs, and then a merge phase where it merges all the output tapes into
> a sorted result.  There are some complexities; for example, if the
> number of tapes is really large, then we might need multiple merge
> phases, only the last of which will produce tuples.

This would ordinarily be the point at which I'd say "but you're very
unlikely to require multiple passes for an external sort these days".
But I won't say that on this thread, because CLUSTER generally has
unusually wide tuples, and so is much more likely to be I/O bound, to
require multiple passes, etc. (I bet the v10 enhancements
disproportionately improved CLUSTER performance.)

-- 
Peter Geoghegan

Re: [HACKERS] CLUSTER command progress monitor

From

Peter Geoghegan

Date:

22 November 2017, 00:19:51

On Tue, Nov 21, 2017 at 12:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I agree.
>
> I have been of the opinion all along that progress monitoring needs to
> report facts, not theories.  The number of tuples read thus far is a
> fact, and is fine to report for whatever value it may have to someone.

That makes a lot of sense to me. I sometimes think that we're too
hesitant to expose internal information due to concerns about it being
hard to interpret. I see wait events as bucking this trend, which I
welcome. We see similar trends in the Linux kernel, with tools like
perf and BCC/eBPF now being regularly used to debug production issues.

> The number of tuples that will be read in the future is a theory, and
> as you say, progress monitoring is most likely to be used in cases
> where theory and practice ended up being very different.

You hit the nail on the head here.

It's not that these things are not difficult to interpret - the
concern itself is justified. It just needs to be weighed against the
benefit of having some instrumentation to start with. People are much
more likely to complain about obscure debug information, which makes
them feel dumb, than they are to complain about the absence of any
instrumentation, but I still think that the latter is the bigger
problem.

Besides, you don't necessarily have to understand something to act on
it. The internals of Oracle are trade secrets, but they were the first
to have wait events, I think. At least having something that you can
Google can make all the difference.

-- 
Peter Geoghegan

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

22 November 2017, 03:53:31

On Wed, Nov 22, 2017 at 5:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I have been of the opinion all along that progress monitoring needs to
> report facts, not theories.  The number of tuples read thus far is a
> fact, and is fine to report for whatever value it may have to someone.
> The number of tuples that will be read in the future is a theory, and
> as you say, progress monitoring is most likely to be used in cases
> where theory and practice ended up being very different.

+1. We should never as well enter in things like trying to estimate
the amount of time remaining to finish a task [1].

[1]: https://www.xkcd.com/612/
-- 
Michael

Re: [HACKERS] CLUSTER command progress monitor

From

Thomas Munro

Date:

22 November 2017, 03:58:03

On Wed, Nov 22, 2017 at 1:53 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Nov 22, 2017 at 5:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I have been of the opinion all along that progress monitoring needs to
>> report facts, not theories.  The number of tuples read thus far is a
>> fact, and is fine to report for whatever value it may have to someone.
>> The number of tuples that will be read in the future is a theory, and
>> as you say, progress monitoring is most likely to be used in cases
>> where theory and practice ended up being very different.
>
> +1. We should never as well enter in things like trying to estimate
> the amount of time remaining to finish a task [1].
>
> [1]: https://www.xkcd.com/612/

+1

That is one reason I made pg_stat_replication.XXX_lag report the lag
of WAL that has been processed, not (say) the time until we catch up.
In some information-poor scenarios it interpolates which isn't perfect
but the general idea is that is shows you measurements of the past
(facts), not predictions about the future (theories).

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

24 August 2018, 08:04:15

On 2017/11/22 6:07, Peter Geoghegan wrote:
> On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Progress reporting on sorts seems like a tricky problem to me, as I
>> said before.  In most cases, a sort is going to involve an initial
>> stage where it reads all the input tuples and writes out quicksorted
>> runs, and then a merge phase where it merges all the output tapes into
>> a sorted result.  There are some complexities; for example, if the
>> number of tapes is really large, then we might need multiple merge
>> phases, only the last of which will produce tuples.
> 
> This would ordinarily be the point at which I'd say "but you're very
> unlikely to require multiple passes for an external sort these days".
> But I won't say that on this thread, because CLUSTER generally has
> unusually wide tuples, and so is much more likely to be I/O bound, to
> require multiple passes, etc. (I bet the v10 enhancements
> disproportionately improved CLUSTER performance.)

Hi,

I came back to develop the feature for community.
V4 patch is corrected these following points:

   - Rebase on master (143290efd)
   - Fix document
   - Replace the column name scan_index_relid with cluster_index_relid.
       Thanks to Jeff Janes!

I'm now working on improving the patch based on Robert's comment related to
"Seqscan and Sort case" and also considering how to handle the "Index scan case".

Please find attached file.

Regards,
Tatsuro Yamada

Attachment

progress_monitor_for_cluster_command_v4.patch

Re: [HACKERS] CLUSTER command progress monitor

From

Dmitry Dolgov

Date:

29 November 2018, 12:20:28

> On Fri, Aug 24, 2018 at 7:06 AM Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote:
>
> On 2017/11/22 6:07, Peter Geoghegan wrote:
> > On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> Progress reporting on sorts seems like a tricky problem to me, as I
> >> said before.  In most cases, a sort is going to involve an initial
> >> stage where it reads all the input tuples and writes out quicksorted
> >> runs, and then a merge phase where it merges all the output tapes into
> >> a sorted result.  There are some complexities; for example, if the
> >> number of tapes is really large, then we might need multiple merge
> >> phases, only the last of which will produce tuples.
> >
> > This would ordinarily be the point at which I'd say "but you're very
> > unlikely to require multiple passes for an external sort these days".
> > But I won't say that on this thread, because CLUSTER generally has
> > unusually wide tuples, and so is much more likely to be I/O bound, to
> > require multiple passes, etc. (I bet the v10 enhancements
> > disproportionately improved CLUSTER performance.)
>
> Hi,
>
> I came back to develop the feature for community.
> V4 patch is corrected these following points:
>
>    - Rebase on master (143290efd)
>    - Fix document
>    - Replace the column name scan_index_relid with cluster_index_relid.
>        Thanks to Jeff Janes!
>
> I'm now working on improving the patch based on Robert's comment related to
> "Seqscan and Sort case" and also considering how to handle the "Index scan case".

Thank you,

Unfortunately, this patch has some conflicts now, could you rebase it? Also
what's is the status of your work on improving it based on the
provided feedback?

In the meantime I'm moving it to the next CF.

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

03 December 2018, 09:22:20

On 2018/11/29 21:20, Dmitry Dolgov wrote:
>> On Fri, Aug 24, 2018 at 7:06 AM Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote:
>>
>> On 2017/11/22 6:07, Peter Geoghegan wrote:
>>> On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> Progress reporting on sorts seems like a tricky problem to me, as I
>>>> said before.  In most cases, a sort is going to involve an initial
>>>> stage where it reads all the input tuples and writes out quicksorted
>>>> runs, and then a merge phase where it merges all the output tapes into
>>>> a sorted result.  There are some complexities; for example, if the
>>>> number of tapes is really large, then we might need multiple merge
>>>> phases, only the last of which will produce tuples.
>>>
>>> This would ordinarily be the point at which I'd say "but you're very
>>> unlikely to require multiple passes for an external sort these days".
>>> But I won't say that on this thread, because CLUSTER generally has
>>> unusually wide tuples, and so is much more likely to be I/O bound, to
>>> require multiple passes, etc. (I bet the v10 enhancements
>>> disproportionately improved CLUSTER performance.)
>>
>> Hi,
>>
>> I came back to develop the feature for community.
>> V4 patch is corrected these following points:
>>
>>     - Rebase on master (143290efd)
>>     - Fix document
>>     - Replace the column name scan_index_relid with cluster_index_relid.
>>         Thanks to Jeff Janes!
>>
>> I'm now working on improving the patch based on Robert's comment related to
>> "Seqscan and Sort case" and also considering how to handle the "Index scan case".
> 
> Thank you,
> 
> Unfortunately, this patch has some conflicts now, could you rebase it? Also
> what's is the status of your work on improving it based on the
> provided feedback?
> 
> In the meantime I'm moving it to the next CF.

Thank you for managing the CF and Sorry for the late reply.
I'll rebase it for the next CF and also I'll clear my head because the patch
needs design change to address the feedbacks, I guess. Therefore, the status is
reconsidering the design of the patch. :)

Regards,
Tatsuro Yamada
NTT Open Source Software Center

Re: [HACKERS] CLUSTER command progress monitor

From

Alvaro Herrera

Date:

03 December 2018, 17:17:25

On 2018-Dec-03, Tatsuro Yamada wrote:

> > In the meantime I'm moving it to the next CF.
> 
> Thank you for managing the CF and Sorry for the late reply.
> I'll rebase it for the next CF and also I'll clear my head because the patch
> needs design change to address the feedbacks, I guess. Therefore, the status is
> reconsidering the design of the patch. :)

I think we should mark it as Returned with Feedback then.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

04 December 2018, 02:52:20

On Mon, Dec 03, 2018 at 02:17:25PM -0300, Alvaro Herrera wrote:
> I think we should mark it as Returned with Feedback then.

+1.
--
Michael

Attachment

signature.asc

Re: [HACKERS] CLUSTER command progress monitor

From

Alvaro Herrera

Date:

18 December 2018, 17:23:11

Hello Yamada-san,

On 2018-Dec-03, Tatsuro Yamada wrote:

> Thank you for managing the CF and Sorry for the late reply.
> I'll rebase it for the next CF and also I'll clear my head because the patch
> needs design change to address the feedbacks, I guess. Therefore, the status is
> reconsidering the design of the patch. :)

Do you have a new version of this patch?  If not, do you think you'll
have something in time for the upcoming commitfest?

Thanks

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

From

Alvaro Herrera

Date:

18 December 2018, 21:01:59

On 2017-Nov-21, Peter Geoghegan wrote:

> On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > Progress reporting on sorts seems like a tricky problem to me, as I
> > said before.  In most cases, a sort is going to involve an initial
> > stage where it reads all the input tuples and writes out quicksorted
> > runs, and then a merge phase where it merges all the output tapes into
> > a sorted result.  There are some complexities; for example, if the
> > number of tapes is really large, then we might need multiple merge
> > phases, only the last of which will produce tuples.
> 
> This would ordinarily be the point at which I'd say "but you're very
> unlikely to require multiple passes for an external sort these days".
> But I won't say that on this thread, because CLUSTER generally has
> unusually wide tuples, and so is much more likely to be I/O bound, to
> require multiple passes, etc. (I bet the v10 enhancements
> disproportionately improved CLUSTER performance.)

When the seqscan-and-sort strategy is used, we feed tuplesort with every
tuple from the scan.  Once that's completed, we call `performsort`, then
retrieve tuples.

If we see this in terms of tapes and merges, we can report the total
number of each of those that we have completed.  As far as I understand,
we write one tape to completion, and only then start another one, right?
Since there's no way to know how many tapes/merges are needed in total,
it's not possible to compute a percentage of completion.  That's seems
okay -- we're just telling the user that progress is being made, and we
only report facts not theory.  Perhaps we can (also?) indicate disk I/O
utilization, in terms of the number of blocks written by tuplesort.

I suppose that in order to have tuplesort.c report progress, we would
have to have some kind of API that tuplesort would invoke internally to
indicate events such as "tape started/completed", "merge started/completed".
One idea is to use a callback system; each tuplesort caller could
optionally pass a callback to the "begin" function, for progress
reporting purposes.  Initially only cluster.c would use it, but I
suppose eventually every tuplesort caller would want that.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

From

Peter Geoghegan

Date:

18 December 2018, 22:25:12

On Tue, Dec 18, 2018 at 1:02 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> If we see this in terms of tapes and merges, we can report the total
> number of each of those that we have completed.  As far as I understand,
> we write one tape to completion, and only then start another one, right?
> Since there's no way to know how many tapes/merges are needed in total,
> it's not possible to compute a percentage of completion.  That's seems
> okay -- we're just telling the user that progress is being made, and we
> only report facts not theory.  Perhaps we can (also?) indicate disk I/O
> utilization, in terms of the number of blocks written by tuplesort.

The number of blocks tuplesort uses is constant from the end of
initial run generation, since logtape.c will recycle blocks.

> I suppose that in order to have tuplesort.c report progress, we would
> have to have some kind of API that tuplesort would invoke internally to
> indicate events such as "tape started/completed", "merge started/completed".
> One idea is to use a callback system; each tuplesort caller could
> optionally pass a callback to the "begin" function, for progress
> reporting purposes.  Initially only cluster.c would use it, but I
> suppose eventually every tuplesort caller would want that.

I think that you could have a callback that did something with the
information currently reported by trace_sort. That's not a bad way of
scoping the problem. That's how I myself monitor the progress of a
sort, and it works pretty well (whether or not that means other people
can do it is not exactly clear to me).

We predict the number of merge passes within cost_sort() already. That
doesn't seem all that hard to generalize, so that you report the
expected number of passes against the current pass. Some passes are
much quicker than others, but you generally don't have that many with
realistic cases. I don't expect that it will work very well with an
internal sort, but in the case of CLUSTER that almost seems
irrelevant. And maybe even in all cases.

I think that the user is going to have to be willing to develop some
intuition about the progress for it to be all that useful. They're
really looking for something that gives a clue if they'll have to wait
an hour, a day, or a week, which it seems like trace_sort-like
information gives you some idea of. (BTW, dtrace probes can already
give the user much the same information -- I think that more people
should use those, since tracing technology on Linux has improved
drastically in the last few years.)

-- 
Peter Geoghegan

Re: [HACKERS] CLUSTER command progress monitor

From

Alvaro Herrera

Date:

18 December 2018, 22:47:36

On 2018-Dec-18, Peter Geoghegan wrote:

> On Tue, Dec 18, 2018 at 1:02 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > If we see this in terms of tapes and merges, we can report the total
> > number of each of those that we have completed.  As far as I understand,
> > we write one tape to completion, and only then start another one, right?
> > Since there's no way to know how many tapes/merges are needed in total,
> > it's not possible to compute a percentage of completion.  That's seems
> > okay -- we're just telling the user that progress is being made, and we
> > only report facts not theory.  Perhaps we can (also?) indicate disk I/O
> > utilization, in terms of the number of blocks written by tuplesort.
> 
> The number of blocks tuplesort uses is constant from the end of
> initial run generation, since logtape.c will recycle blocks.

Well, if you think about individual blocks in terms of storage space,
maybe that's true, but I meant in an Heraclitus way of men never
stepping into the same river -- the second time you write the block,
it's not the same block you wrote before, so you count it twice.  It's
not the actual disk space utilization that matters, but how much I/O
have you done (even if it is just to kernel cache, I suppose).

> > I suppose that in order to have tuplesort.c report progress, we would
> > have to have some kind of API that tuplesort would invoke internally to
> > indicate events such as "tape started/completed", "merge started/completed".
> > One idea is to use a callback system; each tuplesort caller could
> > optionally pass a callback to the "begin" function, for progress
> > reporting purposes.  Initially only cluster.c would use it, but I
> > suppose eventually every tuplesort caller would want that.
> 
> I think that you could have a callback that did something with the
> information currently reported by trace_sort. That's not a bad way of
> scoping the problem. That's how I myself monitor the progress of a
> sort, and it works pretty well (whether or not that means other people
> can do it is not exactly clear to me).

Thanks, that looks useful.

I suppose mapping such numbers to actual progress is a bit of an art (or
intuition as you say), but it seems to be the best we can do, if we do
anything at all.

> We predict the number of merge passes within cost_sort() already. That
> doesn't seem all that hard to generalize, so that you report the
> expected number of passes against the current pass. Some passes are
> much quicker than others, but you generally don't have that many with
> realistic cases. I don't expect that it will work very well with an
> internal sort, but in the case of CLUSTER that almost seems
> irrelevant. And maybe even in all cases.

How good are those predictions?  The feeling I get from this thread is
that if the estimation of the number of passes is unreliable, it's
better not to report it at all; just return how many we've done thus
far.  It's undesirable to report that we're about 150% done (or take
hours to get to 40% done, then suddenly be over).

I wonder if internal sorts are really all that interesting from the PoV
of progress reporting.  Also, I have the impression that quicksort isn't
very amenable to letting you know how much work is left.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

From

Peter Geoghegan

Date:

18 December 2018, 23:25:36

On Tue, Dec 18, 2018 at 2:47 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Well, if you think about individual blocks in terms of storage space,
> maybe that's true, but I meant in an Heraclitus way of men never
> stepping into the same river -- the second time you write the block,
> it's not the same block you wrote before, so you count it twice.  It's
> not the actual disk space utilization that matters, but how much I/O
> have you done (even if it is just to kernel cache, I suppose).

Right.

> I suppose mapping such numbers to actual progress is a bit of an art (or
> intuition as you say), but it seems to be the best we can do, if we do
> anything at all.

I think that it's fairly useful. I suspect that you don't have to have
my theoretical grounding in sorting to be able to do almost as well.
All you need is a little bit of experience.

> How good are those predictions?  The feeling I get from this thread is
> that if the estimation of the number of passes is unreliable, it's
> better not to report it at all; just return how many we've done thus
> far.  It's undesirable to report that we're about 150% done (or take
> hours to get to 40% done, then suddenly be over).

Maybe it isn't that reliable. But on second thought I think that it
might not matter, and maybe we should just not do that.

"How slow can I make this sort go by subtracting work_mem?" is a game
that I like to play sometimes. This blogpost plays that game, and
reaches some pretty amusing conclusions:

https://www.cybertec-postgresql.com/en/postgresql-improving-sort-performance/

It says that sorting numeric is 60% slower when you do an external
sort. But it's an apples to asteroids comparison, because the
comparison is made between 4MB of work_mem, and 1GB. I think that it's
pretty damn impressive that it's only 60% slower! Besides, even that
difference is probably on the high side of average, because numeric
abbreviated keys work particularly well, and you won't get the same
benefit with a unique numeric values when you happen to be doing a lot
of merging. If you tried the same experiment with integers, or even
text + abbreviated keys, I bet the difference would be a lot smaller.
Despite the huge gap in the amount of memory used.

On modern hardware, where doing some amount of random I/O is not that
noticeable, you'll have a very hard time finding a case where even a
paltry amount of memory with many passes does all that much worse than
an internal sort (OTOH, it's not hard to find cases where an external
sort is *faster*). Even if you make a generic estimate, it's still
probably going to be pretty good, because there just isn't that much
variation in how long the sort will take as you vary the amount of
memory it can use. Some people will be surprised at this, but it's a
pretty robust effect. (This is why I think that a hash_mem GUC might
be a good medium term solution that improves upon work_mem -- the
situation is dramatically different when it comes to hashing.)

My point is that you could offer users the kind of insight they'd find
very useful with only a very crude estimate of the amount of merging.
Even if it was 60% slower than initially projected, that's still not
an awful estimate to most users. That just leaves initial run
generation, but it's relatively easy to accurately estimate the amount
of initial runs. I rarely see a case where merging takes more than 40%
of the total, barring parallel CREATE INDEX.

> I wonder if internal sorts are really all that interesting from the PoV
> of progress reporting.  Also, I have the impression that quicksort isn't
> very amenable to letting you know how much work is left.

It is hard to predict the duration of one massive quicksort, but it's
seems fairly easy to recognize a kind of cadence across multiple
quicksorts/runs that each take seconds to a couple of minutes. That's
going to be the vast, vast majority of cases we care about.

-- 
Peter Geoghegan

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

19 December 2018, 06:48:27

Hi Alvaro,

On 2018/12/19 2:23, Alvaro Herrera wrote:
> Hello Yamada-san,
> 
> On 2018-Dec-03, Tatsuro Yamada wrote:
> 
>> Thank you for managing the CF and Sorry for the late reply.
>> I'll rebase it for the next CF and also I'll clear my head because the patch
>> needs design change to address the feedbacks, I guess. Therefore, the status is
>> reconsidering the design of the patch. :)
> 
> Do you have a new version of this patch?  If not, do you think you'll
> have something in time for the upcoming commitfest?

Not yet, I'll be able to send only a rebased patch by the end of this month.
I mean it has no design change because I can't catch up on how to get a progress
from sort and index scan. However I'm going to register the patch on next CF.
I'm happy if you have interested in the patch. :)

Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

28 December 2018, 08:17:06

Hi,

>> Do you have a new version of this patch?  If not, do you think you'll
>> have something in time for the upcoming commitfest?
> 
> Not yet, I'll be able to send only a rebased patch by the end of this month.
> I mean it has no design change because I can't catch up on how to get a progress
> from sort and index scan. However I'm going to register the patch on next CF.
> I'm happy if you have interested in the patch.


This patch is rebased on HEAD.
I'll tackle revising the patch based on feedbacks next month.

Happy holidays!
Tatsuro Yamada

Attachment

progress_monitor_for_cluster_command_v5.patch

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

22 February 2019, 21:02:34

On Fri, Dec 28, 2018 at 3:20 AM Tatsuro Yamada
<yamada.tatsuro@lab.ntt.co.jp> wrote:
> This patch is rebased on HEAD.
> I'll tackle revising the patch based on feedbacks next month.

+   Running <command>VACUUM FULL</command> is listed in
<structname>pg_stat_progress_cluster</structname>
+   view because it uses <command>CLUSTER</command> command
internally.  See <xref linkend='cluster-progress-reporting'>.

It's not really true to say that VACUUM FULL uses the CLUSTER command
internally.  It's not really true.  It uses a good chunk of the same
infrastructure, but it certainly doesn't use the actual command, and
it's not really the exact same thing either, because internally it's
doing a sequential scan but no sort, which never happens with CLUSTER.
I'm not sure exactly how to rephrase this, but I think we need to make
it more precise.

One idea is that maybe we should try to think of a design that could
also handle the rewriting variants of ALTER TABLE, and call it
pg_stat_progress_rewrite.  Maybe that's moving the goalposts too far,
but I'm not saying we'd necessarily have to do all the work now, just
have a view that we think could also handle that.  Then again, maybe
the needs are too different.

+   Whenever <command>CLUSTER</command> is running, the
+   <structname>pg_stat_progress_cluster</structname> view will contain
+   one row for each backend that is currently clustering or vacuuming
(VACUUM FULL).

That sentence contradicts itself.  Just say that it contains a row for
each backend that is currently running CLUSTER or VACUUM FULL.

@@ -105,6 +107,7 @@ static void reform_and_rewrite_tuple(HeapTuple tuple,
 void
 cluster(ClusterStmt *stmt, bool isTopLevel)
 {
+
  if (stmt->relation != NULL)
  {
  /* This is the single-relation case. */

Useless hunk.

@@ -186,7 +189,9 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
  heap_close(rel, NoLock);

  /* Do the job. */
+ pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
  cluster_rel(tableOid, indexOid, stmt->options);
+ pgstat_progress_end_command();
  }
  else
  {

It seems like that stuff should be inside cluster_rel().

+ /* Set reltuples to total_tuples */
+ pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_TUPLES,
OldHeap->rd_rel->reltuples);

I object.  If the user wants that, they can get it from pg_class
themselves via an SQL query.  It's also an estimate, not something we
know to be accurate; I want us to only report facts here, not theories

+ pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP);
+ pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD,
PROGRESS_CLUSTER_METHOD_INDEX_SCAN);

I think you should use pgstat_progress_update_multi_param() if
updating multiple parameters at the same time.

Also, some lines in this patch, such as this one, are very long.
Consider techniques to reduce the line length to 80 characters or
less, such as inserting a line break between the two arguments.

+ /* Set scan_method to NULL */
+ pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, -1);

NULL and -1 are not the same thing.

I think that we shouldn't have both
PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP and
PROGRESS_CLUSTER_PHASE_SCAN_HEAP.  They're the same thing.  Let's just
use PROGRESS_CLUSTER_PHASE_SCAN_HEAP for both.  Actually, better yet,
let's get rid of PROGRESS_CLUSTER_SCAN_METHOD and have
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP and
PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP.  That seems noticeably
simpler.

I agree that it's acceptable to report
PROGRESS_CLUSTER_HEAP_TUPLES_VACUUMED and
PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD, but I'm not sure I
understand why it's valuable to do so in the context of a progress
indicator.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

01 March 2019, 04:53:55

On 2019/02/23 6:02, Robert Haas wrote:
> On Fri, Dec 28, 2018 at 3:20 AM Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>> This patch is rebased on HEAD.
>> I'll tackle revising the patch based on feedbacks next month.
> 
> +   Running <command>VACUUM FULL</command> is listed in
> <structname>pg_stat_progress_cluster</structname>
> +   view because it uses <command>CLUSTER</command> command
> internally.  See <xref linkend='cluster-progress-reporting'>.
> 
> It's not really true to say that VACUUM FULL uses the CLUSTER command
> internally.  It's not really true.  It uses a good chunk of the same
> infrastructure, but it certainly doesn't use the actual command, and
> it's not really the exact same thing either, because internally it's
> doing a sequential scan but no sort, which never happens with CLUSTER.
> I'm not sure exactly how to rephrase this, but I think we need to make
> it more precise.
> 
> One idea is that maybe we should try to think of a design that could
> also handle the rewriting variants of ALTER TABLE, and call it
> pg_stat_progress_rewrite.  Maybe that's moving the goalposts too far,
> but I'm not saying we'd necessarily have to do all the work now, just
> have a view that we think could also handle that.  Then again, maybe
> the needs are too different.
> 

Hmm..., I see.
If possible, I'd like to stop thinking of VACUUM FULL to avoid complication of
the implementation.
For now, I haven't enough time to design pg_stat_progress_rewrite. I suppose that
it's tough work.


> +   Whenever <command>CLUSTER</command> is running, the
> +   <structname>pg_stat_progress_cluster</structname> view will contain
> +   one row for each backend that is currently clustering or vacuuming
> (VACUUM FULL).
> 
> That sentence contradicts itself.  Just say that it contains a row for
> each backend that is currently running CLUSTER or VACUUM FULL.
> 

Fixed.


> @@ -105,6 +107,7 @@ static void reform_and_rewrite_tuple(HeapTuple tuple,
>   void
>   cluster(ClusterStmt *stmt, bool isTopLevel)
>   {
> +
>    if (stmt->relation != NULL)
>    {
>    /* This is the single-relation case. */
> 
> Useless hunk.
> 

Fixed.


> @@ -186,7 +189,9 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
>    heap_close(rel, NoLock);
> 
>    /* Do the job. */
> + pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
>    cluster_rel(tableOid, indexOid, stmt->options);
> + pgstat_progress_end_command();
>    }
>    else
>    {
> 
> It seems like that stuff should be inside cluster_rel().
>

Fixed.


> + /* Set reltuples to total_tuples */
> + pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_TUPLES,
> OldHeap->rd_rel->reltuples);
> 
> I object.  If the user wants that, they can get it from pg_class
> themselves via an SQL query.  It's also an estimate, not something we
> know to be accurate; I want us to only report facts here, not theories

I understand that progress monitor should only report facts, so I
removed that code.



> + pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
> PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP);
> + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD,
> PROGRESS_CLUSTER_METHOD_INDEX_SCAN);
> 
> I think you should use pgstat_progress_update_multi_param() if
> updating multiple parameters at the same time.
> 
> Also, some lines in this patch, such as this one, are very long.
> Consider techniques to reduce the line length to 80 characters or
> less, such as inserting a line break between the two arguments.

Fixed.


> + /* Set scan_method to NULL */
> + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, -1);
> 
> NULL and -1 are not the same thing.

Oops, fixed.


> I think that we shouldn't have both
> PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP and
> PROGRESS_CLUSTER_PHASE_SCAN_HEAP.  They're the same thing.  Let's just
> use PROGRESS_CLUSTER_PHASE_SCAN_HEAP for both.  Actually, better yet,
> let's get rid of PROGRESS_CLUSTER_SCAN_METHOD and have
> PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP and
> PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP.  That seems noticeably
> simpler.

Fixed.


> I agree that it's acceptable to report
> PROGRESS_CLUSTER_HEAP_TUPLES_VACUUMED and
> PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD, but I'm not sure I
> understand why it's valuable to do so in the context of a progress
> indicator.

Actually, I'm not sure why I added it since so much time has passed. :(
So, I'll remove PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD at least.


Attached patch is wip patch.


Thanks!
Tatsuro Yamada

Attachment

progress_monitor_for_cluster_command_v6.patch

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

01 March 2019, 05:17:51

> Attached patch is wip patch.

Is it possible to remove the following patch?
Because I registered the patch twice on CF Mar.

    https://commitfest.postgresql.org/22/2049/

Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

From

Etsuro Fujita

Date:

01 March 2019, 05:48:42

(2019/03/01 14:17), Tatsuro Yamada wrote:
>> Attached patch is wip patch.
>
> Is it possible to remove the following patch?
> Because I registered the patch twice on CF Mar.
>
> https://commitfest.postgresql.org/22/2049/

Please remove the above and keep this:

https://commitfest.postgresql.org/22/1779/

which I moved from the January CF to the March one on behalf of him.

Best regards,
Etsuro Fujita

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

01 March 2019, 19:15:12

On Thu, Feb 28, 2019 at 11:54 PM Tatsuro Yamada
<yamada.tatsuro@lab.ntt.co.jp> wrote:
> Attached patch is wip patch.

+       <command>CLUSTER</command> and <command>VACUUM FULL</command>,
showing current progress.

and -> or

+   certain commands during command execution.  Currently, the suppoted
+   progress reporting commands are <command>VACUUM</command> and
<command>CLUSTER</command>.

suppoted -> supported

But I'd just say: Currently, the only commands which support progress
reporting are <command>VACUUM</command> and
<command>CLUSTER</command>.

+   Running <command>VACUUM FULL</command> is listed in
<structname>pg_stat_progress_cluster</structname>
+   view because it uses <command>CLUSTER</command> command
internally.  See <xref linkend='cluster-progress-reporting'>.

How about: Running <command>VACUUM FULL</command> is listed in
<structname>pg_stat_progress_cluster</structname> because both
<command>VACUUM FULL</command> and <command>CLUSTER</command> rewrite
the table, while regular <command>VACUUM</command> only modifies it in
place.

+       Current processing command: CLUSTER/VACUUM FULL.

The command that is running.  Either CLUSTER or VACUUM FULL.

+       Current processing phase of cluster/vacuum full.  See <xref
linkend='cluster-phases'> or <xref linkend='vacuum-full-phases'>.

Current processing phase of CLUSTER or VACUUM FULL.

Or maybe better, just abbreviate to: Current processing phase.

+       Scan method of table: index scan/seq scan.

Eh, shouldn't this be gone now?  And likewise for the view definition?

+       OID of the index.

If the table is being scanned using an index, this is the OID of the
index being used; otherwise, it is zero.

+     <entry><structfield>heap_tuples_total</structfield></entry>

Leftovers.  Skipping over the rest of your documentation changes since
it looks like a bunch of things there still need to be updated.

+ pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);

This now appears inside cluster_rel(), but also vacuum_rel() is still
doing the same thing.  That's wrong.

+ if(OidIsValid(indexOid))

Missing space.  Please pgindent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

04 March 2019, 10:31:21

On 2019/03/02 4:15, Robert Haas wrote:
> On Thu, Feb 28, 2019 at 11:54 PM Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>> Attached patch is wip patch.


Thanks for your comments! :)
I revised the code and the document.


> +       <command>CLUSTER</command> and <command>VACUUM FULL</command>,
> showing current progress.
> 
> and -> or

Fixed.


> +   certain commands during command execution.  Currently, the suppoted
> +   progress reporting commands are <command>VACUUM</command> and
> <command>CLUSTER</command>.
> 
> suppoted -> supported
> 
> But I'd just say: Currently, the only commands which support progress
> reporting are <command>VACUUM</command> and
> <command>CLUSTER</command>.

I choose the latter. Fixed.


> +   Running <command>VACUUM FULL</command> is listed in
> <structname>pg_stat_progress_cluster</structname>
> +   view because it uses <command>CLUSTER</command> command
> internally.  See <xref linkend='cluster-progress-reporting'>.
> 
> How about: Running <command>VACUUM FULL</command> is listed in
> <structname>pg_stat_progress_cluster</structname> because both
> <command>VACUUM FULL</command> and <command>CLUSTER</command> rewrite
> the table, while regular <command>VACUUM</command> only modifies it in
> place.

Fixed.


> +       Current processing command: CLUSTER/VACUUM FULL.
> 
> The command that is running.  Either CLUSTER or VACUUM FULL.

Fixed.


> +       Current processing phase of cluster/vacuum full.  See <xref
> linkend='cluster-phases'> or <xref linkend='vacuum-full-phases'>.
> 
> Current processing phase of CLUSTER or VACUUM FULL.
> 
> Or maybe better, just abbreviate to: Current processing phase.

Fixed as you suggested.


> +       Scan method of table: index scan/seq scan.
> 
> Eh, shouldn't this be gone now?  And likewise for the view definition?

Fixed. Sorry, It was an oversight.


> +       OID of the index.
> 
> If the table is being scanned using an index, this is the OID of the
> index being used; otherwise, it is zero.

Fixed.


> +     <entry><structfield>heap_tuples_total</structfield></entry>
> 
> Leftovers.  Skipping over the rest of your documentation changes since
> it looks like a bunch of things there still need to be updated.

I agree. Thanks a lot!
I'll divide the patch into two patch such as code and document.


> + pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
> 
> This now appears inside cluster_rel(), but also vacuum_rel() is still
> doing the same thing.  That's wrong.

It was an oversight too. I fixed.


> + if(OidIsValid(indexOid))
> 
> Missing space.  Please pgindent.

Fixed.
I Will do pgindent later.


Please find attached files. :)

Thanks,
Tatsuro Yamada

Attachment

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

04 March 2019, 10:37:40

On 2019/03/02 4:15, Robert Haas wrote:
> On Thu, Feb 28, 2019 at 11:54 PM Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>> Attached patch is wip patch.

I rewrote the current design of the progress monitor and also
wrote discussion points in the middle of this email. I'd like to
get any feedback from -hackers.


=== Current design ===

CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
Depending on which one is chosen, the command will proceed in the
following sequence of phases:

   * Scan method: Seq Scan
     0. initializing                 (*2)
     1. seq scanning heap            (*1)
     3. sorting tuples               (*2)
     4. writing new heap             (*1)
     5. swapping relation files      (*2)
     6. rebuilding index             (*2)
     7. performing final cleanup     (*2)

   * Scan method: Index Scan
     0. initializing                 (*2)
     2. index scanning heap          (*1)
     5. swapping relation files      (*2)
     6. rebuilding index             (*2)
     7. performing final cleanup     (*2)

VACUUM FULL command will proceed in the following sequence of phases:

     1. seq scanning heap            (*1)
     5. swapping relation files      (*2)
     6. rebuilding index             (*2)
     7. performing final cleanup     (*2)

(*1): increasing the value in heap_tuples_scanned column
(*2): only shows the phase in the phase column

The view provides the information of CLUSTER command progress details as follows
# \d pg_stat_progress_cluster
               View "pg_catalog.pg_stat_progress_cluster"
           Column           |  Type   | Collation | Nullable | Default
---------------------------+---------+-----------+----------+---------
  pid                       | integer |           |          |
  datid                     | oid     |           |          |
  datname                   | name    |           |          |
  relid                     | oid     |           |          |
  command                   | text    |           |          |
  phase                     | text    |           |          |
  cluster_index_relid       | bigint  |           |          |
  heap_tuples_scanned       | bigint  |           |          |
  heap_tuples_vacuumed      | bigint  |           |          |


=== Discussion points ===

  - Progress counter for "3. sorting tuples" phase
     - Should we add pgstat_progress_update_param() in tuplesort.c like a
       "trace_sort"?
       Thanks to Peter Geoghegan for the useful advice!

  - Progress counter for "6. rebuilding index" phase
     - Should we add "index_vacuum_count" in the view like a vacuum progress monitor?
       If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c.
       However, I'm not sure whether it is okay or not.

  - pg_stat_progress_rewrite
     - TBA


=== My test case ===

I share my test case of progress monitor.
If someone wants to watch the current progress monitor, you can use
this test case as a example.

[Terminal1]
Run this query on psql:

    select * from pg_stat_progress_cluster; \watch 0.05

[Terminal2]
Run these queries on psql:

drop table t1;

create table t1 as select a, random() * 1000 as b from generate_series(0, 999999) a;
create index idx_t1 on t1(a);
create index idx_t1_b on t1(b);
analyze t1;

-- index scan
set enable_seqscan to off;
cluster verbose t1 using idx_t1;

-- seq scan
set enable_seqscan to on;
set enable_indexscan to off;
cluster verbose t1 using idx_t1;

-- only given table name to cluster command
cluster verbose t1;

-- only cluster command
cluster verbose;

-- vacuum full
vacuum full t1;

-- vacuum full
vacuum full;




Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

05 March 2019, 02:35:37

On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
<yamada.tatsuro@lab.ntt.co.jp> wrote:
> === Current design ===
>
> CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
> Depending on which one is chosen, the command will proceed in the
> following sequence of phases:
>
>    * Scan method: Seq Scan
>      0. initializing                 (*2)
>      1. seq scanning heap            (*1)
>      3. sorting tuples               (*2)
>      4. writing new heap             (*1)
>      5. swapping relation files      (*2)
>      6. rebuilding index             (*2)
>      7. performing final cleanup     (*2)
>
>    * Scan method: Index Scan
>      0. initializing                 (*2)
>      2. index scanning heap          (*1)
>      5. swapping relation files      (*2)
>      6. rebuilding index             (*2)
>      7. performing final cleanup     (*2)
>
> VACUUM FULL command will proceed in the following sequence of phases:
>
>      1. seq scanning heap            (*1)
>      5. swapping relation files      (*2)
>      6. rebuilding index             (*2)
>      7. performing final cleanup     (*2)
>
> (*1): increasing the value in heap_tuples_scanned column
> (*2): only shows the phase in the phase column

All of that sounds good.

> The view provides the information of CLUSTER command progress details as follows
> # \d pg_stat_progress_cluster
>                View "pg_catalog.pg_stat_progress_cluster"
>            Column           |  Type   | Collation | Nullable | Default
> ---------------------------+---------+-----------+----------+---------
>   pid                       | integer |           |          |
>   datid                     | oid     |           |          |
>   datname                   | name    |           |          |
>   relid                     | oid     |           |          |
>   command                   | text    |           |          |
>   phase                     | text    |           |          |
>   cluster_index_relid       | bigint  |           |          |
>   heap_tuples_scanned       | bigint  |           |          |
>   heap_tuples_vacuumed      | bigint  |           |          |

Still not sure if we need heap_tuples_vacuumed.  We could try to
report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
we're using a Seq Scan.

> === Discussion points ===
>
>   - Progress counter for "3. sorting tuples" phase
>      - Should we add pgstat_progress_update_param() in tuplesort.c like a
>        "trace_sort"?
>        Thanks to Peter Geoghegan for the useful advice!

How would we avoid an abstraction violation?

>   - Progress counter for "6. rebuilding index" phase
>      - Should we add "index_vacuum_count" in the view like a vacuum progress monitor?
>        If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c.
>        However, I'm not sure whether it is okay or not.

Doesn't seem unreasonable to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: [HACKERS] CLUSTER command progress monitor

From

David Steele

Date:

05 March 2019, 08:29:42

On 3/1/19 7:48 AM, Etsuro Fujita wrote:
> (2019/03/01 14:17), Tatsuro Yamada wrote:
>>> Attached patch is wip patch.
>>
>> Is it possible to remove the following patch?
>> Because I registered the patch twice on CF Mar.
>>
>> https://commitfest.postgresql.org/22/2049/
> 
> Please remove the above and keep this:
> 
> https://commitfest.postgresql.org/22/1779/
> 
> which I moved from the January CF to the March one on behalf of him.

I have closed the duplicate entry (#2049) and retained the entry which 
contains the CF history (#1779).

Regards,
-- 
-David
david@pgmasters.net

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

05 March 2019, 08:37:17

Hi David,

On 2019/03/05 17:29, David Steele wrote:
> On 3/1/19 7:48 AM, Etsuro Fujita wrote:
>> (2019/03/01 14:17), Tatsuro Yamada wrote:
>>>> Attached patch is wip patch.
>>>
>>> Is it possible to remove the following patch?
>>> Because I registered the patch twice on CF Mar.
>>>
>>> https://commitfest.postgresql.org/22/2049/
>>
>> Please remove the above and keep this:
>>
>> https://commitfest.postgresql.org/22/1779/
>>
>> which I moved from the January CF to the March one on behalf of him.
> 
> I have closed the duplicate entry (#2049) and retained the entry which contains the CF history (#1779).


Thank you! :)


Regards,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

05 March 2019, 08:56:39

Hi Robert!

On 2019/03/05 11:35, Robert Haas wrote:
> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>> === Current design ===
>>
>> CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
>> Depending on which one is chosen, the command will proceed in the
>> following sequence of phases:
>>
>>     * Scan method: Seq Scan
>>       0. initializing                 (*2)
>>       1. seq scanning heap            (*1)
>>       3. sorting tuples               (*2)
>>       4. writing new heap             (*1)
>>       5. swapping relation files      (*2)
>>       6. rebuilding index             (*2)
>>       7. performing final cleanup     (*2)
>>
>>     * Scan method: Index Scan
>>       0. initializing                 (*2)
>>       2. index scanning heap          (*1)
>>       5. swapping relation files      (*2)
>>       6. rebuilding index             (*2)
>>       7. performing final cleanup     (*2)
>>
>> VACUUM FULL command will proceed in the following sequence of phases:
>>
>>       1. seq scanning heap            (*1)
>>       5. swapping relation files      (*2)
>>       6. rebuilding index             (*2)
>>       7. performing final cleanup     (*2)
>>
>> (*1): increasing the value in heap_tuples_scanned column
>> (*2): only shows the phase in the phase column
> 
> All of that sounds good.
> 
>> The view provides the information of CLUSTER command progress details as follows
>> # \d pg_stat_progress_cluster
>>                 View "pg_catalog.pg_stat_progress_cluster"
>>             Column           |  Type   | Collation | Nullable | Default
>> ---------------------------+---------+-----------+----------+---------
>>    pid                       | integer |           |          |
>>    datid                     | oid     |           |          |
>>    datname                   | name    |           |          |
>>    relid                     | oid     |           |          |
>>    command                   | text    |           |          |
>>    phase                     | text    |           |          |
>>    cluster_index_relid       | bigint  |           |          |
>>    heap_tuples_scanned       | bigint  |           |          |
>>    heap_tuples_vacuumed      | bigint  |           |          |
> 
> Still not sure if we need heap_tuples_vacuumed.  We could try to
> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
> we're using a Seq Scan.

I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in
next patch.

Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to
get those from initscan(). I'll investigate it more.

cluster.c
   copy_heap_data()
     heap_beginscan()
       heap_beginscan_internal()
         initscan()



>> === Discussion points ===
>>
>>    - Progress counter for "3. sorting tuples" phase
>>       - Should we add pgstat_progress_update_param() in tuplesort.c like a
>>         "trace_sort"?
>>         Thanks to Peter Geoghegan for the useful advice!
> 
> How would we avoid an abstraction violation?

Hmm... What do you mean an abstraction violation?
If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples.


>>    - Progress counter for "6. rebuilding index" phase
>>       - Should we add "index_vacuum_count" in the view like a vacuum progress monitor?
>>         If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c.
>>         However, I'm not sure whether it is okay or not.
> 
> Doesn't seem unreasonable to me.

I see, I'll add it later.


Regards,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

05 March 2019, 16:13:28

On Tue, Mar 5, 2019 at 3:56 AM Tatsuro Yamada
<yamada.tatsuro@lab.ntt.co.jp> wrote:
> >> === Discussion points ===
> >>
> >>    - Progress counter for "3. sorting tuples" phase
> >>       - Should we add pgstat_progress_update_param() in tuplesort.c like a
> >>         "trace_sort"?
> >>         Thanks to Peter Geoghegan for the useful advice!
> >
> > How would we avoid an abstraction violation?
>
> Hmm... What do you mean an abstraction violation?
> If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples.

What I mean is... I think it would be useful to have this counter, but
I'm not sure how the tuplesort code would know to update the counter
in this case and not in other cases.  The tuplesort code is used for
lots of things; we can't update a counter for CLUSTER if the tuplesort
is being used for CREATE INDEX or a Sort node in a query or whatever.
So my question is how we would indicate to the tuplesort that it needs
to do the counter update, and whether that would end up making for
ugly code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

From

Alvaro Herrera

Date:

06 March 2019, 01:03:38

On 2019-Mar-04, Robert Haas wrote:

> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:

> > === Discussion points ===
> >
> >   - Progress counter for "3. sorting tuples" phase
> >      - Should we add pgstat_progress_update_param() in tuplesort.c like a
> >        "trace_sort"?
> >        Thanks to Peter Geoghegan for the useful advice!
> 
> How would we avoid an abstraction violation?

The theory embodied in my patch at https://postgr.es/m/20190304204607.GA15946@alvherre.pgsql
is that we don't; tuplesort.c functions (index.c's IndexBuildHeapScan in
my case) would get a boolean parameter to indicate whether to update
some params or not -- the param number(s) to update are supposed to be
generic in the sense that it's not part of any individual command's
implementation (PROGRESS_SCAN_BLOCKS_DONE for what you call "blks
scanned", PROGRESS_SCAN_BLOCKS_TOTAL for "blks total"), but rather
defined by the "progress update provider" (index.c or tuplesort.c).

One, err, small issue with that idea is that we need the param numbers
not to conflict for any "progress update providers" that are to be used
simultaneously by any command.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

06 March 2019, 06:25:48

On 2019/03/06 1:13, Robert Haas wrote:
> On Tue, Mar 5, 2019 at 3:56 AM Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>>>> === Discussion points ===
>>>>
>>>>     - Progress counter for "3. sorting tuples" phase
>>>>        - Should we add pgstat_progress_update_param() in tuplesort.c like a
>>>>          "trace_sort"?
>>>>          Thanks to Peter Geoghegan for the useful advice!
>>>
>>> How would we avoid an abstraction violation?
>>
>> Hmm... What do you mean an abstraction violation?
>> If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples.
> 
> What I mean is... I think it would be useful to have this counter, but
> I'm not sure how the tuplesort code would know to update the counter
> in this case and not in other cases.  The tuplesort code is used for
> lots of things; we can't update a counter for CLUSTER if the tuplesort
> is being used for CREATE INDEX or a Sort node in a query or whatever.
> So my question is how we would indicate to the tuplesort that it needs
> to do the counter update, and whether that would end up making for
> ugly code.

Thanks for your explanation!
I understood that now. I guess it means an API to get a progress for sort processing,
so let us just put it aside now. I'd like to leave that as is for an appropriate person.

Regards,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

06 March 2019, 06:38:54

On 2019/03/05 17:56, Tatsuro Yamada wrote:
> Hi Robert!
> 
> On 2019/03/05 11:35, Robert Haas wrote:
>> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
>> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>>> === Current design ===
>>>
>>> CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
>>> Depending on which one is chosen, the command will proceed in the
>>> following sequence of phases:
>>>
>>>     * Scan method: Seq Scan
>>>       0. initializing                 (*2)
>>>       1. seq scanning heap            (*1)
>>>       3. sorting tuples               (*2)
>>>       4. writing new heap             (*1)
>>>       5. swapping relation files      (*2)
>>>       6. rebuilding index             (*2)
>>>       7. performing final cleanup     (*2)
>>>
>>>     * Scan method: Index Scan
>>>       0. initializing                 (*2)
>>>       2. index scanning heap          (*1)
>>>       5. swapping relation files      (*2)
>>>       6. rebuilding index             (*2)
>>>       7. performing final cleanup     (*2)
>>>
>>> VACUUM FULL command will proceed in the following sequence of phases:
>>>
>>>       1. seq scanning heap            (*1)
>>>       5. swapping relation files      (*2)
>>>       6. rebuilding index             (*2)
>>>       7. performing final cleanup     (*2)
>>>
>>> (*1): increasing the value in heap_tuples_scanned column
>>> (*2): only shows the phase in the phase column
>>
>> All of that sounds good.
>>
>>> The view provides the information of CLUSTER command progress details as follows
>>> # \d pg_stat_progress_cluster
>>>                 View "pg_catalog.pg_stat_progress_cluster"
>>>             Column           |  Type   | Collation | Nullable | Default
>>> ---------------------------+---------+-----------+----------+---------
>>>    pid                       | integer |           |          |
>>>    datid                     | oid     |           |          |
>>>    datname                   | name    |           |          |
>>>    relid                     | oid     |           |          |
>>>    command                   | text    |           |          |
>>>    phase                     | text    |           |          |
>>>    cluster_index_relid       | bigint  |           |          |
>>>    heap_tuples_scanned       | bigint  |           |          |
>>>    heap_tuples_vacuumed      | bigint  |           |          |
>>
>> Still not sure if we need heap_tuples_vacuumed.  We could try to
>> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
>> we're using a Seq Scan.
> 
> I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in
> next patch.
> 
> Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to
> get those from initscan(). I'll investigate it more.
> 
> cluster.c
>    copy_heap_data()
>      heap_beginscan()
>        heap_beginscan_internal()
>          initscan()
> 
> 
> 
>>> === Discussion points ===
>>>
>>>    - Progress counter for "3. sorting tuples" phase
>>>       - Should we add pgstat_progress_update_param() in tuplesort.c like a
>>>         "trace_sort"?
>>>         Thanks to Peter Geoghegan for the useful advice!
>>
>> How would we avoid an abstraction violation?
> 
> Hmm... What do you mean an abstraction violation?
> If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples.
> 
> 
>>>    - Progress counter for "6. rebuilding index" phase
>>>       - Should we add "index_vacuum_count" in the view like a vacuum progress monitor?
>>>         If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c.
>>>         However, I'm not sure whether it is okay or not.
>>
>> Doesn't seem unreasonable to me.
> 
> I see, I'll add it later.


Attached file is revised and WIP patch including:

   - Remove heap_tuples_vacuumed
   - Add heap_blks_scanned and heap_blks_total
   - Add index_vacuum_count

I tried to "add heap_blks_scanned and heap_blks_total" columns and I realized that
"heap_tuples_scanned" column is suitable as a counter when a scan method is
both index-scan and seq-scan because CLUSTER is on a tuple basis.



Regards,
Tatsuro Yamada

Attachment

progress_monitor_for_cluster_command_v8_code.patch

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

06 March 2019, 14:52:43

On Tue, Mar 5, 2019 at 8:03 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> One, err, small issue with that idea is that we need the param numbers
> not to conflict for any "progress update providers" that are to be used
> simultaneously by any command.

Is that really an issue?  I think progress reporting -- at least with
the current infrastructure -- is only ever going to be possible for
utility commands, not queries.  And those really shouldn't have very
many sorts going on at once.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

From

Alvaro Herrera

Date:

06 March 2019, 15:13:16

On 2019-Mar-06, Robert Haas wrote:

> On Tue, Mar 5, 2019 at 8:03 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > One, err, small issue with that idea is that we need the param numbers
> > not to conflict for any "progress update providers" that are to be used
> > simultaneously by any command.
> 
> Is that really an issue?  I think progress reporting -- at least with
> the current infrastructure -- is only ever going to be possible for
> utility commands, not queries.  And those really shouldn't have very
> many sorts going on at once.

Well, I don't think it is, but I thought it was worth pointing out
explicitly.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

08 March 2019, 08:13:56

On 2019/03/06 15:38, Tatsuro Yamada wrote:
> On 2019/03/05 17:56, Tatsuro Yamada wrote:
>> On 2019/03/05 11:35, Robert Haas wrote:
>>> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
>>> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>>>> === Current design ===
>>>>
>>>> CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
>>>> Depending on which one is chosen, the command will proceed in the
>>>> following sequence of phases:
>>>>
>>>>     * Scan method: Seq Scan
>>>>       0. initializing                 (*2)
>>>>       1. seq scanning heap            (*1)
>>>>       3. sorting tuples               (*2)
>>>>       4. writing new heap             (*1)
>>>>       5. swapping relation files      (*2)
>>>>       6. rebuilding index             (*2)
>>>>       7. performing final cleanup     (*2)
>>>>
>>>>     * Scan method: Index Scan
>>>>       0. initializing                 (*2)
>>>>       2. index scanning heap          (*1)
>>>>       5. swapping relation files      (*2)
>>>>       6. rebuilding index             (*2)
>>>>       7. performing final cleanup     (*2)
>>>>
>>>> VACUUM FULL command will proceed in the following sequence of phases:
>>>>
>>>>       1. seq scanning heap            (*1)
>>>>       5. swapping relation files      (*2)
>>>>       6. rebuilding index             (*2)
>>>>       7. performing final cleanup     (*2)
>>>>
>>>> (*1): increasing the value in heap_tuples_scanned column
>>>> (*2): only shows the phase in the phase column
>>>
>>> All of that sounds good.
>>>
>>>> The view provides the information of CLUSTER command progress details as follows
>>>> # \d pg_stat_progress_cluster
>>>>                 View "pg_catalog.pg_stat_progress_cluster"
>>>>             Column           |  Type   | Collation | Nullable | Default
>>>> ---------------------------+---------+-----------+----------+---------
>>>>    pid                       | integer |           |          |
>>>>    datid                     | oid     |           |          |
>>>>    datname                   | name    |           |          |
>>>>    relid                     | oid     |           |          |
>>>>    command                   | text    |           |          |
>>>>    phase                     | text    |           |          |
>>>>    cluster_index_relid       | bigint  |           |          |
>>>>    heap_tuples_scanned       | bigint  |           |          |
>>>>    heap_tuples_vacuumed      | bigint  |           |          |
>>>
>>> Still not sure if we need heap_tuples_vacuumed.  We could try to
>>> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
>>> we're using a Seq Scan.
>>
>> I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in
>> next patch.
>>
>> Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to
>> get those from initscan(). I'll investigate it more.
>>
>> cluster.c
>>    copy_heap_data()
>>      heap_beginscan()
>>        heap_beginscan_internal()
>>          initscan()
>>
>>
>>
>>>> === Discussion points ===
>>>>
>>>>    - Progress counter for "3. sorting tuples" phase
>>>>       - Should we add pgstat_progress_update_param() in tuplesort.c like a
>>>>         "trace_sort"?
>>>>         Thanks to Peter Geoghegan for the useful advice!
>>>
>>> How would we avoid an abstraction violation?
>>
>> Hmm... What do you mean an abstraction violation?
>> If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples.
>>
>>
>>>>    - Progress counter for "6. rebuilding index" phase
>>>>       - Should we add "index_vacuum_count" in the view like a vacuum progress monitor?
>>>>         If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c.
>>>>         However, I'm not sure whether it is okay or not.
>>>
>>> Doesn't seem unreasonable to me.
>>
>> I see, I'll add it later.
> 
> 
> Attached file is revised and WIP patch including:
> 
>    - Remove heap_tuples_vacuumed
>    - Add heap_blks_scanned and heap_blks_total
>    - Add index_vacuum_count
> 
> I tried to "add heap_blks_scanned and heap_blks_total" columns and I realized that
> "heap_tuples_scanned" column is suitable as a counter when a scan method is
> both index-scan and seq-scan because CLUSTER is on a tuple basis.


Attached file is rebased patch on current HEAD.
I changed a status. :)


Regards,
Tatsuro Yamada

Attachment

Re: [HACKERS] CLUSTER command progress monitor

From

Rafia Sabih

Date:

18 March 2019, 11:42:23

On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada
<yamada.tatsuro@lab.ntt.co.jp> wrote:
>
> On 2019/03/06 15:38, Tatsuro Yamada wrote:
> > On 2019/03/05 17:56, Tatsuro Yamada wrote:
> >> On 2019/03/05 11:35, Robert Haas wrote:
> >>> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
> >>> <yamada.tatsuro@lab.ntt.co.jp> wrote:
> >>>> === Current design ===
> >>>>
> >>>> CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
> >>>> Depending on which one is chosen, the command will proceed in the
> >>>> following sequence of phases:
> >>>>
> >>>>     * Scan method: Seq Scan
> >>>>       0. initializing                 (*2)
> >>>>       1. seq scanning heap            (*1)
> >>>>       3. sorting tuples               (*2)
> >>>>       4. writing new heap             (*1)
> >>>>       5. swapping relation files      (*2)
> >>>>       6. rebuilding index             (*2)
> >>>>       7. performing final cleanup     (*2)
> >>>>
> >>>>     * Scan method: Index Scan
> >>>>       0. initializing                 (*2)
> >>>>       2. index scanning heap          (*1)
> >>>>       5. swapping relation files      (*2)
> >>>>       6. rebuilding index             (*2)
> >>>>       7. performing final cleanup     (*2)
> >>>>
> >>>> VACUUM FULL command will proceed in the following sequence of phases:
> >>>>
> >>>>       1. seq scanning heap            (*1)
> >>>>       5. swapping relation files      (*2)
> >>>>       6. rebuilding index             (*2)
> >>>>       7. performing final cleanup     (*2)
> >>>>
> >>>> (*1): increasing the value in heap_tuples_scanned column
> >>>> (*2): only shows the phase in the phase column
> >>>
> >>> All of that sounds good.
> >>>
> >>>> The view provides the information of CLUSTER command progress details as follows
> >>>> # \d pg_stat_progress_cluster
> >>>>                 View "pg_catalog.pg_stat_progress_cluster"
> >>>>             Column           |  Type   | Collation | Nullable | Default
> >>>> ---------------------------+---------+-----------+----------+---------
> >>>>    pid                       | integer |           |          |
> >>>>    datid                     | oid     |           |          |
> >>>>    datname                   | name    |           |          |
> >>>>    relid                     | oid     |           |          |
> >>>>    command                   | text    |           |          |
> >>>>    phase                     | text    |           |          |
> >>>>    cluster_index_relid       | bigint  |           |          |
> >>>>    heap_tuples_scanned       | bigint  |           |          |
> >>>>    heap_tuples_vacuumed      | bigint  |           |          |
> >>>
> >>> Still not sure if we need heap_tuples_vacuumed.  We could try to
> >>> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
> >>> we're using a Seq Scan.
> >>
> >> I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in
> >> next patch.
> >>
> >> Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to
> >> get those from initscan(). I'll investigate it more.
> >>
> >> cluster.c
> >>    copy_heap_data()
> >>      heap_beginscan()
> >>        heap_beginscan_internal()
> >>          initscan()
> >>
> >>
> >>
> >>>> === Discussion points ===
> >>>>
> >>>>    - Progress counter for "3. sorting tuples" phase
> >>>>       - Should we add pgstat_progress_update_param() in tuplesort.c like a
> >>>>         "trace_sort"?
> >>>>         Thanks to Peter Geoghegan for the useful advice!
> >>>
> >>> How would we avoid an abstraction violation?
> >>
> >> Hmm... What do you mean an abstraction violation?
> >> If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples.
> >>
> >>
> >>>>    - Progress counter for "6. rebuilding index" phase
> >>>>       - Should we add "index_vacuum_count" in the view like a vacuum progress monitor?
> >>>>         If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c.
> >>>>         However, I'm not sure whether it is okay or not.
> >>>
> >>> Doesn't seem unreasonable to me.
> >>
> >> I see, I'll add it later.
> >
> >
> > Attached file is revised and WIP patch including:
> >
> >    - Remove heap_tuples_vacuumed
> >    - Add heap_blks_scanned and heap_blks_total
> >    - Add index_vacuum_count
> >
> > I tried to "add heap_blks_scanned and heap_blks_total" columns and I realized that
> > "heap_tuples_scanned" column is suitable as a counter when a scan method is
> > both index-scan and seq-scan because CLUSTER is on a tuple basis.
>
>
> Attached file is rebased patch on current HEAD.
> I changed a status. :)
>
>
Looks like the patch needs a rebase.
I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2

PFA reject file in case you want to have a look.
> Regards,
> Tatsuro Yamada
>
>
>


-- 
Regards,
Rafia Sabih

Attachment

cluster.c.rej

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

19 March 2019, 01:43:23

Hi Rafia!

On 2019/03/18 20:42, Rafia Sabih wrote:
> On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>> Attached file is rebased patch on current HEAD.
>> I changed a status. :)
>>
>>
> Looks like the patch needs a rebase.
> I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2
> 
> PFA reject file in case you want to have a look.

Thanks for testing it. :)
I rebased the patch on the current head: f2004f19ed9c9228d3ea2b12379ccb4b9212641f.

Please find attached file.

Also, I share my test case of progress monitor below.

=== My test case ===

[Terminal1]
Run this query on psql:

    \a \t
    select * from pg_stat_progress_cluster; \watch 0.05

[Terminal2]
Run these queries on psql:

drop table t1;

create table t1 as select a, random() * 1000 as b from generate_series(0, 999999) a;
create index idx_t1 on t1(a);
create index idx_t1_b on t1(b);
analyze t1;

-- index scan
set enable_seqscan to off;
cluster verbose t1 using idx_t1;

-- seq scan
set enable_seqscan to on;
set enable_indexscan to off;
cluster verbose t1 using idx_t1;

-- only given table name to cluster command
cluster verbose t1;

-- only cluster command
cluster verbose;

-- vacuum full
vacuum full t1;

-- vacuum full
vacuum full;

====================


Regards,
Tatsuro Yamada

Attachment

progress_monitor_for_cluster_command_v10_code.patch

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

19 March 2019, 02:02:57

On 2019/03/19 10:43, Tatsuro Yamada wrote:
> Hi Rafia!
> 
> On 2019/03/18 20:42, Rafia Sabih wrote:
>> On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada
>> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>>> Attached file is rebased patch on current HEAD.
>>> I changed a status. :)
>>>
>>>
>> Looks like the patch needs a rebase.
>> I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2
>>
>> PFA reject file in case you want to have a look.
> 
> Thanks for testing it. :)
> I rebased the patch on the current head: f2004f19ed9c9228d3ea2b12379ccb4b9212641f.
> 
> Please find attached file.
> 
> Also, I share my test case of progress monitor below.


Attached patch is a rebased document patch. :)


Thanks,
Tatsuro Yamada

Attachment

progress_monitor_for_cluster_command_v10_doc.patch

Re: [HACKERS] CLUSTER command progress monitor

From

Kyotaro HORIGUCHI

Date:

19 March 2019, 10:00:20

At Tue, 19 Mar 2019 11:02:57 +0900, Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote in
<dc0dd07c-f185-0cf9-ba54-c5c31f6514f2@lab.ntt.co.jp>
> On 2019/03/19 10:43, Tatsuro Yamada wrote:
> > Hi Rafia!
> > On 2019/03/18 20:42, Rafia Sabih wrote:
> >> On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada
> >> <yamada.tatsuro@lab.ntt.co.jp> wrote:
> >>> Attached file is rebased patch on current HEAD.
> >>> I changed a status. :)
> >>>
> >>>
> >> Looks like the patch needs a rebase.
> >> I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2
> >>
> >> PFA reject file in case you want to have a look.
> > Thanks for testing it. :)
> > I rebased the patch on the current head:
> > f2004f19ed9c9228d3ea2b12379ccb4b9212641f.
> > Please find attached file.
> > Also, I share my test case of progress monitor below.
> 
> 
> Attached patch is a rebased document patch. :)

The monitor view has four columns:

  heap_tuples_scanned
    : used while the "seq scan" and "index scan" phases.

  heap_blks_total, heap_blks_scanned
    : used only while the "seq scan" phase.

  index_rebuild_count:
    : used only while the "rebuilding index" phase.


Couldn't we change the view like the following?

.. |   phase    | heap tuples scanned | progress indicator  | value
..-+------------+---------------------+---------------------+--------
.. | seq scan ..|  2134214            | heap blocks pct     | 23.5%
.. | index sc ..|  5453395            | (null)              |  (null)
.. | sorting  ..|  <the last value>   | (null)              |  (null)
.. | swapping ..|  <the last value>   | (null)              |  (null)
.. | rebuildi ..|  <the last value>   | index count         |  2
.. | performi ..|  <the last value>   | (null)              |  (null)

Only seq scan phase has two kind of progress indicator so if we
choose "heap blks pct" as the only indicator for the phase, it
could be simplified as:

.. |   phase     | progress indicator  | value
..-+-------------+---------------------+--------
.. | seq scan .. | heap blocks pct     | 23.5%
.. | index sc .. | heap tuples scanned | 2398457
.. | sorting  .. | (null)              | (null)
.. | swapping .. | (null)              | (null)
.. | rebuildi .. | index count         | 2
.. | performi .. | (null)              | (null)

A downside of the view is that it looks quite differently from
pg_stat_progress_vacuum. Since I'm not sure it's a good design,
feel free to oppose/reject this.

finish_heap_swap is also called in matview code path but the
function doesn't seem to detect that situation. Is it right
behavior? (I didn't confirm what happens when it is called from
matview refresh path, though.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

19 March 2019, 18:47:00

On Mon, Mar 18, 2019 at 10:03 PM Tatsuro Yamada
<yamada.tatsuro@lab.ntt.co.jp> wrote:
> Attached patch is a rebased document patch. :)

Attached is an updated patch.  I went through this patch carefully
today, in the hopes of committing it, and I think the attached version
is pretty closet to being committable, but there's at least one open
issue remaining, as described below.

- The regression tests did not pass because expected/rules.out was not
properly updated.  I fixed that.

- The documentation did not build because some tags were not properly
terminated e.g. <xref linkend='...'> rather than <xref
linkend='...'/>.  I also fixed that.

- The documentation had two nearly-identical lists of phases.  I
merged them into one.  There might be room for some further
fine-tuning here.

- cluster_rel() had multiple places where it could return without
calling pgstat_progress_end_command().  I fixed that.

- cluster_rel() inexplicably delayed updating PROGRESS_CLUSTER_COMMAND
for longer than seems necessary.  I fixed that.

- copy_heap_data() zeroed out the heap-tuples-scanned,
heap-blocks-scanned, and total-heap-blocks counters when it began
PROGRESS_CLUSTER_PHASE_SORT_TUPLES and
PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES.  This seems like throwing away
useful information for no good reason.  I changed it not to do that in
all cases except the one mentioned in the next paragraph.

- It *is* currently to reset PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED
because that counter gets reused to indicate the number of heap tuples
*written back out*, but I think that is bad design for two reasons.
First, the documentation does not explain that sometimes the number of
heap tuples scanned is really reporting the number of heap tuples
written.  Second, it's bad for columns to have misleading names.
Third, it would actually be really useful to store these values in
separate columns, because then you could expect that the number tuples
written would eventually equal the number scanned, and you'd still
have the number that were scanned around so that you could clearly see
how close you were getting to rewriting the entire heap.  This is the
one thing I found but did not fix; any chance you could make this
change and update the documentation to match?

- The comment about reporting that we are now reindexing relations was
jammed in between an existing comment and the associated code.  I
moved it to a more logical place.

- The new if-statement in pg_stat_get_progress_info was missing a
space required by project style.  I added the space.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

progress_monitor_for_cluster_command_v11.patch

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

22 March 2019, 18:31:42

On Tue, Mar 19, 2019 at 2:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
> how close you were getting to rewriting the entire heap.  This is the
> one thing I found but did not fix; any chance you could make this
> change and update the documentation to match?

Hi, is anybody working on this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

From

Tattsu Yama

Date:

24 March 2019, 16:55:18

Hi Robert! >On Tue, Mar 19, 2019 at 2:47 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote: >> how close you were getting to rewriting the entire heap. This is the >> one thing I found but did not fix; any chance you could make this >> change and update the documentation to match? > > >Hi, is anybody working on this? Thank you so much for reviewing the patch and sorry for the late reply. Today, I realized that you sent the email for the patch because I took a sick leave from work for a while. So, I created new patch based on your comments asap. I hope it is acceptable to you. :) Please find attached file. Changes - Add new column *heap_tuples_written* in the view This column is updated when the phases are "seq scanning heap", "index scanning heap" or "writing new heap". - Fix document - Revised the patch on 280a408b48d5ee42969f981bceb9e9426c3a344c

Regards,

Tatsuro Yamada

Attachment

progress_monitor_for_cluster_command_v12.patch

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

25 March 2019, 01:01:46

Hi Robert!

On 2019/03/23 3:31, Robert Haas wrote:
> On Tue, Mar 19, 2019 at 2:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
>> how close you were getting to rewriting the entire heap.  This is the
>> one thing I found but did not fix; any chance you could make this
>> change and update the documentation to match?
> 
> Hi, is anybody working on this?

I sent this email using my personal email address: yamatattsu@gmail-.
I re-send it with the patch and my test result.

Thank you so much for reviewing the patch and sorry for the late reply.
Today, I realized that you sent the email for the patch because I took a
sick leave from work for a while. So, I created new patch based on your comments asap.
I hope it is acceptable to you. :)

Changes
   - Add new column *heap_tuples_written* in the view
       This column is updated when the phases are "seq scanning heap",
       "index scanning heap" or "writing new heap".
   - Fix document
   - Revised the patch on the current head: 940311e4bb32a5fe99155052e41179c88b5d48af.

Please find attached files. :)

Regards,
Tatsuro Yamada

Attachment

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

25 March 2019, 15:02:51

On Sun, Mar 24, 2019 at 9:02 PM Tatsuro Yamada
<yamada.tatsuro@lab.ntt.co.jp> wrote:
> Please find attached files. :)

Committed.  Thanks!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

26 March 2019, 01:04:48

Hi Robert and Reviewers!

On 2019/03/26 0:02, Robert Haas wrote:
> On Sun, Mar 24, 2019 at 9:02 PM Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>> Please find attached files. :)
> 
> Committed.  Thanks!

Thank you!
Hope this feature will help DBA and user. :)

Regards,
Tatsuro Yamada
NTT Open Source Software Center

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

26 March 2019, 02:15:37

On Tue, Mar 26, 2019 at 10:04:48AM +0900, Tatsuro Yamada wrote:
> Hope this feature will help DBA and user. :)

Congrats, Yamada-san.
--
Michael

Attachment

signature.asc

Re: [HACKERS] CLUSTER command progress monitor

From

Alvaro Herrera

Date:

01 August 2019, 18:43:33

Hmm, I'm trying this out now and I don't see the index_rebuild_count
ever go up.  I think it's because the indexes are built using parallel
index build ... or maybe it was the table AM changes that moved things
around, not sure.  There's a period at the end when the CLUSTER command
keeps working, but it's gone from pg_stat_progress_cluster.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

13 August 2019, 05:40:58

Hi Alvaro!

On 2019/08/02 3:43, Alvaro Herrera wrote:
> Hmm, I'm trying this out now and I don't see the index_rebuild_count
> ever go up.  I think it's because the indexes are built using parallel
> index build ... or maybe it was the table AM changes that moved things
> around, not sure.  There's a period at the end when the CLUSTER command
> keeps working, but it's gone from pg_stat_progress_cluster.


Thanks for your report.
I'll investigate it. :)


Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

14 August 2019, 02:38:01

Hi Alvaro and All,

On 2019/08/13 14:40, Tatsuro Yamada wrote:
> Hi Alvaro!
> 
> On 2019/08/02 3:43, Alvaro Herrera wrote:
>> Hmm, I'm trying this out now and I don't see the index_rebuild_count
>> ever go up.  I think it's because the indexes are built using parallel
>> index build ... or maybe it was the table AM changes that moved things
>> around, not sure.  There's a period at the end when the CLUSTER command
>> keeps working, but it's gone from pg_stat_progress_cluster.
> 
> 
> Thanks for your report.
> I'll investigate it. :)

I did "git bisect" and found the commit:

  03f9e5cba0ee1633af4abe734504df50af46fbd8
  Report progress of REINDEX operations

In src/backend/catalog/index.c,
CLUSTER progress reporting increases index_rebuild_count in reindex_relation()
by pgstat_progress_update_param().
However, reindex_relation() calls reindex_index(), and REINDEX progress reporting is existing on the latter function,
andit starts pgstat_progress_start_command() pgstat_progress_end_command() for REINDEX progress reporting.

Therefore, CLUSTER progress reporting failed to update index_rebuild_count because
it made a mistake to update the REINDEX's view, I think.

My Idea to fix that is following:

  - Add a target view name parameter to Progress monitor's API

   For example:
   <Before>
     pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT, i).

   <After>
     pgstat_progress_update_param(*PROGRESS_CLUSTER_VIEW*, PROGRESS_CLUSTER_INDEX_REBUILD_COUNT, i).

However, I'm not sure whether it is able or not because I haven't read
the code of the API yet.
What do you think about that? :)

Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

14 August 2019, 02:52:53

On Wed, Aug 14, 2019 at 11:38:01AM +0900, Tatsuro Yamada wrote:
> On 2019/08/13 14:40, Tatsuro Yamada wrote:
>> On 2019/08/02 3:43, Alvaro Herrera wrote:
>>> Hmm, I'm trying this out now and I don't see the index_rebuild_count
>>> ever go up.  I think it's because the indexes are built using parallel
>>> index build ... or maybe it was the table AM changes that moved things
>>> around, not sure.  There's a period at the end when the CLUSTER command
>>> keeps working, but it's gone from pg_stat_progress_cluster.
>>
>> Thanks for your report.
>> I'll investigate it. :)
>
> I did "git bisect" and found the commit:
>
>  03f9e5cba0ee1633af4abe734504df50af46fbd8
>  Report progress of REINDEX operations

I am adding an open item for this one.
--
Michael

Attachment

signature.asc

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

15 August 2019, 03:48:21

Hi Michael, Alvaro and Robert!

On 2019/08/14 11:52, Michael Paquier wrote:
> On Wed, Aug 14, 2019 at 11:38:01AM +0900, Tatsuro Yamada wrote:
>> On 2019/08/13 14:40, Tatsuro Yamada wrote:
>>> On 2019/08/02 3:43, Alvaro Herrera wrote:
>>>> Hmm, I'm trying this out now and I don't see the index_rebuild_count
>>>> ever go up.  I think it's because the indexes are built using parallel
>>>> index build ... or maybe it was the table AM changes that moved things
>>>> around, not sure.  There's a period at the end when the CLUSTER command
>>>> keeps working, but it's gone from pg_stat_progress_cluster.
>>>
>>> Thanks for your report.
>>> I'll investigate it. :)
>>
>> I did "git bisect" and found the commit:
>>
>>   03f9e5cba0ee1633af4abe734504df50af46fbd8
>>   Report progress of REINDEX operations
> 
> I am adding an open item for this one.
> --
> Michael


Okay, I checked it on the wiki.

   https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items
   - index_rebuild_count in CLUSTER reporting never increments

To be clear, 03f9e5cb broke CLUSTER progress reporting, but
I investigated little more and share my ideas to fix the problem.

* Call stack
========================================
cluster_rel
   pgstat_progress_start_command(CLUSTER) *A1
   rebuild_relation
     finish_heap_swap
       reindex_relation
         reindex_index
           pgstat_progress_start_command(CREATE_INDEX) *B1
           pgstat_progress_end_command() *B2
         pgstat_progress_update_param(INDEX_REBUILD_COUNT, i) <- failed :(
   pgstat_progress_end_command() *A2

   Note
     These are sets:
       A1 and A2,
       B1 and B2
========================================


* Ideas to fix
   There are Three options, I guess.
========================================
   1. Call pgstat_progress_start_command(CLUSTER) again
      before pgstat_progress_update_param(INDEX_REBUILD_COUNT, i).

   2. Add "save and restore" functions for the following two
      variables of MyBeentry in pgstat.c.
         - st_progress_command
         - st_progress_command_target

   3. Use Hash or List to store multiple values for the two
      variables in pgstat.c.
========================================


I tried 1. and it shown index_rebuild_count, but it also shown
"initializing" phase again and other columns were empty. So, it is
not suitable to fix the problem. :(
I'm going to try 2. and 3., but, unfortunately, I can't get enough
time to do that after PGConf.Asia 2019.
If we selected 3., it affects following these progress reporting
features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay,
I suppose. Any comments welcome! :)


Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

From

Masahiko Sawada

Date:

30 August 2019, 10:45:57

On Thu, Aug 15, 2019 at 12:48 PM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:
>
> Hi Michael, Alvaro and Robert!
>
> On 2019/08/14 11:52, Michael Paquier wrote:
> > On Wed, Aug 14, 2019 at 11:38:01AM +0900, Tatsuro Yamada wrote:
> >> On 2019/08/13 14:40, Tatsuro Yamada wrote:
> >>> On 2019/08/02 3:43, Alvaro Herrera wrote:
> >>>> Hmm, I'm trying this out now and I don't see the index_rebuild_count
> >>>> ever go up.  I think it's because the indexes are built using parallel
> >>>> index build ... or maybe it was the table AM changes that moved things
> >>>> around, not sure.  There's a period at the end when the CLUSTER command
> >>>> keeps working, but it's gone from pg_stat_progress_cluster.
> >>>
> >>> Thanks for your report.
> >>> I'll investigate it. :)
> >>
> >> I did "git bisect" and found the commit:
> >>
> >>   03f9e5cba0ee1633af4abe734504df50af46fbd8
> >>   Report progress of REINDEX operations
> >
> > I am adding an open item for this one.
> > --
> > Michael
>
>
> Okay, I checked it on the wiki.
>
>    https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items
>    - index_rebuild_count in CLUSTER reporting never increments
>
> To be clear, 03f9e5cb broke CLUSTER progress reporting, but
> I investigated little more and share my ideas to fix the problem.
>
> * Call stack
> ========================================
> cluster_rel
>    pgstat_progress_start_command(CLUSTER) *A1
>    rebuild_relation
>      finish_heap_swap
>        reindex_relation
>          reindex_index
>            pgstat_progress_start_command(CREATE_INDEX) *B1
>            pgstat_progress_end_command() *B2
>          pgstat_progress_update_param(INDEX_REBUILD_COUNT, i) <- failed :(
>    pgstat_progress_end_command() *A2
>
>    Note
>      These are sets:
>        A1 and A2,
>        B1 and B2
> ========================================
>
>
> * Ideas to fix
>    There are Three options, I guess.
> ========================================
>    1. Call pgstat_progress_start_command(CLUSTER) again
>       before pgstat_progress_update_param(INDEX_REBUILD_COUNT, i).
>
>    2. Add "save and restore" functions for the following two
>       variables of MyBeentry in pgstat.c.
>          - st_progress_command
>          - st_progress_command_target
>
>    3. Use Hash or List to store multiple values for the two
>       variables in pgstat.c.
> ========================================
>
>
> I tried 1. and it shown index_rebuild_count, but it also shown
> "initializing" phase again and other columns were empty. So, it is
> not suitable to fix the problem. :(
> I'm going to try 2. and 3., but, unfortunately, I can't get enough
> time to do that after PGConf.Asia 2019.
> If we selected 3., it affects following these progress reporting
> features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay,
> I suppose. Any comments welcome! :)

I looked at this open item. I prefer #3 but I think it would be enough
to have a small stack using a fixed length array to track nested
progress information and the current executing command (maybe 4 or 8
would be enough for maximum nested level for now?). That way, we don't
need any change the interfaces. AFAICS there is no case where we call
only either pgstat_progress_start_command or
pgstat_progress_end_command without each other (although
pgstat_progress_update_param is called without start). So I think that
having a small stack for tracking multiple reports would work.

Attached the draft patch that fixes this issue. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

track_nested_command_progress.patch

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

02 September 2019, 07:57:10

On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote:
> > I tried 1. and it shown index_rebuild_count, but it also shown
> > "initializing" phase again and other columns were empty. So, it is
> > not suitable to fix the problem. :(
> > I'm going to try 2. and 3., but, unfortunately, I can't get enough
> > time to do that after PGConf.Asia 2019.
> > If we selected 3., it affects following these progress reporting
> > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay,
> > I suppose. Any comments welcome! :)
>
> I looked at this open item. I prefer #3 but I think it would be enough
> to have a small stack using a fixed length array to track nested
> progress information and the current executing command (maybe 4 or 8
> would be enough for maximum nested level for now?). That way, we don't
> need any change the interfaces. AFAICS there is no case where we call
> only either pgstat_progress_start_command or
> pgstat_progress_end_command without each other (although
> pgstat_progress_update_param is called without start). So I think that
> having a small stack for tracking multiple reports would work.
>
> Attached the draft patch that fixes this issue. Please review it.

Do we actually want to show to the user information about CREATE INDEX
which is different than CLUSTER?  It could be confusing for the user
to see a progress report from a command different than the one
actually launched.  There could be a method 4 here: do not start a new
command progress when there is another one already started, and do not
try to end it in the code path where it could not be started as it did
not stack.  So while I see the advantages of stacking the progress
records as you do when doing cascading calls of the progress
reporting, I am not sure that:
1) We should bloat more PgBackendStatus for that.
2) We want to add more complication in this area close to the
release of 12.

Another solution as mentioned by Yamada-san is just to start again the
progress for the cluster command in reindex_relation() however you got
an issue here as reindex_relation() is also called by REINDEX TABLE. I
find actually very weird that we have a cluster-related field added
for REINDEX, so it seems to me that all the interactions between the
code paths of CLUSTER and REINDEX have not been completely thought
through.  This part has been added by 6f97457 :(
--
Michael

Attachment

signature.asc

Re: [HACKERS] CLUSTER command progress monitor

From

Masahiko Sawada

Date:

02 September 2019, 09:32:59

On Mon, Sep 2, 2019 at 4:59 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote:
> > > I tried 1. and it shown index_rebuild_count, but it also shown
> > > "initializing" phase again and other columns were empty. So, it is
> > > not suitable to fix the problem. :(
> > > I'm going to try 2. and 3., but, unfortunately, I can't get enough
> > > time to do that after PGConf.Asia 2019.
> > > If we selected 3., it affects following these progress reporting
> > > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay,
> > > I suppose. Any comments welcome! :)
> >
> > I looked at this open item. I prefer #3 but I think it would be enough
> > to have a small stack using a fixed length array to track nested
> > progress information and the current executing command (maybe 4 or 8
> > would be enough for maximum nested level for now?). That way, we don't
> > need any change the interfaces. AFAICS there is no case where we call
> > only either pgstat_progress_start_command or
> > pgstat_progress_end_command without each other (although
> > pgstat_progress_update_param is called without start). So I think that
> > having a small stack for tracking multiple reports would work.
> >
> > Attached the draft patch that fixes this issue. Please review it.
>
> Do we actually want to show to the user information about CREATE INDEX
> which is different than CLUSTER?  It could be confusing for the user
> to see a progress report from a command different than the one
> actually launched.

I personally think it would be helpful for users. We provide the
progress reporting for each commands but it might not be detailed
enough. But we can provide more details of progress information of
each commands by combining them. Only users who want to confirm the
details need to see different progress reports.

> There could be a method 4 here: do not start a new
> command progress when there is another one already started, and do not
> try to end it in the code path where it could not be started as it did
> not stack.  So while I see the advantages of stacking the progress
> records as you do when doing cascading calls of the progress
> reporting, I am not sure that:
> 1) We should bloat more PgBackendStatus for that.
> 2) We want to add more complication in this area close to the
> release of 12.

I agreed, especially to 2. We can live with #4 method in PG12 and the
patch I proposed could be discussed as a new feature.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] CLUSTER command progress monitor

From

Masahiko Sawada

Date:

03 September 2019, 04:59:00

On Mon, Sep 2, 2019 at 6:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Sep 2, 2019 at 4:59 PM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote:
> > > > I tried 1. and it shown index_rebuild_count, but it also shown
> > > > "initializing" phase again and other columns were empty. So, it is
> > > > not suitable to fix the problem. :(
> > > > I'm going to try 2. and 3., but, unfortunately, I can't get enough
> > > > time to do that after PGConf.Asia 2019.
> > > > If we selected 3., it affects following these progress reporting
> > > > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay,
> > > > I suppose. Any comments welcome! :)
> > >
> > > I looked at this open item. I prefer #3 but I think it would be enough
> > > to have a small stack using a fixed length array to track nested
> > > progress information and the current executing command (maybe 4 or 8
> > > would be enough for maximum nested level for now?). That way, we don't
> > > need any change the interfaces. AFAICS there is no case where we call
> > > only either pgstat_progress_start_command or
> > > pgstat_progress_end_command without each other (although
> > > pgstat_progress_update_param is called without start). So I think that
> > > having a small stack for tracking multiple reports would work.
> > >
> > > Attached the draft patch that fixes this issue. Please review it.
> >
> > Do we actually want to show to the user information about CREATE INDEX
> > which is different than CLUSTER?  It could be confusing for the user
> > to see a progress report from a command different than the one
> > actually launched.
>
> I personally think it would be helpful for users. We provide the
> progress reporting for each commands but it might not be detailed
> enough. But we can provide more details of progress information of
> each commands by combining them. Only users who want to confirm the
> details need to see different progress reports.
>
> > There could be a method 4 here: do not start a new
> > command progress when there is another one already started, and do not
> > try to end it in the code path where it could not be started as it did
> > not stack.  So while I see the advantages of stacking the progress
> > records as you do when doing cascading calls of the progress
> > reporting, I am not sure that:
> > 1) We should bloat more PgBackendStatus for that.
> > 2) We want to add more complication in this area close to the
> > release of 12.
>
> I agreed, especially to 2. We can live with #4 method in PG12 and the
> patch I proposed could be discussed as a new feature.

After more thought, even if we don't start a new command progress when
there is another one already started the progress update functions
could be called and these functions don't specify the command type.
Therefore, the progress information might be changed with wrong value
by different command. Probably we can change the caller of progress
updating function so that it doesn't call all of them if the command
could not start a new progress report, but it might be a big change.

As an alternative idea, we can make pgstat_progress_end_command() have
one argument that is command the caller wants to end. That is, we
don't end the command progress when the specified command type doesn't
match to the command type of current running command progress. We
unconditionally clear the progress information at CommitTransaction()
and AbortTransaction() but we can do that by passing
PROGRESS_COMMAND_INVALID to pgstat_progress_end_command().

BTW the following condition in pgstat_progress_end_command() seems to
be wrong. We should return from the function when either
pgstat_track_activities is disabled or the current progress command is
invalid.

       if (!pgstat_track_activities
              && beentry->st_progress_command == PROGRESS_COMMAND_INVALID)
              return;

I've attached the patch fixes the issue I newly found.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

fix_progress_end_command.patch

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

03 September 2019, 05:52:28

On Tue, Sep 03, 2019 at 01:59:00PM +0900, Masahiko Sawada wrote:
> After more thought, even if we don't start a new command progress when
> there is another one already started the progress update functions
> could be called and these functions don't specify the command type.
> Therefore, the progress information might be changed with wrong value
> by different command. Probably we can change the caller of progress
> updating function so that it doesn't call all of them if the command
> could not start a new progress report, but it might be a big change.

That's one issue.

> As an alternative idea, we can make pgstat_progress_end_command() have
> one argument that is command the caller wants to end. That is, we
> don't end the command progress when the specified command type doesn't
> match to the command type of current running command progress. We
> unconditionally clear the progress information at CommitTransaction()
> and AbortTransaction() but we can do that by passing
> PROGRESS_COMMAND_INVALID to pgstat_progress_end_command().

Possibly.  I don't dislike the idea of piling up the progress
information for cascading calls and I would use that with a depth
counter and a fixed-size array.

> BTW the following condition in pgstat_progress_end_command() seems to
> be wrong. We should return from the function when either
> pgstat_track_activities is disabled or the current progress command is
> invalid.
>
>        if (!pgstat_track_activities
>               && beentry->st_progress_command == PROGRESS_COMMAND_INVALID)
>               return;
>
> I've attached the patch fixes the issue I newly found.

Indeed, good catch.  This is wrong since b6fb647 which has introduced
the progress reports.  I'll fix that one and back-patch if there are
no objections.

With my RMT hat on for v12, I don't think that it is really the moment
to discuss how we want to change this API post beta3, and we have room
for that in future development cycles.  There are quite some questions
which need answers I am unsure of:
- Do we really want to support nested calls of progress reports for
multiple command?
- Perhaps for some commands it makes sense to have an overlap of the
fields used, but we need a clear definition of what can be done or
not.  I am not really comfortable with the idea of having in
reindex_relation() a progress report related only to CLUSTER, which is
also a REINDEX code path.  The semantics shared between both commands
need to be thought a bit more.  For example
PROGRESS_CLUSTER_INDEX_REBUILD_COUNT could cause the system catalog to
report PROGRESS_CREATEIDX_PHASE_WAIT_3 because of an incorrect command
type, which would be just wrong for a CLUSTER command.
- Which command should be reported to the user, only the upper-level
one?
- Perhaps we can live only with the approach of not registering a new
command if one already exists, and actually be more flexible with the
phase fields used, in short we use unique numbers for each phase?

The most conservative bet from a release point of view, and actually
my bet because that's safe, would be to basically revert 6f97457
(CLUSTER), 03f9e5c (REINDEX) and ab0dfc96 (CREATE INDEX which has
overlaps with REINDEX in the build and validation paths).  What I am
really scared of is that we have just barely scratched the surface of
the issues caused by the inter-dependencies between all the code
paths of those commands, and that we have much more waiting behind
this open item.
--
Michael

Attachment

signature.asc

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

04 September 2019, 06:48:19

On Tue, Sep 03, 2019 at 02:52:28PM +0900, Michael Paquier wrote:
> Indeed, good catch.  This is wrong since b6fb647 which has introduced
> the progress reports.  I'll fix that one and back-patch if there are
> no objections.

OK, applied this part down to 9.6.
--
Michael

Attachment

signature.asc

Re: [HACKERS] CLUSTER command progress monitor

From

Masahiko Sawada

Date:

04 September 2019, 12:01:42

On Wed, Sep 4, 2019 at 3:48 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Sep 03, 2019 at 02:52:28PM +0900, Michael Paquier wrote:
> > Indeed, good catch.  This is wrong since b6fb647 which has introduced
> > the progress reports.  I'll fix that one and back-patch if there are
> > no objections.
>
> OK, applied this part down to 9.6.

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

04 September 2019, 13:18:39

On Tue, Sep 3, 2019 at 1:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> After more thought, even if we don't start a new command progress when
> there is another one already started the progress update functions
> could be called and these functions don't specify the command type.
> Therefore, the progress information might be changed with wrong value
> by different command. Probably we can change the caller of progress
> updating function so that it doesn't call all of them if the command
> could not start a new progress report, but it might be a big change.
>
> As an alternative idea, we can make pgstat_progress_end_command() have
> one argument that is command the caller wants to end. That is, we
> don't end the command progress when the specified command type doesn't
> match to the command type of current running command progress. We
> unconditionally clear the progress information at CommitTransaction()
> and AbortTransaction() but we can do that by passing
> PROGRESS_COMMAND_INVALID to pgstat_progress_end_command().

I think this is all going in the wrong direction.  It's the
responsibility of the people who are calling the pgstat_progress_*
functions to only do so when it's appropriate.  Having the
pgstat_progress_* functions try to untangle whether or not they ought
to ignore calls made to them is backwards.

It seems to me that the problem here can be summarized in this way:
there's a bunch of code reuse in the relevant paths here, and somebody
wasn't careful enough when adding progress reporting for one of the
new commands, and so now things are broken, because e.g. CLUSTER
progress reporting gets ended by a pgstat_progress_end_command() call
that was intended for some other utility command but is reached in the
CLUSTER case anyway.  The solution to that problem in my book is to
figure out which commit broke it, and then the person who made that
commit either needs to fix it or revert it.

It's quite possible here that we need a bigger redesign to make adding
progress reporting for new command easier and less prone to bugs, but
I don't think it's at all clear what such a redesign should look like
or even that we definitely need one, and September is not the right
time to be redesigning features for the pending release.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

05 September 2019, 01:03:16

On Wed, Sep 04, 2019 at 09:18:39AM -0400, Robert Haas wrote:
> I think this is all going in the wrong direction.  It's the
> responsibility of the people who are calling the pgstat_progress_*
> functions to only do so when it's appropriate.  Having the
> pgstat_progress_* functions try to untangle whether or not they ought
> to ignore calls made to them is backwards.

Check.

> It seems to me that the problem here can be summarized in this way:
> there's a bunch of code reuse in the relevant paths here, and somebody
> wasn't careful enough when adding progress reporting for one of the
> new commands, and so now things are broken, because e.g. CLUSTER
> progress reporting gets ended by a pgstat_progress_end_command() call
> that was intended for some other utility command but is reached in the
> CLUSTER case anyway.  The solution to that problem in my book is to
> figure out which commit broke it, and then the person who made that
> commit either needs to fix it or revert it.

I am not sure that it is right as well to say that the first committed
patch is right and that the follow-up ones are wrong.  CLUSTER
progress was committed first (6f97457), followed a couple of days
after by CREATE INDEX (ab0dfc9) and then REINDEX (03f9e5c).  So let's
have a look at them..

For CLUSTER, the progress starts and ends in cluster_rel().  CLUSTER
uses its code paths at the beginning, but then things get more
complicated, particularly with finish_heap_swap() which calls directly
reindex_table().  6f97457 includes one progress update at point which
can be a problem per its shared nature in reindex_relation() with
PROGRESS_CLUSTER_INDEX_REBUILD_COUNT.  This last part is wrong IMO,
why should cluster reporting take priority in this code path,
enforcing anything else?

For CREATE INDEX, the progress reporting starts and ends once in
DefineIndex().  However, we have updates of progress within each index
AM build routine, which could be taken by many code paths.  Is it
actually fine to give priority to CREATE INDEX in those cases?  Those
paths can as well be taken by REINDEX or CLUSTER (right?), so having a
counter for CREATE INDEX looks logically wrong to me.  The part where
we wait for snapshots looks actually good from the perspective of
REINDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY.

For REINDEX, we have a problematic start progress call in
reindex_index() which is for example called by reindex_relation for
each relation's index for a non-concurrent case (also in
ReindexRelationConcurrently()).  I think that these are incorrect
locations, and I would have placed them in ReindexIndex(),
ReindexTable() and ReindexMultipleTables() so as we avoid anything
low-level.  This has added two calls to pgstat_progress_update_param()
in reindex_index(), which is shared between all.  Why would it be fine
to give the priority to a CREATE INDEX marker here if CLUSTER can also
cross this way?

On top of those issues, I see some problems with the current state of
affairs, and I am repeating myself:
- It is possible that pgstat_progress_update_param() is called for a
given command for a code path taken by a completely different
command, and that we have a real risk of showing a status completely
buggy as the progress phases share the same numbers.
- We don't consider wisely end and start progress handling for
cascading calls, leading to a risk where we start command A, but
for shared code paths where we assume that only command B can run then
the processing abruptly ends for command A.
- Is it actually fine to report information about a command completely
different than the one provided by the client?  It is for example
possible to call CLUSTER, but show up to the user progress report
about PROGRESS_COMMAND_CREATE_INDEX (see reindex_index).  This could
actually make sense if we are able to handle cascading progress
reports.

These are, at least it seems to me, fundamental problems we need to
ponder more about if we begin to include more progress reporting, and
we don't have that now, and that worries me.

> It's quite possible here that we need a bigger redesign to make adding
> progress reporting for new command easier and less prone to bugs, but
> I don't think it's at all clear what such a redesign should look like
> or even that we definitely need one, and September is not the right
> time to be redesigning features for the pending release.

Definitely.
--
Michael

Attachment

signature.asc

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

05 September 2019, 19:17:51

On Wed, Sep 4, 2019 at 9:03 PM Michael Paquier <michael@paquier.xyz> wrote:
> For CLUSTER, the progress starts and ends in cluster_rel().  CLUSTER
> uses its code paths at the beginning, but then things get more
> complicated, particularly with finish_heap_swap() which calls directly
> reindex_table().  6f97457 includes one progress update at point which
> can be a problem per its shared nature in reindex_relation() with
> PROGRESS_CLUSTER_INDEX_REBUILD_COUNT.  This last part is wrong IMO,
> why should cluster reporting take priority in this code path,
> enforcing anything else?

Oops.  Yeah, that's bogus (as are some of the other things you
mention).  I think we're going to have to fix this by passing down
some flags to these functions to tell them what kind of progress
updates to do (or to do none).  Or else pass down a callback function
and a context object, but that seems like it might be overkill.

> On top of those issues, I see some problems with the current state of
> affairs, and I am repeating myself:
> - It is possible that pgstat_progress_update_param() is called for a
> given command for a code path taken by a completely different
> command, and that we have a real risk of showing a status completely
> buggy as the progress phases share the same numbers.
> - We don't consider wisely end and start progress handling for
> cascading calls, leading to a risk where we start command A, but
> for shared code paths where we assume that only command B can run then
> the processing abruptly ends for command A.

Those are just weaknesses of the infrastructure.  Perhaps there is a
better solution, but there's no intrinsic reason that we can't avoid
them by careful coding.

> - Is it actually fine to report information about a command completely
> different than the one provided by the client?  It is for example
> possible to call CLUSTER, but show up to the user progress report
> about PROGRESS_COMMAND_CREATE_INDEX (see reindex_index).  This could
> actually make sense if we are able to handle cascading progress
> reports.

Well, it might be OK to do that if we're clear that this is the index
progress-reporting view and the command is CLUSTER but it happens to
be building an index now so we're showing it here.  But I don't see
how it would work anyway: you can't reported cascading progress
reports in shared memory because you've only got a fixed amount of
space.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

06 September 2019, 05:44:18

On Thu, Sep 05, 2019 at 03:17:51PM -0400, Robert Haas wrote:
> Oops.  Yeah, that's bogus (as are some of the other things you
> mention).  I think we're going to have to fix this by passing down
> some flags to these functions to tell them what kind of progress
> updates to do (or to do none).  Or else pass down a callback function
> and a context object, but that seems like it might be overkill.

One idea I got was to pass the command ID as an extra argument of the
update routine.  I am not completely sure either if we need this level
of complication.

> Those are just weaknesses of the infrastructure.  Perhaps there is a
> better solution, but there's no intrinsic reason that we can't avoid
> them by careful coding.

Perhaps.  The current infra allows the addition of a progress report
in code paths which are isolated from other things.  For CLUSTER, most
things are fine as long as the progress is updated in cluster_rel(),
the rest is too internal.

> Well, it might be OK to do that if we're clear that this is the index
> progress-reporting view and the command is CLUSTER but it happens to
> be building an index now so we're showing it here.  But I don't see
> how it would work anyway: you can't reported cascading progress
> reports in shared memory because you've only got a fixed amount of
> space.

I don't see exactly why we could not switch to a fixed number of
slots, say 8, with one code path to start a progress which adds an
extra report on the stack, one to remove one entry from the stack, and
a new one to reset the whole thing for a backend.  This would not need
much restructuration of course.

Finally comes the question of what do we do for v12?  I am adding in
CC Peter, Alvaro being already present, who have been involved in the
commits with CREATE INDEX and REINDEX.  It would be sad to revert a
this feature, but well I'd rather do that now than regret later
releasing the feature as it is currently shaped..  Let's see what the
others think.
--
Michael

Attachment

signature.asc

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

06 September 2019, 05:52:39

On Fri, Sep 06, 2019 at 02:44:18PM +0900, Michael Paquier wrote:
> I don't see exactly why we could not switch to a fixed number of
> slots, say 8, with one code path to start a progress which adds an
> extra report on the stack, one to remove one entry from the stack, and
> a new one to reset the whole thing for a backend.  This would not need
> much restructuration of course.

Wake up, Neo.  Your last sentence is confusing.  I meant that this
would need more design efforts, so that's not in scope for v12.
--
Michael

Attachment

signature.asc

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

06 September 2019, 12:10:58

On Fri, Sep 6, 2019 at 1:44 AM Michael Paquier <michael@paquier.xyz> wrote:
> One idea I got was to pass the command ID as an extra argument of the
> update routine.  I am not completely sure either if we need this level
> of complication.

I still don't think that's the right approach.

> > Those are just weaknesses of the infrastructure.  Perhaps there is a
> > better solution, but there's no intrinsic reason that we can't avoid
> > them by careful coding.
>
> Perhaps.  The current infra allows the addition of a progress report
> in code paths which are isolated from other things.  For CLUSTER, most
> things are fine as long as the progress is updated in cluster_rel(),
> the rest is too internal.

It's fine if things are updated as well -- it's just you need to make
sure that those places know whether or not they are supposed to be
doing those updates. Again, why can't we just pass down a value
telling them "do reindex-style progress updates" or "do cluster-style
progress updates" or "do no progress updates"?

> > Well, it might be OK to do that if we're clear that this is the index
> > progress-reporting view and the command is CLUSTER but it happens to
> > be building an index now so we're showing it here.  But I don't see
> > how it would work anyway: you can't reported cascading progress
> > reports in shared memory because you've only got a fixed amount of
> > space.
>
> I don't see exactly why we could not switch to a fixed number of
> slots, say 8, with one code path to start a progress which adds an
> extra report on the stack, one to remove one entry from the stack, and
> a new one to reset the whole thing for a backend.  This would not need
> much restructuration of course.

You could do that, but I don't think it's probably that great of an
idea.  Now you've built something which is significantly more complex
than the original design of this feature, but still not good enough to
report on the progress of a query tree.  I tend to think we should
confine ourselves to the progress reporting that can reasonably be
done within the current infrastructure until somebody invents a really
general mechanism that can handle, essentially, an EXPLAIN-on-the-fly
of a current query tree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

From

Alvaro Herrera from 2ndQuadrant

Date:

06 September 2019, 14:27:02

On 2019-Sep-06, Michael Paquier wrote:

> Finally comes the question of what do we do for v12?  I am adding in
> CC Peter, Alvaro being already present, who have been involved in the
> commits with CREATE INDEX and REINDEX.  It would be sad to revert a
> this feature, but well I'd rather do that now than regret later
> releasing the feature as it is currently shaped..  Let's see what the
> others think.

As far as I understand, CREATE INDEX is not affected -- only REINDEX is.
Of course, it would be sad to revert even the latter, but it's not as
bleak as reverting the whole thing.

That said, I did spend some time on this type of issue when doing CREATE
INDEX support; you can tell because I defined the columns for block
numbers in a scan separately from CREATE INDEX specific fields,
precisely to avoid multiple commands running concurrently from
clobbering unrelated columns:

/* Block numbers in a generic relation scan */
#define PROGRESS_SCAN_BLOCKS_TOTAL                15
#define PROGRESS_SCAN_BLOCKS_DONE                16

I would say that it's fairly useful to have CLUSTER report progress on
indexes being created underneath, but I understand that it might be too
late to be designing the CLUSTER report to take advantage of the CREATE
INDEX metrics.

I think a workable, not terribly invasive approach is to have REINDEX
process its commands conditionally: have the caller indicate whether
progress is to be reported, and skip the calls if not.  That would
(should) prevent it from clobbering the state set up by CLUSTER.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

09 September 2019, 07:31:59

On Fri, Sep 06, 2019 at 10:27:02AM -0400, Alvaro Herrera from 2ndQuadrant wrote:
> That said, I did spend some time on this type of issue when doing CREATE
> INDEX support; you can tell because I defined the columns for block
> numbers in a scan separately from CREATE INDEX specific fields,
> precisely to avoid multiple commands running concurrently from
> clobbering unrelated columns:
>
> /* Block numbers in a generic relation scan */
> #define PROGRESS_SCAN_BLOCKS_TOTAL                15
> #define PROGRESS_SCAN_BLOCKS_DONE                16

Hm.  It is not really clear what is the intention by looking at the
contents progress.h.

> I would say that it's fairly useful to have CLUSTER report progress on
> indexes being created underneath, but I understand that it might be too
> late to be designing the CLUSTER report to take advantage of the CREATE
> INDEX metrics.

The same can be said about the reporting done in reindex_relation for
PROGRESS_CLUSTER_INDEX_REBUILD_COUNT.  I think that it should be
removed for now.

> I think a workable, not terribly invasive approach is to have REINDEX
> process its commands conditionally: have the caller indicate whether
> progress is to be reported, and skip the calls if not.  That would
> (should) prevent it from clobbering the state set up by CLUSTER.

So, you would basically add an extra flag in the options of
reindex_index() to decide if a progress report should be started or
not?  I am not a fan of that because it does not take care of the root
issue which is that the start of the progress reports is too much
internal.  I think that it would be actually less error prone to move
the start of the progress reporting for REINDEX out of reindex_index()
and start it at a higher level.  Looking again at the code, I would
recommend that we should start the progress in ReindexIndex() before
calling reindex_index(), ReindexMultipleTables() before calling
reindex_relation() and ReindexRelationConcurrently(), and
ReindexTable() before calling reindex_relation().  That will avoid
each command to clobber each other's in-progress reports.

It would be also very good to document clearly how the overlaps for
the progress parameter values are not happening.  For example for
CREATE INDEX, we don't know why 1, 2 and 7 are not used.

Note that there is an ID collision with PROGRESS_CREATEIDX_INDEX_OID
updated in reindex_index() and the CLUSTER part
PROGRESS_CLUSTER_HEAP_BLKS_SCANNED..  There could be an argument to
use a completely different range of IDs for each command phase to
avoid any overlap, but it is scary to consider that we may not have
found all the issues with one command cloberring another one's
state..
--
Michael

Attachment

signature.asc

Re: [HACKERS] CLUSTER command progress monitor

From

Peter Geoghegan

Date:

09 September 2019, 22:13:10

On Fri, Sep 6, 2019 at 5:11 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Sep 6, 2019 at 1:44 AM Michael Paquier <michael@paquier.xyz> wrote:
> > I don't see exactly why we could not switch to a fixed number of
> > slots, say 8, with one code path to start a progress which adds an
> > extra report on the stack, one to remove one entry from the stack, and
> > a new one to reset the whole thing for a backend.  This would not need
> > much restructuration of course.
>
> You could do that, but I don't think it's probably that great of an
> idea.  Now you've built something which is significantly more complex
> than the original design of this feature, but still not good enough to
> report on the progress of a query tree.  I tend to think we should
> confine ourselves to the progress reporting that can reasonably be
> done within the current infrastructure until somebody invents a really
> general mechanism that can handle, essentially, an EXPLAIN-on-the-fly
> of a current query tree.

+1. Let's not complicate the progress reporting infrastructure for an
uncertain benefit.

CLUSTER/VACUUM FULL is fundamentally an awkward utility command to
target with progress reporting infrastructure. I think that it's okay
to redefine how progress reporting works with CLUSTER now, in order to
fix the REINDEX/CLUSTER state clobbering bug.

-- 
Peter Geoghegan

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

13 September 2019, 06:48:51

On Fri, Sep 06, 2019 at 08:10:58AM -0400, Robert Haas wrote:
> It's fine if things are updated as well -- it's just you need to make
> sure that those places know whether or not they are supposed to be
> doing those updates. Again, why can't we just pass down a value
> telling them "do reindex-style progress updates" or "do cluster-style
> progress updates" or "do no progress updates"?

That's invasive.  CREATE INDEX reporting goes pretty deep into the
tree, with steps dedicated to the builds and scans of btree (nbtsort.c
for example) and hash index AMs.  In this case we have something that
does somewhat what you are looking for with report_progress which gets
set to true only for VACUUM.  If we were to do something like that, we
would also need to keep some sort of mapping regarding which command
ID (as defined by ProgressCommandType) is able to use which set of
parameter flags (as defined by the arguments of
pgstat_progress_update_param() and its multi_* cousin).  Then comes
the issue that some parameters may be used by multiple command types,
while other don't (REINDEX and CREATE INDEX have some shared
mapping).
--
Michael

Attachment

signature.asc

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

13 September 2019, 12:12:23

On Fri, Sep 13, 2019 at 2:49 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Fri, Sep 06, 2019 at 08:10:58AM -0400, Robert Haas wrote:
> > It's fine if things are updated as well -- it's just you need to make
> > sure that those places know whether or not they are supposed to be
> > doing those updates. Again, why can't we just pass down a value
> > telling them "do reindex-style progress updates" or "do cluster-style
> > progress updates" or "do no progress updates"?
>
> That's invasive.  CREATE INDEX reporting goes pretty deep into the
> tree, with steps dedicated to the builds and scans of btree (nbtsort.c
> for example) and hash index AMs.  In this case we have something that
> does somewhat what you are looking for with report_progress which gets
> set to true only for VACUUM.  If we were to do something like that, we
> would also need to keep some sort of mapping regarding which command
> ID (as defined by ProgressCommandType) is able to use which set of
> parameter flags (as defined by the arguments of
> pgstat_progress_update_param() and its multi_* cousin).  Then comes
> the issue that some parameters may be used by multiple command types,
> while other don't (REINDEX and CREATE INDEX have some shared
> mapping).

Well, if CREATE INDEX progress reporting can't be reasonably done
within the current infrastructure, then it should be reverted for v12
and not committed again until somebody upgrades the infrastructure.

I admit that I was a bit suspicious about that commit, but I figured
Alvaro knew what he was doing and didn't study it in any depth.  And
perhaps he did know what he was doing and will disagree with your
assessment. But so far I haven't heard an idea for evolving the
infrastructure that sounds more than half-baked.

It's generally clear, though, that the existing infrastructure is not
well-suited to progress reporting for code that bounces all over the
tree.  It's not clear that *any* infrastructure is *entirely*
well-suited to do that; such problems are inherently complex. But this
is just a very simple system which was designed to be simple and very
low cost to use, and it may be that it's been stretched outside of its
comfort zone.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

From

Alvaro Herrera

Date:

13 September 2019, 16:03:06

On 2019-Sep-13, Robert Haas wrote:

> On Fri, Sep 13, 2019 at 2:49 AM Michael Paquier <michael@paquier.xyz> wrote:
> > On Fri, Sep 06, 2019 at 08:10:58AM -0400, Robert Haas wrote:
> > > It's fine if things are updated as well -- it's just you need to make
> > > sure that those places know whether or not they are supposed to be
> > > doing those updates. Again, why can't we just pass down a value
> > > telling them "do reindex-style progress updates" or "do cluster-style
> > > progress updates" or "do no progress updates"?
> >
> > That's invasive.  CREATE INDEX reporting goes pretty deep into the
> > tree, with steps dedicated to the builds and scans of btree (nbtsort.c
> > for example) and hash index AMs.

> Well, if CREATE INDEX progress reporting can't be reasonably done
> within the current infrastructure, then it should be reverted for v12
> and not committed again until somebody upgrades the infrastructure.

Ummm ... I've been operating --in this thread-- under the assumption
that it is REINDEX to blame for this problem, not CREATE INDEX, because
my recollection is that I tested CREATE INDEX together with CLUSTER and
it worked fine.  Has anybody done any actual research that the problem
is to blame on CREATE INDEX and not REINDEX?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

From

Robert Haas

Date:

13 September 2019, 16:48:40

On Fri, Sep 13, 2019 at 12:03 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Ummm ... I've been operating --in this thread-- under the assumption
> that it is REINDEX to blame for this problem, not CREATE INDEX, because
> my recollection is that I tested CREATE INDEX together with CLUSTER and
> it worked fine.  Has anybody done any actual research that the problem
> is to blame on CREATE INDEX and not REINDEX?

I am not sure. I think, though, that the point is that all three
commands rebuild indexes. So unless they all expect the same things in
terms of which counters get set during that process, things will not
work correctly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

From

Alvaro Herrera

Date:

13 September 2019, 18:00:25

Hello Tatsuro,

On 2019-Aug-13, Tatsuro Yamada wrote:

> On 2019/08/02 3:43, Alvaro Herrera wrote:
> > Hmm, I'm trying this out now and I don't see the index_rebuild_count
> > ever go up.  I think it's because the indexes are built using parallel
> > index build ... or maybe it was the table AM changes that moved things
> > around, not sure.  There's a period at the end when the CLUSTER command
> > keeps working, but it's gone from pg_stat_progress_cluster.
> 
> Thanks for your report.
> I'll investigate it. :)

I have fixed it.  Can you please verify?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

14 September 2019, 02:45:47

On Fri, Sep 13, 2019 at 12:48:40PM -0400, Robert Haas wrote:
> On Fri, Sep 13, 2019 at 12:03 PM Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> Ummm ... I've been operating --in this thread-- under the assumption
>> that it is REINDEX to blame for this problem, not CREATE INDEX, because
>> my recollection is that I tested CREATE INDEX together with CLUSTER and
>> it worked fine.  Has anybody done any actual research that the problem
>> is to blame on CREATE INDEX and not REINDEX?
>
> I am not sure. I think, though, that the point is that all three
> commands rebuild indexes. So unless they all expect the same things in
> terms of which counters get set during that process, things will not
> work correctly.

I have provided a short summary of the two issues on the open item
page (https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items) as
the open item was too much evasive.  Here is a copy-paste for the
archives of what I wrote:
1) A progress may be started while another one is already in progress.
Hence, if progress gets stopped the previously-started state is
removed, causing all follow-up updates to not happen.
2) Progress updates happening in a code path shared between those
three commands may clobber a previous state present.

Regarding 1) and based on what I found in the code, you can blame
REINDEX reporting which has added progress_start calls in code paths
which are also taken by CREATE INDEX and CLUSTER, causing their
progress reporting to go to the void.  In order to fix this one we
could do what I summarized in [1].

As mentioned by Robert, the problem summarized in 2) is much more
complex using the current infrastructure, and one could blame all the
commands per the way they do not share the same set of progress
phases.  There are a couple of potential solutions which have been
discussed on the thread:
- Allow commands to share the same set of phases, which requires some
kind of mapping between the phases (?).
- Allow progress reports to become a stack.  This would also take care
of any kind of issues in 1) for the future, and this can cause the
incorrect command to be reported to pg_stat_activity if not being
careful.
- Allow only reporting for a given command ID, which would basically
require to pass down the command ID to progress update APIs and bypass
an update if the command ID provided by caller does not match the
existing one started (?).

1) is pretty easy to fix based on the current state of the code, 2)
requires much more consideration, and that's no material for v12.  It
could be perfectly possible as well that we have another solution not
discussed yet on this thread.

[1]: https://www.postgresql.org/message-id/20190905010316.GB14853@paquier.xyz
--
Michael

Attachment

signature.asc

Re: [HACKERS] CLUSTER command progress monitor

From

Tattsu Yama

Date:

14 September 2019, 04:06:32

Hi Alvaro!

Hello Tatsuro,
On 2019-Aug-13, Tatsuro Yamada wrote:
> On 2019/08/02 3:43, Alvaro Herrera wrote:
> > Hmm, I'm trying this out now and I don't see the index_rebuild_count
> > ever go up. I think it's because the indexes are built using parallel
> > index build ... or maybe it was the table AM changes that moved things
> > around, not sure. There's a period at the end when the CLUSTER command
> > keeps working, but it's gone from pg_stat_progress_cluster.
>
> Thanks for your report.
> I'll investigate it. :)

I have fixed it. Can you please verify?

Thanks! I can review your patch for fix it.

However, I was starting fixing the problem from the last day of PGConf.Asia (11 Sep).
Attached file is WIP patch.In my patch, I added "command id" to all APIs of progress reporting to isolate commands. Therefore, it doesn't allow to cascade updating system views. And my patch is on WIP so it needs clean-up and test.
I share it anyway. :)

Here is a test result of my patch.

The last column index_rebuild count is increased.

========================================

postgres=# select * from pg_stat_progress_cluster ; \watch 0.001;
11636|13591|postgres|16384|CLUSTER|initializing|0|0|0|0|0|0
11636|13591|postgres|16384|CLUSTER|index scanning heap|16389|251|251|0|0|0
...

11636|13591|postgres|16384|CLUSTER|index scanning heap|16389|10000|10000|0|0|0
11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|10000|10000|0|0|0...

11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|10000|10000|0|0|1
...

11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|10000|10000|0|0|2
...

11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|10000|10000|0|0|3
...

11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|10000|10000|0|0|4
...

11636|13591|postgres|16384|CLUSTER|performing final cleanup|16389|10000|10000|0|0|5

========================================

Thanks,

Tatsuro Yamada

Attachment

v1_fix_progress_report_for_cluster.patch

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

14 September 2019, 04:30:44

On Sat, Sep 14, 2019 at 01:06:32PM +0900, Tattsu Yama wrote:
> Thanks! I can review your patch for fix it.
> However, I was starting fixing the problem from the last day of PGConf.Asia
> (11 Sep).
> Attached file is WIP patch.In my patch, I added "command id" to all APIs of
> progress reporting to isolate commands. Therefore, it doesn't allow to
> cascade updating system views. And my patch is on WIP so it needs clean-up
> and test.
> I share it anyway. :)

+       if (cmdtype == PROGRESS_COMMAND_INVALID || beentry->st_progress_command == cmdtype)
+       {
+               PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+               beentry->st_progress_param[index] = val;
+               PGSTAT_END_WRITE_ACTIVITY(beentry);
+       }
You basically don't need the progress reports if the command ID is
invalid, no?

Another note is that you don't actually fix the problems related to
the calls of pgstat_progress_end_command() which have been added for
REINDEX reporting, so a progress report started for CLUSTER can get
ended earlier than expected, preventing the follow-up progress updates
to show up.
--
Michael

Attachment

signature.asc

Re: [HACKERS] CLUSTER command progress monitor

From

Tattsu Yama

Date:

15 September 2019, 03:35:10

Hi Michael!

> Attached file is WIP patch.In my patch, I added "command id" to all APIs of
> progress reporting to isolate commands. Therefore, it doesn't allow to
> cascade updating system views. And my patch is on WIP so it needs clean-up
> and test.
> I share it anyway. :)

+ if (cmdtype == PROGRESS_COMMAND_INVALID || beentry->st_progress_command == cmdtype)
+ {
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+ beentry->st_progress_param[index] = val;
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
+ }
You basically don't need the progress reports if the command ID is
invalid, no?

Ah, right.

I'll check and fix that today. :)

Another note is that you don't actually fix the problems related to
the calls of pgstat_progress_end_command() which have been added for
REINDEX reporting, so a progress report started for CLUSTER can get
ended earlier than expected, preventing the follow-up progress updates
to show up.

Hmm... I fixed the problem. Please confirm the test result repeated below.

CLUSTER is able to get the last phase: performing final clean up by using the patch.

# Test result

========================================

11636|13591|postgres|16384|CLUSTER|index scanning heap|16389|10000|10000|0|0|0
11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|10000|10000|0|0|0 <== The last column rebuild_index_count is increasing!

11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|10000|10000|0|0|1

11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|10000|10000|0|0|2

11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|10000|10000|0|0|3

11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|10000|10000|0|0|4

11636|13591|postgres|16384|CLUSTER|performing final cleanup|16389|10000|10000|0|0|5 <== The last phase of CLUSTER!

========================================

Thanks,

Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

From

Tattsu Yama

Date:

16 September 2019, 06:26:10

Hi Michael,

> Attached file is WIP patch.In my patch, I added "command id" to all APIs of
> progress reporting to isolate commands. Therefore, it doesn't allow to
> cascade updating system views. And my patch is on WIP so it needs clean-up
> and test.
> I share it anyway. :)

+ if (cmdtype == PROGRESS_COMMAND_INVALID || beentry->st_progress_command == cmdtype)
+ {
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+ beentry->st_progress_param[index] = val;
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
+ }
You basically don't need the progress reports if the command ID is
invalid, no?

Ah, right.
I'll check and fix that today. :)

I fixed the patch based on your comment.

Please find attached file. :)

I should have explained the API changes more. I added cmdtype as a given parameter for all functions (See below).

Therefore, I suppose that my patch is similar to the following fix as you mentioned on -hackers.

- Allow only reporting for a given command ID, which would basically
require to pass down the command ID to progress update APIs and bypass an update
if the command ID provided by caller does not match the existing one started (?).

#pgstat.c

pgstat_progress_start_command(ProgressCommandType cmdtype,...)
- Progress reporter starts when beentry->st_progress_command is PROGRESS_COMMAND_INVALID

pgstat_progress_end_command(ProgressCommandType cmdtype,...)
- Progress reporter ends when beentry->st_progress_command equals cmdtype

pgstat_progress_update_param(ProgressCommandType cmdtype,...) and
pgstat_progress_update_multi_param(ProgressCommandType cmdtype,...)
- Progress reporter updates parameters if beentry->st_progress_command equals cmdtype

Note:

cmdtype means the ProgressCommandType below:

# pgstat.h

typedef enum ProgressCommandType
{
PROGRESS_COMMAND_INVALID,
PROGRESS_COMMAND_VACUUM,
PROGRESS_COMMAND_CLUSTER,
PROGRESS_COMMAND_CREATE_INDEX
} ProgressCommandType;

Thanks,

Tatsuro Yamada

Attachment

v2_fix_progress_report_for_cluster.patch

Re: [HACKERS] CLUSTER command progress monitor

From

Alvaro Herrera

Date:

16 September 2019, 14:12:15

On 2019-Sep-16, Tattsu Yama wrote:

> I should have explained the API changes more. I added cmdtype as a given
> parameter for all functions (See below).
> Therefore, I suppose that my patch is similar to the following fix as you
> mentioned on -hackers.

Is this fix strictly necessary for pg12, or is this something that we
can leave for pg13?


-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

17 September 2019, 02:01:28

Hi Alvaro,

On 2019/09/16 23:12, Alvaro Herrera wrote:
> On 2019-Sep-16, Tattsu Yama wrote:
> 
>> I should have explained the API changes more. I added cmdtype as a given
>> parameter for all functions (See below).
>> Therefore, I suppose that my patch is similar to the following fix as you
>> mentioned on -hackers.
> 
> Is this fix strictly necessary for pg12, or is this something that we
> can leave for pg13?

Not only me but many DBA needs this progress report feature on PG12,
therefore I'm trying to fix the problem. If you send other patch to
fix the problem, and it is more elegant than mine, I can withdraw my patch.
Anyway, I want to avoid this feature being reverted.
Do you have any ideas to fix the problem?

Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

From

Alvaro Herrera

Date:

17 September 2019, 02:08:32

On 2019-Sep-17, Tatsuro Yamada wrote:

> On 2019/09/16 23:12, Alvaro Herrera wrote:

> > Is this fix strictly necessary for pg12, or is this something that we
> > can leave for pg13?
> 
> Not only me but many DBA needs this progress report feature on PG12,
> therefore I'm trying to fix the problem. If you send other patch to
> fix the problem, and it is more elegant than mine, I can withdraw my patch.
> Anyway, I want to avoid this feature being reverted.
> Do you have any ideas to fix the problem?

I committed a fix for the originally reported problem as da47e43dc32e in
branch REL_12_STABLE.  Is that insufficient, and if so why?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

17 September 2019, 02:34:21

Hi Alvaro!

>>> Is this fix strictly necessary for pg12, or is this something that we
>>> can leave for pg13?
>>
>> Not only me but many DBA needs this progress report feature on PG12,
>> therefore I'm trying to fix the problem. If you send other patch to
>> fix the problem, and it is more elegant than mine, I can withdraw my patch.
>> Anyway, I want to avoid this feature being reverted.
>> Do you have any ideas to fix the problem?
> 
> I committed a fix for the originally reported problem as da47e43dc32e in
> branch REL_12_STABLE.  Is that insufficient, and if so why?


Ooops, I misunderstood. I now realized you committed your patch to
fix the problem. Thanks! I'll test it later. :)

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=da47e43dc32e3c5916396f0cbcfa974b371e4875


Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

From

Tatsuro Yamada

Date:

17 September 2019, 03:30:12

Hi Alvaro!

>>>> Is this fix strictly necessary for pg12, or is this something that we
>>>> can leave for pg13?
>>>
>>> Not only me but many DBA needs this progress report feature on PG12,
>>> therefore I'm trying to fix the problem. If you send other patch to
>>> fix the problem, and it is more elegant than mine, I can withdraw my patch.
>>> Anyway, I want to avoid this feature being reverted.
>>> Do you have any ideas to fix the problem?
>>
>> I committed a fix for the originally reported problem as da47e43dc32e in
>> branch REL_12_STABLE.  Is that insufficient, and if so why?
> 
> 
> Ooops, I misunderstood. I now realized you committed your patch to
> fix the problem. Thanks! I'll test it later. :)
> 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=da47e43dc32e3c5916396f0cbcfa974b371e4875


I tested your patch (da47e43d) and it works fine. Thanks! :)
So, my patch improving progress reporting API can leave for PG13.


#Test scenario
===================
[Session #1]
select * from pg_stat_progress_cluster ; \watch 0.0001

[Session #2]
create table hoge as select a from generate_series(1, 100000) a;
create index ind_hoge1 on hoge(a);
create index ind_hoge2 on hoge((a%2));
create index ind_hoge3 on hoge((a%3));
create index ind_hoge4 on hoge((a%4));
create index ind_hoge5 on hoge((a%5));
cluster hoge using ind_hoge1;
===================

#Test result
===================
22283|13593|postgres|16384|CLUSTER|initializing|0|0|0|0|0|0
...
22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|100000|100000|0|0|0 <= Increasing from 0 to 5
22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|100000|100000|0|0|1
22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|100000|100000|0|0|2
22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|100000|100000|0|0|3
22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|100000|100000|0|0|4
22283|13593|postgres|16384|CLUSTER|performing final cleanup|16387|100000|100000|0|0|5
===================


Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

17 September 2019, 05:13:39

On Mon, Sep 16, 2019 at 03:26:10PM +0900, Tattsu Yama wrote:
> I should have explained the API changes more. I added cmdtype as a given
> parameter for all functions (See below).
> Therefore, I suppose that my patch is similar to the following fix as you
> mentioned on -hackers.

Yes, that's an option I mentioned here, but it has drawbacks:
https://www.postgresql.org/message-id/20190914024547.GB15406@paquier.xyz

I have just looked at it again after a small rebase and there are
issues with the design of your patch:
- When aborting a transaction, we need to enforce a reset of the
command ID used in st_progress_command to be PROGRESS_COMMAND_INVALID.
Unfortunately, your patch does not consider the case where an error
happens while a command ID is set, causing a command to still be
tracked with the next transactions of the session.  Even worse, it
prevents pgstat_progress_start_command() to be called again in this
case for another command.
- CLUSTER can rebuild indexes, and we'd likely want to be able to
track some of the information from CREATE INDEX for CLUSTER.

The second issue is perhaps fine as it is not really straight-forward
to share the same progress phases across multiple commands, and we
could live without it for now, or require a follow-up patch to make
the information of CREATE INDEX available to CLUSTER.

Now, the first issue is of another caliber and a no-go :(

On HEAD, pgstat_progress_end_command() has the limitation to not be
able to stack multiple commands, so calling it in cascade has the
disadvantage to perhaps erase the progress state of a command (and it
is not designed for that anyway), which is what happens with CLUSTER
when reindex_index() starts a new progress report, but the simplicity
of the current infrastructure is very safe when it comes to failure
handling, to make sure that an reset happens as long as the command ID
is not invalid.  Your patch makes that part unpredictable.
--
Michael

Attachment

signature.asc

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

18 September 2019, 01:21:04

On Sat, Sep 14, 2019 at 11:45:47AM +0900, Michael Paquier wrote:
> I have provided a short summary of the two issues on the open item
> page (https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items) as
> the open item was too much evasive.  Here is a copy-paste for the
> archives of what I wrote:
> 1) A progress may be started while another one is already in progress.
> Hence, if progress gets stopped the previously-started state is
> removed, causing all follow-up updates to not happen.
> 2) Progress updates happening in a code path shared between those
> three commands may clobber a previous state present.
>
> Regarding 1) and based on what I found in the code, you can blame
> REINDEX reporting which has added progress_start calls in code paths
> which are also taken by CREATE INDEX and CLUSTER, causing their
> progress reporting to go to the void.  In order to fix this one we
> could do what I summarized in [1].
>
> [1]: https://www.postgresql.org/message-id/20190905010316.GB14853@paquier.xyz

So, with the clock ticking and the release getting close by, what do
we do for this set of issues?  REINDEX, CREATE INDEX and CLUSTER all
try to build indexes and the current infrastructure is not really
adapted to hold all that.  Robert, Alvaro and Peter E, do you have any
comments to offer?
--
Michael

Attachment

signature.asc

Re: [HACKERS] CLUSTER command progress monitor

From

Alvaro Herrera

Date:

18 September 2019, 01:50:22

On 2019-Sep-18, Michael Paquier wrote:

> So, with the clock ticking and the release getting close by, what do
> we do for this set of issues?  REINDEX, CREATE INDEX and CLUSTER all
> try to build indexes and the current infrastructure is not really
> adapted to hold all that.  Robert, Alvaro and Peter E, do you have any
> comments to offer?

Which part of it is not already fixed?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

From

Michael Paquier

Date:

18 September 2019, 02:10:31

On Tue, Sep 17, 2019 at 10:50:22PM -0300, Alvaro Herrera wrote:
> On 2019-Sep-18, Michael Paquier wrote:
>> So, with the clock ticking and the release getting close by, what do
>> we do for this set of issues?  REINDEX, CREATE INDEX and CLUSTER all
>> try to build indexes and the current infrastructure is not really
>> adapted to hold all that.  Robert, Alvaro and Peter E, do you have any
>> comments to offer?
>
> Which part of it is not already fixed?

I can still see at least two problems.  There is one issue with
pgstat_progress_update_param() which gets called in reindex_table()
for a progress phase of CLUSTER, and this even if
REINDEXOPT_REPORT_PROGRESS is not set in the options.  Also it seems
to me that the calls to pgstat_progress_start_command() and
pgstat_progress_end_command() are at incorrect locations for
reindex_index() and that those should be one level higher on the stack
to avoid any kind of interactions with another command whose progress
has already started.
--
Michael

Attachment

signature.asc

Re: [HACKERS] CLUSTER command progress monitor

From

Alvaro Herrera

Date:

20 September 2019, 16:49:47

On 2019-Sep-18, Michael Paquier wrote:

> On Tue, Sep 17, 2019 at 10:50:22PM -0300, Alvaro Herrera wrote:
> > On 2019-Sep-18, Michael Paquier wrote:
> >> So, with the clock ticking and the release getting close by, what do
> >> we do for this set of issues?  REINDEX, CREATE INDEX and CLUSTER all
> >> try to build indexes and the current infrastructure is not really
> >> adapted to hold all that.  Robert, Alvaro and Peter E, do you have any
> >> comments to offer?
> > 
> > Which part of it is not already fixed?
> 
> I can still see at least two problems.  There is one issue with
> pgstat_progress_update_param() which gets called in reindex_table() 
> for a progress phase of CLUSTER, and this even if
> REINDEXOPT_REPORT_PROGRESS is not set in the options.

(You mean reindex_relation.)

... but that param update is there for CLUSTER, not for REINDEX, so if
we made it dependent on the option to turn on CREATE INDEX progress
updates, it would break CLUSTER progress reporting.  Also, the parameter
being updated is not used by CREATE INDEX, so there's no spurious
change.  I think this ain't broke, and thus it don't need fixin'.

If anything, I would like the CLUSTER report to show index creation
progress, which would go the opposite way.  But that seems a patch for
pg13.

> Also it seems
> to me that the calls to pgstat_progress_start_command() and
> pgstat_progress_end_command() are at incorrect locations for
> reindex_index() and that those should be one level higher on the stack
> to avoid any kind of interactions with another command whose progress
> has already started.

That doesn't work, because the caller doesn't have the OID of the table,
which pgstat_progress_start_command() needs.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services