Thread: [HACKERS] CLUSTER command progress monitor
Hi, Following is a proposal for reporting the progress of CLUSTER command: It seems that the following could be the phases of CLUSTER processing: 1. scanning heap 2. sort tuples 3. writing new heap 4. scan heap and write new heap 5. swapping relation files 6. rebuild index 7. performing final cleanup These phases are based on Rahila's presentation at PGCon 2017 (https://www.pgcon.org/2017/schedule/attachments/472_Progress%20Measurement%20PostgreSQL.pdf) and I added some phases on it. CLUSTER command may use Index Scan or Seq Scan when scanning the heap. Depending on which one is chosen, the command will proceed in the following sequence of phases: * Seq Scan 1. scanning heap 2. sort tuples 3. writing new heap 5. swapping relation files 6. rebuild index 7. performing final cleanup * Index Scan 4. scan heap and write new heap 5. swapping relation files 6. rebuild index 7. performing final cleanup The view provides the information of CLUSTER command progress details as follows postgres=# \d pg_stat_progress_cluster View "pg_catalog.pg_stat_progress_cluster" Column | Type | Collation | Nullable | Default ---------------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | relid | oid | | | phase | text | | | scan_method | text | | | scan_index_relid | bigint | | | heap_tuples_total | bigint | | | heap_tuples_scanned | bigint | | | Then I have questions. * Should we have separate views for them? Or should both be covered by the same view with some indication of which command (CLUSTER or VACUUM FULL) is actually running? I mean this progress monitor could be covering not only CLUSTER command but also VACUUM FULL command. * I chose tuples as scan heap's counter (heap_tuples_scanned) since it's not easy to get current blocks from Index Scan. Is it Ok? I'll add this patch to CF2017-09. Any comments or suggestion are welcome. Regards, Tatsuro Yamada NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Aug 31, 2017 at 2:12 PM, Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > Any comments or suggestion are welcome. Although this patch updates src/test/regress/expected/rules.out I think perhaps you included the wrong version? That regression test fails for me -- Thomas Munro http://www.enterprisedb.com
Hi Thomas, >> Any comments or suggestion are welcome. > > Although this patch updates src/test/regress/expected/rules.out I > think perhaps you included the wrong version? That regression test > fails for me Thanks for the comment. I use the patch on 7b69b6ce and it's fine. Did you use "initdb" command after "make install"? The pg_stat_progress_cluster view is created in initdb, probably. Regards, Tatsuro Yamada
On Fri, Sep 1, 2017 at 3:38 PM, Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > Hi Thomas, > >>> Any comments or suggestion are welcome. >> >> >> Although this patch updates src/test/regress/expected/rules.out I >> think perhaps you included the wrong version? That regression test >> fails for me > > > Thanks for the comment. > > I use the patch on 7b69b6ce and it's fine. > Did you use "initdb" command after "make install"? > The pg_stat_progress_cluster view is created in initdb, probably. > I also got a regression test error (applied to abe85ef). Here is regression.diff file. *** /home/masahiko/source/postgresql/src/test/regress/expected/rules.out 2017-09-01 17:27:33.680055612 -0700 --- /home/masahiko/source/postgresql/src/test/regress/results/rules.out 2017-09-01 17:28:10.410055596 -0700 *************** *** 1819,1824 **** --- 1819,1849 ---- pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin, pg_stat_get_db_conflict_startup_deadlock(d.oid)AS confl_deadlock FROM pg_database d; + pg_stat_progress_cluster| SELECT s.pid, + s.datid, + d.datname, + s.relid, + CASE s.param1 + WHEN 0 THEN 'initializing'::text + WHEN 1 THEN 'scanning heap'::text + WHEN 2 THEN 'sorting tuples'::text + WHEN 3 THEN 'writing new heap'::text + WHEN 4 THEN 'scan heap and write new heap'::text + WHEN 5 THEN 'swapping relation files'::text + WHEN 6 THEN 'rebuilding index'::text + WHEN 7 THEN 'performing final cleanup'::text + ELSE NULL::text + END AS phase, + CASE s.param2 + WHEN 0 THEN 'index scan'::text + WHEN 1 THEN 'seq scan'::text + ELSE NULL::text + END AS scan_method, + s.param3 AS scan_index_relid, + s.param4 AS heap_tuples_total, + s.param5 AS heap_tuples_scanned + FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10) + LEFT JOIN pg_database d ON ((s.datid = d.oid))); pg_stat_progress_vacuum| SELECT s.pid, s.datid, d.datname, *************** *** 1841,1871 **** s.param7 AS num_dead_tuples FROM (pg_stat_get_progress_info('VACUUM'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10) LEFT JOIN pg_database d ON ((s.datid = d.oid))); - pg_stat_progress_cluster| SELECT - s.pid, - s.datid, - d.datname, - s.relid, - CASE s.param1 - WHEN 0 THEN 'initializing'::text - WHEN 1 THEN 'scanning heap'::text - WHEN 2 THEN 'sorting tuples'::text - WHEN 3 THEN 'writing new heap'::text - WHEN 4 THEN 'scan heap and write new heap'::text - WHEN 5 THEN 'swapping relation files'::text - WHEN 6 THEN 'rebuilding index'::text - WHEN 7 THEN 'performing final cleanup'::text - ELSE NULL::text - END AS phase, - CASE S.param2 - WHEN 0 THEN 'index scan' - WHEN 1 THEN 'seq scan' - END AS scan_method, - s.param3 AS index_relid, - s.param4 AS heap_blks_total, - s.param5 AS heap_blks_scanned - FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5) - LEFT JOIN pg_database d ON ((s.datid = d.oid))); pg_stat_replication| SELECT s.pid, s.usesysid, u.rolnameAS usename, --- 1866,1871 ---- ====================================================================== Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi Sawada-san, Thomas, Thanks for sharing the reggression.diff. I realized Thomas's comment is right. Attached patch is fixed version. Could you try it? Regards, Tatsuro Yamada NTT Open Source Software Center On 2017/09/01 17:59, Masahiko Sawada wrote: > On Fri, Sep 1, 2017 at 3:38 PM, Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: >> Hi Thomas, >> >>>> Any comments or suggestion are welcome. >>> >>> >>> Although this patch updates src/test/regress/expected/rules.out I >>> think perhaps you included the wrong version? That regression test >>> fails for me >> >> >> Thanks for the comment. >> >> I use the patch on 7b69b6ce and it's fine. >> Did you use "initdb" command after "make install"? >> The pg_stat_progress_cluster view is created in initdb, probably. >> > > I also got a regression test error (applied to abe85ef). Here is > regression.diff file. > > *** /home/masahiko/source/postgresql/src/test/regress/expected/rules.out > 2017-09-01 17:27:33.680055612 -0700 > --- /home/masahiko/source/postgresql/src/test/regress/results/rules.out > 2017-09-01 17:28:10.410055596 -0700 > *************** > *** 1819,1824 **** > --- 1819,1849 ---- > pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin, > pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock > FROM pg_database d; > + pg_stat_progress_cluster| SELECT s.pid, > + s.datid, > + d.datname, > + s.relid, > + CASE s.param1 > + WHEN 0 THEN 'initializing'::text > + WHEN 1 THEN 'scanning heap'::text > + WHEN 2 THEN 'sorting tuples'::text > + WHEN 3 THEN 'writing new heap'::text > + WHEN 4 THEN 'scan heap and write new heap'::text > + WHEN 5 THEN 'swapping relation files'::text > + WHEN 6 THEN 'rebuilding index'::text > + WHEN 7 THEN 'performing final cleanup'::text > + ELSE NULL::text > + END AS phase, > + CASE s.param2 > + WHEN 0 THEN 'index scan'::text > + WHEN 1 THEN 'seq scan'::text > + ELSE NULL::text > + END AS scan_method, > + s.param3 AS scan_index_relid, > + s.param4 AS heap_tuples_total, > + s.param5 AS heap_tuples_scanned > + FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, > relid, param1, param2, param3, param4, param5, param6, param7, param8, > param9, param10) > + LEFT JOIN pg_database d ON ((s.datid = d.oid))); > pg_stat_progress_vacuum| SELECT s.pid, > s.datid, > d.datname, > *************** > *** 1841,1871 **** > s.param7 AS num_dead_tuples > FROM (pg_stat_get_progress_info('VACUUM'::text) s(pid, datid, > relid, param1, param2, param3, param4, param5, param6, param7, param8, > param9, param10) > LEFT JOIN pg_database d ON ((s.datid = d.oid))); > - pg_stat_progress_cluster| SELECT > - s.pid, > - s.datid, > - d.datname, > - s.relid, > - CASE s.param1 > - WHEN 0 THEN 'initializing'::text > - WHEN 1 THEN 'scanning heap'::text > - WHEN 2 THEN 'sorting tuples'::text > - WHEN 3 THEN 'writing new heap'::text > - WHEN 4 THEN 'scan heap and write new heap'::text > - WHEN 5 THEN 'swapping relation files'::text > - WHEN 6 THEN 'rebuilding index'::text > - WHEN 7 THEN 'performing final cleanup'::text > - ELSE NULL::text > - END AS phase, > - CASE S.param2 > - WHEN 0 THEN 'index scan' > - WHEN 1 THEN 'seq scan' > - END AS scan_method, > - s.param3 AS index_relid, > - s.param4 AS heap_blks_total, > - s.param5 AS heap_blks_scanned > - FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, > relid, param1, param2, param3, param4, param5) > - LEFT JOIN pg_database d ON ((s.datid = d.oid))); > pg_stat_replication| SELECT s.pid, > s.usesysid, > u.rolname AS usename, > --- 1866,1871 ---- > > ====================================================================== > > Regards, > > -- > Masahiko Sawada > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Mon, Sep 4, 2017 at 11:37 AM, Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > Hi Sawada-san, Thomas, > > Thanks for sharing the reggression.diff. > I realized Thomas's comment is right. > > Attached patch is fixed version. > Could you try it? > Yeah, in my environment the regression test passed. Thanks. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi Sawada-san, Thanks for taking your time. I'll be more careful. Regards, Tatsuro Yamada On 2017/09/04 11:51, Masahiko Sawada wrote: > On Mon, Sep 4, 2017 at 11:37 AM, Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: >> Hi Sawada-san, Thomas, >> >> Thanks for sharing the reggression.diff. >> I realized Thomas's comment is right. >> >> Attached patch is fixed version. >> Could you try it? >> > > Yeah, in my environment the regression test passed. Thanks. > > Regards, > > -- > Masahiko Sawada > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center > >
On Thu, Aug 31, 2017 at 11:12 AM, Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > Then I have questions. > > * Should we have separate views for them? Or should both be covered by the > same view with some indication of which command (CLUSTER or VACUUM FULL) > is actually running? Using the same view for both, and tell that this is rather VACUUM or CLUSTER in the view, would be better IMO. Coming up with a name more generic than pg_stat_progress_cluster may be better though if this speaks with VACUUM FULL as well, user-facing documentation does not say that VACUUM FULL is actually CLUSTER. > I'll add this patch to CF2017-09. > Any comments or suggestion are welcome. Nice to see that you are taking the time to implement patches for upstream, Yamada-san! -- Michael
On 2017/09/04 15:38, Michael Paquier wrote: > On Thu, Aug 31, 2017 at 11:12 AM, Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: >> Then I have questions. >> >> * Should we have separate views for them? Or should both be covered by the >> same view with some indication of which command (CLUSTER or VACUUM FULL) >> is actually running? > > Using the same view for both, and tell that this is rather VACUUM or > CLUSTER in the view, would be better IMO. Coming up with a name more > generic than pg_stat_progress_cluster may be better though if this > speaks with VACUUM FULL as well, user-facing documentation does not > say that VACUUM FULL is actually CLUSTER. Thanks for sharing your thoughts. Agreed. I'll add new column like a "command" to tell whether running CLUSTER or VACUUM. And how about this new view name? - pg_stat_progress_reorg Is it more general name than previous name if it covers both commands? >> I'll add this patch to CF2017-09. >> Any comments or suggestion are welcome. > > Nice to see that you are taking the time to implement patches for > upstream, Yamada-san! Same here. :) I'd like to contribute creating feature that is for DBA and users. Progress monitoring feature is important from my DBA experiences. I'm happy if you lend your hand. Thanks, Tatsuro Yamada
Hi Hackers, I revised the patch like this: - Add "command" column in the view It tells that the running command is CLUSTER or VACUUM FULL. - Enable VACUUM FULL progress monitor Add heap_tuples_vacuumed and heap_tuples_recently_dead as a counter in the view. Sequence of phases are below: 1. scanning heap 5. swapping relation files 6. rebuild index 7. performing final cleanup I didn't change the name of view (pg_stat_progress_cluster) because I'm not sure whether the new name (pg_stat_progress_reorg) is suitable or not. Any comments or suggestion are welcome. Thanks, Tatsuro Yamada On 2017/09/04 20:17, Tatsuro Yamada wrote: > On 2017/09/04 15:38, Michael Paquier wrote: >> On Thu, Aug 31, 2017 at 11:12 AM, Tatsuro Yamada >> <yamada.tatsuro@lab.ntt.co.jp> wrote: >>> Then I have questions. >>> >>> * Should we have separate views for them? Or should both be covered by the >>> same view with some indication of which command (CLUSTER or VACUUM FULL) >>> is actually running? >> >> Using the same view for both, and tell that this is rather VACUUM or >> CLUSTER in the view, would be better IMO. Coming up with a name more >> generic than pg_stat_progress_cluster may be better though if this >> speaks with VACUUM FULL as well, user-facing documentation does not >> say that VACUUM FULL is actually CLUSTER. > > Thanks for sharing your thoughts. > Agreed. > I'll add new column like a "command" to tell whether running CLUSTER or VACUUM. > And how about this new view name? > - pg_stat_progress_reorg > Is it more general name than previous name if it covers both commands? > > >>> I'll add this patch to CF2017-09. >>> Any comments or suggestion are welcome. >> >> Nice to see that you are taking the time to implement patches for >> upstream, Yamada-san! > > Same here. :) > I'd like to contribute creating feature that is for DBA and users. > Progress monitoring feature is important from my DBA experiences. > I'm happy if you lend your hand. > > Thanks, > Tatsuro Yamada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Sep 6, 2017 at 3:58 PM, Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > I revised the patch like this: You should avoid top-posting. > I didn't change the name of view (pg_stat_progress_cluster) because I'm not > sure > whether the new name (pg_stat_progress_reorg) is suitable or not. Here are some ideas: rewrite (incorrect for ALTER TABLE), organize (somewhat fine), order, operate (too general?), bundle, reform, assemble. -- Michael
On 2017/09/06 16:11, Michael Paquier wrote: > On Wed, Sep 6, 2017 at 3:58 PM, Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: >> I revised the patch like this: > > You should avoid top-posting. I see. >> I didn't change the name of view (pg_stat_progress_cluster) because I'm not >> sure >> whether the new name (pg_stat_progress_reorg) is suitable or not. > > Here are some ideas: rewrite (incorrect for ALTER TABLE), organize > (somewhat fine), order, operate (too general?), bundle, reform, > assemble. Thanks for sharing your ideas. I searched the words like a "reform table", "reassemble table" and "reorganize table" on google. I realized "reorganaize table" is good choice than others because many DBMS uses this keyword. Therefore, I'll change the name to it like this: before pg_stat_progress_cluster after pg_stat_progress_reorg (I abbreviate reorganize to reorg.) Does anyone have any suggestions? I'll revise the patch. Regards, Tatsuro Yamada
On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > 1. scanning heap > 2. sort tuples These two phases overlap, though. I believe progress reporting for sorts is really hard. In the simple case where the data fits in work_mem, none of the work of the sort gets done until all the data is read. Once you switch to an external sort, you're writing batch files, so a lot of the work is now being done during data loading. But as the number of batch files grows, the final merge at the end becomes an increasingly noticeable part of the cost, and eventually you end up needing multiple merge passes. I think we need some smart way to report on sorts so that we can tell how much of the work has really been done, but I don't know how to do it. > heap_tuples_total | bigint | | | The patch is getting the value reported as heap_tuples_total from OldHeap->rd_rel->reltuples. I think this is pointless: the user can see that value anyway if they wish. The point of the progress counters is to expose things the user couldn't otherwise see. It's also not necessarily accurate: it's only an estimate in the best case, and may be way off if the relation has recently be extended by a large amount. I think it's pretty important that we try hard to only report values that are known to be accurate, because users hate (and mock) inaccurate progress reports. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/08 18:55, Robert Haas wrote: > On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: >> 1. scanning heap >> 2. sort tuples > > These two phases overlap, though. I believe progress reporting for > sorts is really hard. In the simple case where the data fits in > work_mem, none of the work of the sort gets done until all the data is > read. Once you switch to an external sort, you're writing batch > files, so a lot of the work is now being done during data loading. > But as the number of batch files grows, the final merge at the end > becomes an increasingly noticeable part of the cost, and eventually > you end up needing multiple merge passes. I think we need some smart > way to report on sorts so that we can tell how much of the work has > really been done, but I don't know how to do it. Thanks for the comment. As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by cost estimation. In the case of SEQ SCAN, these two phases not overlap. However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan heap and write new heap" when INDEX SCAN was selected. I agree that progress reporting for sort is difficult. So it only reports the phase ("sorting tuples") in the current design of progress monitor of cluster. It doesn't report counter of sort. >> heap_tuples_total | bigint | | | > > The patch is getting the value reported as heap_tuples_total from > OldHeap->rd_rel->reltuples. I think this is pointless: the user can > see that value anyway if they wish. The point of the progress > counters is to expose things the user couldn't otherwise see. It's > also not necessarily accurate: it's only an estimate in the best case, > and may be way off if the relation has recently be extended by a large > amount. I think it's pretty important that we try hard to only report > values that are known to be accurate, because users hate (and mock) > inaccurate progress reports. Do you mean to use the number of rows by using below calculation instead OldHeap->rd_rel->reltuples? estimate rows = physical table size / average row length I understand that OldHeap->rd_rel->reltuples is sometimes useless because it is correct by auto analyze and it can't perform when under a threshold. I'll add it in next patch and also share more detailed the current design of progress monitor for cluster. Regards, Tatsuro Yamada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > Thanks for the comment. > > As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by > cost estimation. In the case of SEQ SCAN, these two phases not overlap. > However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan > heap and write new heap" when INDEX SCAN was selected. > > I agree that progress reporting for sort is difficult. So it only reports > the phase ("sorting tuples") in the current design of progress monitor of > cluster. > It doesn't report counter of sort. Doesn't that make it almost useless? I would guess that scanning the heap and writing the new heap would ordinarily account for most of the runtime, or at least enough that you're going to want something more than just knowing that's the phase you're in. >> The patch is getting the value reported as heap_tuples_total from >> OldHeap->rd_rel->reltuples. I think this is pointless: the user can >> see that value anyway if they wish. The point of the progress >> counters is to expose things the user couldn't otherwise see. It's >> also not necessarily accurate: it's only an estimate in the best case, >> and may be way off if the relation has recently be extended by a large >> amount. I think it's pretty important that we try hard to only report >> values that are known to be accurate, because users hate (and mock) >> inaccurate progress reports. > > Do you mean to use the number of rows by using below calculation instead > OldHeap->rd_rel->reltuples? > > estimate rows = physical table size / average row length No, I mean don't report it at all. The caller can do that calculation if they wish, without any help from the progress reporting machinery. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 11, 2017 at 7:38 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: >> Thanks for the comment. >> >> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by >> cost estimation. In the case of SEQ SCAN, these two phases not overlap. >> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan >> heap and write new heap" when INDEX SCAN was selected. >> >> I agree that progress reporting for sort is difficult. So it only reports >> the phase ("sorting tuples") in the current design of progress monitor of >> cluster. >> It doesn't report counter of sort. > > Doesn't that make it almost useless? I would guess that scanning the > heap and writing the new heap would ordinarily account for most of the > runtime, or at least enough that you're going to want something more > than just knowing that's the phase you're in. It's definitely my experience that CLUSTER is incredibly I/O bound. You're shoveling the tuples through tuplesort.c, but the actual sorting component isn't where the real costs are. Profiling shows that writing out the new heap (including moderately complicated bookkeeping) is the bottleneck, IIRC. That's why parallel CLUSTER didn't look attractive, even though it would be a fairly straightforward matter to add that on top of the parallel CREATE INDEX structure from the patch that I wrote to do that. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/11 23:38, Robert Haas wrote: > On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: >> Thanks for the comment. >> >> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by >> cost estimation. In the case of SEQ SCAN, these two phases not overlap. >> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan >> heap and write new heap" when INDEX SCAN was selected. >> >> I agree that progress reporting for sort is difficult. So it only reports >> the phase ("sorting tuples") in the current design of progress monitor of >> cluster. >> It doesn't report counter of sort. > > Doesn't that make it almost useless? I would guess that scanning the > heap and writing the new heap would ordinarily account for most of the > runtime, or at least enough that you're going to want something more > than just knowing that's the phase you're in. Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort()) I know that external merge sort takes a time than quick sort. I'll try investigating how to get a counter from external merge sort processing. Is this the right way? >>> The patch is getting the value reported as heap_tuples_total from >>> OldHeap->rd_rel->reltuples. I think this is pointless: the user can >>> see that value anyway if they wish. The point of the progress >>> counters is to expose things the user couldn't otherwise see. It's >>> also not necessarily accurate: it's only an estimate in the best case, >>> and may be way off if the relation has recently be extended by a large >>> amount. I think it's pretty important that we try hard to only report >>> values that are known to be accurate, because users hate (and mock) >>> inaccurate progress reports. >> >> Do you mean to use the number of rows by using below calculation instead >> OldHeap->rd_rel->reltuples? >> >> estimate rows = physical table size / average row length > > No, I mean don't report it at all. The caller can do that calculation > if they wish, without any help from the progress reporting machinery. I see. I'll remove that column on next patch. Regards, Tatsuro Yamada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/12 21:20, Tatsuro Yamada wrote: > On 2017/09/11 23:38, Robert Haas wrote: >> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada >> <yamada.tatsuro@lab.ntt.co.jp> wrote: >>> Thanks for the comment. >>> >>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by >>> cost estimation. In the case of SEQ SCAN, these two phases not overlap. >>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan >>> heap and write new heap" when INDEX SCAN was selected. >>> >>> I agree that progress reporting for sort is difficult. So it only reports >>> the phase ("sorting tuples") in the current design of progress monitor of >>> cluster. >>> It doesn't report counter of sort. >> >> Doesn't that make it almost useless? I would guess that scanning the >> heap and writing the new heap would ordinarily account for most of the >> runtime, or at least enough that you're going to want something more >> than just knowing that's the phase you're in. > > Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort()) > I know that external merge sort takes a time than quick sort. > I'll try investigating how to get a counter from external merge sort processing. > Is this the right way? > > >>>> The patch is getting the value reported as heap_tuples_total from >>>> OldHeap->rd_rel->reltuples. I think this is pointless: the user can >>>> see that value anyway if they wish. The point of the progress >>>> counters is to expose things the user couldn't otherwise see. It's >>>> also not necessarily accurate: it's only an estimate in the best case, >>>> and may be way off if the relation has recently be extended by a large >>>> amount. I think it's pretty important that we try hard to only report >>>> values that are known to be accurate, because users hate (and mock) >>>> inaccurate progress reports. >>> >>> Do you mean to use the number of rows by using below calculation instead >>> OldHeap->rd_rel->reltuples? >>> >>> estimate rows = physical table size / average row length >> >> No, I mean don't report it at all. The caller can do that calculation >> if they wish, without any help from the progress reporting machinery. > > I see. I'll remove that column on next patch. I will summarize the current design and future corrections before sending the next patch. === Current design === CLUSTER command may use Index Scan or Seq Scan when scanning the heap. Depending on which one is chosen, the command will proceed in the following sequence of phases: * Scan method: Seq Scan 1. scanning heap (*1) 2. sorting tuples (*2) 3. writing newheap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performingfinal cleanup (*2) * Scan method: Index Scan 4. scan heap and write new heap (*1) 5. swapping relation files (*2) 6. rebuildingindex (*2) 7. performing final cleanup (*2) VACUUM FULL command will proceed in the following sequence of phases: 1. scanning heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) (*1): increasing the value in heap_tuples_scanned column (*2): only shows the phase in the phase column The view provides the information of CLUSTER command progress details as follows # \d pg_stat_progress_cluster View "pg_catalog.pg_stat_progress_cluster" Column | Type | Collation | Nullable | Default ---------------------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | relid | oid | | | command | text | | | phase | text | | | scan_method | text | | | scan_index_relid | bigint | | | heap_tuples_total | bigint | | | heap_tuples_scanned | bigint | | | heap_tuples_vacuumed | bigint | | | heap_tuples_recently_dead | bigint | | | === It will be changed on next patch === - Rename to pg_stat_progress_reolg from pg_stat_progress_cluster - Remove heap_tuples_total column from the view - Add aprogress counter in the phase of "sorting tuples" (difficult?!) === My test case as a bonus === I share my test case of progress monitor. If someone wants to watch the current progress monitor, you can use this test case as a example. [Terminal1] Run this query on psql: select * from pg_stat_progress_cluster; \watch 0.05 [Terminal2] Run these queries on psql: drop table t1; create table t1 as select a, random() * 1000 as b from generate_series(0, 99999999) a; create index idx_t1 on t1(a); create index idx_t1_b on t1(b); analyze t1; -- index scan set enable_seqscan to off; cluster verbose t1 using idx_t1; -- seq scan set enable_seqscan to on; set enable_indexscan to off; cluster verbose t1 using idx_t1; -- only given table name to cluster command cluster verbose t1; -- only cluster command cluster verbose; -- vacuum full vacuum full t1; -- vacuum full vacuum full; Thanks, Tatsuro Yamada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
The view provides the information of CLUSTER command progress details as follows
postgres=# \d pg_stat_progress_cluster
View "pg_catalog.pg_stat_progress_cluster"
Column | Type | Collation | Nullable | Default
---------------------+---------+-----------+----------+----- ----
pid | integer | | |
datid | oid | | |
datname | name | | |
relid | oid | | |
phase | text | | |
scan_method | text | | |
scan_index_relid | bigint | | |
heap_tuples_total | bigint | | |
heap_tuples_scanned | bigint | | |
> On 12 Sep 2017, at 14:57, Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > > On 2017/09/12 21:20, Tatsuro Yamada wrote: >> On 2017/09/11 23:38, Robert Haas wrote: >>> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada >>> <yamada.tatsuro@lab.ntt.co.jp> wrote: >>>> Thanks for the comment. >>>> >>>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by >>>> cost estimation. In the case of SEQ SCAN, these two phases not overlap. >>>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan >>>> heap and write new heap" when INDEX SCAN was selected. >>>> >>>> I agree that progress reporting for sort is difficult. So it only reports >>>> the phase ("sorting tuples") in the current design of progress monitor of >>>> cluster. >>>> It doesn't report counter of sort. >>> >>> Doesn't that make it almost useless? I would guess that scanning the >>> heap and writing the new heap would ordinarily account for most of the >>> runtime, or at least enough that you're going to want something more >>> than just knowing that's the phase you're in. >> >> Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort()) >> I know that external merge sort takes a time than quick sort. >> I'll try investigating how to get a counter from external merge sort processing. >> Is this the right way? >> >> >>>>> The patch is getting the value reported as heap_tuples_total from >>>>> OldHeap->rd_rel->reltuples. I think this is pointless: the user can >>>>> see that value anyway if they wish. The point of the progress >>>>> counters is to expose things the user couldn't otherwise see. It's >>>>> also not necessarily accurate: it's only an estimate in the best case, >>>>> and may be way off if the relation has recently be extended by a large >>>>> amount. I think it's pretty important that we try hard to only report >>>>> values that are known to be accurate, because users hate (and mock) >>>>> inaccurate progress reports. >>>> >>>> Do you mean to use the number of rows by using below calculation instead >>>> OldHeap->rd_rel->reltuples? >>>> >>>> estimate rows = physical table size / average row length >>> >>> No, I mean don't report it at all. The caller can do that calculation >>> if they wish, without any help from the progress reporting machinery. >> >> I see. I'll remove that column on next patch. > > > I will summarize the current design and future corrections before sending > the next patch. > > > === Current design === > > CLUSTER command may use Index Scan or Seq Scan when scanning the heap. > Depending on which one is chosen, the command will proceed in the > following sequence of phases: > > * Scan method: Seq Scan > 1. scanning heap (*1) > 2. sorting tuples (*2) > 3. writing new heap (*1) > 5. swapping relation files (*2) > 6. rebuilding index (*2) > 7. performing final cleanup (*2) > > * Scan method: Index Scan > 4. scan heap and write new heap (*1) > 5. swapping relation files (*2) > 6. rebuilding index (*2) > 7. performing final cleanup (*2) > > VACUUM FULL command will proceed in the following sequence of phases: > > 1. scanning heap (*1) > 5. swapping relation files (*2) > 6. rebuilding index (*2) > 7. performing final cleanup (*2) > > (*1): increasing the value in heap_tuples_scanned column > (*2): only shows the phase in the phase column > > The view provides the information of CLUSTER command progress details as follows > # \d pg_stat_progress_cluster > View "pg_catalog.pg_stat_progress_cluster" > Column | Type | Collation | Nullable | Default > ---------------------------+---------+-----------+----------+--------- > pid | integer | | | > datid | oid | | | > datname | name | | | > relid | oid | | | > command | text | | | > phase | text | | | > scan_method | text | | | > scan_index_relid | bigint | | | > heap_tuples_total | bigint | | | > heap_tuples_scanned | bigint | | | > heap_tuples_vacuumed | bigint | | | > heap_tuples_recently_dead | bigint | | | > > > === It will be changed on next patch === > > - Rename to pg_stat_progress_reolg from pg_stat_progress_cluster > - Remove heap_tuples_total column from the view > - Add a progress counter in the phase of "sorting tuples" (difficult?!) > > > === My test case as a bonus === > > I share my test case of progress monitor. > If someone wants to watch the current progress monitor, you can use > this test case as a example. > > [Terminal1] > Run this query on psql: > > select * from pg_stat_progress_cluster; \watch 0.05 > > [Terminal2] > Run these queries on psql: > > drop table t1; > > create table t1 as select a, random() * 1000 as b from generate_series(0, 99999999) a; > create index idx_t1 on t1(a); > create index idx_t1_b on t1(b); > analyze t1; > > -- index scan > set enable_seqscan to off; > cluster verbose t1 using idx_t1; > > -- seq scan > set enable_seqscan to on; > set enable_indexscan to off; > cluster verbose t1 using idx_t1; > > -- only given table name to cluster command > cluster verbose t1; > > -- only cluster command > cluster verbose; > > -- vacuum full > vacuum full t1; > > -- vacuum full > vacuum full; Based on this thread, this patch has been marked Returned with Feedback. Please re-submit a new version to a future commitfest. cheers ./daniel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 12, 2017 at 8:20 AM, Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: >>> I agree that progress reporting for sort is difficult. So it only reports >>> the phase ("sorting tuples") in the current design of progress monitor of >>> cluster. >>> It doesn't report counter of sort. >> >> Doesn't that make it almost useless? I would guess that scanning the >> heap and writing the new heap would ordinarily account for most of the >> runtime, or at least enough that you're going to want something more >> than just knowing that's the phase you're in. > > Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort()) > I know that external merge sort takes a time than quick sort. > I'll try investigating how to get a counter from external merge sort > processing. > Is this the right way? Progress reporting on sorts seems like a tricky problem to me, as I said before. In most cases, a sort is going to involve an initial stage where it reads all the input tuples and writes out quicksorted runs, and then a merge phase where it merges all the output tapes into a sorted result. There are some complexities; for example, if the number of tapes is really large, then we might need multiple merge phases, only the last of which will produce tuples. On the other hand, if work_mem is very large, the time taken for sorting each run might itself be significant that we'd like to have insight into progress. If we ignore those complexities, though, a reasonable way of reporting progress might be to report the following: 1. blocks read from the relation 2. # of tuples we've put into the tuplesort 3. # of tuples we've extracted from the tuplesort During the first part of the sort, (1) and (2) will be growing, and the user can measure progress by comparing (1) to the total size of the relation. During the final merge, (3) will be growing, eventually becoming equal to (2), so the user can measure progress my comparing (2) with (3). This approach only works for a seqscan-and-sort, though. I'm not sure what to do about the index scan case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: > > 1. scanning heap > > 2. sort tuples > > These two phases overlap, though. I believe progress reporting for > sorts is really hard. In the simple case where the data fits in > work_mem, none of the work of the sort gets done until all the data is > read. Once you switch to an external sort, you're writing batch > files, so a lot of the work is now being done during data loading. > But as the number of batch files grows, the final merge at the end > becomes an increasingly noticeable part of the cost, and eventually > you end up needing multiple merge passes. I think we need some smart > way to report on sorts so that we can tell how much of the work has > really been done, but I don't know how to do it. Whatever complexity is hidden in the sort, cost_sort() should have taken it into consideration when called via plan_cluster_use_sort(). Thus I think that once we have both startup and total cost, the current progress of the sort stage can be estimated from the current number of input and output rows. Please remind me if my proposal appears to be too simplistic. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at
Antonin Houska <ah@cybertec.at> writes: > Robert Haas <robertmhaas@gmail.com> wrote: >> These two phases overlap, though. I believe progress reporting for >> sorts is really hard. > Whatever complexity is hidden in the sort, cost_sort() should have taken it > into consideration when called via plan_cluster_use_sort(). Thus I think that > once we have both startup and total cost, the current progress of the sort > stage can be estimated from the current number of input and output > rows. Please remind me if my proposal appears to be too simplistic. Well, even if you assume that the planner's cost model omits nothing (which I wouldn't bet on), its result is only going to be as good as the planner's estimate of the number of rows to be sorted. And, in cases where people actually care about progress monitoring, it's likely that the planner got that wrong, maybe horribly so. I think it's a bad idea for progress monitoring to depend on the planner's estimates in any way whatsoever. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Antonin Houska <ah@cybertec.at> writes: > > Robert Haas <robertmhaas@gmail.com> wrote: > >> These two phases overlap, though. I believe progress reporting for > >> sorts is really hard. > > > Whatever complexity is hidden in the sort, cost_sort() should have taken it > > into consideration when called via plan_cluster_use_sort(). Thus I think that > > once we have both startup and total cost, the current progress of the sort > > stage can be estimated from the current number of input and output > > rows. Please remind me if my proposal appears to be too simplistic. > > Well, even if you assume that the planner's cost model omits nothing > (which I wouldn't bet on), its result is only going to be as good as the > planner's estimate of the number of rows to be sorted. And, in cases > where people actually care about progress monitoring, it's likely that > the planner got that wrong, maybe horribly so. I think it's a bad idea > for progress monitoring to depend on the planner's estimates in any way > whatsoever. The general idea was that some sort of prediction of the total cost is needed anyway if we should tell during execution what fraction of work has already been done. And also that the cost computation that we perform during execution shouldn't (ideally) differ from cost_sort(). So I thought that it's easier to refine cost_sort() than to implement the same computation from scratch elsewhere. Besides that I see 2 circumstances that make the estimate of the number of input tuples simpler in the CLUSTER case: * There's only 1 input relation w/o any kind of clause. * CLUSTER uses SnapshotAny, so pg_class(reltuples) is closer to the actual number of input rows than it would be in generalcase. (Of course, pg_class would only be useful for the initial estimate.) Unlike planner, the executor could recalculate the cost estimate at some point(s) as it recognizes that the actual number of tuples per page appears to differ from the density derived from pg_class initially. Still wrong? -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at
On Mon, Nov 20, 2017 at 12:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Antonin Houska <ah@cybertec.at> writes: >> Robert Haas <robertmhaas@gmail.com> wrote: >>> These two phases overlap, though. I believe progress reporting for >>> sorts is really hard. > >> Whatever complexity is hidden in the sort, cost_sort() should have taken it >> into consideration when called via plan_cluster_use_sort(). Thus I think that >> once we have both startup and total cost, the current progress of the sort >> stage can be estimated from the current number of input and output >> rows. Please remind me if my proposal appears to be too simplistic. > > Well, even if you assume that the planner's cost model omits nothing > (which I wouldn't bet on), its result is only going to be as good as the > planner's estimate of the number of rows to be sorted. And, in cases > where people actually care about progress monitoring, it's likely that > the planner got that wrong, maybe horribly so. I think it's a bad idea > for progress monitoring to depend on the planner's estimates in any way > whatsoever. I agree. I have been of the opinion all along that progress monitoring needs to report facts, not theories. The number of tuples read thus far is a fact, and is fine to report for whatever value it may have to someone. The number of tuples that will be read in the future is a theory, and as you say, progress monitoring is most likely to be used in cases where theory and practice ended up being very different. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Nov 20, 2017 at 12:05 PM, Antonin Houska <ah@cybertec.at> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: >> On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada >> <yamada.tatsuro@lab.ntt.co.jp> wrote: >> > 1. scanning heap >> > 2. sort tuples >> >> These two phases overlap, though. I believe progress reporting for >> sorts is really hard. In the simple case where the data fits in >> work_mem, none of the work of the sort gets done until all the data is >> read. Once you switch to an external sort, you're writing batch >> files, so a lot of the work is now being done during data loading. >> But as the number of batch files grows, the final merge at the end >> becomes an increasingly noticeable part of the cost, and eventually >> you end up needing multiple merge passes. I think we need some smart >> way to report on sorts so that we can tell how much of the work has >> really been done, but I don't know how to do it. > > Whatever complexity is hidden in the sort, cost_sort() should have taken it > into consideration when called via plan_cluster_use_sort(). Thus I think that > once we have both startup and total cost, the current progress of the sort > stage can be estimated from the current number of input and output > rows. Please remind me if my proposal appears to be too simplistic. I think it is far too simplistic. If the sort is being fed by a sequential scan, reporting the number of blocks scanned so far as compared to the total number that will be scanned would be a fine way of reporting on the progress of the sequential scan -- and it's better to use blocks, which we know for sure about, than rows, at which we can only guess. But that's the *scan* progress, not the *sort* progress. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Progress reporting on sorts seems like a tricky problem to me, as I > said before. In most cases, a sort is going to involve an initial > stage where it reads all the input tuples and writes out quicksorted > runs, and then a merge phase where it merges all the output tapes into > a sorted result. There are some complexities; for example, if the > number of tapes is really large, then we might need multiple merge > phases, only the last of which will produce tuples. This would ordinarily be the point at which I'd say "but you're very unlikely to require multiple passes for an external sort these days". But I won't say that on this thread, because CLUSTER generally has unusually wide tuples, and so is much more likely to be I/O bound, to require multiple passes, etc. (I bet the v10 enhancements disproportionately improved CLUSTER performance.) -- Peter Geoghegan
On Tue, Nov 21, 2017 at 12:55 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I agree. > > I have been of the opinion all along that progress monitoring needs to > report facts, not theories. The number of tuples read thus far is a > fact, and is fine to report for whatever value it may have to someone. That makes a lot of sense to me. I sometimes think that we're too hesitant to expose internal information due to concerns about it being hard to interpret. I see wait events as bucking this trend, which I welcome. We see similar trends in the Linux kernel, with tools like perf and BCC/eBPF now being regularly used to debug production issues. > The number of tuples that will be read in the future is a theory, and > as you say, progress monitoring is most likely to be used in cases > where theory and practice ended up being very different. You hit the nail on the head here. It's not that these things are not difficult to interpret - the concern itself is justified. It just needs to be weighed against the benefit of having some instrumentation to start with. People are much more likely to complain about obscure debug information, which makes them feel dumb, than they are to complain about the absence of any instrumentation, but I still think that the latter is the bigger problem. Besides, you don't necessarily have to understand something to act on it. The internals of Oracle are trade secrets, but they were the first to have wait events, I think. At least having something that you can Google can make all the difference. -- Peter Geoghegan
On Wed, Nov 22, 2017 at 5:55 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I have been of the opinion all along that progress monitoring needs to > report facts, not theories. The number of tuples read thus far is a > fact, and is fine to report for whatever value it may have to someone. > The number of tuples that will be read in the future is a theory, and > as you say, progress monitoring is most likely to be used in cases > where theory and practice ended up being very different. +1. We should never as well enter in things like trying to estimate the amount of time remaining to finish a task [1]. [1]: https://www.xkcd.com/612/ -- Michael
On Wed, Nov 22, 2017 at 1:53 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Nov 22, 2017 at 5:55 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> I have been of the opinion all along that progress monitoring needs to >> report facts, not theories. The number of tuples read thus far is a >> fact, and is fine to report for whatever value it may have to someone. >> The number of tuples that will be read in the future is a theory, and >> as you say, progress monitoring is most likely to be used in cases >> where theory and practice ended up being very different. > > +1. We should never as well enter in things like trying to estimate > the amount of time remaining to finish a task [1]. > > [1]: https://www.xkcd.com/612/ +1 That is one reason I made pg_stat_replication.XXX_lag report the lag of WAL that has been processed, not (say) the time until we catch up. In some information-poor scenarios it interpolates which isn't perfect but the general idea is that is shows you measurements of the past (facts), not predictions about the future (theories). -- Thomas Munro http://www.enterprisedb.com
On 2017/11/22 6:07, Peter Geoghegan wrote: > On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Progress reporting on sorts seems like a tricky problem to me, as I >> said before. In most cases, a sort is going to involve an initial >> stage where it reads all the input tuples and writes out quicksorted >> runs, and then a merge phase where it merges all the output tapes into >> a sorted result. There are some complexities; for example, if the >> number of tapes is really large, then we might need multiple merge >> phases, only the last of which will produce tuples. > > This would ordinarily be the point at which I'd say "but you're very > unlikely to require multiple passes for an external sort these days". > But I won't say that on this thread, because CLUSTER generally has > unusually wide tuples, and so is much more likely to be I/O bound, to > require multiple passes, etc. (I bet the v10 enhancements > disproportionately improved CLUSTER performance.) Hi, I came back to develop the feature for community. V4 patch is corrected these following points: - Rebase on master (143290efd) - Fix document - Replace the column name scan_index_relid with cluster_index_relid. Thanks to Jeff Janes! I'm now working on improving the patch based on Robert's comment related to "Seqscan and Sort case" and also considering how to handle the "Index scan case". Please find attached file. Regards, Tatsuro Yamada
Attachment
> On Fri, Aug 24, 2018 at 7:06 AM Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > > On 2017/11/22 6:07, Peter Geoghegan wrote: > > On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas <robertmhaas@gmail.com> wrote: > >> Progress reporting on sorts seems like a tricky problem to me, as I > >> said before. In most cases, a sort is going to involve an initial > >> stage where it reads all the input tuples and writes out quicksorted > >> runs, and then a merge phase where it merges all the output tapes into > >> a sorted result. There are some complexities; for example, if the > >> number of tapes is really large, then we might need multiple merge > >> phases, only the last of which will produce tuples. > > > > This would ordinarily be the point at which I'd say "but you're very > > unlikely to require multiple passes for an external sort these days". > > But I won't say that on this thread, because CLUSTER generally has > > unusually wide tuples, and so is much more likely to be I/O bound, to > > require multiple passes, etc. (I bet the v10 enhancements > > disproportionately improved CLUSTER performance.) > > Hi, > > I came back to develop the feature for community. > V4 patch is corrected these following points: > > - Rebase on master (143290efd) > - Fix document > - Replace the column name scan_index_relid with cluster_index_relid. > Thanks to Jeff Janes! > > I'm now working on improving the patch based on Robert's comment related to > "Seqscan and Sort case" and also considering how to handle the "Index scan case". Thank you, Unfortunately, this patch has some conflicts now, could you rebase it? Also what's is the status of your work on improving it based on the provided feedback? In the meantime I'm moving it to the next CF.
On 2018/11/29 21:20, Dmitry Dolgov wrote: >> On Fri, Aug 24, 2018 at 7:06 AM Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: >> >> On 2017/11/22 6:07, Peter Geoghegan wrote: >>> On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>>> Progress reporting on sorts seems like a tricky problem to me, as I >>>> said before. In most cases, a sort is going to involve an initial >>>> stage where it reads all the input tuples and writes out quicksorted >>>> runs, and then a merge phase where it merges all the output tapes into >>>> a sorted result. There are some complexities; for example, if the >>>> number of tapes is really large, then we might need multiple merge >>>> phases, only the last of which will produce tuples. >>> >>> This would ordinarily be the point at which I'd say "but you're very >>> unlikely to require multiple passes for an external sort these days". >>> But I won't say that on this thread, because CLUSTER generally has >>> unusually wide tuples, and so is much more likely to be I/O bound, to >>> require multiple passes, etc. (I bet the v10 enhancements >>> disproportionately improved CLUSTER performance.) >> >> Hi, >> >> I came back to develop the feature for community. >> V4 patch is corrected these following points: >> >> - Rebase on master (143290efd) >> - Fix document >> - Replace the column name scan_index_relid with cluster_index_relid. >> Thanks to Jeff Janes! >> >> I'm now working on improving the patch based on Robert's comment related to >> "Seqscan and Sort case" and also considering how to handle the "Index scan case". > > Thank you, > > Unfortunately, this patch has some conflicts now, could you rebase it? Also > what's is the status of your work on improving it based on the > provided feedback? > > In the meantime I'm moving it to the next CF. Thank you for managing the CF and Sorry for the late reply. I'll rebase it for the next CF and also I'll clear my head because the patch needs design change to address the feedbacks, I guess. Therefore, the status is reconsidering the design of the patch. :) Regards, Tatsuro Yamada NTT Open Source Software Center
On 2018-Dec-03, Tatsuro Yamada wrote: > > In the meantime I'm moving it to the next CF. > > Thank you for managing the CF and Sorry for the late reply. > I'll rebase it for the next CF and also I'll clear my head because the patch > needs design change to address the feedbacks, I guess. Therefore, the status is > reconsidering the design of the patch. :) I think we should mark it as Returned with Feedback then. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Dec 03, 2018 at 02:17:25PM -0300, Alvaro Herrera wrote: > I think we should mark it as Returned with Feedback then. +1. -- Michael
Attachment
Hello Yamada-san, On 2018-Dec-03, Tatsuro Yamada wrote: > Thank you for managing the CF and Sorry for the late reply. > I'll rebase it for the next CF and also I'll clear my head because the patch > needs design change to address the feedbacks, I guess. Therefore, the status is > reconsidering the design of the patch. :) Do you have a new version of this patch? If not, do you think you'll have something in time for the upcoming commitfest? Thanks -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2017-Nov-21, Peter Geoghegan wrote: > On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > Progress reporting on sorts seems like a tricky problem to me, as I > > said before. In most cases, a sort is going to involve an initial > > stage where it reads all the input tuples and writes out quicksorted > > runs, and then a merge phase where it merges all the output tapes into > > a sorted result. There are some complexities; for example, if the > > number of tapes is really large, then we might need multiple merge > > phases, only the last of which will produce tuples. > > This would ordinarily be the point at which I'd say "but you're very > unlikely to require multiple passes for an external sort these days". > But I won't say that on this thread, because CLUSTER generally has > unusually wide tuples, and so is much more likely to be I/O bound, to > require multiple passes, etc. (I bet the v10 enhancements > disproportionately improved CLUSTER performance.) When the seqscan-and-sort strategy is used, we feed tuplesort with every tuple from the scan. Once that's completed, we call `performsort`, then retrieve tuples. If we see this in terms of tapes and merges, we can report the total number of each of those that we have completed. As far as I understand, we write one tape to completion, and only then start another one, right? Since there's no way to know how many tapes/merges are needed in total, it's not possible to compute a percentage of completion. That's seems okay -- we're just telling the user that progress is being made, and we only report facts not theory. Perhaps we can (also?) indicate disk I/O utilization, in terms of the number of blocks written by tuplesort. I suppose that in order to have tuplesort.c report progress, we would have to have some kind of API that tuplesort would invoke internally to indicate events such as "tape started/completed", "merge started/completed". One idea is to use a callback system; each tuplesort caller could optionally pass a callback to the "begin" function, for progress reporting purposes. Initially only cluster.c would use it, but I suppose eventually every tuplesort caller would want that. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Dec 18, 2018 at 1:02 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > If we see this in terms of tapes and merges, we can report the total > number of each of those that we have completed. As far as I understand, > we write one tape to completion, and only then start another one, right? > Since there's no way to know how many tapes/merges are needed in total, > it's not possible to compute a percentage of completion. That's seems > okay -- we're just telling the user that progress is being made, and we > only report facts not theory. Perhaps we can (also?) indicate disk I/O > utilization, in terms of the number of blocks written by tuplesort. The number of blocks tuplesort uses is constant from the end of initial run generation, since logtape.c will recycle blocks. > I suppose that in order to have tuplesort.c report progress, we would > have to have some kind of API that tuplesort would invoke internally to > indicate events such as "tape started/completed", "merge started/completed". > One idea is to use a callback system; each tuplesort caller could > optionally pass a callback to the "begin" function, for progress > reporting purposes. Initially only cluster.c would use it, but I > suppose eventually every tuplesort caller would want that. I think that you could have a callback that did something with the information currently reported by trace_sort. That's not a bad way of scoping the problem. That's how I myself monitor the progress of a sort, and it works pretty well (whether or not that means other people can do it is not exactly clear to me). We predict the number of merge passes within cost_sort() already. That doesn't seem all that hard to generalize, so that you report the expected number of passes against the current pass. Some passes are much quicker than others, but you generally don't have that many with realistic cases. I don't expect that it will work very well with an internal sort, but in the case of CLUSTER that almost seems irrelevant. And maybe even in all cases. I think that the user is going to have to be willing to develop some intuition about the progress for it to be all that useful. They're really looking for something that gives a clue if they'll have to wait an hour, a day, or a week, which it seems like trace_sort-like information gives you some idea of. (BTW, dtrace probes can already give the user much the same information -- I think that more people should use those, since tracing technology on Linux has improved drastically in the last few years.) -- Peter Geoghegan
On 2018-Dec-18, Peter Geoghegan wrote: > On Tue, Dec 18, 2018 at 1:02 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > If we see this in terms of tapes and merges, we can report the total > > number of each of those that we have completed. As far as I understand, > > we write one tape to completion, and only then start another one, right? > > Since there's no way to know how many tapes/merges are needed in total, > > it's not possible to compute a percentage of completion. That's seems > > okay -- we're just telling the user that progress is being made, and we > > only report facts not theory. Perhaps we can (also?) indicate disk I/O > > utilization, in terms of the number of blocks written by tuplesort. > > The number of blocks tuplesort uses is constant from the end of > initial run generation, since logtape.c will recycle blocks. Well, if you think about individual blocks in terms of storage space, maybe that's true, but I meant in an Heraclitus way of men never stepping into the same river -- the second time you write the block, it's not the same block you wrote before, so you count it twice. It's not the actual disk space utilization that matters, but how much I/O have you done (even if it is just to kernel cache, I suppose). > > I suppose that in order to have tuplesort.c report progress, we would > > have to have some kind of API that tuplesort would invoke internally to > > indicate events such as "tape started/completed", "merge started/completed". > > One idea is to use a callback system; each tuplesort caller could > > optionally pass a callback to the "begin" function, for progress > > reporting purposes. Initially only cluster.c would use it, but I > > suppose eventually every tuplesort caller would want that. > > I think that you could have a callback that did something with the > information currently reported by trace_sort. That's not a bad way of > scoping the problem. That's how I myself monitor the progress of a > sort, and it works pretty well (whether or not that means other people > can do it is not exactly clear to me). Thanks, that looks useful. I suppose mapping such numbers to actual progress is a bit of an art (or intuition as you say), but it seems to be the best we can do, if we do anything at all. > We predict the number of merge passes within cost_sort() already. That > doesn't seem all that hard to generalize, so that you report the > expected number of passes against the current pass. Some passes are > much quicker than others, but you generally don't have that many with > realistic cases. I don't expect that it will work very well with an > internal sort, but in the case of CLUSTER that almost seems > irrelevant. And maybe even in all cases. How good are those predictions? The feeling I get from this thread is that if the estimation of the number of passes is unreliable, it's better not to report it at all; just return how many we've done thus far. It's undesirable to report that we're about 150% done (or take hours to get to 40% done, then suddenly be over). I wonder if internal sorts are really all that interesting from the PoV of progress reporting. Also, I have the impression that quicksort isn't very amenable to letting you know how much work is left. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Dec 18, 2018 at 2:47 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Well, if you think about individual blocks in terms of storage space, > maybe that's true, but I meant in an Heraclitus way of men never > stepping into the same river -- the second time you write the block, > it's not the same block you wrote before, so you count it twice. It's > not the actual disk space utilization that matters, but how much I/O > have you done (even if it is just to kernel cache, I suppose). Right. > I suppose mapping such numbers to actual progress is a bit of an art (or > intuition as you say), but it seems to be the best we can do, if we do > anything at all. I think that it's fairly useful. I suspect that you don't have to have my theoretical grounding in sorting to be able to do almost as well. All you need is a little bit of experience. > How good are those predictions? The feeling I get from this thread is > that if the estimation of the number of passes is unreliable, it's > better not to report it at all; just return how many we've done thus > far. It's undesirable to report that we're about 150% done (or take > hours to get to 40% done, then suddenly be over). Maybe it isn't that reliable. But on second thought I think that it might not matter, and maybe we should just not do that. "How slow can I make this sort go by subtracting work_mem?" is a game that I like to play sometimes. This blogpost plays that game, and reaches some pretty amusing conclusions: https://www.cybertec-postgresql.com/en/postgresql-improving-sort-performance/ It says that sorting numeric is 60% slower when you do an external sort. But it's an apples to asteroids comparison, because the comparison is made between 4MB of work_mem, and 1GB. I think that it's pretty damn impressive that it's only 60% slower! Besides, even that difference is probably on the high side of average, because numeric abbreviated keys work particularly well, and you won't get the same benefit with a unique numeric values when you happen to be doing a lot of merging. If you tried the same experiment with integers, or even text + abbreviated keys, I bet the difference would be a lot smaller. Despite the huge gap in the amount of memory used. On modern hardware, where doing some amount of random I/O is not that noticeable, you'll have a very hard time finding a case where even a paltry amount of memory with many passes does all that much worse than an internal sort (OTOH, it's not hard to find cases where an external sort is *faster*). Even if you make a generic estimate, it's still probably going to be pretty good, because there just isn't that much variation in how long the sort will take as you vary the amount of memory it can use. Some people will be surprised at this, but it's a pretty robust effect. (This is why I think that a hash_mem GUC might be a good medium term solution that improves upon work_mem -- the situation is dramatically different when it comes to hashing.) My point is that you could offer users the kind of insight they'd find very useful with only a very crude estimate of the amount of merging. Even if it was 60% slower than initially projected, that's still not an awful estimate to most users. That just leaves initial run generation, but it's relatively easy to accurately estimate the amount of initial runs. I rarely see a case where merging takes more than 40% of the total, barring parallel CREATE INDEX. > I wonder if internal sorts are really all that interesting from the PoV > of progress reporting. Also, I have the impression that quicksort isn't > very amenable to letting you know how much work is left. It is hard to predict the duration of one massive quicksort, but it's seems fairly easy to recognize a kind of cadence across multiple quicksorts/runs that each take seconds to a couple of minutes. That's going to be the vast, vast majority of cases we care about. -- Peter Geoghegan
Hi Alvaro, On 2018/12/19 2:23, Alvaro Herrera wrote: > Hello Yamada-san, > > On 2018-Dec-03, Tatsuro Yamada wrote: > >> Thank you for managing the CF and Sorry for the late reply. >> I'll rebase it for the next CF and also I'll clear my head because the patch >> needs design change to address the feedbacks, I guess. Therefore, the status is >> reconsidering the design of the patch. :) > > Do you have a new version of this patch? If not, do you think you'll > have something in time for the upcoming commitfest? Not yet, I'll be able to send only a rebased patch by the end of this month. I mean it has no design change because I can't catch up on how to get a progress from sort and index scan. However I'm going to register the patch on next CF. I'm happy if you have interested in the patch. :) Thanks, Tatsuro Yamada
Hi, >> Do you have a new version of this patch? If not, do you think you'll >> have something in time for the upcoming commitfest? > > Not yet, I'll be able to send only a rebased patch by the end of this month. > I mean it has no design change because I can't catch up on how to get a progress > from sort and index scan. However I'm going to register the patch on next CF. > I'm happy if you have interested in the patch. This patch is rebased on HEAD. I'll tackle revising the patch based on feedbacks next month. Happy holidays! Tatsuro Yamada
Attachment
On Fri, Dec 28, 2018 at 3:20 AM Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > This patch is rebased on HEAD. > I'll tackle revising the patch based on feedbacks next month. + Running <command>VACUUM FULL</command> is listed in <structname>pg_stat_progress_cluster</structname> + view because it uses <command>CLUSTER</command> command internally. See <xref linkend='cluster-progress-reporting'>. It's not really true to say that VACUUM FULL uses the CLUSTER command internally. It's not really true. It uses a good chunk of the same infrastructure, but it certainly doesn't use the actual command, and it's not really the exact same thing either, because internally it's doing a sequential scan but no sort, which never happens with CLUSTER. I'm not sure exactly how to rephrase this, but I think we need to make it more precise. One idea is that maybe we should try to think of a design that could also handle the rewriting variants of ALTER TABLE, and call it pg_stat_progress_rewrite. Maybe that's moving the goalposts too far, but I'm not saying we'd necessarily have to do all the work now, just have a view that we think could also handle that. Then again, maybe the needs are too different. + Whenever <command>CLUSTER</command> is running, the + <structname>pg_stat_progress_cluster</structname> view will contain + one row for each backend that is currently clustering or vacuuming (VACUUM FULL). That sentence contradicts itself. Just say that it contains a row for each backend that is currently running CLUSTER or VACUUM FULL. @@ -105,6 +107,7 @@ static void reform_and_rewrite_tuple(HeapTuple tuple, void cluster(ClusterStmt *stmt, bool isTopLevel) { + if (stmt->relation != NULL) { /* This is the single-relation case. */ Useless hunk. @@ -186,7 +189,9 @@ cluster(ClusterStmt *stmt, bool isTopLevel) heap_close(rel, NoLock); /* Do the job. */ + pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid); cluster_rel(tableOid, indexOid, stmt->options); + pgstat_progress_end_command(); } else { It seems like that stuff should be inside cluster_rel(). + /* Set reltuples to total_tuples */ + pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_TUPLES, OldHeap->rd_rel->reltuples); I object. If the user wants that, they can get it from pg_class themselves via an SQL query. It's also an estimate, not something we know to be accurate; I want us to only report facts here, not theories + pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE, PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP); + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, PROGRESS_CLUSTER_METHOD_INDEX_SCAN); I think you should use pgstat_progress_update_multi_param() if updating multiple parameters at the same time. Also, some lines in this patch, such as this one, are very long. Consider techniques to reduce the line length to 80 characters or less, such as inserting a line break between the two arguments. + /* Set scan_method to NULL */ + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, -1); NULL and -1 are not the same thing. I think that we shouldn't have both PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP and PROGRESS_CLUSTER_PHASE_SCAN_HEAP. They're the same thing. Let's just use PROGRESS_CLUSTER_PHASE_SCAN_HEAP for both. Actually, better yet, let's get rid of PROGRESS_CLUSTER_SCAN_METHOD and have PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP and PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP. That seems noticeably simpler. I agree that it's acceptable to report PROGRESS_CLUSTER_HEAP_TUPLES_VACUUMED and PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD, but I'm not sure I understand why it's valuable to do so in the context of a progress indicator. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2019/02/23 6:02, Robert Haas wrote: > On Fri, Dec 28, 2018 at 3:20 AM Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: >> This patch is rebased on HEAD. >> I'll tackle revising the patch based on feedbacks next month. > > + Running <command>VACUUM FULL</command> is listed in > <structname>pg_stat_progress_cluster</structname> > + view because it uses <command>CLUSTER</command> command > internally. See <xref linkend='cluster-progress-reporting'>. > > It's not really true to say that VACUUM FULL uses the CLUSTER command > internally. It's not really true. It uses a good chunk of the same > infrastructure, but it certainly doesn't use the actual command, and > it's not really the exact same thing either, because internally it's > doing a sequential scan but no sort, which never happens with CLUSTER. > I'm not sure exactly how to rephrase this, but I think we need to make > it more precise. > > One idea is that maybe we should try to think of a design that could > also handle the rewriting variants of ALTER TABLE, and call it > pg_stat_progress_rewrite. Maybe that's moving the goalposts too far, > but I'm not saying we'd necessarily have to do all the work now, just > have a view that we think could also handle that. Then again, maybe > the needs are too different. > Hmm..., I see. If possible, I'd like to stop thinking of VACUUM FULL to avoid complication of the implementation. For now, I haven't enough time to design pg_stat_progress_rewrite. I suppose that it's tough work. > + Whenever <command>CLUSTER</command> is running, the > + <structname>pg_stat_progress_cluster</structname> view will contain > + one row for each backend that is currently clustering or vacuuming > (VACUUM FULL). > > That sentence contradicts itself. Just say that it contains a row for > each backend that is currently running CLUSTER or VACUUM FULL. > Fixed. > @@ -105,6 +107,7 @@ static void reform_and_rewrite_tuple(HeapTuple tuple, > void > cluster(ClusterStmt *stmt, bool isTopLevel) > { > + > if (stmt->relation != NULL) > { > /* This is the single-relation case. */ > > Useless hunk. > Fixed. > @@ -186,7 +189,9 @@ cluster(ClusterStmt *stmt, bool isTopLevel) > heap_close(rel, NoLock); > > /* Do the job. */ > + pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid); > cluster_rel(tableOid, indexOid, stmt->options); > + pgstat_progress_end_command(); > } > else > { > > It seems like that stuff should be inside cluster_rel(). > Fixed. > + /* Set reltuples to total_tuples */ > + pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_TUPLES, > OldHeap->rd_rel->reltuples); > > I object. If the user wants that, they can get it from pg_class > themselves via an SQL query. It's also an estimate, not something we > know to be accurate; I want us to only report facts here, not theories I understand that progress monitor should only report facts, so I removed that code. > + pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE, > PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP); > + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, > PROGRESS_CLUSTER_METHOD_INDEX_SCAN); > > I think you should use pgstat_progress_update_multi_param() if > updating multiple parameters at the same time. > > Also, some lines in this patch, such as this one, are very long. > Consider techniques to reduce the line length to 80 characters or > less, such as inserting a line break between the two arguments. Fixed. > + /* Set scan_method to NULL */ > + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, -1); > > NULL and -1 are not the same thing. Oops, fixed. > I think that we shouldn't have both > PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP and > PROGRESS_CLUSTER_PHASE_SCAN_HEAP. They're the same thing. Let's just > use PROGRESS_CLUSTER_PHASE_SCAN_HEAP for both. Actually, better yet, > let's get rid of PROGRESS_CLUSTER_SCAN_METHOD and have > PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP and > PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP. That seems noticeably > simpler. Fixed. > I agree that it's acceptable to report > PROGRESS_CLUSTER_HEAP_TUPLES_VACUUMED and > PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD, but I'm not sure I > understand why it's valuable to do so in the context of a progress > indicator. Actually, I'm not sure why I added it since so much time has passed. :( So, I'll remove PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD at least. Attached patch is wip patch. Thanks! Tatsuro Yamada
Attachment
> Attached patch is wip patch. Is it possible to remove the following patch? Because I registered the patch twice on CF Mar. https://commitfest.postgresql.org/22/2049/ Thanks, Tatsuro Yamada
(2019/03/01 14:17), Tatsuro Yamada wrote: >> Attached patch is wip patch. > > Is it possible to remove the following patch? > Because I registered the patch twice on CF Mar. > > https://commitfest.postgresql.org/22/2049/ Please remove the above and keep this: https://commitfest.postgresql.org/22/1779/ which I moved from the January CF to the March one on behalf of him. Best regards, Etsuro Fujita
On Thu, Feb 28, 2019 at 11:54 PM Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > Attached patch is wip patch. + <command>CLUSTER</command> and <command>VACUUM FULL</command>, showing current progress. and -> or + certain commands during command execution. Currently, the suppoted + progress reporting commands are <command>VACUUM</command> and <command>CLUSTER</command>. suppoted -> supported But I'd just say: Currently, the only commands which support progress reporting are <command>VACUUM</command> and <command>CLUSTER</command>. + Running <command>VACUUM FULL</command> is listed in <structname>pg_stat_progress_cluster</structname> + view because it uses <command>CLUSTER</command> command internally. See <xref linkend='cluster-progress-reporting'>. How about: Running <command>VACUUM FULL</command> is listed in <structname>pg_stat_progress_cluster</structname> because both <command>VACUUM FULL</command> and <command>CLUSTER</command> rewrite the table, while regular <command>VACUUM</command> only modifies it in place. + Current processing command: CLUSTER/VACUUM FULL. The command that is running. Either CLUSTER or VACUUM FULL. + Current processing phase of cluster/vacuum full. See <xref linkend='cluster-phases'> or <xref linkend='vacuum-full-phases'>. Current processing phase of CLUSTER or VACUUM FULL. Or maybe better, just abbreviate to: Current processing phase. + Scan method of table: index scan/seq scan. Eh, shouldn't this be gone now? And likewise for the view definition? + OID of the index. If the table is being scanned using an index, this is the OID of the index being used; otherwise, it is zero. + <entry><structfield>heap_tuples_total</structfield></entry> Leftovers. Skipping over the rest of your documentation changes since it looks like a bunch of things there still need to be updated. + pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid); This now appears inside cluster_rel(), but also vacuum_rel() is still doing the same thing. That's wrong. + if(OidIsValid(indexOid)) Missing space. Please pgindent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2019/03/02 4:15, Robert Haas wrote: > On Thu, Feb 28, 2019 at 11:54 PM Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: >> Attached patch is wip patch. Thanks for your comments! :) I revised the code and the document. > + <command>CLUSTER</command> and <command>VACUUM FULL</command>, > showing current progress. > > and -> or Fixed. > + certain commands during command execution. Currently, the suppoted > + progress reporting commands are <command>VACUUM</command> and > <command>CLUSTER</command>. > > suppoted -> supported > > But I'd just say: Currently, the only commands which support progress > reporting are <command>VACUUM</command> and > <command>CLUSTER</command>. I choose the latter. Fixed. > + Running <command>VACUUM FULL</command> is listed in > <structname>pg_stat_progress_cluster</structname> > + view because it uses <command>CLUSTER</command> command > internally. See <xref linkend='cluster-progress-reporting'>. > > How about: Running <command>VACUUM FULL</command> is listed in > <structname>pg_stat_progress_cluster</structname> because both > <command>VACUUM FULL</command> and <command>CLUSTER</command> rewrite > the table, while regular <command>VACUUM</command> only modifies it in > place. Fixed. > + Current processing command: CLUSTER/VACUUM FULL. > > The command that is running. Either CLUSTER or VACUUM FULL. Fixed. > + Current processing phase of cluster/vacuum full. See <xref > linkend='cluster-phases'> or <xref linkend='vacuum-full-phases'>. > > Current processing phase of CLUSTER or VACUUM FULL. > > Or maybe better, just abbreviate to: Current processing phase. Fixed as you suggested. > + Scan method of table: index scan/seq scan. > > Eh, shouldn't this be gone now? And likewise for the view definition? Fixed. Sorry, It was an oversight. > + OID of the index. > > If the table is being scanned using an index, this is the OID of the > index being used; otherwise, it is zero. Fixed. > + <entry><structfield>heap_tuples_total</structfield></entry> > > Leftovers. Skipping over the rest of your documentation changes since > it looks like a bunch of things there still need to be updated. I agree. Thanks a lot! I'll divide the patch into two patch such as code and document. > + pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid); > > This now appears inside cluster_rel(), but also vacuum_rel() is still > doing the same thing. That's wrong. It was an oversight too. I fixed. > + if(OidIsValid(indexOid)) > > Missing space. Please pgindent. Fixed. I Will do pgindent later. Please find attached files. :) Thanks, Tatsuro Yamada
Attachment
On 2019/03/02 4:15, Robert Haas wrote: > On Thu, Feb 28, 2019 at 11:54 PM Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: >> Attached patch is wip patch. I rewrote the current design of the progress monitor and also wrote discussion points in the middle of this email. I'd like to get any feedback from -hackers. === Current design === CLUSTER command uses Index Scan or Seq Scan when scanning the heap. Depending on which one is chosen, the command will proceed in the following sequence of phases: * Scan method: Seq Scan 0. initializing (*2) 1. seq scanning heap (*1) 3. sorting tuples (*2) 4. writing new heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) * Scan method: Index Scan 0. initializing (*2) 2. index scanning heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) VACUUM FULL command will proceed in the following sequence of phases: 1. seq scanning heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) (*1): increasing the value in heap_tuples_scanned column (*2): only shows the phase in the phase column The view provides the information of CLUSTER command progress details as follows # \d pg_stat_progress_cluster View "pg_catalog.pg_stat_progress_cluster" Column | Type | Collation | Nullable | Default ---------------------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | relid | oid | | | command | text | | | phase | text | | | cluster_index_relid | bigint | | | heap_tuples_scanned | bigint | | | heap_tuples_vacuumed | bigint | | | === Discussion points === - Progress counter for "3. sorting tuples" phase - Should we add pgstat_progress_update_param() in tuplesort.c like a "trace_sort"? Thanks to Peter Geoghegan for the useful advice! - Progress counter for "6. rebuilding index" phase - Should we add "index_vacuum_count" in the view like a vacuum progress monitor? If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c. However, I'm not sure whether it is okay or not. - pg_stat_progress_rewrite - TBA === My test case === I share my test case of progress monitor. If someone wants to watch the current progress monitor, you can use this test case as a example. [Terminal1] Run this query on psql: select * from pg_stat_progress_cluster; \watch 0.05 [Terminal2] Run these queries on psql: drop table t1; create table t1 as select a, random() * 1000 as b from generate_series(0, 999999) a; create index idx_t1 on t1(a); create index idx_t1_b on t1(b); analyze t1; -- index scan set enable_seqscan to off; cluster verbose t1 using idx_t1; -- seq scan set enable_seqscan to on; set enable_indexscan to off; cluster verbose t1 using idx_t1; -- only given table name to cluster command cluster verbose t1; -- only cluster command cluster verbose; -- vacuum full vacuum full t1; -- vacuum full vacuum full; Thanks, Tatsuro Yamada
On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > === Current design === > > CLUSTER command uses Index Scan or Seq Scan when scanning the heap. > Depending on which one is chosen, the command will proceed in the > following sequence of phases: > > * Scan method: Seq Scan > 0. initializing (*2) > 1. seq scanning heap (*1) > 3. sorting tuples (*2) > 4. writing new heap (*1) > 5. swapping relation files (*2) > 6. rebuilding index (*2) > 7. performing final cleanup (*2) > > * Scan method: Index Scan > 0. initializing (*2) > 2. index scanning heap (*1) > 5. swapping relation files (*2) > 6. rebuilding index (*2) > 7. performing final cleanup (*2) > > VACUUM FULL command will proceed in the following sequence of phases: > > 1. seq scanning heap (*1) > 5. swapping relation files (*2) > 6. rebuilding index (*2) > 7. performing final cleanup (*2) > > (*1): increasing the value in heap_tuples_scanned column > (*2): only shows the phase in the phase column All of that sounds good. > The view provides the information of CLUSTER command progress details as follows > # \d pg_stat_progress_cluster > View "pg_catalog.pg_stat_progress_cluster" > Column | Type | Collation | Nullable | Default > ---------------------------+---------+-----------+----------+--------- > pid | integer | | | > datid | oid | | | > datname | name | | | > relid | oid | | | > command | text | | | > phase | text | | | > cluster_index_relid | bigint | | | > heap_tuples_scanned | bigint | | | > heap_tuples_vacuumed | bigint | | | Still not sure if we need heap_tuples_vacuumed. We could try to report heap_blks_scanned and heap_blks_total like we do for VACUUM, if we're using a Seq Scan. > === Discussion points === > > - Progress counter for "3. sorting tuples" phase > - Should we add pgstat_progress_update_param() in tuplesort.c like a > "trace_sort"? > Thanks to Peter Geoghegan for the useful advice! How would we avoid an abstraction violation? > - Progress counter for "6. rebuilding index" phase > - Should we add "index_vacuum_count" in the view like a vacuum progress monitor? > If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c. > However, I'm not sure whether it is okay or not. Doesn't seem unreasonable to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3/1/19 7:48 AM, Etsuro Fujita wrote: > (2019/03/01 14:17), Tatsuro Yamada wrote: >>> Attached patch is wip patch. >> >> Is it possible to remove the following patch? >> Because I registered the patch twice on CF Mar. >> >> https://commitfest.postgresql.org/22/2049/ > > Please remove the above and keep this: > > https://commitfest.postgresql.org/22/1779/ > > which I moved from the January CF to the March one on behalf of him. I have closed the duplicate entry (#2049) and retained the entry which contains the CF history (#1779). Regards, -- -David david@pgmasters.net
Hi David, On 2019/03/05 17:29, David Steele wrote: > On 3/1/19 7:48 AM, Etsuro Fujita wrote: >> (2019/03/01 14:17), Tatsuro Yamada wrote: >>>> Attached patch is wip patch. >>> >>> Is it possible to remove the following patch? >>> Because I registered the patch twice on CF Mar. >>> >>> https://commitfest.postgresql.org/22/2049/ >> >> Please remove the above and keep this: >> >> https://commitfest.postgresql.org/22/1779/ >> >> which I moved from the January CF to the March one on behalf of him. > > I have closed the duplicate entry (#2049) and retained the entry which contains the CF history (#1779). Thank you! :) Regards, Tatsuro Yamada
Hi Robert! On 2019/03/05 11:35, Robert Haas wrote: > On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: >> === Current design === >> >> CLUSTER command uses Index Scan or Seq Scan when scanning the heap. >> Depending on which one is chosen, the command will proceed in the >> following sequence of phases: >> >> * Scan method: Seq Scan >> 0. initializing (*2) >> 1. seq scanning heap (*1) >> 3. sorting tuples (*2) >> 4. writing new heap (*1) >> 5. swapping relation files (*2) >> 6. rebuilding index (*2) >> 7. performing final cleanup (*2) >> >> * Scan method: Index Scan >> 0. initializing (*2) >> 2. index scanning heap (*1) >> 5. swapping relation files (*2) >> 6. rebuilding index (*2) >> 7. performing final cleanup (*2) >> >> VACUUM FULL command will proceed in the following sequence of phases: >> >> 1. seq scanning heap (*1) >> 5. swapping relation files (*2) >> 6. rebuilding index (*2) >> 7. performing final cleanup (*2) >> >> (*1): increasing the value in heap_tuples_scanned column >> (*2): only shows the phase in the phase column > > All of that sounds good. > >> The view provides the information of CLUSTER command progress details as follows >> # \d pg_stat_progress_cluster >> View "pg_catalog.pg_stat_progress_cluster" >> Column | Type | Collation | Nullable | Default >> ---------------------------+---------+-----------+----------+--------- >> pid | integer | | | >> datid | oid | | | >> datname | name | | | >> relid | oid | | | >> command | text | | | >> phase | text | | | >> cluster_index_relid | bigint | | | >> heap_tuples_scanned | bigint | | | >> heap_tuples_vacuumed | bigint | | | > > Still not sure if we need heap_tuples_vacuumed. We could try to > report heap_blks_scanned and heap_blks_total like we do for VACUUM, if > we're using a Seq Scan. I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in next patch. Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to get those from initscan(). I'll investigate it more. cluster.c copy_heap_data() heap_beginscan() heap_beginscan_internal() initscan() >> === Discussion points === >> >> - Progress counter for "3. sorting tuples" phase >> - Should we add pgstat_progress_update_param() in tuplesort.c like a >> "trace_sort"? >> Thanks to Peter Geoghegan for the useful advice! > > How would we avoid an abstraction violation? Hmm... What do you mean an abstraction violation? If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples. >> - Progress counter for "6. rebuilding index" phase >> - Should we add "index_vacuum_count" in the view like a vacuum progress monitor? >> If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c. >> However, I'm not sure whether it is okay or not. > > Doesn't seem unreasonable to me. I see, I'll add it later. Regards, Tatsuro Yamada
On Tue, Mar 5, 2019 at 3:56 AM Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > >> === Discussion points === > >> > >> - Progress counter for "3. sorting tuples" phase > >> - Should we add pgstat_progress_update_param() in tuplesort.c like a > >> "trace_sort"? > >> Thanks to Peter Geoghegan for the useful advice! > > > > How would we avoid an abstraction violation? > > Hmm... What do you mean an abstraction violation? > If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples. What I mean is... I think it would be useful to have this counter, but I'm not sure how the tuplesort code would know to update the counter in this case and not in other cases. The tuplesort code is used for lots of things; we can't update a counter for CLUSTER if the tuplesort is being used for CREATE INDEX or a Sort node in a query or whatever. So my question is how we would indicate to the tuplesort that it needs to do the counter update, and whether that would end up making for ugly code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2019-Mar-04, Robert Haas wrote: > On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: > > === Discussion points === > > > > - Progress counter for "3. sorting tuples" phase > > - Should we add pgstat_progress_update_param() in tuplesort.c like a > > "trace_sort"? > > Thanks to Peter Geoghegan for the useful advice! > > How would we avoid an abstraction violation? The theory embodied in my patch at https://postgr.es/m/20190304204607.GA15946@alvherre.pgsql is that we don't; tuplesort.c functions (index.c's IndexBuildHeapScan in my case) would get a boolean parameter to indicate whether to update some params or not -- the param number(s) to update are supposed to be generic in the sense that it's not part of any individual command's implementation (PROGRESS_SCAN_BLOCKS_DONE for what you call "blks scanned", PROGRESS_SCAN_BLOCKS_TOTAL for "blks total"), but rather defined by the "progress update provider" (index.c or tuplesort.c). One, err, small issue with that idea is that we need the param numbers not to conflict for any "progress update providers" that are to be used simultaneously by any command. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019/03/06 1:13, Robert Haas wrote: > On Tue, Mar 5, 2019 at 3:56 AM Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: >>>> === Discussion points === >>>> >>>> - Progress counter for "3. sorting tuples" phase >>>> - Should we add pgstat_progress_update_param() in tuplesort.c like a >>>> "trace_sort"? >>>> Thanks to Peter Geoghegan for the useful advice! >>> >>> How would we avoid an abstraction violation? >> >> Hmm... What do you mean an abstraction violation? >> If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples. > > What I mean is... I think it would be useful to have this counter, but > I'm not sure how the tuplesort code would know to update the counter > in this case and not in other cases. The tuplesort code is used for > lots of things; we can't update a counter for CLUSTER if the tuplesort > is being used for CREATE INDEX or a Sort node in a query or whatever. > So my question is how we would indicate to the tuplesort that it needs > to do the counter update, and whether that would end up making for > ugly code. Thanks for your explanation! I understood that now. I guess it means an API to get a progress for sort processing, so let us just put it aside now. I'd like to leave that as is for an appropriate person. Regards, Tatsuro Yamada
On 2019/03/05 17:56, Tatsuro Yamada wrote: > Hi Robert! > > On 2019/03/05 11:35, Robert Haas wrote: >> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada >> <yamada.tatsuro@lab.ntt.co.jp> wrote: >>> === Current design === >>> >>> CLUSTER command uses Index Scan or Seq Scan when scanning the heap. >>> Depending on which one is chosen, the command will proceed in the >>> following sequence of phases: >>> >>> * Scan method: Seq Scan >>> 0. initializing (*2) >>> 1. seq scanning heap (*1) >>> 3. sorting tuples (*2) >>> 4. writing new heap (*1) >>> 5. swapping relation files (*2) >>> 6. rebuilding index (*2) >>> 7. performing final cleanup (*2) >>> >>> * Scan method: Index Scan >>> 0. initializing (*2) >>> 2. index scanning heap (*1) >>> 5. swapping relation files (*2) >>> 6. rebuilding index (*2) >>> 7. performing final cleanup (*2) >>> >>> VACUUM FULL command will proceed in the following sequence of phases: >>> >>> 1. seq scanning heap (*1) >>> 5. swapping relation files (*2) >>> 6. rebuilding index (*2) >>> 7. performing final cleanup (*2) >>> >>> (*1): increasing the value in heap_tuples_scanned column >>> (*2): only shows the phase in the phase column >> >> All of that sounds good. >> >>> The view provides the information of CLUSTER command progress details as follows >>> # \d pg_stat_progress_cluster >>> View "pg_catalog.pg_stat_progress_cluster" >>> Column | Type | Collation | Nullable | Default >>> ---------------------------+---------+-----------+----------+--------- >>> pid | integer | | | >>> datid | oid | | | >>> datname | name | | | >>> relid | oid | | | >>> command | text | | | >>> phase | text | | | >>> cluster_index_relid | bigint | | | >>> heap_tuples_scanned | bigint | | | >>> heap_tuples_vacuumed | bigint | | | >> >> Still not sure if we need heap_tuples_vacuumed. We could try to >> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if >> we're using a Seq Scan. > > I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in > next patch. > > Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to > get those from initscan(). I'll investigate it more. > > cluster.c > copy_heap_data() > heap_beginscan() > heap_beginscan_internal() > initscan() > > > >>> === Discussion points === >>> >>> - Progress counter for "3. sorting tuples" phase >>> - Should we add pgstat_progress_update_param() in tuplesort.c like a >>> "trace_sort"? >>> Thanks to Peter Geoghegan for the useful advice! >> >> How would we avoid an abstraction violation? > > Hmm... What do you mean an abstraction violation? > If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples. > > >>> - Progress counter for "6. rebuilding index" phase >>> - Should we add "index_vacuum_count" in the view like a vacuum progress monitor? >>> If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c. >>> However, I'm not sure whether it is okay or not. >> >> Doesn't seem unreasonable to me. > > I see, I'll add it later. Attached file is revised and WIP patch including: - Remove heap_tuples_vacuumed - Add heap_blks_scanned and heap_blks_total - Add index_vacuum_count I tried to "add heap_blks_scanned and heap_blks_total" columns and I realized that "heap_tuples_scanned" column is suitable as a counter when a scan method is both index-scan and seq-scan because CLUSTER is on a tuple basis. Regards, Tatsuro Yamada
Attachment
On Tue, Mar 5, 2019 at 8:03 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > One, err, small issue with that idea is that we need the param numbers > not to conflict for any "progress update providers" that are to be used > simultaneously by any command. Is that really an issue? I think progress reporting -- at least with the current infrastructure -- is only ever going to be possible for utility commands, not queries. And those really shouldn't have very many sorts going on at once. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2019-Mar-06, Robert Haas wrote: > On Tue, Mar 5, 2019 at 8:03 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > One, err, small issue with that idea is that we need the param numbers > > not to conflict for any "progress update providers" that are to be used > > simultaneously by any command. > > Is that really an issue? I think progress reporting -- at least with > the current infrastructure -- is only ever going to be possible for > utility commands, not queries. And those really shouldn't have very > many sorts going on at once. Well, I don't think it is, but I thought it was worth pointing out explicitly. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019/03/06 15:38, Tatsuro Yamada wrote: > On 2019/03/05 17:56, Tatsuro Yamada wrote: >> On 2019/03/05 11:35, Robert Haas wrote: >>> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada >>> <yamada.tatsuro@lab.ntt.co.jp> wrote: >>>> === Current design === >>>> >>>> CLUSTER command uses Index Scan or Seq Scan when scanning the heap. >>>> Depending on which one is chosen, the command will proceed in the >>>> following sequence of phases: >>>> >>>> * Scan method: Seq Scan >>>> 0. initializing (*2) >>>> 1. seq scanning heap (*1) >>>> 3. sorting tuples (*2) >>>> 4. writing new heap (*1) >>>> 5. swapping relation files (*2) >>>> 6. rebuilding index (*2) >>>> 7. performing final cleanup (*2) >>>> >>>> * Scan method: Index Scan >>>> 0. initializing (*2) >>>> 2. index scanning heap (*1) >>>> 5. swapping relation files (*2) >>>> 6. rebuilding index (*2) >>>> 7. performing final cleanup (*2) >>>> >>>> VACUUM FULL command will proceed in the following sequence of phases: >>>> >>>> 1. seq scanning heap (*1) >>>> 5. swapping relation files (*2) >>>> 6. rebuilding index (*2) >>>> 7. performing final cleanup (*2) >>>> >>>> (*1): increasing the value in heap_tuples_scanned column >>>> (*2): only shows the phase in the phase column >>> >>> All of that sounds good. >>> >>>> The view provides the information of CLUSTER command progress details as follows >>>> # \d pg_stat_progress_cluster >>>> View "pg_catalog.pg_stat_progress_cluster" >>>> Column | Type | Collation | Nullable | Default >>>> ---------------------------+---------+-----------+----------+--------- >>>> pid | integer | | | >>>> datid | oid | | | >>>> datname | name | | | >>>> relid | oid | | | >>>> command | text | | | >>>> phase | text | | | >>>> cluster_index_relid | bigint | | | >>>> heap_tuples_scanned | bigint | | | >>>> heap_tuples_vacuumed | bigint | | | >>> >>> Still not sure if we need heap_tuples_vacuumed. We could try to >>> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if >>> we're using a Seq Scan. >> >> I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in >> next patch. >> >> Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to >> get those from initscan(). I'll investigate it more. >> >> cluster.c >> copy_heap_data() >> heap_beginscan() >> heap_beginscan_internal() >> initscan() >> >> >> >>>> === Discussion points === >>>> >>>> - Progress counter for "3. sorting tuples" phase >>>> - Should we add pgstat_progress_update_param() in tuplesort.c like a >>>> "trace_sort"? >>>> Thanks to Peter Geoghegan for the useful advice! >>> >>> How would we avoid an abstraction violation? >> >> Hmm... What do you mean an abstraction violation? >> If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples. >> >> >>>> - Progress counter for "6. rebuilding index" phase >>>> - Should we add "index_vacuum_count" in the view like a vacuum progress monitor? >>>> If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c. >>>> However, I'm not sure whether it is okay or not. >>> >>> Doesn't seem unreasonable to me. >> >> I see, I'll add it later. > > > Attached file is revised and WIP patch including: > > - Remove heap_tuples_vacuumed > - Add heap_blks_scanned and heap_blks_total > - Add index_vacuum_count > > I tried to "add heap_blks_scanned and heap_blks_total" columns and I realized that > "heap_tuples_scanned" column is suitable as a counter when a scan method is > both index-scan and seq-scan because CLUSTER is on a tuple basis. Attached file is rebased patch on current HEAD. I changed a status. :) Regards, Tatsuro Yamada
Attachment
On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > > On 2019/03/06 15:38, Tatsuro Yamada wrote: > > On 2019/03/05 17:56, Tatsuro Yamada wrote: > >> On 2019/03/05 11:35, Robert Haas wrote: > >>> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada > >>> <yamada.tatsuro@lab.ntt.co.jp> wrote: > >>>> === Current design === > >>>> > >>>> CLUSTER command uses Index Scan or Seq Scan when scanning the heap. > >>>> Depending on which one is chosen, the command will proceed in the > >>>> following sequence of phases: > >>>> > >>>> * Scan method: Seq Scan > >>>> 0. initializing (*2) > >>>> 1. seq scanning heap (*1) > >>>> 3. sorting tuples (*2) > >>>> 4. writing new heap (*1) > >>>> 5. swapping relation files (*2) > >>>> 6. rebuilding index (*2) > >>>> 7. performing final cleanup (*2) > >>>> > >>>> * Scan method: Index Scan > >>>> 0. initializing (*2) > >>>> 2. index scanning heap (*1) > >>>> 5. swapping relation files (*2) > >>>> 6. rebuilding index (*2) > >>>> 7. performing final cleanup (*2) > >>>> > >>>> VACUUM FULL command will proceed in the following sequence of phases: > >>>> > >>>> 1. seq scanning heap (*1) > >>>> 5. swapping relation files (*2) > >>>> 6. rebuilding index (*2) > >>>> 7. performing final cleanup (*2) > >>>> > >>>> (*1): increasing the value in heap_tuples_scanned column > >>>> (*2): only shows the phase in the phase column > >>> > >>> All of that sounds good. > >>> > >>>> The view provides the information of CLUSTER command progress details as follows > >>>> # \d pg_stat_progress_cluster > >>>> View "pg_catalog.pg_stat_progress_cluster" > >>>> Column | Type | Collation | Nullable | Default > >>>> ---------------------------+---------+-----------+----------+--------- > >>>> pid | integer | | | > >>>> datid | oid | | | > >>>> datname | name | | | > >>>> relid | oid | | | > >>>> command | text | | | > >>>> phase | text | | | > >>>> cluster_index_relid | bigint | | | > >>>> heap_tuples_scanned | bigint | | | > >>>> heap_tuples_vacuumed | bigint | | | > >>> > >>> Still not sure if we need heap_tuples_vacuumed. We could try to > >>> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if > >>> we're using a Seq Scan. > >> > >> I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in > >> next patch. > >> > >> Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to > >> get those from initscan(). I'll investigate it more. > >> > >> cluster.c > >> copy_heap_data() > >> heap_beginscan() > >> heap_beginscan_internal() > >> initscan() > >> > >> > >> > >>>> === Discussion points === > >>>> > >>>> - Progress counter for "3. sorting tuples" phase > >>>> - Should we add pgstat_progress_update_param() in tuplesort.c like a > >>>> "trace_sort"? > >>>> Thanks to Peter Geoghegan for the useful advice! > >>> > >>> How would we avoid an abstraction violation? > >> > >> Hmm... What do you mean an abstraction violation? > >> If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples. > >> > >> > >>>> - Progress counter for "6. rebuilding index" phase > >>>> - Should we add "index_vacuum_count" in the view like a vacuum progress monitor? > >>>> If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c. > >>>> However, I'm not sure whether it is okay or not. > >>> > >>> Doesn't seem unreasonable to me. > >> > >> I see, I'll add it later. > > > > > > Attached file is revised and WIP patch including: > > > > - Remove heap_tuples_vacuumed > > - Add heap_blks_scanned and heap_blks_total > > - Add index_vacuum_count > > > > I tried to "add heap_blks_scanned and heap_blks_total" columns and I realized that > > "heap_tuples_scanned" column is suitable as a counter when a scan method is > > both index-scan and seq-scan because CLUSTER is on a tuple basis. > > > Attached file is rebased patch on current HEAD. > I changed a status. :) > > Looks like the patch needs a rebase. I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2 PFA reject file in case you want to have a look. > Regards, > Tatsuro Yamada > > > -- Regards, Rafia Sabih
Attachment
Hi Rafia! On 2019/03/18 20:42, Rafia Sabih wrote: > On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: >> Attached file is rebased patch on current HEAD. >> I changed a status. :) >> >> > Looks like the patch needs a rebase. > I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2 > > PFA reject file in case you want to have a look. Thanks for testing it. :) I rebased the patch on the current head: f2004f19ed9c9228d3ea2b12379ccb4b9212641f. Please find attached file. Also, I share my test case of progress monitor below. === My test case === [Terminal1] Run this query on psql: \a \t select * from pg_stat_progress_cluster; \watch 0.05 [Terminal2] Run these queries on psql: drop table t1; create table t1 as select a, random() * 1000 as b from generate_series(0, 999999) a; create index idx_t1 on t1(a); create index idx_t1_b on t1(b); analyze t1; -- index scan set enable_seqscan to off; cluster verbose t1 using idx_t1; -- seq scan set enable_seqscan to on; set enable_indexscan to off; cluster verbose t1 using idx_t1; -- only given table name to cluster command cluster verbose t1; -- only cluster command cluster verbose; -- vacuum full vacuum full t1; -- vacuum full vacuum full; ==================== Regards, Tatsuro Yamada
Attachment
On 2019/03/19 10:43, Tatsuro Yamada wrote: > Hi Rafia! > > On 2019/03/18 20:42, Rafia Sabih wrote: >> On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada >> <yamada.tatsuro@lab.ntt.co.jp> wrote: >>> Attached file is rebased patch on current HEAD. >>> I changed a status. :) >>> >>> >> Looks like the patch needs a rebase. >> I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2 >> >> PFA reject file in case you want to have a look. > > Thanks for testing it. :) > I rebased the patch on the current head: f2004f19ed9c9228d3ea2b12379ccb4b9212641f. > > Please find attached file. > > Also, I share my test case of progress monitor below. Attached patch is a rebased document patch. :) Thanks, Tatsuro Yamada
Attachment
At Tue, 19 Mar 2019 11:02:57 +0900, Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote in <dc0dd07c-f185-0cf9-ba54-c5c31f6514f2@lab.ntt.co.jp> > On 2019/03/19 10:43, Tatsuro Yamada wrote: > > Hi Rafia! > > On 2019/03/18 20:42, Rafia Sabih wrote: > >> On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada > >> <yamada.tatsuro@lab.ntt.co.jp> wrote: > >>> Attached file is rebased patch on current HEAD. > >>> I changed a status. :) > >>> > >>> > >> Looks like the patch needs a rebase. > >> I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2 > >> > >> PFA reject file in case you want to have a look. > > Thanks for testing it. :) > > I rebased the patch on the current head: > > f2004f19ed9c9228d3ea2b12379ccb4b9212641f. > > Please find attached file. > > Also, I share my test case of progress monitor below. > > > Attached patch is a rebased document patch. :) The monitor view has four columns: heap_tuples_scanned : used while the "seq scan" and "index scan" phases. heap_blks_total, heap_blks_scanned : used only while the "seq scan" phase. index_rebuild_count: : used only while the "rebuilding index" phase. Couldn't we change the view like the following? .. | phase | heap tuples scanned | progress indicator | value ..-+------------+---------------------+---------------------+-------- .. | seq scan ..| 2134214 | heap blocks pct | 23.5% .. | index sc ..| 5453395 | (null) | (null) .. | sorting ..| <the last value> | (null) | (null) .. | swapping ..| <the last value> | (null) | (null) .. | rebuildi ..| <the last value> | index count | 2 .. | performi ..| <the last value> | (null) | (null) Only seq scan phase has two kind of progress indicator so if we choose "heap blks pct" as the only indicator for the phase, it could be simplified as: .. | phase | progress indicator | value ..-+-------------+---------------------+-------- .. | seq scan .. | heap blocks pct | 23.5% .. | index sc .. | heap tuples scanned | 2398457 .. | sorting .. | (null) | (null) .. | swapping .. | (null) | (null) .. | rebuildi .. | index count | 2 .. | performi .. | (null) | (null) A downside of the view is that it looks quite differently from pg_stat_progress_vacuum. Since I'm not sure it's a good design, feel free to oppose/reject this. finish_heap_swap is also called in matview code path but the function doesn't seem to detect that situation. Is it right behavior? (I didn't confirm what happens when it is called from matview refresh path, though.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Mar 18, 2019 at 10:03 PM Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > Attached patch is a rebased document patch. :) Attached is an updated patch. I went through this patch carefully today, in the hopes of committing it, and I think the attached version is pretty closet to being committable, but there's at least one open issue remaining, as described below. - The regression tests did not pass because expected/rules.out was not properly updated. I fixed that. - The documentation did not build because some tags were not properly terminated e.g. <xref linkend='...'> rather than <xref linkend='...'/>. I also fixed that. - The documentation had two nearly-identical lists of phases. I merged them into one. There might be room for some further fine-tuning here. - cluster_rel() had multiple places where it could return without calling pgstat_progress_end_command(). I fixed that. - cluster_rel() inexplicably delayed updating PROGRESS_CLUSTER_COMMAND for longer than seems necessary. I fixed that. - copy_heap_data() zeroed out the heap-tuples-scanned, heap-blocks-scanned, and total-heap-blocks counters when it began PROGRESS_CLUSTER_PHASE_SORT_TUPLES and PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES. This seems like throwing away useful information for no good reason. I changed it not to do that in all cases except the one mentioned in the next paragraph. - It *is* currently to reset PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED because that counter gets reused to indicate the number of heap tuples *written back out*, but I think that is bad design for two reasons. First, the documentation does not explain that sometimes the number of heap tuples scanned is really reporting the number of heap tuples written. Second, it's bad for columns to have misleading names. Third, it would actually be really useful to store these values in separate columns, because then you could expect that the number tuples written would eventually equal the number scanned, and you'd still have the number that were scanned around so that you could clearly see how close you were getting to rewriting the entire heap. This is the one thing I found but did not fix; any chance you could make this change and update the documentation to match? - The comment about reporting that we are now reindexing relations was jammed in between an existing comment and the associated code. I moved it to a more logical place. - The new if-statement in pg_stat_get_progress_info was missing a space required by project style. I added the space. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Tue, Mar 19, 2019 at 2:47 PM Robert Haas <robertmhaas@gmail.com> wrote: > how close you were getting to rewriting the entire heap. This is the > one thing I found but did not fix; any chance you could make this > change and update the documentation to match? Hi, is anybody working on this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Robert! >On Tue, Mar 19, 2019 at 2:47 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote: >> how close you were getting to rewriting the entire heap. This is the >> one thing I found but did not fix; any chance you could make this >> change and update the documentation to match? > > >Hi, is anybody working on this? Thank you so much for reviewing the patch and sorry for the late reply. Today, I realized that you sent the email for the patch because I took a sick leave from work for a while. So, I created new patch based on your comments asap. I hope it is acceptable to you. :) Please find attached file. Changes - Add new column *heap_tuples_written* in the view This column is updated when the phases are "seq scanning heap", "index scanning heap" or "writing new heap". - Fix document - Revised the patch on 280a408b48d5ee42969f981bceb9e9426c3a344c
Regards,
Tatsuro Yamada
Attachment
Hi Robert! On 2019/03/23 3:31, Robert Haas wrote: > On Tue, Mar 19, 2019 at 2:47 PM Robert Haas <robertmhaas@gmail.com> wrote: >> how close you were getting to rewriting the entire heap. This is the >> one thing I found but did not fix; any chance you could make this >> change and update the documentation to match? > > Hi, is anybody working on this? I sent this email using my personal email address: yamatattsu@gmail-. I re-send it with the patch and my test result. Thank you so much for reviewing the patch and sorry for the late reply. Today, I realized that you sent the email for the patch because I took a sick leave from work for a while. So, I created new patch based on your comments asap. I hope it is acceptable to you. :) Changes - Add new column *heap_tuples_written* in the view This column is updated when the phases are "seq scanning heap", "index scanning heap" or "writing new heap". - Fix document - Revised the patch on the current head: 940311e4bb32a5fe99155052e41179c88b5d48af. Please find attached files. :) Regards, Tatsuro Yamada
Attachment
On Sun, Mar 24, 2019 at 9:02 PM Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > Please find attached files. :) Committed. Thanks! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Robert and Reviewers! On 2019/03/26 0:02, Robert Haas wrote: > On Sun, Mar 24, 2019 at 9:02 PM Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: >> Please find attached files. :) > > Committed. Thanks! Thank you! Hope this feature will help DBA and user. :) Regards, Tatsuro Yamada NTT Open Source Software Center
On Tue, Mar 26, 2019 at 10:04:48AM +0900, Tatsuro Yamada wrote: > Hope this feature will help DBA and user. :) Congrats, Yamada-san. -- Michael
Attachment
Hmm, I'm trying this out now and I don't see the index_rebuild_count ever go up. I think it's because the indexes are built using parallel index build ... or maybe it was the table AM changes that moved things around, not sure. There's a period at the end when the CLUSTER command keeps working, but it's gone from pg_stat_progress_cluster. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Alvaro! On 2019/08/02 3:43, Alvaro Herrera wrote: > Hmm, I'm trying this out now and I don't see the index_rebuild_count > ever go up. I think it's because the indexes are built using parallel > index build ... or maybe it was the table AM changes that moved things > around, not sure. There's a period at the end when the CLUSTER command > keeps working, but it's gone from pg_stat_progress_cluster. Thanks for your report. I'll investigate it. :) Thanks, Tatsuro Yamada
Hi Alvaro and All, On 2019/08/13 14:40, Tatsuro Yamada wrote: > Hi Alvaro! > > On 2019/08/02 3:43, Alvaro Herrera wrote: >> Hmm, I'm trying this out now and I don't see the index_rebuild_count >> ever go up. I think it's because the indexes are built using parallel >> index build ... or maybe it was the table AM changes that moved things >> around, not sure. There's a period at the end when the CLUSTER command >> keeps working, but it's gone from pg_stat_progress_cluster. > > > Thanks for your report. > I'll investigate it. :) I did "git bisect" and found the commit: 03f9e5cba0ee1633af4abe734504df50af46fbd8 Report progress of REINDEX operations In src/backend/catalog/index.c, CLUSTER progress reporting increases index_rebuild_count in reindex_relation() by pgstat_progress_update_param(). However, reindex_relation() calls reindex_index(), and REINDEX progress reporting is existing on the latter function, andit starts pgstat_progress_start_command() pgstat_progress_end_command() for REINDEX progress reporting. Therefore, CLUSTER progress reporting failed to update index_rebuild_count because it made a mistake to update the REINDEX's view, I think. My Idea to fix that is following: - Add a target view name parameter to Progress monitor's API For example: <Before> pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT, i). <After> pgstat_progress_update_param(*PROGRESS_CLUSTER_VIEW*, PROGRESS_CLUSTER_INDEX_REBUILD_COUNT, i). However, I'm not sure whether it is able or not because I haven't read the code of the API yet. What do you think about that? :) Thanks, Tatsuro Yamada
On Wed, Aug 14, 2019 at 11:38:01AM +0900, Tatsuro Yamada wrote: > On 2019/08/13 14:40, Tatsuro Yamada wrote: >> On 2019/08/02 3:43, Alvaro Herrera wrote: >>> Hmm, I'm trying this out now and I don't see the index_rebuild_count >>> ever go up. I think it's because the indexes are built using parallel >>> index build ... or maybe it was the table AM changes that moved things >>> around, not sure. There's a period at the end when the CLUSTER command >>> keeps working, but it's gone from pg_stat_progress_cluster. >> >> Thanks for your report. >> I'll investigate it. :) > > I did "git bisect" and found the commit: > > 03f9e5cba0ee1633af4abe734504df50af46fbd8 > Report progress of REINDEX operations I am adding an open item for this one. -- Michael
Attachment
Hi Michael, Alvaro and Robert! On 2019/08/14 11:52, Michael Paquier wrote: > On Wed, Aug 14, 2019 at 11:38:01AM +0900, Tatsuro Yamada wrote: >> On 2019/08/13 14:40, Tatsuro Yamada wrote: >>> On 2019/08/02 3:43, Alvaro Herrera wrote: >>>> Hmm, I'm trying this out now and I don't see the index_rebuild_count >>>> ever go up. I think it's because the indexes are built using parallel >>>> index build ... or maybe it was the table AM changes that moved things >>>> around, not sure. There's a period at the end when the CLUSTER command >>>> keeps working, but it's gone from pg_stat_progress_cluster. >>> >>> Thanks for your report. >>> I'll investigate it. :) >> >> I did "git bisect" and found the commit: >> >> 03f9e5cba0ee1633af4abe734504df50af46fbd8 >> Report progress of REINDEX operations > > I am adding an open item for this one. > -- > Michael Okay, I checked it on the wiki. https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items - index_rebuild_count in CLUSTER reporting never increments To be clear, 03f9e5cb broke CLUSTER progress reporting, but I investigated little more and share my ideas to fix the problem. * Call stack ======================================== cluster_rel pgstat_progress_start_command(CLUSTER) *A1 rebuild_relation finish_heap_swap reindex_relation reindex_index pgstat_progress_start_command(CREATE_INDEX) *B1 pgstat_progress_end_command() *B2 pgstat_progress_update_param(INDEX_REBUILD_COUNT, i) <- failed :( pgstat_progress_end_command() *A2 Note These are sets: A1 and A2, B1 and B2 ======================================== * Ideas to fix There are Three options, I guess. ======================================== 1. Call pgstat_progress_start_command(CLUSTER) again before pgstat_progress_update_param(INDEX_REBUILD_COUNT, i). 2. Add "save and restore" functions for the following two variables of MyBeentry in pgstat.c. - st_progress_command - st_progress_command_target 3. Use Hash or List to store multiple values for the two variables in pgstat.c. ======================================== I tried 1. and it shown index_rebuild_count, but it also shown "initializing" phase again and other columns were empty. So, it is not suitable to fix the problem. :( I'm going to try 2. and 3., but, unfortunately, I can't get enough time to do that after PGConf.Asia 2019. If we selected 3., it affects following these progress reporting features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay, I suppose. Any comments welcome! :) Thanks, Tatsuro Yamada
On Thu, Aug 15, 2019 at 12:48 PM Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote: > > Hi Michael, Alvaro and Robert! > > On 2019/08/14 11:52, Michael Paquier wrote: > > On Wed, Aug 14, 2019 at 11:38:01AM +0900, Tatsuro Yamada wrote: > >> On 2019/08/13 14:40, Tatsuro Yamada wrote: > >>> On 2019/08/02 3:43, Alvaro Herrera wrote: > >>>> Hmm, I'm trying this out now and I don't see the index_rebuild_count > >>>> ever go up. I think it's because the indexes are built using parallel > >>>> index build ... or maybe it was the table AM changes that moved things > >>>> around, not sure. There's a period at the end when the CLUSTER command > >>>> keeps working, but it's gone from pg_stat_progress_cluster. > >>> > >>> Thanks for your report. > >>> I'll investigate it. :) > >> > >> I did "git bisect" and found the commit: > >> > >> 03f9e5cba0ee1633af4abe734504df50af46fbd8 > >> Report progress of REINDEX operations > > > > I am adding an open item for this one. > > -- > > Michael > > > Okay, I checked it on the wiki. > > https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items > - index_rebuild_count in CLUSTER reporting never increments > > To be clear, 03f9e5cb broke CLUSTER progress reporting, but > I investigated little more and share my ideas to fix the problem. > > * Call stack > ======================================== > cluster_rel > pgstat_progress_start_command(CLUSTER) *A1 > rebuild_relation > finish_heap_swap > reindex_relation > reindex_index > pgstat_progress_start_command(CREATE_INDEX) *B1 > pgstat_progress_end_command() *B2 > pgstat_progress_update_param(INDEX_REBUILD_COUNT, i) <- failed :( > pgstat_progress_end_command() *A2 > > Note > These are sets: > A1 and A2, > B1 and B2 > ======================================== > > > * Ideas to fix > There are Three options, I guess. > ======================================== > 1. Call pgstat_progress_start_command(CLUSTER) again > before pgstat_progress_update_param(INDEX_REBUILD_COUNT, i). > > 2. Add "save and restore" functions for the following two > variables of MyBeentry in pgstat.c. > - st_progress_command > - st_progress_command_target > > 3. Use Hash or List to store multiple values for the two > variables in pgstat.c. > ======================================== > > > I tried 1. and it shown index_rebuild_count, but it also shown > "initializing" phase again and other columns were empty. So, it is > not suitable to fix the problem. :( > I'm going to try 2. and 3., but, unfortunately, I can't get enough > time to do that after PGConf.Asia 2019. > If we selected 3., it affects following these progress reporting > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay, > I suppose. Any comments welcome! :) I looked at this open item. I prefer #3 but I think it would be enough to have a small stack using a fixed length array to track nested progress information and the current executing command (maybe 4 or 8 would be enough for maximum nested level for now?). That way, we don't need any change the interfaces. AFAICS there is no case where we call only either pgstat_progress_start_command or pgstat_progress_end_command without each other (although pgstat_progress_update_param is called without start). So I think that having a small stack for tracking multiple reports would work. Attached the draft patch that fixes this issue. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote: > > I tried 1. and it shown index_rebuild_count, but it also shown > > "initializing" phase again and other columns were empty. So, it is > > not suitable to fix the problem. :( > > I'm going to try 2. and 3., but, unfortunately, I can't get enough > > time to do that after PGConf.Asia 2019. > > If we selected 3., it affects following these progress reporting > > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay, > > I suppose. Any comments welcome! :) > > I looked at this open item. I prefer #3 but I think it would be enough > to have a small stack using a fixed length array to track nested > progress information and the current executing command (maybe 4 or 8 > would be enough for maximum nested level for now?). That way, we don't > need any change the interfaces. AFAICS there is no case where we call > only either pgstat_progress_start_command or > pgstat_progress_end_command without each other (although > pgstat_progress_update_param is called without start). So I think that > having a small stack for tracking multiple reports would work. > > Attached the draft patch that fixes this issue. Please review it. Do we actually want to show to the user information about CREATE INDEX which is different than CLUSTER? It could be confusing for the user to see a progress report from a command different than the one actually launched. There could be a method 4 here: do not start a new command progress when there is another one already started, and do not try to end it in the code path where it could not be started as it did not stack. So while I see the advantages of stacking the progress records as you do when doing cascading calls of the progress reporting, I am not sure that: 1) We should bloat more PgBackendStatus for that. 2) We want to add more complication in this area close to the release of 12. Another solution as mentioned by Yamada-san is just to start again the progress for the cluster command in reindex_relation() however you got an issue here as reindex_relation() is also called by REINDEX TABLE. I find actually very weird that we have a cluster-related field added for REINDEX, so it seems to me that all the interactions between the code paths of CLUSTER and REINDEX have not been completely thought through. This part has been added by 6f97457 :( -- Michael
Attachment
On Mon, Sep 2, 2019 at 4:59 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote: > > > I tried 1. and it shown index_rebuild_count, but it also shown > > > "initializing" phase again and other columns were empty. So, it is > > > not suitable to fix the problem. :( > > > I'm going to try 2. and 3., but, unfortunately, I can't get enough > > > time to do that after PGConf.Asia 2019. > > > If we selected 3., it affects following these progress reporting > > > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay, > > > I suppose. Any comments welcome! :) > > > > I looked at this open item. I prefer #3 but I think it would be enough > > to have a small stack using a fixed length array to track nested > > progress information and the current executing command (maybe 4 or 8 > > would be enough for maximum nested level for now?). That way, we don't > > need any change the interfaces. AFAICS there is no case where we call > > only either pgstat_progress_start_command or > > pgstat_progress_end_command without each other (although > > pgstat_progress_update_param is called without start). So I think that > > having a small stack for tracking multiple reports would work. > > > > Attached the draft patch that fixes this issue. Please review it. > > Do we actually want to show to the user information about CREATE INDEX > which is different than CLUSTER? It could be confusing for the user > to see a progress report from a command different than the one > actually launched. I personally think it would be helpful for users. We provide the progress reporting for each commands but it might not be detailed enough. But we can provide more details of progress information of each commands by combining them. Only users who want to confirm the details need to see different progress reports. > There could be a method 4 here: do not start a new > command progress when there is another one already started, and do not > try to end it in the code path where it could not be started as it did > not stack. So while I see the advantages of stacking the progress > records as you do when doing cascading calls of the progress > reporting, I am not sure that: > 1) We should bloat more PgBackendStatus for that. > 2) We want to add more complication in this area close to the > release of 12. I agreed, especially to 2. We can live with #4 method in PG12 and the patch I proposed could be discussed as a new feature. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Sep 2, 2019 at 6:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Sep 2, 2019 at 4:59 PM Michael Paquier <michael@paquier.xyz> wrote: > > > > On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote: > > > > I tried 1. and it shown index_rebuild_count, but it also shown > > > > "initializing" phase again and other columns were empty. So, it is > > > > not suitable to fix the problem. :( > > > > I'm going to try 2. and 3., but, unfortunately, I can't get enough > > > > time to do that after PGConf.Asia 2019. > > > > If we selected 3., it affects following these progress reporting > > > > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay, > > > > I suppose. Any comments welcome! :) > > > > > > I looked at this open item. I prefer #3 but I think it would be enough > > > to have a small stack using a fixed length array to track nested > > > progress information and the current executing command (maybe 4 or 8 > > > would be enough for maximum nested level for now?). That way, we don't > > > need any change the interfaces. AFAICS there is no case where we call > > > only either pgstat_progress_start_command or > > > pgstat_progress_end_command without each other (although > > > pgstat_progress_update_param is called without start). So I think that > > > having a small stack for tracking multiple reports would work. > > > > > > Attached the draft patch that fixes this issue. Please review it. > > > > Do we actually want to show to the user information about CREATE INDEX > > which is different than CLUSTER? It could be confusing for the user > > to see a progress report from a command different than the one > > actually launched. > > I personally think it would be helpful for users. We provide the > progress reporting for each commands but it might not be detailed > enough. But we can provide more details of progress information of > each commands by combining them. Only users who want to confirm the > details need to see different progress reports. > > > There could be a method 4 here: do not start a new > > command progress when there is another one already started, and do not > > try to end it in the code path where it could not be started as it did > > not stack. So while I see the advantages of stacking the progress > > records as you do when doing cascading calls of the progress > > reporting, I am not sure that: > > 1) We should bloat more PgBackendStatus for that. > > 2) We want to add more complication in this area close to the > > release of 12. > > I agreed, especially to 2. We can live with #4 method in PG12 and the > patch I proposed could be discussed as a new feature. After more thought, even if we don't start a new command progress when there is another one already started the progress update functions could be called and these functions don't specify the command type. Therefore, the progress information might be changed with wrong value by different command. Probably we can change the caller of progress updating function so that it doesn't call all of them if the command could not start a new progress report, but it might be a big change. As an alternative idea, we can make pgstat_progress_end_command() have one argument that is command the caller wants to end. That is, we don't end the command progress when the specified command type doesn't match to the command type of current running command progress. We unconditionally clear the progress information at CommitTransaction() and AbortTransaction() but we can do that by passing PROGRESS_COMMAND_INVALID to pgstat_progress_end_command(). BTW the following condition in pgstat_progress_end_command() seems to be wrong. We should return from the function when either pgstat_track_activities is disabled or the current progress command is invalid. if (!pgstat_track_activities && beentry->st_progress_command == PROGRESS_COMMAND_INVALID) return; I've attached the patch fixes the issue I newly found. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Tue, Sep 03, 2019 at 01:59:00PM +0900, Masahiko Sawada wrote: > After more thought, even if we don't start a new command progress when > there is another one already started the progress update functions > could be called and these functions don't specify the command type. > Therefore, the progress information might be changed with wrong value > by different command. Probably we can change the caller of progress > updating function so that it doesn't call all of them if the command > could not start a new progress report, but it might be a big change. That's one issue. > As an alternative idea, we can make pgstat_progress_end_command() have > one argument that is command the caller wants to end. That is, we > don't end the command progress when the specified command type doesn't > match to the command type of current running command progress. We > unconditionally clear the progress information at CommitTransaction() > and AbortTransaction() but we can do that by passing > PROGRESS_COMMAND_INVALID to pgstat_progress_end_command(). Possibly. I don't dislike the idea of piling up the progress information for cascading calls and I would use that with a depth counter and a fixed-size array. > BTW the following condition in pgstat_progress_end_command() seems to > be wrong. We should return from the function when either > pgstat_track_activities is disabled or the current progress command is > invalid. > > if (!pgstat_track_activities > && beentry->st_progress_command == PROGRESS_COMMAND_INVALID) > return; > > I've attached the patch fixes the issue I newly found. Indeed, good catch. This is wrong since b6fb647 which has introduced the progress reports. I'll fix that one and back-patch if there are no objections. With my RMT hat on for v12, I don't think that it is really the moment to discuss how we want to change this API post beta3, and we have room for that in future development cycles. There are quite some questions which need answers I am unsure of: - Do we really want to support nested calls of progress reports for multiple command? - Perhaps for some commands it makes sense to have an overlap of the fields used, but we need a clear definition of what can be done or not. I am not really comfortable with the idea of having in reindex_relation() a progress report related only to CLUSTER, which is also a REINDEX code path. The semantics shared between both commands need to be thought a bit more. For example PROGRESS_CLUSTER_INDEX_REBUILD_COUNT could cause the system catalog to report PROGRESS_CREATEIDX_PHASE_WAIT_3 because of an incorrect command type, which would be just wrong for a CLUSTER command. - Which command should be reported to the user, only the upper-level one? - Perhaps we can live only with the approach of not registering a new command if one already exists, and actually be more flexible with the phase fields used, in short we use unique numbers for each phase? The most conservative bet from a release point of view, and actually my bet because that's safe, would be to basically revert 6f97457 (CLUSTER), 03f9e5c (REINDEX) and ab0dfc96 (CREATE INDEX which has overlaps with REINDEX in the build and validation paths). What I am really scared of is that we have just barely scratched the surface of the issues caused by the inter-dependencies between all the code paths of those commands, and that we have much more waiting behind this open item. -- Michael
Attachment
On Tue, Sep 03, 2019 at 02:52:28PM +0900, Michael Paquier wrote: > Indeed, good catch. This is wrong since b6fb647 which has introduced > the progress reports. I'll fix that one and back-patch if there are > no objections. OK, applied this part down to 9.6. -- Michael
Attachment
On Wed, Sep 4, 2019 at 3:48 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Sep 03, 2019 at 02:52:28PM +0900, Michael Paquier wrote: > > Indeed, good catch. This is wrong since b6fb647 which has introduced > > the progress reports. I'll fix that one and back-patch if there are > > no objections. > > OK, applied this part down to 9.6. Thank you! Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Sep 3, 2019 at 1:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > After more thought, even if we don't start a new command progress when > there is another one already started the progress update functions > could be called and these functions don't specify the command type. > Therefore, the progress information might be changed with wrong value > by different command. Probably we can change the caller of progress > updating function so that it doesn't call all of them if the command > could not start a new progress report, but it might be a big change. > > As an alternative idea, we can make pgstat_progress_end_command() have > one argument that is command the caller wants to end. That is, we > don't end the command progress when the specified command type doesn't > match to the command type of current running command progress. We > unconditionally clear the progress information at CommitTransaction() > and AbortTransaction() but we can do that by passing > PROGRESS_COMMAND_INVALID to pgstat_progress_end_command(). I think this is all going in the wrong direction. It's the responsibility of the people who are calling the pgstat_progress_* functions to only do so when it's appropriate. Having the pgstat_progress_* functions try to untangle whether or not they ought to ignore calls made to them is backwards. It seems to me that the problem here can be summarized in this way: there's a bunch of code reuse in the relevant paths here, and somebody wasn't careful enough when adding progress reporting for one of the new commands, and so now things are broken, because e.g. CLUSTER progress reporting gets ended by a pgstat_progress_end_command() call that was intended for some other utility command but is reached in the CLUSTER case anyway. The solution to that problem in my book is to figure out which commit broke it, and then the person who made that commit either needs to fix it or revert it. It's quite possible here that we need a bigger redesign to make adding progress reporting for new command easier and less prone to bugs, but I don't think it's at all clear what such a redesign should look like or even that we definitely need one, and September is not the right time to be redesigning features for the pending release. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Sep 04, 2019 at 09:18:39AM -0400, Robert Haas wrote: > I think this is all going in the wrong direction. It's the > responsibility of the people who are calling the pgstat_progress_* > functions to only do so when it's appropriate. Having the > pgstat_progress_* functions try to untangle whether or not they ought > to ignore calls made to them is backwards. Check. > It seems to me that the problem here can be summarized in this way: > there's a bunch of code reuse in the relevant paths here, and somebody > wasn't careful enough when adding progress reporting for one of the > new commands, and so now things are broken, because e.g. CLUSTER > progress reporting gets ended by a pgstat_progress_end_command() call > that was intended for some other utility command but is reached in the > CLUSTER case anyway. The solution to that problem in my book is to > figure out which commit broke it, and then the person who made that > commit either needs to fix it or revert it. I am not sure that it is right as well to say that the first committed patch is right and that the follow-up ones are wrong. CLUSTER progress was committed first (6f97457), followed a couple of days after by CREATE INDEX (ab0dfc9) and then REINDEX (03f9e5c). So let's have a look at them.. For CLUSTER, the progress starts and ends in cluster_rel(). CLUSTER uses its code paths at the beginning, but then things get more complicated, particularly with finish_heap_swap() which calls directly reindex_table(). 6f97457 includes one progress update at point which can be a problem per its shared nature in reindex_relation() with PROGRESS_CLUSTER_INDEX_REBUILD_COUNT. This last part is wrong IMO, why should cluster reporting take priority in this code path, enforcing anything else? For CREATE INDEX, the progress reporting starts and ends once in DefineIndex(). However, we have updates of progress within each index AM build routine, which could be taken by many code paths. Is it actually fine to give priority to CREATE INDEX in those cases? Those paths can as well be taken by REINDEX or CLUSTER (right?), so having a counter for CREATE INDEX looks logically wrong to me. The part where we wait for snapshots looks actually good from the perspective of REINDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY. For REINDEX, we have a problematic start progress call in reindex_index() which is for example called by reindex_relation for each relation's index for a non-concurrent case (also in ReindexRelationConcurrently()). I think that these are incorrect locations, and I would have placed them in ReindexIndex(), ReindexTable() and ReindexMultipleTables() so as we avoid anything low-level. This has added two calls to pgstat_progress_update_param() in reindex_index(), which is shared between all. Why would it be fine to give the priority to a CREATE INDEX marker here if CLUSTER can also cross this way? On top of those issues, I see some problems with the current state of affairs, and I am repeating myself: - It is possible that pgstat_progress_update_param() is called for a given command for a code path taken by a completely different command, and that we have a real risk of showing a status completely buggy as the progress phases share the same numbers. - We don't consider wisely end and start progress handling for cascading calls, leading to a risk where we start command A, but for shared code paths where we assume that only command B can run then the processing abruptly ends for command A. - Is it actually fine to report information about a command completely different than the one provided by the client? It is for example possible to call CLUSTER, but show up to the user progress report about PROGRESS_COMMAND_CREATE_INDEX (see reindex_index). This could actually make sense if we are able to handle cascading progress reports. These are, at least it seems to me, fundamental problems we need to ponder more about if we begin to include more progress reporting, and we don't have that now, and that worries me. > It's quite possible here that we need a bigger redesign to make adding > progress reporting for new command easier and less prone to bugs, but > I don't think it's at all clear what such a redesign should look like > or even that we definitely need one, and September is not the right > time to be redesigning features for the pending release. Definitely. -- Michael
Attachment
On Wed, Sep 4, 2019 at 9:03 PM Michael Paquier <michael@paquier.xyz> wrote: > For CLUSTER, the progress starts and ends in cluster_rel(). CLUSTER > uses its code paths at the beginning, but then things get more > complicated, particularly with finish_heap_swap() which calls directly > reindex_table(). 6f97457 includes one progress update at point which > can be a problem per its shared nature in reindex_relation() with > PROGRESS_CLUSTER_INDEX_REBUILD_COUNT. This last part is wrong IMO, > why should cluster reporting take priority in this code path, > enforcing anything else? Oops. Yeah, that's bogus (as are some of the other things you mention). I think we're going to have to fix this by passing down some flags to these functions to tell them what kind of progress updates to do (or to do none). Or else pass down a callback function and a context object, but that seems like it might be overkill. > On top of those issues, I see some problems with the current state of > affairs, and I am repeating myself: > - It is possible that pgstat_progress_update_param() is called for a > given command for a code path taken by a completely different > command, and that we have a real risk of showing a status completely > buggy as the progress phases share the same numbers. > - We don't consider wisely end and start progress handling for > cascading calls, leading to a risk where we start command A, but > for shared code paths where we assume that only command B can run then > the processing abruptly ends for command A. Those are just weaknesses of the infrastructure. Perhaps there is a better solution, but there's no intrinsic reason that we can't avoid them by careful coding. > - Is it actually fine to report information about a command completely > different than the one provided by the client? It is for example > possible to call CLUSTER, but show up to the user progress report > about PROGRESS_COMMAND_CREATE_INDEX (see reindex_index). This could > actually make sense if we are able to handle cascading progress > reports. Well, it might be OK to do that if we're clear that this is the index progress-reporting view and the command is CLUSTER but it happens to be building an index now so we're showing it here. But I don't see how it would work anyway: you can't reported cascading progress reports in shared memory because you've only got a fixed amount of space. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Sep 05, 2019 at 03:17:51PM -0400, Robert Haas wrote: > Oops. Yeah, that's bogus (as are some of the other things you > mention). I think we're going to have to fix this by passing down > some flags to these functions to tell them what kind of progress > updates to do (or to do none). Or else pass down a callback function > and a context object, but that seems like it might be overkill. One idea I got was to pass the command ID as an extra argument of the update routine. I am not completely sure either if we need this level of complication. > Those are just weaknesses of the infrastructure. Perhaps there is a > better solution, but there's no intrinsic reason that we can't avoid > them by careful coding. Perhaps. The current infra allows the addition of a progress report in code paths which are isolated from other things. For CLUSTER, most things are fine as long as the progress is updated in cluster_rel(), the rest is too internal. > Well, it might be OK to do that if we're clear that this is the index > progress-reporting view and the command is CLUSTER but it happens to > be building an index now so we're showing it here. But I don't see > how it would work anyway: you can't reported cascading progress > reports in shared memory because you've only got a fixed amount of > space. I don't see exactly why we could not switch to a fixed number of slots, say 8, with one code path to start a progress which adds an extra report on the stack, one to remove one entry from the stack, and a new one to reset the whole thing for a backend. This would not need much restructuration of course. Finally comes the question of what do we do for v12? I am adding in CC Peter, Alvaro being already present, who have been involved in the commits with CREATE INDEX and REINDEX. It would be sad to revert a this feature, but well I'd rather do that now than regret later releasing the feature as it is currently shaped.. Let's see what the others think. -- Michael
Attachment
On Fri, Sep 06, 2019 at 02:44:18PM +0900, Michael Paquier wrote: > I don't see exactly why we could not switch to a fixed number of > slots, say 8, with one code path to start a progress which adds an > extra report on the stack, one to remove one entry from the stack, and > a new one to reset the whole thing for a backend. This would not need > much restructuration of course. Wake up, Neo. Your last sentence is confusing. I meant that this would need more design efforts, so that's not in scope for v12. -- Michael
Attachment
On Fri, Sep 6, 2019 at 1:44 AM Michael Paquier <michael@paquier.xyz> wrote: > One idea I got was to pass the command ID as an extra argument of the > update routine. I am not completely sure either if we need this level > of complication. I still don't think that's the right approach. > > Those are just weaknesses of the infrastructure. Perhaps there is a > > better solution, but there's no intrinsic reason that we can't avoid > > them by careful coding. > > Perhaps. The current infra allows the addition of a progress report > in code paths which are isolated from other things. For CLUSTER, most > things are fine as long as the progress is updated in cluster_rel(), > the rest is too internal. It's fine if things are updated as well -- it's just you need to make sure that those places know whether or not they are supposed to be doing those updates. Again, why can't we just pass down a value telling them "do reindex-style progress updates" or "do cluster-style progress updates" or "do no progress updates"? > > Well, it might be OK to do that if we're clear that this is the index > > progress-reporting view and the command is CLUSTER but it happens to > > be building an index now so we're showing it here. But I don't see > > how it would work anyway: you can't reported cascading progress > > reports in shared memory because you've only got a fixed amount of > > space. > > I don't see exactly why we could not switch to a fixed number of > slots, say 8, with one code path to start a progress which adds an > extra report on the stack, one to remove one entry from the stack, and > a new one to reset the whole thing for a backend. This would not need > much restructuration of course. You could do that, but I don't think it's probably that great of an idea. Now you've built something which is significantly more complex than the original design of this feature, but still not good enough to report on the progress of a query tree. I tend to think we should confine ourselves to the progress reporting that can reasonably be done within the current infrastructure until somebody invents a really general mechanism that can handle, essentially, an EXPLAIN-on-the-fly of a current query tree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2019-Sep-06, Michael Paquier wrote: > Finally comes the question of what do we do for v12? I am adding in > CC Peter, Alvaro being already present, who have been involved in the > commits with CREATE INDEX and REINDEX. It would be sad to revert a > this feature, but well I'd rather do that now than regret later > releasing the feature as it is currently shaped.. Let's see what the > others think. As far as I understand, CREATE INDEX is not affected -- only REINDEX is. Of course, it would be sad to revert even the latter, but it's not as bleak as reverting the whole thing. That said, I did spend some time on this type of issue when doing CREATE INDEX support; you can tell because I defined the columns for block numbers in a scan separately from CREATE INDEX specific fields, precisely to avoid multiple commands running concurrently from clobbering unrelated columns: /* Block numbers in a generic relation scan */ #define PROGRESS_SCAN_BLOCKS_TOTAL 15 #define PROGRESS_SCAN_BLOCKS_DONE 16 I would say that it's fairly useful to have CLUSTER report progress on indexes being created underneath, but I understand that it might be too late to be designing the CLUSTER report to take advantage of the CREATE INDEX metrics. I think a workable, not terribly invasive approach is to have REINDEX process its commands conditionally: have the caller indicate whether progress is to be reported, and skip the calls if not. That would (should) prevent it from clobbering the state set up by CLUSTER. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Sep 06, 2019 at 10:27:02AM -0400, Alvaro Herrera from 2ndQuadrant wrote: > That said, I did spend some time on this type of issue when doing CREATE > INDEX support; you can tell because I defined the columns for block > numbers in a scan separately from CREATE INDEX specific fields, > precisely to avoid multiple commands running concurrently from > clobbering unrelated columns: > > /* Block numbers in a generic relation scan */ > #define PROGRESS_SCAN_BLOCKS_TOTAL 15 > #define PROGRESS_SCAN_BLOCKS_DONE 16 Hm. It is not really clear what is the intention by looking at the contents progress.h. > I would say that it's fairly useful to have CLUSTER report progress on > indexes being created underneath, but I understand that it might be too > late to be designing the CLUSTER report to take advantage of the CREATE > INDEX metrics. The same can be said about the reporting done in reindex_relation for PROGRESS_CLUSTER_INDEX_REBUILD_COUNT. I think that it should be removed for now. > I think a workable, not terribly invasive approach is to have REINDEX > process its commands conditionally: have the caller indicate whether > progress is to be reported, and skip the calls if not. That would > (should) prevent it from clobbering the state set up by CLUSTER. So, you would basically add an extra flag in the options of reindex_index() to decide if a progress report should be started or not? I am not a fan of that because it does not take care of the root issue which is that the start of the progress reports is too much internal. I think that it would be actually less error prone to move the start of the progress reporting for REINDEX out of reindex_index() and start it at a higher level. Looking again at the code, I would recommend that we should start the progress in ReindexIndex() before calling reindex_index(), ReindexMultipleTables() before calling reindex_relation() and ReindexRelationConcurrently(), and ReindexTable() before calling reindex_relation(). That will avoid each command to clobber each other's in-progress reports. It would be also very good to document clearly how the overlaps for the progress parameter values are not happening. For example for CREATE INDEX, we don't know why 1, 2 and 7 are not used. Note that there is an ID collision with PROGRESS_CREATEIDX_INDEX_OID updated in reindex_index() and the CLUSTER part PROGRESS_CLUSTER_HEAP_BLKS_SCANNED.. There could be an argument to use a completely different range of IDs for each command phase to avoid any overlap, but it is scary to consider that we may not have found all the issues with one command cloberring another one's state.. -- Michael
Attachment
On Fri, Sep 6, 2019 at 5:11 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Sep 6, 2019 at 1:44 AM Michael Paquier <michael@paquier.xyz> wrote: > > I don't see exactly why we could not switch to a fixed number of > > slots, say 8, with one code path to start a progress which adds an > > extra report on the stack, one to remove one entry from the stack, and > > a new one to reset the whole thing for a backend. This would not need > > much restructuration of course. > > You could do that, but I don't think it's probably that great of an > idea. Now you've built something which is significantly more complex > than the original design of this feature, but still not good enough to > report on the progress of a query tree. I tend to think we should > confine ourselves to the progress reporting that can reasonably be > done within the current infrastructure until somebody invents a really > general mechanism that can handle, essentially, an EXPLAIN-on-the-fly > of a current query tree. +1. Let's not complicate the progress reporting infrastructure for an uncertain benefit. CLUSTER/VACUUM FULL is fundamentally an awkward utility command to target with progress reporting infrastructure. I think that it's okay to redefine how progress reporting works with CLUSTER now, in order to fix the REINDEX/CLUSTER state clobbering bug. -- Peter Geoghegan
On Fri, Sep 06, 2019 at 08:10:58AM -0400, Robert Haas wrote: > It's fine if things are updated as well -- it's just you need to make > sure that those places know whether or not they are supposed to be > doing those updates. Again, why can't we just pass down a value > telling them "do reindex-style progress updates" or "do cluster-style > progress updates" or "do no progress updates"? That's invasive. CREATE INDEX reporting goes pretty deep into the tree, with steps dedicated to the builds and scans of btree (nbtsort.c for example) and hash index AMs. In this case we have something that does somewhat what you are looking for with report_progress which gets set to true only for VACUUM. If we were to do something like that, we would also need to keep some sort of mapping regarding which command ID (as defined by ProgressCommandType) is able to use which set of parameter flags (as defined by the arguments of pgstat_progress_update_param() and its multi_* cousin). Then comes the issue that some parameters may be used by multiple command types, while other don't (REINDEX and CREATE INDEX have some shared mapping). -- Michael
Attachment
On Fri, Sep 13, 2019 at 2:49 AM Michael Paquier <michael@paquier.xyz> wrote: > On Fri, Sep 06, 2019 at 08:10:58AM -0400, Robert Haas wrote: > > It's fine if things are updated as well -- it's just you need to make > > sure that those places know whether or not they are supposed to be > > doing those updates. Again, why can't we just pass down a value > > telling them "do reindex-style progress updates" or "do cluster-style > > progress updates" or "do no progress updates"? > > That's invasive. CREATE INDEX reporting goes pretty deep into the > tree, with steps dedicated to the builds and scans of btree (nbtsort.c > for example) and hash index AMs. In this case we have something that > does somewhat what you are looking for with report_progress which gets > set to true only for VACUUM. If we were to do something like that, we > would also need to keep some sort of mapping regarding which command > ID (as defined by ProgressCommandType) is able to use which set of > parameter flags (as defined by the arguments of > pgstat_progress_update_param() and its multi_* cousin). Then comes > the issue that some parameters may be used by multiple command types, > while other don't (REINDEX and CREATE INDEX have some shared > mapping). Well, if CREATE INDEX progress reporting can't be reasonably done within the current infrastructure, then it should be reverted for v12 and not committed again until somebody upgrades the infrastructure. I admit that I was a bit suspicious about that commit, but I figured Alvaro knew what he was doing and didn't study it in any depth. And perhaps he did know what he was doing and will disagree with your assessment. But so far I haven't heard an idea for evolving the infrastructure that sounds more than half-baked. It's generally clear, though, that the existing infrastructure is not well-suited to progress reporting for code that bounces all over the tree. It's not clear that *any* infrastructure is *entirely* well-suited to do that; such problems are inherently complex. But this is just a very simple system which was designed to be simple and very low cost to use, and it may be that it's been stretched outside of its comfort zone. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2019-Sep-13, Robert Haas wrote: > On Fri, Sep 13, 2019 at 2:49 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Sep 06, 2019 at 08:10:58AM -0400, Robert Haas wrote: > > > It's fine if things are updated as well -- it's just you need to make > > > sure that those places know whether or not they are supposed to be > > > doing those updates. Again, why can't we just pass down a value > > > telling them "do reindex-style progress updates" or "do cluster-style > > > progress updates" or "do no progress updates"? > > > > That's invasive. CREATE INDEX reporting goes pretty deep into the > > tree, with steps dedicated to the builds and scans of btree (nbtsort.c > > for example) and hash index AMs. > Well, if CREATE INDEX progress reporting can't be reasonably done > within the current infrastructure, then it should be reverted for v12 > and not committed again until somebody upgrades the infrastructure. Ummm ... I've been operating --in this thread-- under the assumption that it is REINDEX to blame for this problem, not CREATE INDEX, because my recollection is that I tested CREATE INDEX together with CLUSTER and it worked fine. Has anybody done any actual research that the problem is to blame on CREATE INDEX and not REINDEX? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Sep 13, 2019 at 12:03 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Ummm ... I've been operating --in this thread-- under the assumption > that it is REINDEX to blame for this problem, not CREATE INDEX, because > my recollection is that I tested CREATE INDEX together with CLUSTER and > it worked fine. Has anybody done any actual research that the problem > is to blame on CREATE INDEX and not REINDEX? I am not sure. I think, though, that the point is that all three commands rebuild indexes. So unless they all expect the same things in terms of which counters get set during that process, things will not work correctly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Tatsuro, On 2019-Aug-13, Tatsuro Yamada wrote: > On 2019/08/02 3:43, Alvaro Herrera wrote: > > Hmm, I'm trying this out now and I don't see the index_rebuild_count > > ever go up. I think it's because the indexes are built using parallel > > index build ... or maybe it was the table AM changes that moved things > > around, not sure. There's a period at the end when the CLUSTER command > > keeps working, but it's gone from pg_stat_progress_cluster. > > Thanks for your report. > I'll investigate it. :) I have fixed it. Can you please verify? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Sep 13, 2019 at 12:48:40PM -0400, Robert Haas wrote: > On Fri, Sep 13, 2019 at 12:03 PM Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> Ummm ... I've been operating --in this thread-- under the assumption >> that it is REINDEX to blame for this problem, not CREATE INDEX, because >> my recollection is that I tested CREATE INDEX together with CLUSTER and >> it worked fine. Has anybody done any actual research that the problem >> is to blame on CREATE INDEX and not REINDEX? > > I am not sure. I think, though, that the point is that all three > commands rebuild indexes. So unless they all expect the same things in > terms of which counters get set during that process, things will not > work correctly. I have provided a short summary of the two issues on the open item page (https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items) as the open item was too much evasive. Here is a copy-paste for the archives of what I wrote: 1) A progress may be started while another one is already in progress. Hence, if progress gets stopped the previously-started state is removed, causing all follow-up updates to not happen. 2) Progress updates happening in a code path shared between those three commands may clobber a previous state present. Regarding 1) and based on what I found in the code, you can blame REINDEX reporting which has added progress_start calls in code paths which are also taken by CREATE INDEX and CLUSTER, causing their progress reporting to go to the void. In order to fix this one we could do what I summarized in [1]. As mentioned by Robert, the problem summarized in 2) is much more complex using the current infrastructure, and one could blame all the commands per the way they do not share the same set of progress phases. There are a couple of potential solutions which have been discussed on the thread: - Allow commands to share the same set of phases, which requires some kind of mapping between the phases (?). - Allow progress reports to become a stack. This would also take care of any kind of issues in 1) for the future, and this can cause the incorrect command to be reported to pg_stat_activity if not being careful. - Allow only reporting for a given command ID, which would basically require to pass down the command ID to progress update APIs and bypass an update if the command ID provided by caller does not match the existing one started (?). 1) is pretty easy to fix based on the current state of the code, 2) requires much more consideration, and that's no material for v12. It could be perfectly possible as well that we have another solution not discussed yet on this thread. [1]: https://www.postgresql.org/message-id/20190905010316.GB14853@paquier.xyz -- Michael
Attachment
Hello Tatsuro,
On 2019-Aug-13, Tatsuro Yamada wrote:
> On 2019/08/02 3:43, Alvaro Herrera wrote:
> > Hmm, I'm trying this out now and I don't see the index_rebuild_count
> > ever go up. I think it's because the indexes are built using parallel
> > index build ... or maybe it was the table AM changes that moved things
> > around, not sure. There's a period at the end when the CLUSTER command
> > keeps working, but it's gone from pg_stat_progress_cluster.
>
> Thanks for your report.
> I'll investigate it. :)
I have fixed it. Can you please verify?
Attached file is WIP patch.In my patch, I added "command id" to all APIs of progress reporting to isolate commands. Therefore, it doesn't allow to cascade updating system views. And my patch is on WIP so it needs clean-up and test.
I share it anyway. :)
11636|13591|postgres|16384|CLUSTER|initializing|0|0|0|0|0|0
11636|13591|postgres|16384|CLUSTER|index scanning heap|16389|251|251|0|0|0
...
11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|10000|10000|0|0|0...
...
...
...
...
Attachment
On Sat, Sep 14, 2019 at 01:06:32PM +0900, Tattsu Yama wrote: > Thanks! I can review your patch for fix it. > However, I was starting fixing the problem from the last day of PGConf.Asia > (11 Sep). > Attached file is WIP patch.In my patch, I added "command id" to all APIs of > progress reporting to isolate commands. Therefore, it doesn't allow to > cascade updating system views. And my patch is on WIP so it needs clean-up > and test. > I share it anyway. :) + if (cmdtype == PROGRESS_COMMAND_INVALID || beentry->st_progress_command == cmdtype) + { + PGSTAT_BEGIN_WRITE_ACTIVITY(beentry); + beentry->st_progress_param[index] = val; + PGSTAT_END_WRITE_ACTIVITY(beentry); + } You basically don't need the progress reports if the command ID is invalid, no? Another note is that you don't actually fix the problems related to the calls of pgstat_progress_end_command() which have been added for REINDEX reporting, so a progress report started for CLUSTER can get ended earlier than expected, preventing the follow-up progress updates to show up. -- Michael
Attachment
> Attached file is WIP patch.In my patch, I added "command id" to all APIs of
> progress reporting to isolate commands. Therefore, it doesn't allow to
> cascade updating system views. And my patch is on WIP so it needs clean-up
> and test.
> I share it anyway. :)
+ if (cmdtype == PROGRESS_COMMAND_INVALID || beentry->st_progress_command == cmdtype)
+ {
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+ beentry->st_progress_param[index] = val;
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
+ }
You basically don't need the progress reports if the command ID is
invalid, no?
Another note is that you don't actually fix the problems related to
the calls of pgstat_progress_end_command() which have been added for
REINDEX reporting, so a progress report started for CLUSTER can get
ended earlier than expected, preventing the follow-up progress updates
to show up.
11636|13591|postgres|16384|CLUSTER|initializing|0|0|0|0|0|0
11636|13591|postgres|16384|CLUSTER|index scanning heap|16389|251|251|0|0|0
11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|10000|10000|0|0|0 <== The last column rebuild_index_count is increasing!
> Attached file is WIP patch.In my patch, I added "command id" to all APIs of
> progress reporting to isolate commands. Therefore, it doesn't allow to
> cascade updating system views. And my patch is on WIP so it needs clean-up
> and test.
> I share it anyway. :)
+ if (cmdtype == PROGRESS_COMMAND_INVALID || beentry->st_progress_command == cmdtype)
+ {
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+ beentry->st_progress_param[index] = val;
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
+ }
You basically don't need the progress reports if the command ID is
invalid, no?Ah, right.I'll check and fix that today. :)
- Allow only reporting for a given command ID, which would basically
require to pass down the command ID to progress update APIs and bypass an update
if the command ID provided by caller does not match the existing one started (?).
- Progress reporter starts when beentry->st_progress_command is PROGRESS_COMMAND_INVALID
pgstat_progress_end_command(ProgressCommandType cmdtype,...)
- Progress reporter ends when beentry->st_progress_command equals cmdtype
pgstat_progress_update_param(ProgressCommandType cmdtype,...) and
pgstat_progress_update_multi_param(ProgressCommandType cmdtype,...)
- Progress reporter updates parameters if beentry->st_progress_command equals cmdtype
{
PROGRESS_COMMAND_INVALID,
PROGRESS_COMMAND_VACUUM,
PROGRESS_COMMAND_CLUSTER,
PROGRESS_COMMAND_CREATE_INDEX
} ProgressCommandType;
Attachment
On 2019-Sep-16, Tattsu Yama wrote: > I should have explained the API changes more. I added cmdtype as a given > parameter for all functions (See below). > Therefore, I suppose that my patch is similar to the following fix as you > mentioned on -hackers. Is this fix strictly necessary for pg12, or is this something that we can leave for pg13? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Alvaro, On 2019/09/16 23:12, Alvaro Herrera wrote: > On 2019-Sep-16, Tattsu Yama wrote: > >> I should have explained the API changes more. I added cmdtype as a given >> parameter for all functions (See below). >> Therefore, I suppose that my patch is similar to the following fix as you >> mentioned on -hackers. > > Is this fix strictly necessary for pg12, or is this something that we > can leave for pg13? Not only me but many DBA needs this progress report feature on PG12, therefore I'm trying to fix the problem. If you send other patch to fix the problem, and it is more elegant than mine, I can withdraw my patch. Anyway, I want to avoid this feature being reverted. Do you have any ideas to fix the problem? Thanks, Tatsuro Yamada
On 2019-Sep-17, Tatsuro Yamada wrote: > On 2019/09/16 23:12, Alvaro Herrera wrote: > > Is this fix strictly necessary for pg12, or is this something that we > > can leave for pg13? > > Not only me but many DBA needs this progress report feature on PG12, > therefore I'm trying to fix the problem. If you send other patch to > fix the problem, and it is more elegant than mine, I can withdraw my patch. > Anyway, I want to avoid this feature being reverted. > Do you have any ideas to fix the problem? I committed a fix for the originally reported problem as da47e43dc32e in branch REL_12_STABLE. Is that insufficient, and if so why? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Alvaro! >>> Is this fix strictly necessary for pg12, or is this something that we >>> can leave for pg13? >> >> Not only me but many DBA needs this progress report feature on PG12, >> therefore I'm trying to fix the problem. If you send other patch to >> fix the problem, and it is more elegant than mine, I can withdraw my patch. >> Anyway, I want to avoid this feature being reverted. >> Do you have any ideas to fix the problem? > > I committed a fix for the originally reported problem as da47e43dc32e in > branch REL_12_STABLE. Is that insufficient, and if so why? Ooops, I misunderstood. I now realized you committed your patch to fix the problem. Thanks! I'll test it later. :) https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=da47e43dc32e3c5916396f0cbcfa974b371e4875 Thanks, Tatsuro Yamada
Hi Alvaro! >>>> Is this fix strictly necessary for pg12, or is this something that we >>>> can leave for pg13? >>> >>> Not only me but many DBA needs this progress report feature on PG12, >>> therefore I'm trying to fix the problem. If you send other patch to >>> fix the problem, and it is more elegant than mine, I can withdraw my patch. >>> Anyway, I want to avoid this feature being reverted. >>> Do you have any ideas to fix the problem? >> >> I committed a fix for the originally reported problem as da47e43dc32e in >> branch REL_12_STABLE. Is that insufficient, and if so why? > > > Ooops, I misunderstood. I now realized you committed your patch to > fix the problem. Thanks! I'll test it later. :) > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=da47e43dc32e3c5916396f0cbcfa974b371e4875 I tested your patch (da47e43d) and it works fine. Thanks! :) So, my patch improving progress reporting API can leave for PG13. #Test scenario =================== [Session #1] select * from pg_stat_progress_cluster ; \watch 0.0001 [Session #2] create table hoge as select a from generate_series(1, 100000) a; create index ind_hoge1 on hoge(a); create index ind_hoge2 on hoge((a%2)); create index ind_hoge3 on hoge((a%3)); create index ind_hoge4 on hoge((a%4)); create index ind_hoge5 on hoge((a%5)); cluster hoge using ind_hoge1; =================== #Test result =================== 22283|13593|postgres|16384|CLUSTER|initializing|0|0|0|0|0|0 ... 22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|100000|100000|0|0|0 <= Increasing from 0 to 5 22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|100000|100000|0|0|1 22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|100000|100000|0|0|2 22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|100000|100000|0|0|3 22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|100000|100000|0|0|4 22283|13593|postgres|16384|CLUSTER|performing final cleanup|16387|100000|100000|0|0|5 =================== Thanks, Tatsuro Yamada
On Mon, Sep 16, 2019 at 03:26:10PM +0900, Tattsu Yama wrote: > I should have explained the API changes more. I added cmdtype as a given > parameter for all functions (See below). > Therefore, I suppose that my patch is similar to the following fix as you > mentioned on -hackers. Yes, that's an option I mentioned here, but it has drawbacks: https://www.postgresql.org/message-id/20190914024547.GB15406@paquier.xyz I have just looked at it again after a small rebase and there are issues with the design of your patch: - When aborting a transaction, we need to enforce a reset of the command ID used in st_progress_command to be PROGRESS_COMMAND_INVALID. Unfortunately, your patch does not consider the case where an error happens while a command ID is set, causing a command to still be tracked with the next transactions of the session. Even worse, it prevents pgstat_progress_start_command() to be called again in this case for another command. - CLUSTER can rebuild indexes, and we'd likely want to be able to track some of the information from CREATE INDEX for CLUSTER. The second issue is perhaps fine as it is not really straight-forward to share the same progress phases across multiple commands, and we could live without it for now, or require a follow-up patch to make the information of CREATE INDEX available to CLUSTER. Now, the first issue is of another caliber and a no-go :( On HEAD, pgstat_progress_end_command() has the limitation to not be able to stack multiple commands, so calling it in cascade has the disadvantage to perhaps erase the progress state of a command (and it is not designed for that anyway), which is what happens with CLUSTER when reindex_index() starts a new progress report, but the simplicity of the current infrastructure is very safe when it comes to failure handling, to make sure that an reset happens as long as the command ID is not invalid. Your patch makes that part unpredictable. -- Michael
Attachment
On Sat, Sep 14, 2019 at 11:45:47AM +0900, Michael Paquier wrote: > I have provided a short summary of the two issues on the open item > page (https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items) as > the open item was too much evasive. Here is a copy-paste for the > archives of what I wrote: > 1) A progress may be started while another one is already in progress. > Hence, if progress gets stopped the previously-started state is > removed, causing all follow-up updates to not happen. > 2) Progress updates happening in a code path shared between those > three commands may clobber a previous state present. > > Regarding 1) and based on what I found in the code, you can blame > REINDEX reporting which has added progress_start calls in code paths > which are also taken by CREATE INDEX and CLUSTER, causing their > progress reporting to go to the void. In order to fix this one we > could do what I summarized in [1]. > > [1]: https://www.postgresql.org/message-id/20190905010316.GB14853@paquier.xyz So, with the clock ticking and the release getting close by, what do we do for this set of issues? REINDEX, CREATE INDEX and CLUSTER all try to build indexes and the current infrastructure is not really adapted to hold all that. Robert, Alvaro and Peter E, do you have any comments to offer? -- Michael
Attachment
On 2019-Sep-18, Michael Paquier wrote: > So, with the clock ticking and the release getting close by, what do > we do for this set of issues? REINDEX, CREATE INDEX and CLUSTER all > try to build indexes and the current infrastructure is not really > adapted to hold all that. Robert, Alvaro and Peter E, do you have any > comments to offer? Which part of it is not already fixed? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Sep 17, 2019 at 10:50:22PM -0300, Alvaro Herrera wrote: > On 2019-Sep-18, Michael Paquier wrote: >> So, with the clock ticking and the release getting close by, what do >> we do for this set of issues? REINDEX, CREATE INDEX and CLUSTER all >> try to build indexes and the current infrastructure is not really >> adapted to hold all that. Robert, Alvaro and Peter E, do you have any >> comments to offer? > > Which part of it is not already fixed? I can still see at least two problems. There is one issue with pgstat_progress_update_param() which gets called in reindex_table() for a progress phase of CLUSTER, and this even if REINDEXOPT_REPORT_PROGRESS is not set in the options. Also it seems to me that the calls to pgstat_progress_start_command() and pgstat_progress_end_command() are at incorrect locations for reindex_index() and that those should be one level higher on the stack to avoid any kind of interactions with another command whose progress has already started. -- Michael
Attachment
On 2019-Sep-18, Michael Paquier wrote: > On Tue, Sep 17, 2019 at 10:50:22PM -0300, Alvaro Herrera wrote: > > On 2019-Sep-18, Michael Paquier wrote: > >> So, with the clock ticking and the release getting close by, what do > >> we do for this set of issues? REINDEX, CREATE INDEX and CLUSTER all > >> try to build indexes and the current infrastructure is not really > >> adapted to hold all that. Robert, Alvaro and Peter E, do you have any > >> comments to offer? > > > > Which part of it is not already fixed? > > I can still see at least two problems. There is one issue with > pgstat_progress_update_param() which gets called in reindex_table() > for a progress phase of CLUSTER, and this even if > REINDEXOPT_REPORT_PROGRESS is not set in the options. (You mean reindex_relation.) ... but that param update is there for CLUSTER, not for REINDEX, so if we made it dependent on the option to turn on CREATE INDEX progress updates, it would break CLUSTER progress reporting. Also, the parameter being updated is not used by CREATE INDEX, so there's no spurious change. I think this ain't broke, and thus it don't need fixin'. If anything, I would like the CLUSTER report to show index creation progress, which would go the opposite way. But that seems a patch for pg13. > Also it seems > to me that the calls to pgstat_progress_start_command() and > pgstat_progress_end_command() are at incorrect locations for > reindex_index() and that those should be one level higher on the stack > to avoid any kind of interactions with another command whose progress > has already started. That doesn't work, because the caller doesn't have the OID of the table, which pgstat_progress_start_command() needs. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services