Thread: [patch] CLUSTER blocks scanned progress reporting
Hi, The pg_stat_progress_cluster view can report incorrect heap_blks_scanned values when synchronize_seqscans is enabled, because it allows the sequential heap scan to not start at block 0. This can result in wraparounds in the heap_blks_scanned column when the table scan wraps around, and starting the next phase with heap_blks_scanned != heap_blks_total. This issue was introduced with the pg_stat_progress_cluster view. The attached patch fixes the issue by accounting for a non-0 heapScan->rs_startblock and calculating the correct number with a non-0 heapScan->rs_startblock in mind. The issue is reproducible starting from PG12 by stopping a CLUSTER-command while it is sequential-scanning the table (seqscan enforceable through enable_indexscan = off), and then re-starting the seqscanning CLUSTER command (without other load/seq-scans on the table). Any thoughts? Matthias van de Meent
Attachment
On 2020/11/21 2:32, Matthias van de Meent wrote: > Hi, > > The pg_stat_progress_cluster view can report incorrect > heap_blks_scanned values when synchronize_seqscans is enabled, because > it allows the sequential heap scan to not start at block 0. This can > result in wraparounds in the heap_blks_scanned column when the table > scan wraps around, and starting the next phase with heap_blks_scanned > != heap_blks_total. This issue was introduced with the > pg_stat_progress_cluster view. Good catch! I agree that this is a bug. > > The attached patch fixes the issue by accounting for a non-0 > heapScan->rs_startblock and calculating the correct number with a > non-0 heapScan->rs_startblock in mind. Thanks for the patch! It basically looks good to me. It's a bit waste of cycles to calculate and update the number of scanned blocks every cycles. So I'm inclined to change the code as follows. Thought? + BlockNumber prev_cblock = InvalidBlockNumber; <snip> + if (prev_cblock != heapScan->rs_cblock) + { + pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED, + (heapScan->rs_cblock + + heapScan->rs_nblocks - + heapScan->rs_startblock + ) % heapScan->rs_nblocks + 1); + prev_cblock = heapScan->rs_cblock; + } Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Tue, 24 Nov 2020 at 15:05, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/11/21 2:32, Matthias van de Meent wrote: > > Hi, > > > > The pg_stat_progress_cluster view can report incorrect > > heap_blks_scanned values when synchronize_seqscans is enabled, because > > it allows the sequential heap scan to not start at block 0. This can > > result in wraparounds in the heap_blks_scanned column when the table > > scan wraps around, and starting the next phase with heap_blks_scanned > > != heap_blks_total. This issue was introduced with the > > pg_stat_progress_cluster view. > > Good catch! I agree that this is a bug. > > > > > The attached patch fixes the issue by accounting for a non-0 > > heapScan->rs_startblock and calculating the correct number with a > > non-0 heapScan->rs_startblock in mind. > > Thanks for the patch! It basically looks good to me. Thanks for the feedback! > It's a bit waste of cycles to calculate and update the number of scanned > blocks every cycles. So I'm inclined to change the code as follows. > Thought? > > + BlockNumber prev_cblock = InvalidBlockNumber; > <snip> > + if (prev_cblock != heapScan->rs_cblock) > + { > + pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED, > + (heapScan->rs_cblock + > + heapScan->rs_nblocks - > + heapScan->rs_startblock > + ) % heapScan->rs_nblocks+ 1); > + prev_cblock = heapScan->rs_cblock; > + } That seems quite reasonable. I noticed that with my proposed patch it is still possible to go to the next phase while heap_blks_scanned != heap_blks_total. This can happen when the final heap pages contain only dead tuples, so no tuple is returned from the last heap page(s) of the scan. As the heapScan->rs_cblock is set to InvalidBlockNumber when the scan is finished (see heapam.c#1060-1072), I think it would be correct to set heap_blks_scanned to heapScan->rs_nblocks at the end of the scan instead. Please find attached a patch applying the suggested changes. Matthias van de Meent
Attachment
On 2020/11/25 0:25, Matthias van de Meent wrote: > On Tue, 24 Nov 2020 at 15:05, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> On 2020/11/21 2:32, Matthias van de Meent wrote: >>> Hi, >>> >>> The pg_stat_progress_cluster view can report incorrect >>> heap_blks_scanned values when synchronize_seqscans is enabled, because >>> it allows the sequential heap scan to not start at block 0. This can >>> result in wraparounds in the heap_blks_scanned column when the table >>> scan wraps around, and starting the next phase with heap_blks_scanned >>> != heap_blks_total. This issue was introduced with the >>> pg_stat_progress_cluster view. >> >> Good catch! I agree that this is a bug. >> >>> >>> The attached patch fixes the issue by accounting for a non-0 >>> heapScan->rs_startblock and calculating the correct number with a >>> non-0 heapScan->rs_startblock in mind. >> >> Thanks for the patch! It basically looks good to me. > > Thanks for the feedback! > >> It's a bit waste of cycles to calculate and update the number of scanned >> blocks every cycles. So I'm inclined to change the code as follows. >> Thought? >> >> + BlockNumber prev_cblock = InvalidBlockNumber; >> <snip> >> + if (prev_cblock != heapScan->rs_cblock) >> + { >> + pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED, >> + (heapScan->rs_cblock + >> + heapScan->rs_nblocks - >> + heapScan->rs_startblock >> + ) % heapScan->rs_nblocks+ 1); >> + prev_cblock = heapScan->rs_cblock; >> + } > > That seems quite reasonable. > > I noticed that with my proposed patch it is still possible to go to > the next phase while heap_blks_scanned != heap_blks_total. This can > happen when the final heap pages contain only dead tuples, so no tuple > is returned from the last heap page(s) of the scan. As the > heapScan->rs_cblock is set to InvalidBlockNumber when the scan is > finished (see heapam.c#1060-1072), I think it would be correct to set > heap_blks_scanned to heapScan->rs_nblocks at the end of the scan > instead. Thanks for updating the patch! Please let me check my understanding about this. I was thinking that even when the last page contains only dead tuples, table_scan_getnextslot() returns the last page (because SnapshotAny is used) and heap_blks_scanned is incremented properly. And then, heapam_relation_copy_for_cluster() handles all the dead tuples in that page. No? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Wed, 25 Nov 2020 at 10:35, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/11/25 0:25, Matthias van de Meent wrote: > > > > I noticed that with my proposed patch it is still possible to go to > > the next phase while heap_blks_scanned != heap_blks_total. This can > > happen when the final heap pages contain only dead tuples, so no tuple > > is returned from the last heap page(s) of the scan. As the > > heapScan->rs_cblock is set to InvalidBlockNumber when the scan is > > finished (see heapam.c#1060-1072), I think it would be correct to set > > heap_blks_scanned to heapScan->rs_nblocks at the end of the scan > > instead. > > Thanks for updating the patch! > > Please let me check my understanding about this. I was thinking that even > when the last page contains only dead tuples, table_scan_getnextslot() > returns the last page (because SnapshotAny is used) and heap_blks_scanned > is incremented properly. And then, heapam_relation_copy_for_cluster() > handles all the dead tuples in that page. No? Yes, my description was incorrect. The potential for a discrepancy exists for seemingly empty pages. I thought that 'filled with only dead tuples' would be correct for 'seemingly empty', but SnapshotAny indeed returns all tuples on a page. But pages can still be empty with SnapshotAny, through being emptied by vacuum, so the last pages scanned can still be empty pages. Then, there would be no tuple returned at the last pages of the scan, and subsequently the CLUSTER/VACUUM FULL would start the next phase without reporting on the last pages that were scanned and had no tuples in them (such that heap_blks_scanned != heap_blks_total). Vacuum truncates empty blocks from the end of the relation, and this prevents empty blocks at the end of CLUSTER for the default case of table scans starting at 0; but because the table scan might not start at block 0, we can have an empty page at the end of the table scan due to the last page of the scan not being the last page of the table. Matthias van de Meent
On 2020/11/25 20:52, Matthias van de Meent wrote: > On Wed, 25 Nov 2020 at 10:35, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> On 2020/11/25 0:25, Matthias van de Meent wrote: >>> >>> I noticed that with my proposed patch it is still possible to go to >>> the next phase while heap_blks_scanned != heap_blks_total. This can >>> happen when the final heap pages contain only dead tuples, so no tuple >>> is returned from the last heap page(s) of the scan. As the >>> heapScan->rs_cblock is set to InvalidBlockNumber when the scan is >>> finished (see heapam.c#1060-1072), I think it would be correct to set >>> heap_blks_scanned to heapScan->rs_nblocks at the end of the scan >>> instead. >> >> Thanks for updating the patch! >> >> Please let me check my understanding about this. I was thinking that even >> when the last page contains only dead tuples, table_scan_getnextslot() >> returns the last page (because SnapshotAny is used) and heap_blks_scanned >> is incremented properly. And then, heapam_relation_copy_for_cluster() >> handles all the dead tuples in that page. No? > > Yes, my description was incorrect. > > The potential for a discrepancy exists for seemingly empty pages. I > thought that 'filled with only dead tuples' would be correct for > 'seemingly empty', but SnapshotAny indeed returns all tuples on a > page. But pages can still be empty with SnapshotAny, through being > emptied by vacuum, so the last pages scanned can still be empty pages. > Then, there would be no tuple returned at the last pages of the scan, > and subsequently the CLUSTER/VACUUM FULL would start the next phase > without reporting on the last pages that were scanned and had no > tuples in them (such that heap_blks_scanned != heap_blks_total). > > Vacuum truncates empty blocks from the end of the relation, and this > prevents empty blocks at the end of CLUSTER for the default case of > table scans starting at 0; but because the table scan might not start > at block 0, we can have an empty page at the end of the table scan due > to the last page of the scan not being the last page of the table. Thanks for explaining that! I understood that. That situation can happen easily also by using VACUUM (TRUNCATE off) command. + * A heap scan need not return tuples for the last page it has + * scanned. To ensure that heap_blks_scanned is equivalent to + * total_heap_blks after the table scan phase, this parameter + * is manually updated to the correct value when the table scan + * finishes. So it's better to update this comment a bit? For example, If the scanned last pages are empty, it's possible to go to the next phase while heap_blks_scanned != heap_blks_total. To ensure that they are equivalet after the table scan phase, this parameter is manually updated to the correct value when the table scan finishes. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Thu, 26 Nov 2020 at 00:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > + * A heap scan need not return tuples for the last page it has > + * scanned. To ensure that heap_blks_scanned is equivalent to > + * total_heap_blks after the table scan phase, this parameter > + * is manually updated to the correct value when the table scan > + * finishes. > > So it's better to update this comment a bit? For example, > > If the scanned last pages are empty, it's possible to go to the next phase > while heap_blks_scanned != heap_blks_total. To ensure that they are > equivalet after the table scan phase, this parameter is manually updated > to the correct value when the table scan finishes. > PFA a patch with updated message and comment. I've reworded the messages to specifically mention empty pages and the need for setting heap_blks_scanned = total_heap_blks explicitly. Feel free to update the specifics, other than that I think this is a complete fix for the issues at hand. Regards, Matthias van de Meent
Attachment
On 2020/11/27 1:45, Matthias van de Meent wrote: > On Thu, 26 Nov 2020 at 00:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> + * A heap scan need not return tuples for the last page it has >> + * scanned. To ensure that heap_blks_scanned is equivalent to >> + * total_heap_blks after the table scan phase, this parameter >> + * is manually updated to the correct value when the table scan >> + * finishes. >> >> So it's better to update this comment a bit? For example, >> >> If the scanned last pages are empty, it's possible to go to the next phase >> while heap_blks_scanned != heap_blks_total. To ensure that they are >> equivalet after the table scan phase, this parameter is manually updated >> to the correct value when the table scan finishes. >> > > PFA a patch with updated message and comment. I've reworded the > messages to specifically mention empty pages and the need for setting > heap_blks_scanned = total_heap_blks explicitly. Thanks for updating the patch! It looks good to me. Barring any objection, I will commit this patch (also back-patch to v12). Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/11/27 1:51, Fujii Masao wrote: > > > On 2020/11/27 1:45, Matthias van de Meent wrote: >> On Thu, 26 Nov 2020 at 00:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>> >>> + * A heap scan need not return tuples for the last page it has >>> + * scanned. To ensure that heap_blks_scanned is equivalent to >>> + * total_heap_blks after the table scan phase, this parameter >>> + * is manually updated to the correct value when the table scan >>> + * finishes. >>> >>> So it's better to update this comment a bit? For example, >>> >>> If the scanned last pages are empty, it's possible to go to the next phase >>> while heap_blks_scanned != heap_blks_total. To ensure that they are >>> equivalet after the table scan phase, this parameter is manually updated >>> to the correct value when the table scan finishes. >>> >> >> PFA a patch with updated message and comment. I've reworded the >> messages to specifically mention empty pages and the need for setting >> heap_blks_scanned = total_heap_blks explicitly. > > Thanks for updating the patch! It looks good to me. > Barring any objection, I will commit this patch (also back-patch to v12). Pushed. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION