Thread: pgsql: Use streaming read I/O in VACUUM's third phase

pgsql: Use streaming read I/O in VACUUM's third phase

From
Melanie Plageman
Date:
Use streaming read I/O in VACUUM's third phase

Make vacuum's third phase (its second pass over the heap), which reaps
dead items collected in the first phase and marks them as reusable, use
the read stream API. This commit adds a new read stream callback,
vacuum_reap_lp_read_stream_next(), that looks ahead in the TidStore and
returns the next block number to read for vacuum.

Author: Melanie Plageman <melanieplageman@gmail.com>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGKN3oy0bN_3yv8hd78a4%2BM1tJC9z7mD8%2Bf%2ByA%2BGeoFUwQ%40mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/c3e775e608f2a6d0bcfba147bf08a506827cc567

Modified Files
--------------
src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++++++++++++++++----
1 file changed, 49 insertions(+), 6 deletions(-)


Re: pgsql: Use streaming read I/O in VACUUM's third phase

From
Melanie Plageman
Date:
On Fri, Feb 14, 2025 at 12:59 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> Use streaming read I/O in VACUUM's third phase
>
> Make vacuum's third phase (its second pass over the heap), which reaps
> dead items collected in the first phase and marks them as reusable, use
> the read stream API. This commit adds a new read stream callback,
> vacuum_reap_lp_read_stream_next(), that looks ahead in the TidStore and
> returns the next block number to read for vacuum.
>
> Author: Melanie Plageman <melanieplageman@gmail.com>
> Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
> Discussion: https://postgr.es/m/CA%2BhUKGKN3oy0bN_3yv8hd78a4%2BM1tJC9z7mD8%2Bf%2ByA%2BGeoFUwQ%40mail.gmail.com
>
> Branch
> ------
> master
>
> Details
> -------
> https://git.postgresql.org/pg/commitdiff/c3e775e608f2a6d0bcfba147bf08a506827cc567
>
> Modified Files
> --------------
> src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++++++++++++++++----
> 1 file changed, 49 insertions(+), 6 deletions(-)

I'm looking into the valgrind failures [1].

==1526248== VALGRINDERROR-END
{
   <insert_a_suppression_name_here>
   Memcheck:Addr1
   fun:lazy_scan_heap
   fun:heap_vacuum_rel
   fun:table_relation_vacuum
   fun:vacuum_rel
   fun:vacuum_rel
   fun:vacuum
   fun:ExecVacuum
   fun:standard_ProcessUtility
   fun:ProcessUtility
   fun:PortalRunUtility
   fun:PortalRunMulti
   fun:PortalRun
   fun:exec_simple_query
}
**1526248** Valgrind detected 492 error(s) during execution of "VACUUM FREEZE;

==1526248== VALGRINDERROR-END
{
   <insert_a_suppression_name_here>
   Memcheck:Addr8
   fun:TidStoreGetBlockOffsets
   fun:lazy_vacuum_heap_rel
   fun:lazy_vacuum
   fun:lazy_scan_heap
   fun:heap_vacuum_rel
   fun:table_relation_vacuum
   fun:vacuum_rel
   fun:vacuum
   fun:ExecVacuum
   fun:standard_ProcessUtility
   fun:ProcessUtility
   fun:PortalRunUtility
   fun:PortalRunMulti
}
==152



[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-02-14%2018%3A00%3A12



Re: pgsql: Use streaming read I/O in VACUUM's third phase

From
Melanie Plageman
Date:
On Fri, Feb 14, 2025 at 1:31 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Fri, Feb 14, 2025 at 12:59 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> >
> > Use streaming read I/O in VACUUM's third phase
> >
> > Make vacuum's third phase (its second pass over the heap), which reaps
> > dead items collected in the first phase and marks them as reusable, use
> > the read stream API. This commit adds a new read stream callback,
> > vacuum_reap_lp_read_stream_next(), that looks ahead in the TidStore and
> > returns the next block number to read for vacuum.
> >
> > Author: Melanie Plageman <melanieplageman@gmail.com>
> > Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
> > Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
> > Discussion: https://postgr.es/m/CA%2BhUKGKN3oy0bN_3yv8hd78a4%2BM1tJC9z7mD8%2Bf%2ByA%2BGeoFUwQ%40mail.gmail.com
> >
> > Branch
> > ------
> > master
> >
> > Details
> > -------
> > https://git.postgresql.org/pg/commitdiff/c3e775e608f2a6d0bcfba147bf08a506827cc567
> >
> > Modified Files
> > --------------
> > src/backend/access/heap/vacuumlazy.c | 55 ++++++++++++++++++++++++++++++++----
> > 1 file changed, 49 insertions(+), 6 deletions(-)
>
> I'm looking into the valgrind failures [1].
>
> ==1526248== VALGRINDERROR-END
> {
>    <insert_a_suppression_name_here>
>    Memcheck:Addr1
>    fun:lazy_scan_heap
>    fun:heap_vacuum_rel
>    fun:table_relation_vacuum
>    fun:vacuum_rel
>    fun:vacuum_rel
>    fun:vacuum
>    fun:ExecVacuum
>    fun:standard_ProcessUtility
>    fun:ProcessUtility
>    fun:PortalRunUtility
>    fun:PortalRunMulti
>    fun:PortalRun
>    fun:exec_simple_query
> }

Looks like there is something wrong with the read stream API. This is
the first read stream user taking advantage of per_buffer_data. Thomas
and I are investigating further. It is trivially reproducible when
running intidb under valgrind.

- Melanie