Re: Parallel heap vacuum - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Parallel heap vacuum
Date
Msg-id 6h7yzrwqo2wxwvk2fajqw7yneg6qasrkiqyhm3wdfr3uzc2fzq@ixenqjp7oehs
Whole thread Raw
In response to Re: Parallel heap vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
Hi,

On 2025-03-20 01:35:42 -0700, Masahiko Sawada wrote:
> One plausible solution would be that we don't use ReadStream in
> parallel heap vacuum cases but directly use
> table_block_parallelscan_xxx() instead. It works but we end up having
> two different scan methods for parallel and non-parallel lazy heap
> scan. I've implemented this idea in the attached v12 patches.

I think that's a bad idea - this means we'll never be able to use direct IO
for parallel VACUUMs, despite

a) The CPU overhead of buffered reads being a problem for VACUUM

b) Data ending up in the kernel page cache is rather wasteful for VACUUM, as
   that's often data that won't otherwise be used again soon. I.e. these reads
   would particularly benefit from using direct IO.

c) Even disregarding DIO, loosing the ability to do larger reads, as provided
   by read streams, looses a fair bit of efficiency (just try doing a
   pg_prewarm of a large relation with io_combine_limit=1 vs
   io_combine_limit=1).

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Next
From: Andres Freund
Date:
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring