On Mon, May 20, 2024 at 8:41 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, May 20, 2024 at 8:47 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
> >
> > On 5/20/24 2:58 AM, John Naylor wrote:
> > > Hi Jon,
> > >
> > > Regarding vacuum "has shown up to a 6x improvement in overall time to
> > > complete its work" -- I believe I've seen reported numbers close to
> > > that only 1) when measuring the index phase in isolation or maybe 2)
> > > the entire vacuum of unlogged tables with one, perfectly-correlated
> > > index (testing has less variance with WAL out of the picture). I
> > > believe tables with many indexes would show a lot of improvement, but
> > > I'm not aware of testing that case specifically. Can you clarify where
> > > 6x came from?
> >
> > Sawada-san showed me the original context, but I can't rapidly find it
> > in the thread. Sawada-san, can you please share the numbers behind this?
> >
>
> I referenced the numbers that I measured during the development[1]
> (test scripts are here[2]). IIRC I used unlogged tables and indexes,
> and these numbers were the entire vacuum execution time including heap
> scanning, index vacuuming and heap vacuuming.
Thanks for confirming.
The wording "has a new internal data structure that reduces memory
usage and has shown up to a 6x improvement in overall time to complete
its work" is specific for runtime, and the memory use is less
specific. Unlogged tables are not the norm, so I'd be cautious of
reporting numbers specifically designed (for testing) to isolate the
thing that changed.
I'm wondering if it might be both more impressive-sounding and more
realistic for the average user experience to reverse that: specific on
memory, and less specific on speed. The best-case memory reduction
occurs for table update patterns that are highly localized, such as
the most recently inserted records, and I'd say those are a lot more
common than the use of unlogged tables.
Maybe something like "has a new internal data structure that reduces
overall time to complete its work and can use up to 20x less memory."
Now, it is true that when dead tuples are sparse and evenly spaced
(e.g. 1 every 100 pages), vacuum can now actually use more memory than
v16. However, the nature of that scenario also means that the number
of TIDs just can't get very big to begin with. In contrast, while the
runtime improvement for normal (logged) tables is likely not
earth-shattering, I believe it will always be at least somewhat
faster, and never slower.