On 5/21/24 6:40 AM, John Naylor wrote:
> On Mon, May 20, 2024 at 8:41 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Mon, May 20, 2024 at 8:47 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
>>>
>>> On 5/20/24 2:58 AM, John Naylor wrote:
>>>> Hi Jon,
>>>>
>>>> Regarding vacuum "has shown up to a 6x improvement in overall time to
>>>> complete its work" -- I believe I've seen reported numbers close to
>>>> that only 1) when measuring the index phase in isolation or maybe 2)
>>>> the entire vacuum of unlogged tables with one, perfectly-correlated
>>>> index (testing has less variance with WAL out of the picture). I
>>>> believe tables with many indexes would show a lot of improvement, but
>>>> I'm not aware of testing that case specifically. Can you clarify where
>>>> 6x came from?
>>>
>>> Sawada-san showed me the original context, but I can't rapidly find it
>>> in the thread. Sawada-san, can you please share the numbers behind this?
>>>
>>
>> I referenced the numbers that I measured during the development[1]
>> (test scripts are here[2]). IIRC I used unlogged tables and indexes,
>> and these numbers were the entire vacuum execution time including heap
>> scanning, index vacuuming and heap vacuuming.
>
> Thanks for confirming.
>
> The wording "has a new internal data structure that reduces memory
> usage and has shown up to a 6x improvement in overall time to complete
> its work" is specific for runtime, and the memory use is less
> specific. Unlogged tables are not the norm, so I'd be cautious of
> reporting numbers specifically designed (for testing) to isolate the
> thing that changed.
>
> I'm wondering if it might be both more impressive-sounding and more
> realistic for the average user experience to reverse that: specific on
> memory, and less specific on speed. The best-case memory reduction
> occurs for table update patterns that are highly localized, such as
> the most recently inserted records, and I'd say those are a lot more
> common than the use of unlogged tables.
>
> Maybe something like "has a new internal data structure that reduces
> overall time to complete its work and can use up to 20x less memory."
>
> Now, it is true that when dead tuples are sparse and evenly spaced
> (e.g. 1 every 100 pages), vacuum can now actually use more memory than
> v16. However, the nature of that scenario also means that the number
> of TIDs just can't get very big to begin with. In contrast, while the
> runtime improvement for normal (logged) tables is likely not
> earth-shattering, I believe it will always be at least somewhat
> faster, and never slower.
Thanks for the feedback. I flipped it around, per your suggestion:
"has a new internal data structure that has shown up to a 20x memory
reduction for vacuum, along with improvements in overall time to
complete its work."
Thanks,
Jonathan