Re: PostgreSQL 17 Beta 1 release announcement draft - Mailing list pgsql-hackers

From Jonathan S. Katz
Subject Re: PostgreSQL 17 Beta 1 release announcement draft
Date
Msg-id 07de231a-bd4f-478b-a4af-501cefda2dae@postgresql.org
Whole thread Raw
In response to Re: PostgreSQL 17 Beta 1 release announcement draft  (John Naylor <johncnaylorls@gmail.com>)
List pgsql-hackers
On 5/21/24 6:40 AM, John Naylor wrote:
> On Mon, May 20, 2024 at 8:41 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Mon, May 20, 2024 at 8:47 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
>>>
>>> On 5/20/24 2:58 AM, John Naylor wrote:
>>>> Hi Jon,
>>>>
>>>> Regarding vacuum "has shown up to a 6x improvement in overall time to
>>>> complete its work" -- I believe I've seen reported numbers close to
>>>> that only 1) when measuring the index phase in isolation or maybe 2)
>>>> the entire vacuum of unlogged tables with one, perfectly-correlated
>>>> index (testing has less variance with WAL out of the picture). I
>>>> believe tables with many indexes would show a lot of improvement, but
>>>> I'm not aware of testing that case specifically. Can you clarify where
>>>> 6x came from?
>>>
>>> Sawada-san showed me the original context, but I can't rapidly find it
>>> in the thread. Sawada-san, can you please share the numbers behind this?
>>>
>>
>> I referenced the numbers that I measured during the development[1]
>> (test scripts are here[2]). IIRC I used unlogged tables and indexes,
>> and these numbers were the entire vacuum execution time including heap
>> scanning, index vacuuming and heap vacuuming.
> 
> Thanks for confirming.
> 
> The wording "has a new internal data structure that reduces memory
> usage and has shown up to a 6x improvement in overall time to complete
> its work" is specific for runtime, and the memory use is less
> specific. Unlogged tables are not the norm, so I'd be cautious of
> reporting numbers specifically designed (for testing) to isolate the
> thing that changed.
> 
> I'm wondering if it might be both more impressive-sounding and more
> realistic for the average user experience to reverse that: specific on
> memory, and less specific on speed. The best-case memory reduction
> occurs for table update patterns that are highly localized, such as
> the most recently inserted records, and I'd say those are a lot more
> common than the use of unlogged tables.
> 
> Maybe something like "has a new internal data structure that reduces
> overall time to complete its work and can use up to 20x less memory."
> 
> Now, it is true that when dead tuples are sparse and evenly spaced
> (e.g. 1 every 100 pages), vacuum can now actually use more memory than
> v16. However, the nature of that scenario also means that the number
> of TIDs just can't get very big to begin with. In contrast, while the
> runtime improvement for normal (logged) tables is likely not
> earth-shattering, I believe it will always be at least somewhat
> faster, and never slower.

Thanks for the feedback. I flipped it around, per your suggestion:

"has a new internal data structure that has shown up to a 20x memory 
reduction for vacuum, along with improvements in overall time to 
complete its work."

Thanks,

Jonathan

Attachment

pgsql-hackers by date:

Previous
From: "Jonathan S. Katz"
Date:
Subject: Re: PostgreSQL 17 Beta 1 release announcement draft
Next
From: Martijn Wallet
Date:
Subject: Re: processes stuck in shutdown following OOM/recovery