Re: PG 18 release notes draft committed - Mailing list pgsql-hackers

From Alexander Borisov
Subject Re: PG 18 release notes draft committed
Date
Msg-id 33fa5fef-3927-4a44-8046-e555a7cd11c1@gmail.com
Whole thread Raw
In response to Re: PG 18 release notes draft committed  (Jelte Fennema-Nio <postgres@jeltef.nl>)
List pgsql-hackers
05.05.2025 03:22, Jelte Fennema-Nio wrote:

[...]

> 
> I think there are a few things at play here why that did not happen in
> Bruce his initial draft:
> 1. I personally think the requirement that Bruce uses for perf
> improvements to make it into the changelog is too strict (see my
> previous email for details)
> 2. Bruce is only a single person, and as such cannot read all emails
> on pgsql-hackers, so he relies only on commit messages to determine
> impact for release notes. The commit message for your change did not
> include any details on the perf improvements that could be expected.
> 3. After skimming the email thread[1], it's hard for me to understand
> where these perf numbers came from. And the first few results only
> mention casefold performance i.e. they call the results: "casefold()
> test." So, it's unclear what perf gains are expected for the other
> functions mentioned in the email subject.

I totally agree with you, it's hard to keep track of everything. It's
also a lot of work to read every commit and understand its essence.

I have no complaints, I'm just trying to understand the rules of getting
into Release Notes.
The rules, as it turns out, are not simple. But they are rules, even
though I don't agree with them, I accept them.

> 
> As for how to improve these:
> 1 is discussed/complained about basically every year whenever release
> notes are created. I don't think we can do any better than having
> those discussions. Unless someone else wants to start owning writing
> the release notes, or we somehow share the burden, e.g. by having the
> person that commits also write a release note entry.
> 2 can be improved by people including perf numbers in their commit
> messages. The second way to improve is by sending feedback on the
> release notes if things are missed, like you did.
> 3 is something you could help with I think. It would have been helpful
> if you had shared the script/commands you used to get these
> performance numbers. That way I could reproduce them myself. Also if
> you had included some perf numbers for lower() and upper() that would
> have been great too, as those are (currently) much more commonly used
> than casefold(). NOTE: I might have missed the script or be wrong
> about this some other way, since Jeff did not require this for
> committing it. If so, please disregard.
> 
> [1]: https://www.postgresql.org/message-id/flat/7cac7e66-9a3b-4e3f-a997-42aa0c401f80%40gmail.com

A bit about what those numbers are, in the discussion for the patch I
described how I got those numbers.

The point is that functions lower(), upper(), casefold() have one common
algorithm, the difference is in what table for mapping we pass to this
algorithm.
Therefore, there is no sense to measure the performance of each function
separately. Any of these functions will show the performance of the
algorithm of getting codepoints from tables in the same way.

Therefore, we can take lower() or upper() or casefold() and get the
result of Unicode table mapping algorithm (that's where I changed the
code, the algorithm).
I can measure everything, but there is no sense in it.
Here are the measurements made at the moment of patch discussion:

For each test, a sql file was created for pgbench. The data description
is present.

casefold() test.

ASCII:
Repeated characters (700kb) in the range from 0x20 to 0x7E.
Patch: tps = 278.449809
Without: tps = 266.526168

Cyrillic:
Repeated characters (1MB) in the range from 0x0410 to 0x042F.
Patch: tps = 86.740680
Without: tps = 49.373695

Unicode:
A query consisting of all Unicode characters from 0xA0 to 0x2FA1D
(excluding 0xD800..0xDFFF).
Patch: tps = 102.221092
Without: tps = 92.477798

* Ubuntu 24.04.1 (Intel(R) Xeon(R) Gold 6140) (gcc version 13.3.0)

ASCII:
Repeated characters (700kb) in the range from 0x20 to 0x7E.
Patch: tps = 146.712371
Without: tps = 120.794307

Cyrillic:
Repeated characters (1MB) in the range from 0x0410 to 0x042F.
Patch: tps = 44.499567
Without: tps = 24.237999

Unicode:
A query consisting of all Unicode characters from 0xA0 to 0x2FA1D
(excluding 0xD800..0xDFFF).
Patch: tps = 54.354833
Without: tps = 46.556531

> 
>> I will continue to improve Postgres.
> 
> Please do, your work is very much appreciated!

I thought it was worthy of a separate line in the Release Notes.
As I think, it is not so easy to increase the performance for Unicode.
So many users use lower() and upper(), and it would be nice to know that
work is being done to improve performance in this area.
But again, I'm new to the Postgres community and I'm getting to know
what's going on here and how it works.

Thank you for paying attention to it!

-- 
Regards,
Alexander Borisov



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: PG 18 release notes draft committed
Next
From: Peter Geoghegan
Date:
Subject: Re: PostgreSQL 18 Beta 1 release announcement draft