Re: PG 18 release notes draft committed - Mailing list pgsql-hackers
From | Alexander Borisov |
---|---|
Subject | Re: PG 18 release notes draft committed |
Date | |
Msg-id | 33fa5fef-3927-4a44-8046-e555a7cd11c1@gmail.com Whole thread Raw |
In response to | Re: PG 18 release notes draft committed (Jelte Fennema-Nio <postgres@jeltef.nl>) |
List | pgsql-hackers |
05.05.2025 03:22, Jelte Fennema-Nio wrote: [...] > > I think there are a few things at play here why that did not happen in > Bruce his initial draft: > 1. I personally think the requirement that Bruce uses for perf > improvements to make it into the changelog is too strict (see my > previous email for details) > 2. Bruce is only a single person, and as such cannot read all emails > on pgsql-hackers, so he relies only on commit messages to determine > impact for release notes. The commit message for your change did not > include any details on the perf improvements that could be expected. > 3. After skimming the email thread[1], it's hard for me to understand > where these perf numbers came from. And the first few results only > mention casefold performance i.e. they call the results: "casefold() > test." So, it's unclear what perf gains are expected for the other > functions mentioned in the email subject. I totally agree with you, it's hard to keep track of everything. It's also a lot of work to read every commit and understand its essence. I have no complaints, I'm just trying to understand the rules of getting into Release Notes. The rules, as it turns out, are not simple. But they are rules, even though I don't agree with them, I accept them. > > As for how to improve these: > 1 is discussed/complained about basically every year whenever release > notes are created. I don't think we can do any better than having > those discussions. Unless someone else wants to start owning writing > the release notes, or we somehow share the burden, e.g. by having the > person that commits also write a release note entry. > 2 can be improved by people including perf numbers in their commit > messages. The second way to improve is by sending feedback on the > release notes if things are missed, like you did. > 3 is something you could help with I think. It would have been helpful > if you had shared the script/commands you used to get these > performance numbers. That way I could reproduce them myself. Also if > you had included some perf numbers for lower() and upper() that would > have been great too, as those are (currently) much more commonly used > than casefold(). NOTE: I might have missed the script or be wrong > about this some other way, since Jeff did not require this for > committing it. If so, please disregard. > > [1]: https://www.postgresql.org/message-id/flat/7cac7e66-9a3b-4e3f-a997-42aa0c401f80%40gmail.com A bit about what those numbers are, in the discussion for the patch I described how I got those numbers. The point is that functions lower(), upper(), casefold() have one common algorithm, the difference is in what table for mapping we pass to this algorithm. Therefore, there is no sense to measure the performance of each function separately. Any of these functions will show the performance of the algorithm of getting codepoints from tables in the same way. Therefore, we can take lower() or upper() or casefold() and get the result of Unicode table mapping algorithm (that's where I changed the code, the algorithm). I can measure everything, but there is no sense in it. Here are the measurements made at the moment of patch discussion: For each test, a sql file was created for pgbench. The data description is present. casefold() test. ASCII: Repeated characters (700kb) in the range from 0x20 to 0x7E. Patch: tps = 278.449809 Without: tps = 266.526168 Cyrillic: Repeated characters (1MB) in the range from 0x0410 to 0x042F. Patch: tps = 86.740680 Without: tps = 49.373695 Unicode: A query consisting of all Unicode characters from 0xA0 to 0x2FA1D (excluding 0xD800..0xDFFF). Patch: tps = 102.221092 Without: tps = 92.477798 * Ubuntu 24.04.1 (Intel(R) Xeon(R) Gold 6140) (gcc version 13.3.0) ASCII: Repeated characters (700kb) in the range from 0x20 to 0x7E. Patch: tps = 146.712371 Without: tps = 120.794307 Cyrillic: Repeated characters (1MB) in the range from 0x0410 to 0x042F. Patch: tps = 44.499567 Without: tps = 24.237999 Unicode: A query consisting of all Unicode characters from 0xA0 to 0x2FA1D (excluding 0xD800..0xDFFF). Patch: tps = 54.354833 Without: tps = 46.556531 > >> I will continue to improve Postgres. > > Please do, your work is very much appreciated! I thought it was worthy of a separate line in the Release Notes. As I think, it is not so easy to increase the performance for Unicode. So many users use lower() and upper(), and it would be nice to know that work is being done to improve performance in this area. But again, I'm new to the Postgres community and I'm getting to know what's going on here and how it works. Thank you for paying attention to it! -- Regards, Alexander Borisov
pgsql-hackers by date: