Thread: glibc qsort() vulnerability
Hi hackers,
There is a bug in glibc's qsort() algorithm that runs the risk of creating an out-of-bounds error if the comparison function is not transitive, for example, if subtraction is used so that it can create an overflow.
I checked the existing functions in the latest version of Postgres source code and most are safe, but there were a few ones that could lead to overflow. I do not know if these can actually lead to problems, but better safe than sorry, so I created a patch to fix those few cases and add a comment to one case that was not clear that it could not overflow.
Best wishes,
Mats Kindahl, Timescale
Mats Kindahl, Timescale
Attachment
Mats Kindahl <mats@timescale.com> writes: > There is a bug in glibc's qsort() algorithm that runs the risk of creating > an out-of-bounds error if the comparison function is not transitive, for > example, if subtraction is used so that it can create an overflow. We don't use glibc's qsort. Have you checked whether there's a problem with the code we do use? regards, tom lane
On Tue, Feb 06, 2024 at 10:11:16AM -0500, Tom Lane wrote: > Mats Kindahl <mats@timescale.com> writes: >> There is a bug in glibc's qsort() algorithm that runs the risk of creating >> an out-of-bounds error if the comparison function is not transitive, for >> example, if subtraction is used so that it can create an overflow. > > We don't use glibc's qsort. Have you checked whether there's a > problem with the code we do use? Oh, interesting. I didn't know that! As of commit 6edd2b4, we've used an in-tree qsort() for everything. port.h has the following line: #define qsort(a,b,c,d) pg_qsort(a,b,c,d) I see a handful of callers that use pg_qsort() directly, which IMO is odd. I would've expected that to be something different if I didn't know about that macro. Maybe we should change those to qsort()... Even if the glibc issue doesn't apply to Postgres, I'm tempted to suggest that we make it project policy that comparison functions must be transitive. There might be no real issues today, but if we write all comparison functions the way Mats is suggesting, it should be easier to reason about overflow risks. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Nathan Bossart <nathandbossart@gmail.com> writes: > Even if the glibc issue doesn't apply to Postgres, I'm tempted to suggest > that we make it project policy that comparison functions must be > transitive. There might be no real issues today, but if we write all > comparison functions the way Mats is suggesting, it should be easier to > reason about overflow risks. A comparison routine that is not is probably broken, agreed. I didn't look through the details of the patch --- I was more curious whether we had a version of the qsort bug, because if we do, we should fix that too. regards, tom lane
On Tue, Feb 06, 2024 at 03:55:58PM -0500, Tom Lane wrote: > A comparison routine that is not is probably broken, agreed. > I didn't look through the details of the patch --- I was more > curious whether we had a version of the qsort bug, because > if we do, we should fix that too. +1 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Tue, Feb 6, 2024 at 4:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mats Kindahl <mats@timescale.com> writes:
> There is a bug in glibc's qsort() algorithm that runs the risk of creating
> an out-of-bounds error if the comparison function is not transitive, for
> example, if subtraction is used so that it can create an overflow.
We don't use glibc's qsort. Have you checked whether there's a
problem with the code we do use?
Interesting. No, haven't checked. Will do that.
Best wishes,
Mats Kindahl
regards, tom lane
On Tue, Feb 6, 2024 at 9:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nathan Bossart <nathandbossart@gmail.com> writes:
> Even if the glibc issue doesn't apply to Postgres, I'm tempted to suggest
> that we make it project policy that comparison functions must be
> transitive. There might be no real issues today, but if we write all
> comparison functions the way Mats is suggesting, it should be easier to
> reason about overflow risks.
A comparison routine that is not is probably broken, agreed.
I didn't look through the details of the patch --- I was more
curious whether we had a version of the qsort bug, because
if we do, we should fix that too.
The patch basically removes the risk of overflow in three routines and just returns -1, 0, or 1, and adds a comment in one.
The routines modified do a subtraction of int:s and return that, which can cause an overflow. This method is used for some int16 as well but since standard conversions in C will perform the arithmetics in "int" precision, this cannot overflow, so added a comment there. It might still be a good idea to follow the same pattern for the int16 routines, but since there is no bug there, I did not add them to the patch.
Best wishes,
Mats Kindahl
Mats Kindahl
regards, tom lane
On 07/02/2024 11:09, Mats Kindahl wrote: > On Tue, Feb 6, 2024 at 9:56 PM Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>> wrote: > > Nathan Bossart <nathandbossart@gmail.com > <mailto:nathandbossart@gmail.com>> writes: > > Even if the glibc issue doesn't apply to Postgres, I'm tempted to > suggest > > that we make it project policy that comparison functions must be > > transitive. There might be no real issues today, but if we write all > > comparison functions the way Mats is suggesting, it should be > easier to > > reason about overflow risks. > > A comparison routine that is not is probably broken, agreed. > I didn't look through the details of the patch --- I was more > curious whether we had a version of the qsort bug, because > if we do, we should fix that too. > > The patch basically removes the risk of overflow in three routines and > just returns -1, 0, or 1, and adds a comment in one. > > The routines modified do a subtraction of int:s and return that, which > can cause an overflow. This method is used for some int16 as well but > since standard conversions in C will perform the arithmetics in "int" > precision, this cannot overflow, so added a comment there. It might > still be a good idea to follow the same pattern for the int16 routines, > but since there is no bug there, I did not add them to the patch. Doesn't hurt to fix the comparison functions, and +1 on using the same pattern everywhere. However, we use our qsort() with user-defined comparison functions, and we cannot make any guarantees about what they might do. So we must ensure that our qsort() doesn't overflow, no matter what the comparison function does. Looking at our ST_SORT(), it seems safe to me. -- Heikki Linnakangas Neon (https://neon.tech)
On Wed, Feb 07, 2024 at 08:46:56PM +0200, Heikki Linnakangas wrote: > Doesn't hurt to fix the comparison functions, and +1 on using the same > pattern everywhere. I attached a new version of the patch with some small adjustments. I haven't looked through all in-tree qsort() comparators to see if any others need to be adjusted, but we should definitely do so as part of this thread. Mats, are you able to do this? > However, we use our qsort() with user-defined comparison functions, and we > cannot make any guarantees about what they might do. So we must ensure that > our qsort() doesn't overflow, no matter what the comparison function does. > > Looking at our ST_SORT(), it seems safe to me. Cool. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2024-02-07 20:46:56 +0200, Heikki Linnakangas wrote: > > The routines modified do a subtraction of int:s and return that, which > > can cause an overflow. This method is used for some int16 as well but > > since standard conversions in C will perform the arithmetics in "int" > > precision, this cannot overflow, so added a comment there. It might > > still be a good idea to follow the same pattern for the int16 routines, > > but since there is no bug there, I did not add them to the patch. > > Doesn't hurt to fix the comparison functions, and +1 on using the same > pattern everywhere. It actually can hurt - the generated code will often be slower. E.g. #include <stdint.h> int cmp_sub(int16_t a, int16_t b) { return (int32_t) a - (int32_t) b; } int cmp_if(int16_t a, int16_t b) { if (a < b) return -1; if (a > b) return 1; return 0; } yields cmp_sub: movsx eax, di movsx esi, si sub eax, esi ret cmp_if: xor eax, eax cmp di, si mov edx, -1 setg al cmovl eax, edx ret with gcc -O3. With other compilers, e.g. msvc, the difference is considerably bigger, due to msvc for some reason not using cmov. See https://godbolt.org/z/34qerPaPE for a few more details. Now, in most cases this won't matter, the sorting isn't performance critical. But I don't think it's a good idea to standardize on a generally slower pattern. Not that that's a good test, but I did quickly benchmark [1] this with intarray. There's about a 10% difference in performance between using the existing compASC() and one using return (int64) *(const int32 *) a - (int64) *(const int32 *) b; Perhaps we could have a central helper for this somewhere? Greetings, Andres Freund [1] -- prep CREATE EXTENSION IF NOT EXISTS intarray; DROP TABLE IF EXISTS arrays_to_sort; CREATE TABLE arrays_to_sort AS SELECT array_shuffle(a) arr FROM (SELECT ARRAY(SELECT generate_series(1, 1000000)) a), generate_series(1,10); -- bench SELECT (sort(arr))[1] FROM arrays_to_sort;
On Wed, Feb 07, 2024 at 01:48:57PM -0800, Andres Freund wrote: > Now, in most cases this won't matter, the sorting isn't performance > critical. But I don't think it's a good idea to standardize on a generally > slower pattern. > > Not that that's a good test, but I did quickly benchmark [1] this with > intarray. There's about a 10% difference in performance between using the > existing compASC() and one using > return (int64) *(const int32 *) a - (int64) *(const int32 *) b; > > > Perhaps we could have a central helper for this somewhere? Maybe said helper could use __builtin_sub_overflow() and fall back to the slow "if" version only if absolutely necessary. The assembly for that looks encouraging, but I still need to actually test it... -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Hi, On 2024-02-07 16:21:24 -0600, Nathan Bossart wrote: > On Wed, Feb 07, 2024 at 01:48:57PM -0800, Andres Freund wrote: > > Now, in most cases this won't matter, the sorting isn't performance > > critical. But I don't think it's a good idea to standardize on a generally > > slower pattern. > > > > Not that that's a good test, but I did quickly benchmark [1] this with > > intarray. There's about a 10% difference in performance between using the > > existing compASC() and one using > > return (int64) *(const int32 *) a - (int64) *(const int32 *) b; > > > > > > Perhaps we could have a central helper for this somewhere? > > Maybe said helper could use __builtin_sub_overflow() and fall back to the > slow "if" version only if absolutely necessary. I suspect that'll be worse code in the common case, given the cmov generated by gcc & clang for the typical branch-y formulation. But it's worth testing. > The assembly for that looks encouraging, but I still need to actually test > it... Possible. For 16bit upcasting to 32bit is clearly the best way. For 32 bit that doesn't work, given the 32bit return, so we need something more. Greetings, Andres Freund
On Wed, Feb 07, 2024 at 04:42:07PM -0800, Andres Freund wrote: > On 2024-02-07 16:21:24 -0600, Nathan Bossart wrote: >> The assembly for that looks encouraging, but I still need to actually test >> it... > > Possible. For 16bit upcasting to 32bit is clearly the best way. For 32 bit > that doesn't work, given the 32bit return, so we need something more. For the same compASC() test, I see an ~8.4% improvement with your int64 code and a ~3.4% improvement with this: int compASC(const void *a, const void *b) { int result; if (unlikely(pg_sub_s32_overflow(*(const int32 *) a, *(const int32 *) b, &result))) { if (*(const int32 *) a > *(const int32 *) b) return 1; if (*(const int32 *) a < *(const int32 *) b) return -1; return 0; } return result; } -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Hi, On 2024-02-07 19:52:11 -0600, Nathan Bossart wrote: > On Wed, Feb 07, 2024 at 04:42:07PM -0800, Andres Freund wrote: > > On 2024-02-07 16:21:24 -0600, Nathan Bossart wrote: > >> The assembly for that looks encouraging, but I still need to actually test > >> it... > > > > Possible. For 16bit upcasting to 32bit is clearly the best way. For 32 bit > > that doesn't work, given the 32bit return, so we need something more. > > For the same compASC() test, I see an ~8.4% improvement with your int64 > code Just to be clear, that code unfortuntely isn't correct, the return value is a 32 bit integer, so the 64bit difference doesn't help. In contrast to the 16bit case. > and a ~3.4% improvement with this: I guess that's still something. Another branchless variant is (a > b) - (a < b). It seems to get a similar improvement as the overflow-checking version. Greetings, Andres Freund
On Thu, Feb 8, 2024 at 3:06 PM Andres Freund <andres@anarazel.de> wrote: > On 2024-02-07 19:52:11 -0600, Nathan Bossart wrote: > > On Wed, Feb 07, 2024 at 04:42:07PM -0800, Andres Freund wrote: > > > On 2024-02-07 16:21:24 -0600, Nathan Bossart wrote: > > >> The assembly for that looks encouraging, but I still need to actually test > > >> it... > > > > > > Possible. For 16bit upcasting to 32bit is clearly the best way. For 32 bit > > > that doesn't work, given the 32bit return, so we need something more. > > > > For the same compASC() test, I see an ~8.4% improvement with your int64 > > code > > Just to be clear, that code unfortuntely isn't correct, the return value is a > 32 bit integer, so the 64bit difference doesn't help. In contrast to the 16bit > case. Perhaps you could wrap it in a branch-free sign() function so you get a narrow answer? https://stackoverflow.com/questions/14579920/fast-sign-of-integer-in-c
On Wed, Feb 07, 2024 at 06:06:37PM -0800, Andres Freund wrote: > Another branchless variant is (a > b) - (a < b). It seems to get a similar > improvement as the overflow-checking version. Well, that's certainly more elegant. I updated the patch to that approach for now. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Thu, Feb 8, 2024 at 3:38 PM Thomas Munro <thomas.munro@gmail.com> wrote: > Perhaps you could wrap it in a branch-free sign() function so you get > a narrow answer? > > https://stackoverflow.com/questions/14579920/fast-sign-of-integer-in-c Ah, strike that, it is much like the suggested (a > b) - (a < b) but with extra steps...
On Thu, Feb 08, 2024 at 03:49:03PM +1300, Thomas Munro wrote: > On Thu, Feb 8, 2024 at 3:38 PM Thomas Munro <thomas.munro@gmail.com> wrote: >> Perhaps you could wrap it in a branch-free sign() function so you get >> a narrow answer? >> >> https://stackoverflow.com/questions/14579920/fast-sign-of-integer-in-c > > Ah, strike that, it is much like the suggested (a > b) - (a < b) but > with extra steps... Yeah, https://godbolt.org/ indicates that the sign approach compiles to movsx rsi, esi movsx rdi, edi xor eax, eax sub rdi, rsi test rdi, rdi setg al shr rdi, 63 sub eax, edi ret while the approach Andres suggested compiles to xor eax, eax cmp edi, esi setl dl setg al movzx edx, dl sub eax, edx ret -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Mats, I apologize for steamrolling a bit here. I'll take a step back into a reviewer role. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Wed, Feb 7, 2024 at 9:56 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Wed, Feb 07, 2024 at 08:46:56PM +0200, Heikki Linnakangas wrote:
> Doesn't hurt to fix the comparison functions, and +1 on using the same
> pattern everywhere.
I attached a new version of the patch with some small adjustments. I
haven't looked through all in-tree qsort() comparators to see if any others
need to be adjusted, but we should definitely do so as part of this thread.
Mats, are you able to do this?
Sure, I checked them and the only ones remaining are those using int16. Shall I modify those as well?
> However, we use our qsort() with user-defined comparison functions, and we
> cannot make any guarantees about what they might do. So we must ensure that
> our qsort() doesn't overflow, no matter what the comparison function does.
>
> Looking at our ST_SORT(), it seems safe to me.
Cool.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Thu, Feb 8, 2024 at 4:08 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
Mats, I apologize for steamrolling a bit here. I'll take a step back into
a reviewer role.
No worries. :)
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Thu, Feb 8, 2024 at 12:01 PM Mats Kindahl <mats@timescale.com> wrote:
On Wed, Feb 7, 2024 at 9:56 PM Nathan Bossart <nathandbossart@gmail.com> wrote:On Wed, Feb 07, 2024 at 08:46:56PM +0200, Heikki Linnakangas wrote:
> Doesn't hurt to fix the comparison functions, and +1 on using the same
> pattern everywhere.
I attached a new version of the patch with some small adjustments. I
haven't looked through all in-tree qsort() comparators to see if any others
need to be adjusted, but we should definitely do so as part of this thread.
Mats, are you able to do this?Sure, I checked them and the only ones remaining are those using int16. Shall I modify those as well?
Seems your additional patch dealt with at least one of the cases.
> However, we use our qsort() with user-defined comparison functions, and we
> cannot make any guarantees about what they might do. So we must ensure that
> our qsort() doesn't overflow, no matter what the comparison function does.
>
> Looking at our ST_SORT(), it seems safe to me.
Cool.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Thu, Feb 8, 2024 at 3:56 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Thu, Feb 08, 2024 at 03:49:03PM +1300, Thomas Munro wrote:
> On Thu, Feb 8, 2024 at 3:38 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Perhaps you could wrap it in a branch-free sign() function so you get
>> a narrow answer?
>>
>> https://stackoverflow.com/questions/14579920/fast-sign-of-integer-in-c
>
> Ah, strike that, it is much like the suggested (a > b) - (a < b) but
> with extra steps...
Yeah, https://godbolt.org/ indicates that the sign approach compiles to
movsx rsi, esi
movsx rdi, edi
xor eax, eax
sub rdi, rsi
test rdi, rdi
setg al
shr rdi, 63
sub eax, edi
ret
while the approach Andres suggested compiles to
xor eax, eax
cmp edi, esi
setl dl
setg al
movzx edx, dl
sub eax, edx
ret
Here is a patch that fixes existing cases and introduces a macro for this comparison (it uses the (a > b) - (a < b) approach). Not sure where to place the macro nor what a suitable name should be, so feel free to suggest anything.
I also noted that some functions are duplicated and it might be an idea to introduce a few standard functions like pg_qsort_strcmp for, e.g., integers and other common types.
Also noted it is quite common to have this pattern in various places to do lexicographic sort of multiple values and continue the comparison if they are equal. Not sure if that is something we should look at.
Best wishes,
Mats Kindahl
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachment
On Thu, Feb 08, 2024 at 02:16:11PM +0100, Mats Kindahl wrote: > +/* > + * Compare two integers and return -1, 0, or 1 without risking overflow. > + * > + * This macro is used to avoid running into overflow issues because a simple > + * subtraction of the two values when implementing a cmp function for qsort(). > +*/ > +#define INT_CMP(lhs,rhs) (((lhs) > (rhs)) - ((lhs) < (rhs))) I think we should offer a few different macros, i.e., separate macros for int8, uint8, int16, uint16, int32, etc. For int16, we can do something faster like (int32) (lhs) - (int32) (rhs) but for int32, we need to do someting more like what's in the patch. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Nathan Bossart <nathandbossart@gmail.com> writes: > On Thu, Feb 08, 2024 at 02:16:11PM +0100, Mats Kindahl wrote: >> +/* >> + * Compare two integers and return -1, 0, or 1 without risking overflow. >> + * >> + * This macro is used to avoid running into overflow issues because a simple >> + * subtraction of the two values when implementing a cmp function for qsort(). >> +*/ >> +#define INT_CMP(lhs,rhs) (((lhs) > (rhs)) - ((lhs) < (rhs))) > I think we should offer a few different macros, i.e., separate macros for > int8, uint8, int16, uint16, int32, etc. For int16, we can do something > faster like > (int32) (lhs) - (int32) (rhs) > but for int32, we need to do someting more like what's in the patch. Are we okay with using macros that (a) have double evaluation hazards and (b) don't enforce the data types being compared are the same? I think static inlines might be a safer technology. Perhaps it's okay given the only expected use is in qsort comparators, but ... regards, tom lane
Hi, On 2024-02-08 13:44:02 -0500, Tom Lane wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: > > On Thu, Feb 08, 2024 at 02:16:11PM +0100, Mats Kindahl wrote: > >> +/* > >> + * Compare two integers and return -1, 0, or 1 without risking overflow. > >> + * > >> + * This macro is used to avoid running into overflow issues because a simple > >> + * subtraction of the two values when implementing a cmp function for qsort(). > >> +*/ > >> +#define INT_CMP(lhs,rhs) (((lhs) > (rhs)) - ((lhs) < (rhs))) > > > I think we should offer a few different macros, i.e., separate macros for > > int8, uint8, int16, uint16, int32, etc. For int16, we can do something > > faster like +1 > > (int32) (lhs) - (int32) (rhs) > > > but for int32, we need to do someting more like what's in the patch. > > Are we okay with using macros that (a) have double evaluation hazards > and (b) don't enforce the data types being compared are the same? > I think static inlines might be a safer technology. +1 I'd put these static inlines into common/int.h. I don't think this is common enough to warrant being in c.h. Probably also doesn't hurt to have a not quite as generic name as INT_CMP, I'd not be too surprised if that's defined in some library. I think it's worth following int.h's pattern of including [s]igned/[u]nsigned in the name, an efficient implementation for signed might not be the same as for unsigned. And if we use static inlines, we need to do so for correct semantics anyway. Greetings, Andres
On Thu, Feb 08, 2024 at 11:59:54AM -0800, Andres Freund wrote: > On 2024-02-08 13:44:02 -0500, Tom Lane wrote: >> Are we okay with using macros that (a) have double evaluation hazards >> and (b) don't enforce the data types being compared are the same? >> I think static inlines might be a safer technology. > > +1 Agreed on static inlines. > I'd put these static inlines into common/int.h. I don't think this is common > enough to warrant being in c.h. Probably also doesn't hurt to have a not quite > as generic name as INT_CMP, I'd not be too surprised if that's defined in some > library. > > > I think it's worth following int.h's pattern of including [s]igned/[u]nsigned > in the name, an efficient implementation for signed might not be the same as > for unsigned. And if we use static inlines, we need to do so for correct > semantics anyway. Seems reasonable to me. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Thu, Feb 8, 2024 at 7:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Thu, Feb 08, 2024 at 02:16:11PM +0100, Mats Kindahl wrote:
>> +/*
>> + * Compare two integers and return -1, 0, or 1 without risking overflow.
>> + *
>> + * This macro is used to avoid running into overflow issues because a simple
>> + * subtraction of the two values when implementing a cmp function for qsort().
>> +*/
>> +#define INT_CMP(lhs,rhs) (((lhs) > (rhs)) - ((lhs) < (rhs)))
> I think we should offer a few different macros, i.e., separate macros for
> int8, uint8, int16, uint16, int32, etc. For int16, we can do something
> faster like
> (int32) (lhs) - (int32) (rhs)
> but for int32, we need to do someting more like what's in the patch.
Are we okay with using macros that (a) have double evaluation hazards
and (b) don't enforce the data types being compared are the same?
I think static inlines might be a safer technology. Perhaps it's
okay given the only expected use is in qsort comparators, but ...
I picked a macro simply because it can deal with all kinds of integers, but if we want to have separate implementations for each then inline functions work just as well and will be safer.
Best wishes,
Mats Kindahl
Mats Kindahl
regards, tom lane
On Thu, Feb 8, 2024 at 9:07 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Thu, Feb 08, 2024 at 11:59:54AM -0800, Andres Freund wrote:
> On 2024-02-08 13:44:02 -0500, Tom Lane wrote:
>> Are we okay with using macros that (a) have double evaluation hazards
>> and (b) don't enforce the data types being compared are the same?
>> I think static inlines might be a safer technology.
>
> +1
Agreed on static inlines.
Seems to be a general consensus on static inlines. I'll get a new patch.
> I'd put these static inlines into common/int.h. I don't think this is common
> enough to warrant being in c.h. Probably also doesn't hurt to have a not quite
> as generic name as INT_CMP, I'd not be too surprised if that's defined in some
> library.
>
>
> I think it's worth following int.h's pattern of including [s]igned/[u]nsigned
> in the name, an efficient implementation for signed might not be the same as
> for unsigned. And if we use static inlines, we need to do so for correct
> semantics anyway.
Seems reasonable to me.
Agree.
Best wishes,
Mats Kindahl
Mats Kindahl
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Nathan Bossart <nathandbossart@gmail.com> writes: > On Thu, Feb 08, 2024 at 11:59:54AM -0800, Andres Freund wrote: >> I'd put these static inlines into common/int.h. I don't think this is common >> enough to warrant being in c.h. Probably also doesn't hurt to have a not quite >> as generic name as INT_CMP, I'd not be too surprised if that's defined in some >> library. >> >> I think it's worth following int.h's pattern of including [s]igned/[u]nsigned >> in the name, an efficient implementation for signed might not be the same as >> for unsigned. And if we use static inlines, we need to do so for correct >> semantics anyway. > Seems reasonable to me. +1 here also. regards, tom lane
On Thu, Feb 8, 2024 at 9:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Thu, Feb 08, 2024 at 11:59:54AM -0800, Andres Freund wrote:
>> I'd put these static inlines into common/int.h. I don't think this is common
>> enough to warrant being in c.h. Probably also doesn't hurt to have a not quite
>> as generic name as INT_CMP, I'd not be too surprised if that's defined in some
>> library.
>>
>> I think it's worth following int.h's pattern of including [s]igned/[u]nsigned
>> in the name, an efficient implementation for signed might not be the same as
>> for unsigned. And if we use static inlines, we need to do so for correct
>> semantics anyway.
> Seems reasonable to me.
+1 here also.
Here is a new version introducing pg_cmp_s32 and friends and use them instead of the INT_CMP macro introduced before. It also moves the definitions to common/int.h and adds that as an include to all locations using these functions.
Note that for integers with sizes less than sizeof(int), C standard conversions will convert the values to "int" before doing the arithmetic, so no casting is *necessary*. I did not force the 16-bit functions to return -1 or 1 and have updated the comment accordingly.
The types "int" and "size_t" are treated as s32 and u32 respectively since that seems to be the case for most of the code, even if strictly not correct (size_t can be an unsigned long int for some architecture).
I also noted that in many situations size_t values are treated as "int" so there is an overflow risk here, even if small. For example, the result of "list_length" is assigned to an integer. I do not think this is an immediate concern, but just wanted to mention it.
Best wishes,
Mats Kindahl
Mats Kindahl
regards, tom lane
Attachment
> On 8 Feb 2024, at 06:52, Nathan Bossart <nathandbossart@gmail.com> wrote: > > For the same compASC() test, I see an ~8.4% improvement with your int64 > code and a ~3.4% improvement with this: If we care about branch prediction in comparison function, maybe we could produce sorting that inlines comparator, thus eliminatingfunction call to comparator? We convert comparison logic to int, to extract comparison back then. I bet “call" is more expensive than “if". Best regards, Andrey Borodin.
On Fri, Feb 09, 2024 at 08:52:26AM +0100, Mats Kindahl wrote: > Here is a new version introducing pg_cmp_s32 and friends and use them > instead of the INT_CMP macro introduced before. It also moves the > definitions to common/int.h and adds that as an include to all locations > using these functions. Thanks for the new version of the patch. > Note that for integers with sizes less than sizeof(int), C standard > conversions will convert the values to "int" before doing the arithmetic, > so no casting is *necessary*. I did not force the 16-bit functions to > return -1 or 1 and have updated the comment accordingly. It might not be necessary, but this is one of those places where I would add casting anyway to make it abundantly clear what we are expecting to happen and why it is safe. > The types "int" and "size_t" are treated as s32 and u32 respectively since > that seems to be the case for most of the code, even if strictly not > correct (size_t can be an unsigned long int for some architecture). Why is it safe to do this? > - return ((const SPLITCOST *) a)->cost - ((const SPLITCOST *) b)->cost; > + return INT_CMP(((const SPLITCOST *) a)->cost, ((const SPLITCOST *) b)->cost); The patch still contains several calls to INT_CMP. > +/*------------------------------------------------------------------------ > + * Comparison routines for integers > + *------------------------------------------------------------------------ > + */ I'd suggest separating this part out to a 0001 patch to make it easier to review. The 0002 patch could take care of converting the existing qsort comparators. > +static inline int > +pg_cmp_s16(int16 a, int16 b) > +{ > + return a - b; > +} > + > +static inline int > +pg_cmp_u16(uint16 a, uint16 b) > +{ > + return a - b; > +} > + > +static inline int > +pg_cmp_s32(int32 a, int32 b) > +{ > + return (a > b) - (a < b); > +} > + > +static inline int > +pg_cmp_u32(uint32 a, uint32 b) > +{ > + return (a > b) - (a < b); > +} > + > +static inline int > +pg_cmp_s64(int64 a, int64 b) > +{ > + return (a > b) - (a < b); > +} > + > +static inline int > +pg_cmp_u64(uint64 a, uint64 b) > +{ > + return (a > b) - (a < b); > +} As suggested above, IMHO we should be rather liberal with the casting to ensure it is abundantly clear what is happening here. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Nathan Bossart <nathandbossart@gmail.com> writes: > On Fri, Feb 09, 2024 at 08:52:26AM +0100, Mats Kindahl wrote: >> The types "int" and "size_t" are treated as s32 and u32 respectively since >> that seems to be the case for most of the code, even if strictly not >> correct (size_t can be an unsigned long int for some architecture). > Why is it safe to do this? We do pretty much assume that "int" is "int32". But I agree that assuming anything about the width of size_t is bad. I think we need a separate pg_cmp_size() or pg_cmp_size_t(). regards, tom lane
On Fri, Feb 09, 2024 at 01:19:49PM +0500, Andrey M. Borodin wrote: > If we care about branch prediction in comparison function, maybe we could > produce sorting that inlines comparator, thus eliminating function call > to comparator? We convert comparison logic to int, to extract comparison > back then. > > I bet “call" is more expensive than “if". It might make sense to have a couple of built-in qsort implementations for pointers to integers, pointers to unsigned integers, etc. However, a lot of current use-cases require inspecting specific fields of structs, so (assuming I understand your proposal correctly), we'd end up with many qsort implementations. If that can be made simple and elegant and demonstrates substantial improvements, then it might be worth considering, but I'm somewhat skeptical that the current uses are performance-sensitive enough to be worth the effort. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Hi, On 2024-02-09 13:19:49 +0500, Andrey M. Borodin wrote: > > On 8 Feb 2024, at 06:52, Nathan Bossart <nathandbossart@gmail.com> wrote: > > For the same compASC() test, I see an ~8.4% improvement with your int64 > > code and a ~3.4% improvement with this: > > If we care about branch prediction in comparison function, maybe we could > produce sorting that inlines comparator, thus eliminating function call to > comparator? We convert comparison logic to int, to extract comparison back > then. We have some infrastructure for that actually, see sort_template.h. But perhaps we should define a static inline of the generic pg_qsort() even. OTOH, plenty places that'll just end up to a pointless amount of code emitted to sort ~5 elements on average. > I bet “call" is more expensive than “if". Not really in this case. The call is perfectly predictable - a single qsort() will use the same callback for every comparison, whereas the if is perfectly *unpredictable*. A branch mispredict is far more expensive than a correctly predicted function call. Greetings, Andres
> On 9 Feb 2024, at 21:32, Nathan Bossart <nathandbossart@gmail.com> wrote: > a lot > of current use-cases require inspecting specific fields of structs Yes, I'm proposing to pass to sorting routine not a comparator, but value extractor. And then rely on operators <,>,==. In a pseudocode: instead of sort(array, (a,b)->a.field-b.field) use sort(array, x->x.field). And rely on "field" being comparable. > If that can be made simple and elegant and > demonstrates substantial improvements I'll try to produce a PoC and measure it with Andres' intarray test. > On 9 Feb 2024, at 23:40, Andres Freund <andres@anarazel.de> wrote: > > We have some infrastructure for that actually, see sort_template.h. But > perhaps we should define a static inline of the generic pg_qsort() even. OTOH, > plenty places that'll just end up to a pointless amount of code emitted to > sort ~5 elements on average. I think there might be another benefit. It's easier to think about values order than function comparator that returns -1,0,+1... >> I bet “call" is more expensive than “if". > > Not really in this case. The call is perfectly predictable - a single qsort() > will use the same callback for every comparison, whereas the if is perfectly > *unpredictable*. A branch mispredict is far more expensive than a correctly > predicted function call. Oh, make sense... I did not understand that. But does cpu predicts what instruction to fetch even after a call instruction?These cpus are really neat things... so, probably, yes, it does. Best regards, Andrey Borodin.
Hi, On 2024-02-10 00:02:08 +0500, Andrey Borodin wrote: > > Not really in this case. The call is perfectly predictable - a single qsort() > > will use the same callback for every comparison, whereas the if is perfectly > > *unpredictable*. A branch mispredict is far more expensive than a correctly > > predicted function call. > > Oh, make sense... I did not understand that. But does cpu predicts what > instruction to fetch even after a call instruction? Yes, it does predict that. Both for branches and calls (which are just special kinds of branches in the end). If you want to find more about this, the term to search for is "branch target buffer". There's also predictions about where a function return will jump to, since that obviously can differ. Modern predictors aren't just taking the instruction pointer into account, to predict where a jump/call will go to. Tey take the history of recent branches into account, i.e. the same instruction will be predicted to jump to different locations, depending on where the current function was called from. This is important as a function obviously can behave very differently depending on the input. > These cpus are really neat things... Indeed. Greetings, Andres Freund
On Fri, Feb 9, 2024 at 5:24 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Feb 09, 2024 at 08:52:26AM +0100, Mats Kindahl wrote:
> Here is a new version introducing pg_cmp_s32 and friends and use them
> instead of the INT_CMP macro introduced before. It also moves the
> definitions to common/int.h and adds that as an include to all locations
> using these functions.
Thanks for the new version of the patch.
> Note that for integers with sizes less than sizeof(int), C standard
> conversions will convert the values to "int" before doing the arithmetic,
> so no casting is *necessary*. I did not force the 16-bit functions to
> return -1 or 1 and have updated the comment accordingly.
It might not be necessary, but this is one of those places where I would
add casting anyway to make it abundantly clear what we are expecting to
happen and why it is safe.
I'll add it.
> The types "int" and "size_t" are treated as s32 and u32 respectively since
> that seems to be the case for most of the code, even if strictly not
> correct (size_t can be an unsigned long int for some architecture).
Why is it safe to do this?
> - return ((const SPLITCOST *) a)->cost - ((const SPLITCOST *) b)->cost;
> + return INT_CMP(((const SPLITCOST *) a)->cost, ((const SPLITCOST *) b)->cost);
The patch still contains several calls to INT_CMP.
I'll fix it.
> +/*------------------------------------------------------------------------
> + * Comparison routines for integers
> + *------------------------------------------------------------------------
> + */
I'd suggest separating this part out to a 0001 patch to make it easier to
review. The 0002 patch could take care of converting the existing qsort
comparators.
Ok. Will split it out into two patches.
> +static inline int
> +pg_cmp_s16(int16 a, int16 b)
> +{
> + return a - b;
> +}
> +
> +static inline int
> +pg_cmp_u16(uint16 a, uint16 b)
> +{
> + return a - b;
> +}
> +
> +static inline int
> +pg_cmp_s32(int32 a, int32 b)
> +{
> + return (a > b) - (a < b);
> +}
> +
> +static inline int
> +pg_cmp_u32(uint32 a, uint32 b)
> +{
> + return (a > b) - (a < b);
> +}
> +
> +static inline int
> +pg_cmp_s64(int64 a, int64 b)
> +{
> + return (a > b) - (a < b);
> +}
> +
> +static inline int
> +pg_cmp_u64(uint64 a, uint64 b)
> +{
> + return (a > b) - (a < b);
> +}
As suggested above, IMHO we should be rather liberal with the casting to
ensure it is abundantly clear what is happening here.
Ok.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Fri, Feb 9, 2024 at 5:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Fri, Feb 09, 2024 at 08:52:26AM +0100, Mats Kindahl wrote:
>> The types "int" and "size_t" are treated as s32 and u32 respectively since
>> that seems to be the case for most of the code, even if strictly not
>> correct (size_t can be an unsigned long int for some architecture).
> Why is it safe to do this?
We do pretty much assume that "int" is "int32". But I agree that
assuming anything about the width of size_t is bad. I think we need
a separate pg_cmp_size() or pg_cmp_size_t().
I added precisely one first, but removed it when I saw that all uses assumed that it was an int. :)
I'll add it back.
Best wishes,
Mats Kindahl
Mats Kindahl
regards, tom lane
On Fri, Feb 9, 2024 at 5:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Fri, Feb 09, 2024 at 08:52:26AM +0100, Mats Kindahl wrote:
>> The types "int" and "size_t" are treated as s32 and u32 respectively since
>> that seems to be the case for most of the code, even if strictly not
>> correct (size_t can be an unsigned long int for some architecture).
> Why is it safe to do this?
We do pretty much assume that "int" is "int32". But I agree that
assuming anything about the width of size_t is bad. I think we need
a separate pg_cmp_size() or pg_cmp_size_t().
Do we want to have something similar for "int" as well? It seems to be quite common and even though it usually is an int32, it does not have to be.
Best wishes,
Mats Kindahl
Mats Kindahl
regards, tom lane
On Fri, Feb 9, 2024 at 8:26 PM Mats Kindahl <mats@timescale.com> wrote:
On Fri, Feb 9, 2024 at 5:24 PM Nathan Bossart <nathandbossart@gmail.com> wrote:On Fri, Feb 09, 2024 at 08:52:26AM +0100, Mats Kindahl wrote:
> Here is a new version introducing pg_cmp_s32 and friends and use them
> instead of the INT_CMP macro introduced before. It also moves the
> definitions to common/int.h and adds that as an include to all locations
> using these functions.
Thanks for the new version of the patch.
> Note that for integers with sizes less than sizeof(int), C standard
> conversions will convert the values to "int" before doing the arithmetic,
> so no casting is *necessary*. I did not force the 16-bit functions to
> return -1 or 1 and have updated the comment accordingly.
It might not be necessary, but this is one of those places where I would
add casting anyway to make it abundantly clear what we are expecting to
happen and why it is safe.I'll add it.> The types "int" and "size_t" are treated as s32 and u32 respectively since
> that seems to be the case for most of the code, even if strictly not
> correct (size_t can be an unsigned long int for some architecture).
Why is it safe to do this?
> - return ((const SPLITCOST *) a)->cost - ((const SPLITCOST *) b)->cost;
> + return INT_CMP(((const SPLITCOST *) a)->cost, ((const SPLITCOST *) b)->cost);
The patch still contains several calls to INT_CMP.I'll fix it.
> +/*------------------------------------------------------------------------
> + * Comparison routines for integers
> + *------------------------------------------------------------------------
> + */
I'd suggest separating this part out to a 0001 patch to make it easier to
review. The 0002 patch could take care of converting the existing qsort
comparators.Ok. Will split it out into two patches.
> +static inline int
> +pg_cmp_s16(int16 a, int16 b)
> +{
> + return a - b;
> +}
> +
> +static inline int
> +pg_cmp_u16(uint16 a, uint16 b)
> +{
> + return a - b;
> +}
> +
> +static inline int
> +pg_cmp_s32(int32 a, int32 b)
> +{
> + return (a > b) - (a < b);
> +}
> +
> +static inline int
> +pg_cmp_u32(uint32 a, uint32 b)
> +{
> + return (a > b) - (a < b);
> +}
> +
> +static inline int
> +pg_cmp_s64(int64 a, int64 b)
> +{
> + return (a > b) - (a < b);
> +}
> +
> +static inline int
> +pg_cmp_u64(uint64 a, uint64 b)
> +{
> + return (a > b) - (a < b);
> +}
As suggested above, IMHO we should be rather liberal with the casting to
ensure it is abundantly clear what is happening here.Ok.
QQ: right now it looks like this:
static inline intpg_cmp_u16(uint16 a, uint16 b){
return (int32)a - (int32)b;
}
and
static inline intpg_cmp_u32(uint32 a, uint32 b){
return (a > b) - (a < b);
}
I think that is clear enough, but do you want more casts added for the return value as well?
Best wishes,
Mats Kindahl
Mats Kindahl
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Fri, Feb 09, 2024 at 08:40:47PM +0100, Mats Kindahl wrote: > On Fri, Feb 9, 2024 at 5:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> We do pretty much assume that "int" is "int32". But I agree that >> assuming anything about the width of size_t is bad. I think we need >> a separate pg_cmp_size() or pg_cmp_size_t(). > > Do we want to have something similar for "int" as well? It seems to be > quite common and even though it usually is an int32, it does not have to be. I don't think we need separate functions for int and int32. As Tom noted, we assume they are the same. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On 2024-02-09 14:04:29 -0600, Nathan Bossart wrote: > On Fri, Feb 09, 2024 at 08:40:47PM +0100, Mats Kindahl wrote: > > On Fri, Feb 9, 2024 at 5:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> We do pretty much assume that "int" is "int32". But I agree that > >> assuming anything about the width of size_t is bad. I think we need > >> a separate pg_cmp_size() or pg_cmp_size_t(). > > > > Do we want to have something similar for "int" as well? It seems to be > > quite common and even though it usually is an int32, it does not have to be. > > I don't think we need separate functions for int and int32. As Tom noted, > we assume they are the same. +1
On Fri, Feb 09, 2024 at 08:43:21PM +0100, Mats Kindahl wrote: > QQ: right now it looks like this: > > static inline int > pg_cmp_u16(uint16 a, uint16 b) > { > > return (int32)a - (int32)b; > > } > > > and > > static inline int > pg_cmp_u32(uint32 a, uint32 b) > { > > return (a > b) - (a < b); > > } > > > I think that is clear enough, but do you want more casts added for the > return value as well? I think that is reasonably clear. The latter does require you to know that < and > return (int) 0 or (int) 1, which might be worth a short comment. But that's just nitpicking... -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Fri, Feb 9, 2024 at 9:08 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Feb 09, 2024 at 08:43:21PM +0100, Mats Kindahl wrote:
> QQ: right now it looks like this:
>
> static inline int
> pg_cmp_u16(uint16 a, uint16 b)
> {
>
> return (int32)a - (int32)b;
>
> }
>
>
> and
>
> static inline int
> pg_cmp_u32(uint32 a, uint32 b)
> {
>
> return (a > b) - (a < b);
>
> }
>
>
> I think that is clear enough, but do you want more casts added for the
> return value as well?
I think that is reasonably clear. The latter does require you to know that
< and > return (int) 0 or (int) 1, which might be worth a short comment.
But that's just nitpicking...
Hi all,
Split the code into two patches: one that just adds the functions (including the new pg_cmp_size()) to common/int.h and one that starts using them. I picked the name "pg_cmp_size" rather than "pg_cmp_size_t" since "_t" is usually used as a suffix for types.
I added a comment to the (a > b) - (a < b) return and have also added casts to (int32) for the int16 and uint16 functions (we need a signed int for uin16 since we need to be able to get a negative number).
Changed the type of two instances that had an implicit cast from size_t to int and used the new pg_,cmp_size() function.
Also fixed the missed replacements in the "contrib" directory.
Best wishes,
Mats Kindahl
Mats Kindahl
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachment
On Sat, Feb 10, 2024 at 08:59:06AM +0100, Mats Kindahl wrote: > Split the code into two patches: one that just adds the functions > (including the new pg_cmp_size()) to common/int.h and one that starts using > them. I picked the name "pg_cmp_size" rather than "pg_cmp_size_t" since > "_t" is usually used as a suffix for types. > > I added a comment to the (a > b) - (a < b) return and have also added casts > to (int32) for the int16 and uint16 functions (we need a signed int for > uin16 since we need to be able to get a negative number). > > Changed the type of two instances that had an implicit cast from size_t to > int and used the new pg_,cmp_size() function. > > Also fixed the missed replacements in the "contrib" directory. Thanks for the new patches. I think the comparison in resowner.c is backwards, and I think we should expand on some of the commentary in int.h. For example, the comment at the top of int.h seems very tailored to the existing functions and should probably be adjusted. And the "comparison routines for integers" comment might benefit from some additional details about the purpose and guarantees of the new functions. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Sat, Feb 10, 2024 at 9:53 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Sat, Feb 10, 2024 at 08:59:06AM +0100, Mats Kindahl wrote:
> Split the code into two patches: one that just adds the functions
> (including the new pg_cmp_size()) to common/int.h and one that starts using
> them. I picked the name "pg_cmp_size" rather than "pg_cmp_size_t" since
> "_t" is usually used as a suffix for types.
>
> I added a comment to the (a > b) - (a < b) return and have also added casts
> to (int32) for the int16 and uint16 functions (we need a signed int for
> uin16 since we need to be able to get a negative number).
>
> Changed the type of two instances that had an implicit cast from size_t to
> int and used the new pg_,cmp_size() function.
>
> Also fixed the missed replacements in the "contrib" directory.
Thanks for the new patches. I think the comparison in resowner.c is
backwards,
Thanks for catching that.
and I think we should expand on some of the commentary in int.h.
For example, the comment at the top of int.h seems very tailored to the
existing functions and should probably be adjusted.
I rewrote the beginning to the following, does that look good?
* int.h* Routines to perform signed and unsigned integer arithmetics, including* comparisons, in an overflow-safe way.
And the "comparison
routines for integers" comment might benefit from some additional details
about the purpose and guarantees of the new functions.
I expanded that into the following. WDYT?
/*------------------------------------------------------------------------* Comparison routines for integers.** These routines are used to implement comparison functions for, e.g.,* qsort(). They are designed to be efficient and not risk overflows in* internal computations that could cause strange results, such as INT_MIN >* INT_MAX if you just return "lhs - rhs".*------------------------------------------------------------------------
On Sun, Feb 11, 2024 at 03:44:42PM +0100, Mats Kindahl wrote: > On Sat, Feb 10, 2024 at 9:53 PM Nathan Bossart <nathandbossart@gmail.com> > wrote: >> and I think we should expand on some of the commentary in int.h. >> For example, the comment at the top of int.h seems very tailored to the >> existing functions and should probably be adjusted. > > > I rewrote the beginning to the following, does that look good? > > * int.h > * Routines to perform signed and unsigned integer arithmetics, including > * comparisons, in an overflow-safe way. > > > >> And the "comparison >> routines for integers" comment might benefit from some additional details >> about the purpose and guarantees of the new functions. >> > > I expanded that into the following. WDYT? > > /*------------------------------------------------------------------------ > * Comparison routines for integers. > * > * These routines are used to implement comparison functions for, e.g., > * qsort(). They are designed to be efficient and not risk overflows in > * internal computations that could cause strange results, such as INT_MIN > > * INT_MAX if you just return "lhs - rhs". > *------------------------------------------------------------------------ LGTM. I might editorialize a bit before committing, but I think your proposed wording illustrates the thrust of the change. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Mon, Feb 12, 2024 at 4:57 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Sun, Feb 11, 2024 at 03:44:42PM +0100, Mats Kindahl wrote:
> On Sat, Feb 10, 2024 at 9:53 PM Nathan Bossart <nathandbossart@gmail.com>
> wrote:
>> and I think we should expand on some of the commentary in int.h.
>> For example, the comment at the top of int.h seems very tailored to the
>> existing functions and should probably be adjusted.
>
>
> I rewrote the beginning to the following, does that look good?
>
> * int.h
> * Routines to perform signed and unsigned integer arithmetics, including
> * comparisons, in an overflow-safe way.
>
>
>
>> And the "comparison
>> routines for integers" comment might benefit from some additional details
>> about the purpose and guarantees of the new functions.
>>
>
> I expanded that into the following. WDYT?
>
> /*------------------------------------------------------------------------
> * Comparison routines for integers.
> *
> * These routines are used to implement comparison functions for, e.g.,
> * qsort(). They are designed to be efficient and not risk overflows in
> * internal computations that could cause strange results, such as INT_MIN >
> * INT_MAX if you just return "lhs - rhs".
> *------------------------------------------------------------------------
LGTM. I might editorialize a bit before committing, but I think your
proposed wording illustrates the thrust of the change.
Thanks Nathan,
Here are the two fixed patches.
Best wishes,
Mats Kindahl
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, Feb 12, 2024 at 06:09:06PM +0100, Mats Kindahl wrote: > Here are the two fixed patches. These patches look pretty good to me. Barring additional feedback, I'll plan on committing them in the next few days. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Mon, Feb 12, 2024 at 5:51 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Mon, Feb 12, 2024 at 06:09:06PM +0100, Mats Kindahl wrote:
> > Here are the two fixed patches.
>
> These patches look pretty good to me. Barring additional feedback, I'll
> plan on committing them in the next few days.
>
Also did some tests locally and everything went well. Patches apply to the main branch without issues and all regression (including checkworld) pass!!
Regards
--
Fabrízio de Royes Mello
Hi, On 2024-02-12 14:51:38 -0600, Nathan Bossart wrote: > On Mon, Feb 12, 2024 at 06:09:06PM +0100, Mats Kindahl wrote: > > Here are the two fixed patches. > > These patches look pretty good to me. Barring additional feedback, I'll > plan on committing them in the next few days. One thing that's worth checking is if this ends up with *worse* code when the comparators are inlined. I think none of the changed comparators will end up getting used with an inlined sort, but ... The reason we could end up with worse code is that when inlining the comparisons would make less sense for the compiler. Consider e.g. return DO_COMPARE(a, b) < 0 ? (DO_COMPARE(b, c) < 0 ? b : (DO_COMPARE(a, c) < 0 ? c : a)) : (DO_COMPARE(b, c) > 0 ? b : (DO_COMPARE(a, c) < 0 ? a : c)); With a naive implementation the compiler will understand it only cares about a < b, not about the other possibilities. I'm not sure that's still true with the more complicated optimized version. Greetings, Andres Freund
On Mon, Feb 12, 2024 at 01:31:30PM -0800, Andres Freund wrote: > One thing that's worth checking is if this ends up with *worse* code when the > comparators are inlined. I think none of the changed comparators will end up > getting used with an inlined sort, but ... Yeah, AFAICT the only inlined sorts are in tuplesort.c and bufmgr.c, and the patches don't touch those files. > The reason we could end up with worse code is that when inlining the > comparisons would make less sense for the compiler. Consider e.g. > return DO_COMPARE(a, b) < 0 ? > (DO_COMPARE(b, c) < 0 ? b : (DO_COMPARE(a, c) < 0 ? c : a)) > : (DO_COMPARE(b, c) > 0 ? b : (DO_COMPARE(a, c) < 0 ? a : c)); > > With a naive implementation the compiler will understand it only cares about > a < b, not about the other possibilities. I'm not sure that's still true with > the more complicated optimized version. You aren't kidding [0]. Besides perhaps adding a comment in sort_template.h, is there anything else you think we should do about this now? [0] https://godbolt.org/z/bbTqK54zK -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Hi, On 2024-02-12 17:04:23 -0600, Nathan Bossart wrote: > On Mon, Feb 12, 2024 at 01:31:30PM -0800, Andres Freund wrote: > > One thing that's worth checking is if this ends up with *worse* code when the > > comparators are inlined. I think none of the changed comparators will end up > > getting used with an inlined sort, but ... > > Yeah, AFAICT the only inlined sorts are in tuplesort.c and bufmgr.c, and > the patches don't touch those files. > > > The reason we could end up with worse code is that when inlining the > > comparisons would make less sense for the compiler. Consider e.g. > > return DO_COMPARE(a, b) < 0 ? > > (DO_COMPARE(b, c) < 0 ? b : (DO_COMPARE(a, c) < 0 ? c : a)) > > : (DO_COMPARE(b, c) > 0 ? b : (DO_COMPARE(a, c) < 0 ? a : c)); > > > > With a naive implementation the compiler will understand it only cares about > > a < b, not about the other possibilities. I'm not sure that's still true with > > the more complicated optimized version. > > You aren't kidding [0]. Besides perhaps adding a comment in > sort_template.h, is there anything else you think we should do about this > now? I'd add also a comment to the new functions. I think it's fine otherwise. I wish there were formulation that'd be optimal for both cases, but this way we can at least adapt all places at once if either find a better formulation or change all our sorts to happen via an inline implementation of qsort or such. Greetings, Andres
On Tue, Feb 13, 2024 at 12:41 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2024-02-12 17:04:23 -0600, Nathan Bossart wrote:
> On Mon, Feb 12, 2024 at 01:31:30PM -0800, Andres Freund wrote:
> > One thing that's worth checking is if this ends up with *worse* code when the
> > comparators are inlined. I think none of the changed comparators will end up
> > getting used with an inlined sort, but ...
>
> Yeah, AFAICT the only inlined sorts are in tuplesort.c and bufmgr.c, and
> the patches don't touch those files.
>
> > The reason we could end up with worse code is that when inlining the
> > comparisons would make less sense for the compiler. Consider e.g.
> > return DO_COMPARE(a, b) < 0 ?
> > (DO_COMPARE(b, c) < 0 ? b : (DO_COMPARE(a, c) < 0 ? c : a))
> > : (DO_COMPARE(b, c) > 0 ? b : (DO_COMPARE(a, c) < 0 ? a : c));
> >
> > With a naive implementation the compiler will understand it only cares about
> > a < b, not about the other possibilities. I'm not sure that's still true with
> > the more complicated optimized version.
>
> You aren't kidding [0]. Besides perhaps adding a comment in
> sort_template.h, is there anything else you think we should do about this
> now?
I'd add also a comment to the new functions. I think it's fine otherwise. I
wish there were formulation that'd be optimal for both cases, but this way we
can at least adapt all places at once if either find a better formulation or
change all our sorts to happen via an inline implementation of qsort or such.
I suspect that this has to do with the fact that we "abuse" the type system by performing arithmetics on booleans converted to integers and the compilers do not have rules for dealing with these.
For example, with the inline function "static inline cmp(a,b) { return a < b ? -1 : a > b ? 1 : 0; }"
Which trivially can be rewritten by the compiler using very basic rules, as follows:
DO_COMPARE(a,b)
(a < b ? -1 : a > b ? 1 : 0) < 0
a < b ? (-1 < 0) : a > b ? (1 < 0) : (0 < 0)
a < b ? true : a > b ? false : false
a < b ? true : a > b ? false : false
a < b ? true : false
a < b
When it comes to (a < b) - (a > b) there are no such rules added since it is not a very common case.
Clang fares better for this case and at least shows the two alternatives as equal.
Maybe we should change to use the original version equivalent to the inline function above since that works better with surrounding code?
Best wishes,
Mats Kindahl
Mats Kindahl
Greetings,
Andres
On Tue, Feb 13, 2024 at 09:43:18AM +0100, Mats Kindahl wrote: > Maybe we should change to use the original version equivalent to the inline > function above since that works better with surrounding code? I don't think that's necessary. We just need to be cognizant of it when using inlined sorts, which are pretty rare at the moment. Your patches should still be a net improvement in many cases because most qsorts use a function pointer to the comparator. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Here is what I have staged for commit. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Fri, Feb 16, 2024 at 12:32 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
Here is what I have staged for commit.
Looks good to me.
Checked that all of the comparisons are in the expected order, except inside compDESC, cmp_lsn, and resource_priority_cmp, where the order is reversed.
Best wishes,
Mats Kindahl
Mats Kindahl
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Fri, Feb 16, 2024 at 01:45:52PM +0100, Mats Kindahl wrote: > Looks good to me. Committed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Fri, Feb 16, 2024 at 9:09 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Feb 16, 2024 at 01:45:52PM +0100, Mats Kindahl wrote:
> Looks good to me.
Committed.
Thanks Nathan!
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com