Thread: glibc qsort() vulnerability

glibc qsort() vulnerability

From
Mats Kindahl
Date:
Hi hackers,

There is a bug in glibc's qsort() algorithm that runs the risk of creating an out-of-bounds error if the comparison function is not transitive, for example, if subtraction is used so that it can create an overflow.


I checked the existing functions in the latest version of Postgres source code and most are safe, but there were a few ones that could lead to overflow. I do not know if these can actually lead to problems, but better safe than sorry, so I created a patch to fix those few cases and add a comment to one case that was not clear that it could not overflow.

Best wishes,
Mats Kindahl, Timescale
Attachment

Re: glibc qsort() vulnerability

From
Tom Lane
Date:
Mats Kindahl <mats@timescale.com> writes:
> There is a bug in glibc's qsort() algorithm that runs the risk of creating
> an out-of-bounds error if the comparison function is not transitive, for
> example, if subtraction is used so that it can create an overflow.

We don't use glibc's qsort.  Have you checked whether there's a
problem with the code we do use?

            regards, tom lane



Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Tue, Feb 06, 2024 at 10:11:16AM -0500, Tom Lane wrote:
> Mats Kindahl <mats@timescale.com> writes:
>> There is a bug in glibc's qsort() algorithm that runs the risk of creating
>> an out-of-bounds error if the comparison function is not transitive, for
>> example, if subtraction is used so that it can create an overflow.
> 
> We don't use glibc's qsort.  Have you checked whether there's a
> problem with the code we do use?

Oh, interesting.  I didn't know that!  As of commit 6edd2b4, we've used an
in-tree qsort() for everything.  port.h has the following line:

    #define qsort(a,b,c,d) pg_qsort(a,b,c,d)

I see a handful of callers that use pg_qsort() directly, which IMO is odd.
I would've expected that to be something different if I didn't know about
that macro.  Maybe we should change those to qsort()...

Even if the glibc issue doesn't apply to Postgres, I'm tempted to suggest
that we make it project policy that comparison functions must be
transitive.  There might be no real issues today, but if we write all
comparison functions the way Mats is suggesting, it should be easier to
reason about overflow risks.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes:
> Even if the glibc issue doesn't apply to Postgres, I'm tempted to suggest
> that we make it project policy that comparison functions must be
> transitive.  There might be no real issues today, but if we write all
> comparison functions the way Mats is suggesting, it should be easier to
> reason about overflow risks.

A comparison routine that is not is probably broken, agreed.
I didn't look through the details of the patch --- I was more
curious whether we had a version of the qsort bug, because
if we do, we should fix that too.

            regards, tom lane



Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Tue, Feb 06, 2024 at 03:55:58PM -0500, Tom Lane wrote:
> A comparison routine that is not is probably broken, agreed.
> I didn't look through the details of the patch --- I was more
> curious whether we had a version of the qsort bug, because
> if we do, we should fix that too.

+1

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:
On Tue, Feb 6, 2024 at 4:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mats Kindahl <mats@timescale.com> writes:
> There is a bug in glibc's qsort() algorithm that runs the risk of creating
> an out-of-bounds error if the comparison function is not transitive, for
> example, if subtraction is used so that it can create an overflow.

We don't use glibc's qsort.  Have you checked whether there's a
problem with the code we do use?

Interesting. No, haven't checked. Will do that.

Best wishes,
Mats Kindahl 

                        regards, tom lane

Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:


On Tue, Feb 6, 2024 at 9:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nathan Bossart <nathandbossart@gmail.com> writes:
> Even if the glibc issue doesn't apply to Postgres, I'm tempted to suggest
> that we make it project policy that comparison functions must be
> transitive.  There might be no real issues today, but if we write all
> comparison functions the way Mats is suggesting, it should be easier to
> reason about overflow risks.

A comparison routine that is not is probably broken, agreed.
I didn't look through the details of the patch --- I was more
curious whether we had a version of the qsort bug, because
if we do, we should fix that too.

The patch basically removes the risk of overflow in three routines and just returns -1, 0, or 1, and adds a comment in one.

The routines modified do a subtraction of int:s and return that, which can cause an overflow. This method is used for some int16 as well but since standard conversions in C will perform the arithmetics in "int" precision, this cannot overflow, so added a comment there. It might still be a good idea to follow the same pattern for the int16 routines, but since there is no bug there, I did not add them to the patch.

Best wishes,
Mats Kindahl

 

                        regards, tom lane

Re: glibc qsort() vulnerability

From
Heikki Linnakangas
Date:
On 07/02/2024 11:09, Mats Kindahl wrote:
> On Tue, Feb 6, 2024 at 9:56 PM Tom Lane <tgl@sss.pgh.pa.us 
> <mailto:tgl@sss.pgh.pa.us>> wrote:
> 
>   Nathan Bossart <nathandbossart@gmail.com
>     <mailto:nathandbossart@gmail.com>> writes:
>      > Even if the glibc issue doesn't apply to Postgres, I'm tempted to
>     suggest
>      > that we make it project policy that comparison functions must be
>      > transitive.  There might be no real issues today, but if we write all
>      > comparison functions the way Mats is suggesting, it should be
>     easier to
>      > reason about overflow risks.
> 
>     A comparison routine that is not is probably broken, agreed.
>     I didn't look through the details of the patch --- I was more
>     curious whether we had a version of the qsort bug, because
>     if we do, we should fix that too.
> 
> The patch basically removes the risk of overflow in three routines and 
> just returns -1, 0, or 1, and adds a comment in one.
> 
> The routines modified do a subtraction of int:s and return that, which 
> can cause an overflow. This method is used for some int16 as well but 
> since standard conversions in C will perform the arithmetics in "int" 
> precision, this cannot overflow, so added a comment there. It might 
> still be a good idea to follow the same pattern for the int16 routines, 
> but since there is no bug there, I did not add them to the patch.

Doesn't hurt to fix the comparison functions, and +1 on using the same 
pattern everywhere.

However, we use our qsort() with user-defined comparison functions, and 
we cannot make any guarantees about what they might do. So we must 
ensure that our qsort() doesn't overflow, no matter what the comparison 
function does.

Looking at our ST_SORT(), it seems safe to me.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Wed, Feb 07, 2024 at 08:46:56PM +0200, Heikki Linnakangas wrote:
> Doesn't hurt to fix the comparison functions, and +1 on using the same
> pattern everywhere.

I attached a new version of the patch with some small adjustments.  I
haven't looked through all in-tree qsort() comparators to see if any others
need to be adjusted, but we should definitely do so as part of this thread.
Mats, are you able to do this?

> However, we use our qsort() with user-defined comparison functions, and we
> cannot make any guarantees about what they might do. So we must ensure that
> our qsort() doesn't overflow, no matter what the comparison function does.
> 
> Looking at our ST_SORT(), it seems safe to me.

Cool.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: glibc qsort() vulnerability

From
Andres Freund
Date:
Hi,

On 2024-02-07 20:46:56 +0200, Heikki Linnakangas wrote:
> > The routines modified do a subtraction of int:s and return that, which
> > can cause an overflow. This method is used for some int16 as well but
> > since standard conversions in C will perform the arithmetics in "int"
> > precision, this cannot overflow, so added a comment there. It might
> > still be a good idea to follow the same pattern for the int16 routines,
> > but since there is no bug there, I did not add them to the patch.
>
> Doesn't hurt to fix the comparison functions, and +1 on using the same
> pattern everywhere.

It actually can hurt - the generated code will often be slower.

E.g.
#include <stdint.h>

int cmp_sub(int16_t a, int16_t b) {
    return (int32_t) a - (int32_t) b;
}

int cmp_if(int16_t a, int16_t b) {
    if (a < b)
        return -1;
    if (a > b)
        return 1;
    return 0;
}

yields

cmp_sub:
        movsx   eax, di
        movsx   esi, si
        sub     eax, esi
        ret
cmp_if:
        xor     eax, eax
        cmp     di, si
        mov     edx, -1
        setg    al
        cmovl   eax, edx
        ret

with gcc -O3.  With other compilers, e.g. msvc, the difference is considerably
bigger, due to msvc for some reason not using cmov.

See https://godbolt.org/z/34qerPaPE for a few more details.


Now, in most cases this won't matter, the sorting isn't performance
critical. But I don't think it's a good idea to standardize on a generally
slower pattern.

Not that that's a good test, but I did quickly benchmark [1] this with
intarray. There's about a 10% difference in performance between using the
existing compASC() and one using
    return (int64) *(const int32 *) a - (int64) *(const int32 *) b;


Perhaps we could have a central helper for this somewhere?

Greetings,

Andres Freund


[1]
-- prep
CREATE EXTENSION IF NOT EXISTS intarray;
DROP TABLE IF EXISTS arrays_to_sort;
CREATE TABLE arrays_to_sort AS SELECT array_shuffle(a) arr FROM (SELECT ARRAY(SELECT generate_series(1, 1000000)) a),
generate_series(1,10);
 
-- bench
SELECT (sort(arr))[1] FROM arrays_to_sort;



Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Wed, Feb 07, 2024 at 01:48:57PM -0800, Andres Freund wrote:
> Now, in most cases this won't matter, the sorting isn't performance
> critical. But I don't think it's a good idea to standardize on a generally
> slower pattern.
> 
> Not that that's a good test, but I did quickly benchmark [1] this with
> intarray. There's about a 10% difference in performance between using the
> existing compASC() and one using
>     return (int64) *(const int32 *) a - (int64) *(const int32 *) b;
> 
> 
> Perhaps we could have a central helper for this somewhere?

Maybe said helper could use __builtin_sub_overflow() and fall back to the
slow "if" version only if absolutely necessary.  The assembly for that
looks encouraging, but I still need to actually test it...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Andres Freund
Date:
Hi,

On 2024-02-07 16:21:24 -0600, Nathan Bossart wrote:
> On Wed, Feb 07, 2024 at 01:48:57PM -0800, Andres Freund wrote:
> > Now, in most cases this won't matter, the sorting isn't performance
> > critical. But I don't think it's a good idea to standardize on a generally
> > slower pattern.
> > 
> > Not that that's a good test, but I did quickly benchmark [1] this with
> > intarray. There's about a 10% difference in performance between using the
> > existing compASC() and one using
> >     return (int64) *(const int32 *) a - (int64) *(const int32 *) b;
> > 
> > 
> > Perhaps we could have a central helper for this somewhere?
> 
> Maybe said helper could use __builtin_sub_overflow() and fall back to the
> slow "if" version only if absolutely necessary.

I suspect that'll be worse code in the common case, given the cmov generated
by gcc & clang for the typical branch-y formulation. But it's worth testing.


> The assembly for that looks encouraging, but I still need to actually test
> it...

Possible. For 16bit upcasting to 32bit is clearly the best way. For 32 bit
that doesn't work, given the 32bit return, so we need something more.

Greetings,

Andres Freund



Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Wed, Feb 07, 2024 at 04:42:07PM -0800, Andres Freund wrote:
> On 2024-02-07 16:21:24 -0600, Nathan Bossart wrote:
>> The assembly for that looks encouraging, but I still need to actually test
>> it...
> 
> Possible. For 16bit upcasting to 32bit is clearly the best way. For 32 bit
> that doesn't work, given the 32bit return, so we need something more.

For the same compASC() test, I see an ~8.4% improvement with your int64
code and a ~3.4% improvement with this:

    int
    compASC(const void *a, const void *b)
    {
        int         result;

        if (unlikely(pg_sub_s32_overflow(*(const int32 *) a,
                                         *(const int32 *) b,
                                         &result)))
        {
            if (*(const int32 *) a > *(const int32 *) b)
                return 1;
            if (*(const int32 *) a < *(const int32 *) b)
                return -1;
            return 0;
        }

        return result;
    }

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Andres Freund
Date:
Hi,

On 2024-02-07 19:52:11 -0600, Nathan Bossart wrote:
> On Wed, Feb 07, 2024 at 04:42:07PM -0800, Andres Freund wrote:
> > On 2024-02-07 16:21:24 -0600, Nathan Bossart wrote:
> >> The assembly for that looks encouraging, but I still need to actually test
> >> it...
> > 
> > Possible. For 16bit upcasting to 32bit is clearly the best way. For 32 bit
> > that doesn't work, given the 32bit return, so we need something more.
> 
> For the same compASC() test, I see an ~8.4% improvement with your int64
> code

Just to be clear, that code unfortuntely isn't correct, the return value is a
32 bit integer, so the 64bit difference doesn't help. In contrast to the 16bit
case.


> and a ~3.4% improvement with this:

I guess that's still something.

Another branchless variant is (a > b) - (a < b). It seems to get a similar
improvement as the overflow-checking version.

Greetings,

Andres Freund



Re: glibc qsort() vulnerability

From
Thomas Munro
Date:
On Thu, Feb 8, 2024 at 3:06 PM Andres Freund <andres@anarazel.de> wrote:
> On 2024-02-07 19:52:11 -0600, Nathan Bossart wrote:
> > On Wed, Feb 07, 2024 at 04:42:07PM -0800, Andres Freund wrote:
> > > On 2024-02-07 16:21:24 -0600, Nathan Bossart wrote:
> > >> The assembly for that looks encouraging, but I still need to actually test
> > >> it...
> > >
> > > Possible. For 16bit upcasting to 32bit is clearly the best way. For 32 bit
> > > that doesn't work, given the 32bit return, so we need something more.
> >
> > For the same compASC() test, I see an ~8.4% improvement with your int64
> > code
>
> Just to be clear, that code unfortuntely isn't correct, the return value is a
> 32 bit integer, so the 64bit difference doesn't help. In contrast to the 16bit
> case.

Perhaps you could wrap it in a branch-free sign() function so you get
a narrow answer?

https://stackoverflow.com/questions/14579920/fast-sign-of-integer-in-c



Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Wed, Feb 07, 2024 at 06:06:37PM -0800, Andres Freund wrote:
> Another branchless variant is (a > b) - (a < b). It seems to get a similar
> improvement as the overflow-checking version.

Well, that's certainly more elegant.  I updated the patch to that approach
for now.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: glibc qsort() vulnerability

From
Thomas Munro
Date:
On Thu, Feb 8, 2024 at 3:38 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Perhaps you could wrap it in a branch-free sign() function so you get
> a narrow answer?
>
> https://stackoverflow.com/questions/14579920/fast-sign-of-integer-in-c

Ah, strike that, it is much like the suggested (a > b) - (a < b) but
with extra steps...



Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Thu, Feb 08, 2024 at 03:49:03PM +1300, Thomas Munro wrote:
> On Thu, Feb 8, 2024 at 3:38 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Perhaps you could wrap it in a branch-free sign() function so you get
>> a narrow answer?
>>
>> https://stackoverflow.com/questions/14579920/fast-sign-of-integer-in-c
> 
> Ah, strike that, it is much like the suggested (a > b) - (a < b) but
> with extra steps...

Yeah, https://godbolt.org/ indicates that the sign approach compiles to

        movsx   rsi, esi
        movsx   rdi, edi
        xor     eax, eax
        sub     rdi, rsi
        test    rdi, rdi
        setg    al
        shr     rdi, 63
        sub     eax, edi
        ret

while the approach Andres suggested compiles to

        xor     eax, eax
        cmp     edi, esi
        setl    dl
        setg    al
        movzx   edx, dl
        sub     eax, edx
        ret

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
Mats, I apologize for steamrolling a bit here.  I'll take a step back into
a reviewer role.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:
On Wed, Feb 7, 2024 at 9:56 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Wed, Feb 07, 2024 at 08:46:56PM +0200, Heikki Linnakangas wrote:
> Doesn't hurt to fix the comparison functions, and +1 on using the same
> pattern everywhere.

I attached a new version of the patch with some small adjustments.  I
haven't looked through all in-tree qsort() comparators to see if any others
need to be adjusted, but we should definitely do so as part of this thread.
Mats, are you able to do this?

Sure, I checked them and the only ones remaining are those using int16. Shall I modify those as well?
 
> However, we use our qsort() with user-defined comparison functions, and we
> cannot make any guarantees about what they might do. So we must ensure that
> our qsort() doesn't overflow, no matter what the comparison function does.
>
> Looking at our ST_SORT(), it seems safe to me.

Cool.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:
On Thu, Feb 8, 2024 at 4:08 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
Mats, I apologize for steamrolling a bit here.  I'll take a step back into
a reviewer role.

No worries. :)
 

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:
On Thu, Feb 8, 2024 at 12:01 PM Mats Kindahl <mats@timescale.com> wrote:
On Wed, Feb 7, 2024 at 9:56 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Wed, Feb 07, 2024 at 08:46:56PM +0200, Heikki Linnakangas wrote:
> Doesn't hurt to fix the comparison functions, and +1 on using the same
> pattern everywhere.

I attached a new version of the patch with some small adjustments.  I
haven't looked through all in-tree qsort() comparators to see if any others
need to be adjusted, but we should definitely do so as part of this thread.
Mats, are you able to do this?

Sure, I checked them and the only ones remaining are those using int16. Shall I modify those as well?

Seems your additional patch dealt with at least one of the cases.
 
 
> However, we use our qsort() with user-defined comparison functions, and we
> cannot make any guarantees about what they might do. So we must ensure that
> our qsort() doesn't overflow, no matter what the comparison function does.
>
> Looking at our ST_SORT(), it seems safe to me.

Cool.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:
On Thu, Feb 8, 2024 at 3:56 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Thu, Feb 08, 2024 at 03:49:03PM +1300, Thomas Munro wrote:
> On Thu, Feb 8, 2024 at 3:38 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Perhaps you could wrap it in a branch-free sign() function so you get
>> a narrow answer?
>>
>> https://stackoverflow.com/questions/14579920/fast-sign-of-integer-in-c
>
> Ah, strike that, it is much like the suggested (a > b) - (a < b) but
> with extra steps...

Yeah, https://godbolt.org/ indicates that the sign approach compiles to

        movsx   rsi, esi
        movsx   rdi, edi
        xor     eax, eax
        sub     rdi, rsi
        test    rdi, rdi
        setg    al
        shr     rdi, 63
        sub     eax, edi
        ret

while the approach Andres suggested compiles to

        xor     eax, eax
        cmp     edi, esi
        setl    dl
        setg    al
        movzx   edx, dl
        sub     eax, edx
        ret

Here is a patch that fixes existing cases and introduces a macro for this comparison (it uses the (a > b) - (a < b) approach). Not sure where to place the macro nor what a suitable name should be, so feel free to suggest anything.

I also noted that some functions are duplicated and it might be an idea to introduce a few standard functions like pg_qsort_strcmp for, e.g., integers and other common types.

Also noted it is quite common to have this pattern in various places to do lexicographic sort of multiple values and continue the comparison if they are equal. Not sure if that is something we should look at.

Best wishes,
Mats Kindahl

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachment

Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Thu, Feb 08, 2024 at 02:16:11PM +0100, Mats Kindahl wrote:
> +/*
> + * Compare two integers and return -1, 0, or 1 without risking overflow.
> + *
> + * This macro is used to avoid running into overflow issues because a simple
> + * subtraction of the two values when implementing a cmp function for qsort().
> +*/
> +#define INT_CMP(lhs,rhs) (((lhs) > (rhs)) - ((lhs) < (rhs)))

I think we should offer a few different macros, i.e., separate macros for
int8, uint8, int16, uint16, int32, etc.  For int16, we can do something
faster like

    (int32) (lhs) - (int32) (rhs)

but for int32, we need to do someting more like what's in the patch.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Thu, Feb 08, 2024 at 02:16:11PM +0100, Mats Kindahl wrote:
>> +/*
>> + * Compare two integers and return -1, 0, or 1 without risking overflow.
>> + *
>> + * This macro is used to avoid running into overflow issues because a simple
>> + * subtraction of the two values when implementing a cmp function for qsort().
>> +*/
>> +#define INT_CMP(lhs,rhs) (((lhs) > (rhs)) - ((lhs) < (rhs)))

> I think we should offer a few different macros, i.e., separate macros for
> int8, uint8, int16, uint16, int32, etc.  For int16, we can do something
> faster like

>     (int32) (lhs) - (int32) (rhs)

> but for int32, we need to do someting more like what's in the patch.

Are we okay with using macros that (a) have double evaluation hazards
and (b) don't enforce the data types being compared are the same?
I think static inlines might be a safer technology.  Perhaps it's
okay given the only expected use is in qsort comparators, but ...

            regards, tom lane



Re: glibc qsort() vulnerability

From
Andres Freund
Date:
Hi,

On 2024-02-08 13:44:02 -0500, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
> > On Thu, Feb 08, 2024 at 02:16:11PM +0100, Mats Kindahl wrote:
> >> +/*
> >> + * Compare two integers and return -1, 0, or 1 without risking overflow.
> >> + *
> >> + * This macro is used to avoid running into overflow issues because a simple
> >> + * subtraction of the two values when implementing a cmp function for qsort().
> >> +*/
> >> +#define INT_CMP(lhs,rhs) (((lhs) > (rhs)) - ((lhs) < (rhs)))
>
> > I think we should offer a few different macros, i.e., separate macros for
> > int8, uint8, int16, uint16, int32, etc.  For int16, we can do something
> > faster like

+1


> >     (int32) (lhs) - (int32) (rhs)
>
> > but for int32, we need to do someting more like what's in the patch.
>
> Are we okay with using macros that (a) have double evaluation hazards
> and (b) don't enforce the data types being compared are the same?
> I think static inlines might be a safer technology.

+1


I'd put these static inlines into common/int.h. I don't think this is common
enough to warrant being in c.h. Probably also doesn't hurt to have a not quite
as generic name as INT_CMP, I'd not be too surprised if that's defined in some
library.


I think it's worth following int.h's pattern of including [s]igned/[u]nsigned
in the name, an efficient implementation for signed might not be the same as
for unsigned. And if we use static inlines, we need to do so for correct
semantics anyway.


Greetings,

Andres



Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Thu, Feb 08, 2024 at 11:59:54AM -0800, Andres Freund wrote:
> On 2024-02-08 13:44:02 -0500, Tom Lane wrote:
>> Are we okay with using macros that (a) have double evaluation hazards
>> and (b) don't enforce the data types being compared are the same?
>> I think static inlines might be a safer technology.
> 
> +1

Agreed on static inlines.

> I'd put these static inlines into common/int.h. I don't think this is common
> enough to warrant being in c.h. Probably also doesn't hurt to have a not quite
> as generic name as INT_CMP, I'd not be too surprised if that's defined in some
> library.
> 
> 
> I think it's worth following int.h's pattern of including [s]igned/[u]nsigned
> in the name, an efficient implementation for signed might not be the same as
> for unsigned. And if we use static inlines, we need to do so for correct
> semantics anyway.

Seems reasonable to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:


On Thu, Feb 8, 2024 at 7:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Thu, Feb 08, 2024 at 02:16:11PM +0100, Mats Kindahl wrote:
>> +/*
>> + * Compare two integers and return -1, 0, or 1 without risking overflow.
>> + *
>> + * This macro is used to avoid running into overflow issues because a simple
>> + * subtraction of the two values when implementing a cmp function for qsort().
>> +*/
>> +#define INT_CMP(lhs,rhs) (((lhs) > (rhs)) - ((lhs) < (rhs)))

> I think we should offer a few different macros, i.e., separate macros for
> int8, uint8, int16, uint16, int32, etc.  For int16, we can do something
> faster like

>       (int32) (lhs) - (int32) (rhs)

> but for int32, we need to do someting more like what's in the patch.

Are we okay with using macros that (a) have double evaluation hazards
and (b) don't enforce the data types being compared are the same?
I think static inlines might be a safer technology.  Perhaps it's
okay given the only expected use is in qsort comparators, but ...

I picked a macro simply because it can deal with all kinds of integers, but if we want to have separate implementations for each then inline functions work just as well and will be safer.

 Best wishes,
Mats Kindahl


                        regards, tom lane

Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:
On Thu, Feb 8, 2024 at 9:07 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Thu, Feb 08, 2024 at 11:59:54AM -0800, Andres Freund wrote:
> On 2024-02-08 13:44:02 -0500, Tom Lane wrote:
>> Are we okay with using macros that (a) have double evaluation hazards
>> and (b) don't enforce the data types being compared are the same?
>> I think static inlines might be a safer technology.
>
> +1

Agreed on static inlines.

Seems to be a general consensus on static inlines. I'll get a new patch.
 
> I'd put these static inlines into common/int.h. I don't think this is common
> enough to warrant being in c.h. Probably also doesn't hurt to have a not quite
> as generic name as INT_CMP, I'd not be too surprised if that's defined in some
> library.
>
>
> I think it's worth following int.h's pattern of including [s]igned/[u]nsigned
> in the name, an efficient implementation for signed might not be the same as
> for unsigned. And if we use static inlines, we need to do so for correct
> semantics anyway.

Seems reasonable to me.

Agree.

Best wishes,
Mats Kindahl
 

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: glibc qsort() vulnerability

From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Thu, Feb 08, 2024 at 11:59:54AM -0800, Andres Freund wrote:
>> I'd put these static inlines into common/int.h. I don't think this is common
>> enough to warrant being in c.h. Probably also doesn't hurt to have a not quite
>> as generic name as INT_CMP, I'd not be too surprised if that's defined in some
>> library.
>>
>> I think it's worth following int.h's pattern of including [s]igned/[u]nsigned
>> in the name, an efficient implementation for signed might not be the same as
>> for unsigned. And if we use static inlines, we need to do so for correct
>> semantics anyway.

> Seems reasonable to me.

+1 here also.

            regards, tom lane



Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:
On Thu, Feb 8, 2024 at 9:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Thu, Feb 08, 2024 at 11:59:54AM -0800, Andres Freund wrote:
>> I'd put these static inlines into common/int.h. I don't think this is common
>> enough to warrant being in c.h. Probably also doesn't hurt to have a not quite
>> as generic name as INT_CMP, I'd not be too surprised if that's defined in some
>> library.
>>
>> I think it's worth following int.h's pattern of including [s]igned/[u]nsigned
>> in the name, an efficient implementation for signed might not be the same as
>> for unsigned. And if we use static inlines, we need to do so for correct
>> semantics anyway.

> Seems reasonable to me.

+1 here also.

Here is a new version introducing pg_cmp_s32 and friends and use them instead of the INT_CMP macro introduced before. It also moves the definitions to common/int.h and adds that as an include to all locations using these functions.

Note that for integers with sizes less than sizeof(int), C standard conversions will convert the values to "int" before doing the arithmetic, so no casting is *necessary*. I did not force the 16-bit functions to return -1 or 1 and have updated the comment accordingly.

The types "int" and "size_t" are treated as s32 and u32 respectively since that seems to be the case for most of the code, even if strictly not correct (size_t can be an unsigned long int for some architecture). 

I also noted that in many situations size_t values are treated as "int" so there is an overflow risk here, even if small. For example, the result of "list_length" is assigned to an integer. I do not think this is an immediate concern, but just wanted to mention it.

Best wishes,
Mats Kindahl 

                        regards, tom lane
Attachment

Re: glibc qsort() vulnerability

From
"Andrey M. Borodin"
Date:

> On 8 Feb 2024, at 06:52, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> For the same compASC() test, I see an ~8.4% improvement with your int64
> code and a ~3.4% improvement with this:

If we care about branch prediction in comparison function, maybe we could produce sorting that inlines comparator, thus
eliminatingfunction call to comparator? We convert comparison logic to int, to extract comparison back then. 

I bet “call" is more expensive than “if".


Best regards, Andrey Borodin.


Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Fri, Feb 09, 2024 at 08:52:26AM +0100, Mats Kindahl wrote:
> Here is a new version introducing pg_cmp_s32 and friends and use them
> instead of the INT_CMP macro introduced before. It also moves the
> definitions to common/int.h and adds that as an include to all locations
> using these functions.

Thanks for the new version of the patch.

> Note that for integers with sizes less than sizeof(int), C standard
> conversions will convert the values to "int" before doing the arithmetic,
> so no casting is *necessary*. I did not force the 16-bit functions to
> return -1 or 1 and have updated the comment accordingly.

It might not be necessary, but this is one of those places where I would
add casting anyway to make it abundantly clear what we are expecting to
happen and why it is safe.

> The types "int" and "size_t" are treated as s32 and u32 respectively since
> that seems to be the case for most of the code, even if strictly not
> correct (size_t can be an unsigned long int for some architecture).

Why is it safe to do this?

> -    return ((const SPLITCOST *) a)->cost - ((const SPLITCOST *) b)->cost;
> +    return INT_CMP(((const SPLITCOST *) a)->cost, ((const SPLITCOST *) b)->cost);

The patch still contains several calls to INT_CMP.

> +/*------------------------------------------------------------------------
> + * Comparison routines for integers
> + *------------------------------------------------------------------------
> + */

I'd suggest separating this part out to a 0001 patch to make it easier to
review.  The 0002 patch could take care of converting the existing qsort
comparators.

> +static inline int
> +pg_cmp_s16(int16 a, int16 b)
> +{
> +    return a - b;
> +}
> +
> +static inline int
> +pg_cmp_u16(uint16 a, uint16 b)
> +{
> +    return a - b;
> +}
> +
> +static inline int
> +pg_cmp_s32(int32 a, int32 b)
> +{
> +    return (a > b) - (a < b);
> +}
> +
> +static inline int
> +pg_cmp_u32(uint32 a, uint32 b)
> +{
> +    return (a > b) - (a < b);
> +}
> +
> +static inline int
> +pg_cmp_s64(int64 a, int64 b)
> +{
> +    return (a > b) - (a < b);
> +}
> +
> +static inline int
> +pg_cmp_u64(uint64 a, uint64 b)
> +{
> +    return (a > b) - (a < b);
> +}

As suggested above, IMHO we should be rather liberal with the casting to
ensure it is abundantly clear what is happening here.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Fri, Feb 09, 2024 at 08:52:26AM +0100, Mats Kindahl wrote:
>> The types "int" and "size_t" are treated as s32 and u32 respectively since
>> that seems to be the case for most of the code, even if strictly not
>> correct (size_t can be an unsigned long int for some architecture).

> Why is it safe to do this?

We do pretty much assume that "int" is "int32".  But I agree that
assuming anything about the width of size_t is bad.  I think we need
a separate pg_cmp_size() or pg_cmp_size_t().

            regards, tom lane



Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Fri, Feb 09, 2024 at 01:19:49PM +0500, Andrey M. Borodin wrote:
> If we care about branch prediction in comparison function, maybe we could
> produce sorting that inlines comparator, thus eliminating function call
> to comparator? We convert comparison logic to int, to extract comparison
> back then.
> 
> I bet “call" is more expensive than “if".

It might make sense to have a couple of built-in qsort implementations for
pointers to integers, pointers to unsigned integers, etc.  However, a lot
of current use-cases require inspecting specific fields of structs, so
(assuming I understand your proposal correctly), we'd end up with many
qsort implementations.  If that can be made simple and elegant and
demonstrates substantial improvements, then it might be worth considering,
but I'm somewhat skeptical that the current uses are performance-sensitive
enough to be worth the effort.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Andres Freund
Date:
Hi,

On 2024-02-09 13:19:49 +0500, Andrey M. Borodin wrote:
> > On 8 Feb 2024, at 06:52, Nathan Bossart <nathandbossart@gmail.com> wrote:
> > For the same compASC() test, I see an ~8.4% improvement with your int64
> > code and a ~3.4% improvement with this:
>
> If we care about branch prediction in comparison function, maybe we could
> produce sorting that inlines comparator, thus eliminating function call to
> comparator? We convert comparison logic to int, to extract comparison back
> then.

We have some infrastructure for that actually, see sort_template.h.  But
perhaps we should define a static inline of the generic pg_qsort() even. OTOH,
plenty places that'll just end up to a pointless amount of code emitted to
sort ~5 elements on average.


> I bet “call" is more expensive than “if".

Not really in this case. The call is perfectly predictable - a single qsort()
will use the same callback for every comparison, whereas the if is perfectly
*unpredictable*.  A branch mispredict is far more expensive than a correctly
predicted function call.

Greetings,

Andres



Re: glibc qsort() vulnerability

From
Andrey Borodin
Date:

> On 9 Feb 2024, at 21:32, Nathan Bossart <nathandbossart@gmail.com> wrote:
>  a lot
> of current use-cases require inspecting specific fields of structs
Yes, I'm proposing to pass to sorting routine not a comparator, but value extractor. And then rely on operators <,>,==.
In a pseudocode: instead of sort(array, (a,b)->a.field-b.field) use sort(array, x->x.field). And rely on "field" being
comparable.

> If that can be made simple and elegant and
> demonstrates substantial improvements
I'll try to produce a PoC and measure it with Andres' intarray test.

> On 9 Feb 2024, at 23:40, Andres Freund <andres@anarazel.de> wrote:
>
> We have some infrastructure for that actually, see sort_template.h.  But
> perhaps we should define a static inline of the generic pg_qsort() even. OTOH,
> plenty places that'll just end up to a pointless amount of code emitted to
> sort ~5 elements on average.
I think there might be another benefit. It's easier to think about values order than function comparator that returns
-1,0,+1...

>> I bet “call" is more expensive than “if".
>
> Not really in this case. The call is perfectly predictable - a single qsort()
> will use the same callback for every comparison, whereas the if is perfectly
> *unpredictable*.  A branch mispredict is far more expensive than a correctly
> predicted function call.

Oh, make sense... I did not understand that. But does cpu predicts what instruction to fetch even after a call
instruction?These cpus are really neat things... so, probably, yes, it does. 


Best regards, Andrey Borodin.


Re: glibc qsort() vulnerability

From
Andres Freund
Date:
Hi,

On 2024-02-10 00:02:08 +0500, Andrey Borodin wrote:
> > Not really in this case. The call is perfectly predictable - a single qsort()
> > will use the same callback for every comparison, whereas the if is perfectly
> > *unpredictable*.  A branch mispredict is far more expensive than a correctly
> > predicted function call.
> 
> Oh, make sense... I did not understand that. But does cpu predicts what
> instruction to fetch even after a call instruction?

Yes, it does predict that. Both for branches and calls (which are just special
kinds of branches in the end). If you want to find more about this, the term
to search for is "branch target buffer".  There's also predictions about where
a function return will jump to, since that obviously can differ.

Modern predictors aren't just taking the instruction pointer into account, to
predict where a jump/call will go to. Tey take the history of recent branches
into account, i.e. the same instruction will be predicted to jump to different
locations, depending on where the current function was called from. This is
important as a function obviously can behave very differently depending on the
input.


> These cpus are really neat things...

Indeed.

Greetings,

Andres Freund



Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:
On Fri, Feb 9, 2024 at 5:24 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Feb 09, 2024 at 08:52:26AM +0100, Mats Kindahl wrote:
> Here is a new version introducing pg_cmp_s32 and friends and use them
> instead of the INT_CMP macro introduced before. It also moves the
> definitions to common/int.h and adds that as an include to all locations
> using these functions.

Thanks for the new version of the patch.

> Note that for integers with sizes less than sizeof(int), C standard
> conversions will convert the values to "int" before doing the arithmetic,
> so no casting is *necessary*. I did not force the 16-bit functions to
> return -1 or 1 and have updated the comment accordingly.

It might not be necessary, but this is one of those places where I would
add casting anyway to make it abundantly clear what we are expecting to
happen and why it is safe.

I'll add it.
 
> The types "int" and "size_t" are treated as s32 and u32 respectively since
> that seems to be the case for most of the code, even if strictly not
> correct (size_t can be an unsigned long int for some architecture).

Why is it safe to do this?

> -     return ((const SPLITCOST *) a)->cost - ((const SPLITCOST *) b)->cost;
> +     return INT_CMP(((const SPLITCOST *) a)->cost, ((const SPLITCOST *) b)->cost);

The patch still contains several calls to INT_CMP.

I'll fix it.
 

> +/*------------------------------------------------------------------------
> + * Comparison routines for integers
> + *------------------------------------------------------------------------
> + */

I'd suggest separating this part out to a 0001 patch to make it easier to
review.  The 0002 patch could take care of converting the existing qsort
comparators.

Ok. Will split it out into two patches.
 

> +static inline int
> +pg_cmp_s16(int16 a, int16 b)
> +{
> +     return a - b;
> +}
> +
> +static inline int
> +pg_cmp_u16(uint16 a, uint16 b)
> +{
> +     return a - b;
> +}
> +
> +static inline int
> +pg_cmp_s32(int32 a, int32 b)
> +{
> +     return (a > b) - (a < b);
> +}
> +
> +static inline int
> +pg_cmp_u32(uint32 a, uint32 b)
> +{
> +     return (a > b) - (a < b);
> +}
> +
> +static inline int
> +pg_cmp_s64(int64 a, int64 b)
> +{
> +     return (a > b) - (a < b);
> +}
> +
> +static inline int
> +pg_cmp_u64(uint64 a, uint64 b)
> +{
> +     return (a > b) - (a < b);
> +}

As suggested above, IMHO we should be rather liberal with the casting to
ensure it is abundantly clear what is happening here.

Ok.
 

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:
On Fri, Feb 9, 2024 at 5:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Fri, Feb 09, 2024 at 08:52:26AM +0100, Mats Kindahl wrote:
>> The types "int" and "size_t" are treated as s32 and u32 respectively since
>> that seems to be the case for most of the code, even if strictly not
>> correct (size_t can be an unsigned long int for some architecture).

> Why is it safe to do this?

We do pretty much assume that "int" is "int32".  But I agree that
assuming anything about the width of size_t is bad.  I think we need
a separate pg_cmp_size() or pg_cmp_size_t().

I added precisely one first, but removed it when I saw that all uses assumed that it was an int. :)

I'll add it back.

Best wishes,
Mats Kindahl 

                        regards, tom lane

Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:
On Fri, Feb 9, 2024 at 5:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Fri, Feb 09, 2024 at 08:52:26AM +0100, Mats Kindahl wrote:
>> The types "int" and "size_t" are treated as s32 and u32 respectively since
>> that seems to be the case for most of the code, even if strictly not
>> correct (size_t can be an unsigned long int for some architecture).

> Why is it safe to do this?

We do pretty much assume that "int" is "int32".  But I agree that
assuming anything about the width of size_t is bad.  I think we need
a separate pg_cmp_size() or pg_cmp_size_t().

Do we want to have something similar for "int" as well? It seems to be quite common and even though it usually is an int32, it does not have to be.

Best wishes,
Mats Kindahl 

                        regards, tom lane

Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:
On Fri, Feb 9, 2024 at 8:26 PM Mats Kindahl <mats@timescale.com> wrote:
On Fri, Feb 9, 2024 at 5:24 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Feb 09, 2024 at 08:52:26AM +0100, Mats Kindahl wrote:
> Here is a new version introducing pg_cmp_s32 and friends and use them
> instead of the INT_CMP macro introduced before. It also moves the
> definitions to common/int.h and adds that as an include to all locations
> using these functions.

Thanks for the new version of the patch.

> Note that for integers with sizes less than sizeof(int), C standard
> conversions will convert the values to "int" before doing the arithmetic,
> so no casting is *necessary*. I did not force the 16-bit functions to
> return -1 or 1 and have updated the comment accordingly.

It might not be necessary, but this is one of those places where I would
add casting anyway to make it abundantly clear what we are expecting to
happen and why it is safe.

I'll add it.
 
> The types "int" and "size_t" are treated as s32 and u32 respectively since
> that seems to be the case for most of the code, even if strictly not
> correct (size_t can be an unsigned long int for some architecture).

Why is it safe to do this?

> -     return ((const SPLITCOST *) a)->cost - ((const SPLITCOST *) b)->cost;
> +     return INT_CMP(((const SPLITCOST *) a)->cost, ((const SPLITCOST *) b)->cost);

The patch still contains several calls to INT_CMP.

I'll fix it.
 

> +/*------------------------------------------------------------------------
> + * Comparison routines for integers
> + *------------------------------------------------------------------------
> + */

I'd suggest separating this part out to a 0001 patch to make it easier to
review.  The 0002 patch could take care of converting the existing qsort
comparators.

Ok. Will split it out into two patches.
 

> +static inline int
> +pg_cmp_s16(int16 a, int16 b)
> +{
> +     return a - b;
> +}
> +
> +static inline int
> +pg_cmp_u16(uint16 a, uint16 b)
> +{
> +     return a - b;
> +}
> +
> +static inline int
> +pg_cmp_s32(int32 a, int32 b)
> +{
> +     return (a > b) - (a < b);
> +}
> +
> +static inline int
> +pg_cmp_u32(uint32 a, uint32 b)
> +{
> +     return (a > b) - (a < b);
> +}
> +
> +static inline int
> +pg_cmp_s64(int64 a, int64 b)
> +{
> +     return (a > b) - (a < b);
> +}
> +
> +static inline int
> +pg_cmp_u64(uint64 a, uint64 b)
> +{
> +     return (a > b) - (a < b);
> +}

As suggested above, IMHO we should be rather liberal with the casting to
ensure it is abundantly clear what is happening here.

Ok.

QQ: right now it looks like this:

static inline int
pg_cmp_u16(uint16 a, uint16 b)
{
return (int32)a - (int32)b;
}

and

static inline int
pg_cmp_u32(uint32 a, uint32 b)
{
return (a > b) - (a < b);
}

I think that is clear enough, but do you want more casts added for the return value as well?

Best wishes,
Mats Kindahl
 
 

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Fri, Feb 09, 2024 at 08:40:47PM +0100, Mats Kindahl wrote:
> On Fri, Feb 9, 2024 at 5:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We do pretty much assume that "int" is "int32".  But I agree that
>> assuming anything about the width of size_t is bad.  I think we need
>> a separate pg_cmp_size() or pg_cmp_size_t().
> 
> Do we want to have something similar for "int" as well? It seems to be
> quite common and even though it usually is an int32, it does not have to be.

I don't think we need separate functions for int and int32.  As Tom noted,
we assume they are the same.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Andres Freund
Date:
On 2024-02-09 14:04:29 -0600, Nathan Bossart wrote:
> On Fri, Feb 09, 2024 at 08:40:47PM +0100, Mats Kindahl wrote:
> > On Fri, Feb 9, 2024 at 5:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> We do pretty much assume that "int" is "int32".  But I agree that
> >> assuming anything about the width of size_t is bad.  I think we need
> >> a separate pg_cmp_size() or pg_cmp_size_t().
> > 
> > Do we want to have something similar for "int" as well? It seems to be
> > quite common and even though it usually is an int32, it does not have to be.
> 
> I don't think we need separate functions for int and int32.  As Tom noted,
> we assume they are the same.

+1



Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Fri, Feb 09, 2024 at 08:43:21PM +0100, Mats Kindahl wrote:
> QQ: right now it looks like this:
> 
> static inline int
> pg_cmp_u16(uint16 a, uint16 b)
> {
> 
> return (int32)a - (int32)b;
> 
> }
> 
> 
> and
> 
> static inline int
> pg_cmp_u32(uint32 a, uint32 b)
> {
> 
> return (a > b) - (a < b);
> 
> }
> 
> 
> I think that is clear enough, but do you want more casts added for the
> return value as well?

I think that is reasonably clear.  The latter does require you to know that
< and > return (int) 0 or (int) 1, which might be worth a short comment.
But that's just nitpicking...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:
On Fri, Feb 9, 2024 at 9:08 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Feb 09, 2024 at 08:43:21PM +0100, Mats Kindahl wrote:
> QQ: right now it looks like this:
>
> static inline int
> pg_cmp_u16(uint16 a, uint16 b)
> {
>
> return (int32)a - (int32)b;
>
> }
>
>
> and
>
> static inline int
> pg_cmp_u32(uint32 a, uint32 b)
> {
>
> return (a > b) - (a < b);
>
> }
>
>
> I think that is clear enough, but do you want more casts added for the
> return value as well?

I think that is reasonably clear.  The latter does require you to know that
< and > return (int) 0 or (int) 1, which might be worth a short comment.
But that's just nitpicking...


Hi all,

Split the code into two patches: one that just adds the functions (including the new pg_cmp_size()) to common/int.h and one that starts using them. I picked the name "pg_cmp_size" rather than "pg_cmp_size_t" since "_t" is usually used as a suffix for types.

I added a comment to the (a > b) - (a < b) return and have also added casts to (int32) for the int16 and uint16 functions (we need a signed int for uin16 since we need to be able to get a negative number).

Changed the type of two instances that had an implicit cast from size_t to int and used the new pg_,cmp_size() function.

Also fixed the missed replacements in the "contrib" directory.

Best wishes,
Mats Kindahl
  
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachment

Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Sat, Feb 10, 2024 at 08:59:06AM +0100, Mats Kindahl wrote:
> Split the code into two patches: one that just adds the functions
> (including the new pg_cmp_size()) to common/int.h and one that starts using
> them. I picked the name "pg_cmp_size" rather than "pg_cmp_size_t" since
> "_t" is usually used as a suffix for types.
> 
> I added a comment to the (a > b) - (a < b) return and have also added casts
> to (int32) for the int16 and uint16 functions (we need a signed int for
> uin16 since we need to be able to get a negative number).
> 
> Changed the type of two instances that had an implicit cast from size_t to
> int and used the new pg_,cmp_size() function.
> 
> Also fixed the missed replacements in the "contrib" directory.

Thanks for the new patches.  I think the comparison in resowner.c is
backwards, and I think we should expand on some of the commentary in int.h.
For example, the comment at the top of int.h seems very tailored to the
existing functions and should probably be adjusted.  And the "comparison
routines for integers" comment might benefit from some additional details
about the purpose and guarantees of the new functions.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:
On Sat, Feb 10, 2024 at 9:53 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Sat, Feb 10, 2024 at 08:59:06AM +0100, Mats Kindahl wrote:
> Split the code into two patches: one that just adds the functions
> (including the new pg_cmp_size()) to common/int.h and one that starts using
> them. I picked the name "pg_cmp_size" rather than "pg_cmp_size_t" since
> "_t" is usually used as a suffix for types.
>
> I added a comment to the (a > b) - (a < b) return and have also added casts
> to (int32) for the int16 and uint16 functions (we need a signed int for
> uin16 since we need to be able to get a negative number).
>
> Changed the type of two instances that had an implicit cast from size_t to
> int and used the new pg_,cmp_size() function.
>
> Also fixed the missed replacements in the "contrib" directory.

Thanks for the new patches.  I think the comparison in resowner.c is
backwards,

Thanks for catching that.

 
and I think we should expand on some of the commentary in int.h.
For example, the comment at the top of int.h seems very tailored to the
existing functions and should probably be adjusted.

I rewrote the beginning to the following, does that look good?

 * int.h
 *  Routines to perform signed and unsigned integer arithmetics, including
 *  comparisons, in an overflow-safe way.
 
And the "comparison
routines for integers" comment might benefit from some additional details
about the purpose and guarantees of the new functions.

I expanded that into the following. WDYT?

/*------------------------------------------------------------------------
 * Comparison routines for integers.
 *
 * These routines are used to implement comparison functions for, e.g.,
 * qsort(). They are designed to be efficient and not risk overflows in
 * internal computations that could cause strange results, such as INT_MIN >
 * INT_MAX if you just return "lhs - rhs".
 *------------------------------------------------------------------------
 
Best wishes,
Mats Kindahl


--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Sun, Feb 11, 2024 at 03:44:42PM +0100, Mats Kindahl wrote:
> On Sat, Feb 10, 2024 at 9:53 PM Nathan Bossart <nathandbossart@gmail.com>
> wrote:
>> and I think we should expand on some of the commentary in int.h.
>> For example, the comment at the top of int.h seems very tailored to the
>> existing functions and should probably be adjusted.
> 
> 
> I rewrote the beginning to the following, does that look good?
> 
>  * int.h
>  *  Routines to perform signed and unsigned integer arithmetics, including
>  *  comparisons, in an overflow-safe way.
> 
> 
> 
>> And the "comparison
>> routines for integers" comment might benefit from some additional details
>> about the purpose and guarantees of the new functions.
>>
> 
> I expanded that into the following. WDYT?
> 
> /*------------------------------------------------------------------------
>  * Comparison routines for integers.
>  *
>  * These routines are used to implement comparison functions for, e.g.,
>  * qsort(). They are designed to be efficient and not risk overflows in
>  * internal computations that could cause strange results, such as INT_MIN >
>  * INT_MAX if you just return "lhs - rhs".
>  *------------------------------------------------------------------------

LGTM.  I might editorialize a bit before committing, but I think your
proposed wording illustrates the thrust of the change.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:
On Mon, Feb 12, 2024 at 4:57 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Sun, Feb 11, 2024 at 03:44:42PM +0100, Mats Kindahl wrote:
> On Sat, Feb 10, 2024 at 9:53 PM Nathan Bossart <nathandbossart@gmail.com>
> wrote:
>> and I think we should expand on some of the commentary in int.h.
>> For example, the comment at the top of int.h seems very tailored to the
>> existing functions and should probably be adjusted.
>
>
> I rewrote the beginning to the following, does that look good?
>
>  * int.h
>  *  Routines to perform signed and unsigned integer arithmetics, including
>  *  comparisons, in an overflow-safe way.
>
>
>
>> And the "comparison
>> routines for integers" comment might benefit from some additional details
>> about the purpose and guarantees of the new functions.
>>
>
> I expanded that into the following. WDYT?
>
> /*------------------------------------------------------------------------
>  * Comparison routines for integers.
>  *
>  * These routines are used to implement comparison functions for, e.g.,
>  * qsort(). They are designed to be efficient and not risk overflows in
>  * internal computations that could cause strange results, such as INT_MIN >
>  * INT_MAX if you just return "lhs - rhs".
>  *------------------------------------------------------------------------

LGTM.  I might editorialize a bit before committing, but I think your
proposed wording illustrates the thrust of the change.

Thanks Nathan,

Here are the two fixed patches.

Best wishes,
Mats Kindahl
 

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachment

Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Mon, Feb 12, 2024 at 06:09:06PM +0100, Mats Kindahl wrote:
> Here are the two fixed patches.

These patches look pretty good to me.  Barring additional feedback, I'll
plan on committing them in the next few days.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Fabrízio de Royes Mello
Date:

On Mon, Feb 12, 2024 at 5:51 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Mon, Feb 12, 2024 at 06:09:06PM +0100, Mats Kindahl wrote:
> > Here are the two fixed patches.
>
> These patches look pretty good to me.  Barring additional feedback, I'll
> plan on committing them in the next few days.
>

Also did some tests locally and everything went well. Patches apply to the main branch without issues and all regression (including checkworld) pass!!

Regards
--
Fabrízio de Royes Mello

Re: glibc qsort() vulnerability

From
Andres Freund
Date:
Hi,

On 2024-02-12 14:51:38 -0600, Nathan Bossart wrote:
> On Mon, Feb 12, 2024 at 06:09:06PM +0100, Mats Kindahl wrote:
> > Here are the two fixed patches.
> 
> These patches look pretty good to me.  Barring additional feedback, I'll
> plan on committing them in the next few days.

One thing that's worth checking is if this ends up with *worse* code when the
comparators are inlined. I think none of the changed comparators will end up
getting used with an inlined sort, but ...

The reason we could end up with worse code is that when inlining the
comparisons would make less sense for the compiler. Consider e.g.
    return DO_COMPARE(a, b) < 0 ?
        (DO_COMPARE(b, c) < 0 ? b : (DO_COMPARE(a, c) < 0 ? c : a))
        : (DO_COMPARE(b, c) > 0 ? b : (DO_COMPARE(a, c) < 0 ? a : c));

With a naive implementation the compiler will understand it only cares about
a < b, not about the other possibilities. I'm not sure that's still true with
the more complicated optimized version.

Greetings,

Andres Freund



Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Mon, Feb 12, 2024 at 01:31:30PM -0800, Andres Freund wrote:
> One thing that's worth checking is if this ends up with *worse* code when the
> comparators are inlined. I think none of the changed comparators will end up
> getting used with an inlined sort, but ...

Yeah, AFAICT the only inlined sorts are in tuplesort.c and bufmgr.c, and
the patches don't touch those files.

> The reason we could end up with worse code is that when inlining the
> comparisons would make less sense for the compiler. Consider e.g.
>     return DO_COMPARE(a, b) < 0 ?
>         (DO_COMPARE(b, c) < 0 ? b : (DO_COMPARE(a, c) < 0 ? c : a))
>         : (DO_COMPARE(b, c) > 0 ? b : (DO_COMPARE(a, c) < 0 ? a : c));
> 
> With a naive implementation the compiler will understand it only cares about
> a < b, not about the other possibilities. I'm not sure that's still true with
> the more complicated optimized version.

You aren't kidding [0].  Besides perhaps adding a comment in
sort_template.h, is there anything else you think we should do about this
now?

[0] https://godbolt.org/z/bbTqK54zK

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Andres Freund
Date:
Hi,

On 2024-02-12 17:04:23 -0600, Nathan Bossart wrote:
> On Mon, Feb 12, 2024 at 01:31:30PM -0800, Andres Freund wrote:
> > One thing that's worth checking is if this ends up with *worse* code when the
> > comparators are inlined. I think none of the changed comparators will end up
> > getting used with an inlined sort, but ...
> 
> Yeah, AFAICT the only inlined sorts are in tuplesort.c and bufmgr.c, and
> the patches don't touch those files.
> 
> > The reason we could end up with worse code is that when inlining the
> > comparisons would make less sense for the compiler. Consider e.g.
> >     return DO_COMPARE(a, b) < 0 ?
> >         (DO_COMPARE(b, c) < 0 ? b : (DO_COMPARE(a, c) < 0 ? c : a))
> >         : (DO_COMPARE(b, c) > 0 ? b : (DO_COMPARE(a, c) < 0 ? a : c));
> > 
> > With a naive implementation the compiler will understand it only cares about
> > a < b, not about the other possibilities. I'm not sure that's still true with
> > the more complicated optimized version.
> 
> You aren't kidding [0].  Besides perhaps adding a comment in
> sort_template.h, is there anything else you think we should do about this
> now?

I'd add also a comment to the new functions. I think it's fine otherwise. I
wish there were formulation that'd be optimal for both cases, but this way we
can at least adapt all places at once if either find a better formulation or
change all our sorts to happen via an inline implementation of qsort or such.

Greetings,

Andres



Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:
On Tue, Feb 13, 2024 at 12:41 AM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2024-02-12 17:04:23 -0600, Nathan Bossart wrote:
> On Mon, Feb 12, 2024 at 01:31:30PM -0800, Andres Freund wrote:
> > One thing that's worth checking is if this ends up with *worse* code when the
> > comparators are inlined. I think none of the changed comparators will end up
> > getting used with an inlined sort, but ...
>
> Yeah, AFAICT the only inlined sorts are in tuplesort.c and bufmgr.c, and
> the patches don't touch those files.
>
> > The reason we could end up with worse code is that when inlining the
> > comparisons would make less sense for the compiler. Consider e.g.
> >     return DO_COMPARE(a, b) < 0 ?
> >             (DO_COMPARE(b, c) < 0 ? b : (DO_COMPARE(a, c) < 0 ? c : a))
> >             : (DO_COMPARE(b, c) > 0 ? b : (DO_COMPARE(a, c) < 0 ? a : c));
> >
> > With a naive implementation the compiler will understand it only cares about
> > a < b, not about the other possibilities. I'm not sure that's still true with
> > the more complicated optimized version.
>
> You aren't kidding [0].  Besides perhaps adding a comment in
> sort_template.h, is there anything else you think we should do about this
> now?

I'd add also a comment to the new functions. I think it's fine otherwise. I
wish there were formulation that'd be optimal for both cases, but this way we
can at least adapt all places at once if either find a better formulation or
change all our sorts to happen via an inline implementation of qsort or such.

I suspect that this has to do with the fact that we "abuse" the type system by performing arithmetics on booleans converted to integers and the compilers do not have rules for dealing with these.

For example, with the inline function "static inline cmp(a,b) { return a < b ? -1 : a > b ? 1 : 0; }"

Which trivially can be rewritten by the compiler using very basic rules, as follows:

DO_COMPARE(a,b)
  (a < b ? -1 : a > b ? 1 : 0) < 0
  a < b ? (-1 < 0) : a > b ? (1 < 0) : (0 < 0)
  a < b ? true : a > b ? false : false
  a < b ? true : a > b ? false : false
  a < b ? true : false
  a < b

When it comes to (a < b) - (a > b) there are no such rules added since it is not a very common case.

Clang fares better for this case and at least shows the two alternatives as equal.

Maybe we should change to use the original version equivalent to the inline function above since that works better with surrounding code?

Best wishes,
Mats Kindahl
 

Greetings,

Andres

Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Tue, Feb 13, 2024 at 09:43:18AM +0100, Mats Kindahl wrote:
> Maybe we should change to use the original version equivalent to the inline
> function above since that works better with surrounding code?

I don't think that's necessary.  We just need to be cognizant of it when
using inlined sorts, which are pretty rare at the moment.  Your patches
should still be a net improvement in many cases because most qsorts use a
function pointer to the comparator.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:
On Fri, Feb 16, 2024 at 12:32 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
Here is what I have staged for commit.

Looks good to me.

Checked that all of the comparisons are in the expected order, except inside compDESC, cmp_lsn, and resource_priority_cmp, where the order is reversed.

Best wishes,
Mats Kindahl

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: glibc qsort() vulnerability

From
Nathan Bossart
Date:
On Fri, Feb 16, 2024 at 01:45:52PM +0100, Mats Kindahl wrote:
> Looks good to me.

Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: glibc qsort() vulnerability

From
Mats Kindahl
Date:
On Fri, Feb 16, 2024 at 9:09 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Feb 16, 2024 at 01:45:52PM +0100, Mats Kindahl wrote:
> Looks good to me.

Committed.

Thanks Nathan!
 

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com