Thread: Avoid overflow with simplehash

Avoid overflow with simplehash

From
Ranier Vilela
Date:
Hi,

SimpleHash.

The function SH_START_ITERATE can trigger some overflow.

See:
typedef struct SH_ITERATOR
{
uint32 cur; /* current element */
uint32 end;
bool done; /* iterator exhausted? */
} SH_ITERATOR;

The cur field is uint32 size and currently can be stored a uint64,
which obviously does not fit.

Also, the current index is int, which is possibly insufficient 
since items can be up to uint32.

Attached a fix.

best regards,
Ranier Vilela

Attachment

Re: Avoid overflow with simplehash

From
Daniel Gustafsson
Date:
> On 6 Jul 2023, at 16:28, Ranier Vilela <ranier.vf@gmail.com> wrote:

> The function SH_START_ITERATE can trigger some overflow.
> 
> See:
> typedef struct SH_ITERATOR
> {
> uint32 cur; /* current element */
> uint32 end;
> bool done; /* iterator exhausted? */
> } SH_ITERATOR;
> 
> The cur field is uint32 size and currently can be stored a uint64,
> which obviously does not fit.

-    Assert(startelem < SH_MAX_SIZE);
+    Assert(startelem < PG_UINT32_MAX);

I mighe be missing something, but from skimming the current code, SH_MAX_SIZE
is currently defined as:

#define SH_MAX_SIZE (((uint64) PG_UINT32_MAX) + 1)

Can you show a reproducer example where you are able to overflow?

--
Daniel Gustafsson




Re: Avoid overflow with simplehash

From
Ranier Vilela
Date:
Em qui., 6 de jul. de 2023 às 11:37, Daniel Gustafsson <daniel@yesql.se> escreveu:
> On 6 Jul 2023, at 16:28, Ranier Vilela <ranier.vf@gmail.com> wrote:

> The function SH_START_ITERATE can trigger some overflow.
>
> See:
> typedef struct SH_ITERATOR
> {
> uint32 cur; /* current element */
> uint32 end;
> bool done; /* iterator exhausted? */
> } SH_ITERATOR;
>
> The cur field is uint32 size and currently can be stored a uint64,
> which obviously does not fit.

-       Assert(startelem < SH_MAX_SIZE);
+       Assert(startelem < PG_UINT32_MAX);

I mighe be missing something, but from skimming the current code, SH_MAX_SIZE
is currently defined as:

#define SH_MAX_SIZE (((uint64) PG_UINT32_MAX) + 1)
This is Assert, that is, in production this test is not done.

See the comments:
"Search for the first empty element."

If the empty element is not found, startelem has PG_UINT64_MAX value,
which do not fit in uint32.

Can you see this?

regards,
Ranier Vilela

Re: Avoid overflow with simplehash

From
Daniel Gustafsson
Date:
> On 6 Jul 2023, at 16:42, Ranier Vilela <ranier.vf@gmail.com> wrote:
> Em qui., 6 de jul. de 2023 às 11:37, Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> escreveu:

> #define SH_MAX_SIZE (((uint64) PG_UINT32_MAX) + 1)
> This is Assert, that is, in production this test is not done.

Correct, which implies that it's a test for something which is deemed highly
unlikely to happen in production.

> If the empty element is not found, startelem has PG_UINT64_MAX value,
> which do not fit in uint32.

Can you show an example where the hash isn't grown automatically to accomodate
this such that the assertion is tripped?

--
Daniel Gustafsson




Re: Avoid overflow with simplehash

From
Ranier Vilela
Date:
Em qui., 6 de jul. de 2023 às 12:00, Daniel Gustafsson <daniel@yesql.se> escreveu:
> On 6 Jul 2023, at 16:42, Ranier Vilela <ranier.vf@gmail.com> wrote:
> Em qui., 6 de jul. de 2023 às 11:37, Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> escreveu:

> #define SH_MAX_SIZE (((uint64) PG_UINT32_MAX) + 1)
> This is Assert, that is, in production this test is not done.

Correct, which implies that it's a test for something which is deemed highly
unlikely to happen in production.
 Highly improbable does not mean impossible, or that it will never happen.


> If the empty element is not found, startelem has PG_UINT64_MAX value,
> which do not fit in uint32.

Can you show an example where the hash isn't grown automatically to accomodate
this such that the assertion is tripped?
A demo won't change the fact that the function can fail, even if it isn't currently failing.
As a precaution to avoid future bugs, I think it's necessary to apply the patch to increase the robustness of the function.

regards,
Ranier Vilela

Re: Avoid overflow with simplehash

From
Tom Lane
Date:
Ranier Vilela <ranier.vf@gmail.com> writes:
> See the comments:
> "Search for the first empty element."
> If the empty element is not found, startelem has PG_UINT64_MAX value,
> which do not fit in uint32.

I think the point of that assertion is exactly that we're required to
have an empty element (because max fillfactor is less than 1),
so the search should have succeeded.

It does seem like we could do

    uint64        startelem = SH_MAX_SIZE;

    ...

    Assert(startelem < SH_MAX_SIZE);

which'd make it a little clearer that the expectation is for
startelem to have changed value.  And I agree that declaring "i"
as int is wrong.

            regards, tom lane



Re: Avoid overflow with simplehash

From
Andres Freund
Date:
Hi,

On 2023-07-06 11:16:26 -0400, Tom Lane wrote:
> Ranier Vilela <ranier.vf@gmail.com> writes:
> > See the comments:
> > "Search for the first empty element."
> > If the empty element is not found, startelem has PG_UINT64_MAX value,
> > which do not fit in uint32.
> 
> I think the point of that assertion is exactly that we're required to
> have an empty element (because max fillfactor is less than 1),
> so the search should have succeeded.

Right, that part of the proposed change seems bogus to me.


> It does seem like we could do
> 
>     uint64        startelem = SH_MAX_SIZE;
> 
>     ...
> 
>     Assert(startelem < SH_MAX_SIZE);
> 
> which'd make it a little clearer that the expectation is for
> startelem to have changed value.

I guess? I find it easier to understand all-bits-set in a coredump as
too-large than SH_MAX_SIZE, but ...


> And I agree that declaring "i" as int is wrong.

Yea, that's definitely not right, not sure how I ended up with that. Will push
a fix. I guess it should be backpatched...

Greetings,

Andres Freund



Re: Avoid overflow with simplehash

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2023-07-06 11:16:26 -0400, Tom Lane wrote:
>> It does seem like we could do
>>     uint64        startelem = SH_MAX_SIZE;
>>     ...
>>     Assert(startelem < SH_MAX_SIZE);
>> which'd make it a little clearer that the expectation is for
>> startelem to have changed value.

> I guess? I find it easier to understand all-bits-set in a coredump as
> too-large than SH_MAX_SIZE, but ...

What'd help even more is a comment:

    /* We should have found an empty element */
    Assert(startelem < SH_MAX_SIZE);

            regards, tom lane



Re: Avoid overflow with simplehash

From
Ranier Vilela
Date:
Em qui., 6 de jul. de 2023 às 12:16, Tom Lane <tgl@sss.pgh.pa.us> escreveu:
Ranier Vilela <ranier.vf@gmail.com> writes:
> See the comments:
> "Search for the first empty element."
> If the empty element is not found, startelem has PG_UINT64_MAX value,
> which do not fit in uint32.

Hi Tom,
I think the point of that assertion is exactly that we're required to
have an empty element (because max fillfactor is less than 1),
so the search should have succeeded.

It does seem like we could do

        uint64          startelem = SH_MAX_SIZE;

        ...

        Assert(startelem < SH_MAX_SIZE);

which'd make it a little clearer that the expectation is for
startelem to have changed value.
I still have doubts about this.

see:
#include <iostream>
#include <string>
#include <limits.h>

#define SH_MAX_SIZE1 (((unsigned long long) 0xFFFFFFFFU) + 1)
#define SH_MAX_SIZE2 (((unsigned long long) 0xFFFFFFFFU) - 1)

int main()
{
    unsigned long long max_size1 = SH_MAX_SIZE1;
    unsigned long long max_size2 = SH_MAX_SIZE2;
    unsigned int cur1 = SH_MAX_SIZE1;
    unsigned int cur2 = SH_MAX_SIZE2;

    printf("SH_MAX_SIZE1=%llu\n", max_size1);
    printf("SH_MAX_SIZE2=%llu\n", max_size2);
    printf("cur1=%u\n", cur1);
    printf("cur2=%u\n", cur2);
}
warning: implicit conversion from 'unsigned long long' to 'unsigned int' changes value from 4294967296 to 0 [-Wconstant-conversion]

outputs:
SH_MAX_SIZE1=4294967296
SH_MAX_SIZE2=4294967294
cur1=0
cur2=4294967294

And in the comments we have:
"Iterate backwards, that allows the current element to be deleted, even
* if there are backward shifts"

So if an empty element is not found and the *cur* field is set to 0 (SH_MAX_SIZE -> uint32),
then will it iterate forwards?

  And I agree that declaring "i"
as int is wrong.
Thanks for the confirmation.

regards,
Ranier Vilela

Re: Avoid overflow with simplehash

From
Andres Freund
Date:
Hi,

On 2023-07-06 13:40:00 -0300, Ranier Vilela wrote:
> I still have doubts about this.
> 
> see:
> #include <iostream>
> #include <string>
> #include <limits.h>
> 
> #define SH_MAX_SIZE1 (((unsigned long long) 0xFFFFFFFFU) + 1)
> #define SH_MAX_SIZE2 (((unsigned long long) 0xFFFFFFFFU) - 1)
> 
> int main()
> {
>     unsigned long long max_size1 = SH_MAX_SIZE1;
>     unsigned long long max_size2 = SH_MAX_SIZE2;
>     unsigned int cur1 = SH_MAX_SIZE1;
>     unsigned int cur2 = SH_MAX_SIZE2;
> 
>     printf("SH_MAX_SIZE1=%llu\n", max_size1);
>     printf("SH_MAX_SIZE2=%llu\n", max_size2);
>     printf("cur1=%u\n", cur1);
>     printf("cur2=%u\n", cur2);
> }
> warning: implicit conversion from 'unsigned long long' to 'unsigned int'
> changes value from 4294967296 to 0 [-Wconstant-conversion]

I don't think we at the moment try to not have implicit-conversion warnings
(nor do we enable them), this would be far from the only place. If we wanted
to here, we'd just need an explicit cast.


> outputs:
> SH_MAX_SIZE1=4294967296
> SH_MAX_SIZE2=4294967294
> cur1=0
> cur2=4294967294
> 
> And in the comments we have:
> "Iterate backwards, that allows the current element to be deleted, even
> * if there are backward shifts"
> 
> So if an empty element is not found and the *cur* field is set to 0
> (SH_MAX_SIZE -> uint32),

That should never be reachable - which the assert tries to ensure.


> then will it iterate forwards?

No, it'd still iterate backwards, but starting from the wrong place - but
there is no correct place to start iterating from if there is no unused
element.

Greetings,

Andres Freund



Re: Avoid overflow with simplehash

From
Ranier Vilela
Date:


Em qui., 6 de jul. de 2023 às 13:52, Andres Freund <andres@anarazel.de> escreveu:
Hi,

On 2023-07-06 13:40:00 -0300, Ranier Vilela wrote:
> I still have doubts about this.
>
> see:
> #include <iostream>
> #include <string>
> #include <limits.h>
>
> #define SH_MAX_SIZE1 (((unsigned long long) 0xFFFFFFFFU) + 1)
> #define SH_MAX_SIZE2 (((unsigned long long) 0xFFFFFFFFU) - 1)
>
> int main()
> {
>     unsigned long long max_size1 = SH_MAX_SIZE1;
>     unsigned long long max_size2 = SH_MAX_SIZE2;
>     unsigned int cur1 = SH_MAX_SIZE1;
>     unsigned int cur2 = SH_MAX_SIZE2;
>
>     printf("SH_MAX_SIZE1=%llu\n", max_size1);
>     printf("SH_MAX_SIZE2=%llu\n", max_size2);
>     printf("cur1=%u\n", cur1);
>     printf("cur2=%u\n", cur2);
> }
> warning: implicit conversion from 'unsigned long long' to 'unsigned int'
> changes value from 4294967296 to 0 [-Wconstant-conversion]

I don't think we at the moment try to not have implicit-conversion warnings
(nor do we enable them), this would be far from the only place. If we wanted
to here, we'd just need an explicit cast.
It was just for show.
 


> outputs:
> SH_MAX_SIZE1=4294967296
> SH_MAX_SIZE2=4294967294
> cur1=0
> cur2=4294967294
>
> And in the comments we have:
> "Iterate backwards, that allows the current element to be deleted, even
> * if there are backward shifts"
>
> So if an empty element is not found and the *cur* field is set to 0
> (SH_MAX_SIZE -> uint32),

That should never be reachable - which the assert tries to ensure.
Right.
 


> then will it iterate forwards?

No, it'd still iterate backwards, but starting from the wrong place - but
there is no correct place to start iterating from if there is no unused
element.
Thanks for the confirmation.

So I suppose we could have this in v1, attached.
With comments added by Tom.

regards,
Ranier Vilela
Attachment

Re: Avoid overflow with simplehash

From
Andres Freund
Date:
Hi,

I pushed changing i to uint32 and adding Tom's comment to 11-HEAD.


On 2023-07-06 14:01:55 -0300, Ranier Vilela wrote:
> > > then will it iterate forwards?
> >
> > No, it'd still iterate backwards, but starting from the wrong place - but
> > there is no correct place to start iterating from if there is no unused
> > element.
> >
> Thanks for the confirmation.
> 
> So I suppose we could have this in v1, attached.
> With comments added by Tom.


> diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
> index 48db837ec8..4fe627a921 100644
> --- a/src/include/lib/simplehash.h
> +++ b/src/include/lib/simplehash.h
> @@ -964,8 +964,8 @@ SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
>  SH_SCOPE void
>  SH_START_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter)
>  {
> -    int            i;
> -    uint64        startelem = PG_UINT64_MAX;
> +    uint32        i;
> +    uint32        startelem = PG_UINT32_MAX;

The startelem type change doesn't strike me as a good idea.  Currently
PG_UINT32_MAX is a valid element.

Greetings,

Andres Freund



Re: Avoid overflow with simplehash

From
Ranier Vilela
Date:
Em qui., 6 de jul. de 2023 às 14:33, Andres Freund <andres@anarazel.de> escreveu:
Hi,

I pushed changing i to uint32 and adding Tom's comment to 11-HEAD.
Thank you.
 
regards,
Ranier Vilela