Thread: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

Hi,

At function has_matching_range, if variable ranges->nranges == 0,
we exit quickly with a result equal to false.

This means that nranges can be zero.
It occurs then that it is possible then to occur an array out of bonds, in the initialization of the variable maxvalue.
So if nranges is equal to zero, there is no need to initialize minvalue and maxvalue.

The patch tries to fix it, avoiding possible errors by using maxvalue.

regards,
Ranier Vilela
Attachment
At Fri, 26 Aug 2022 10:28:50 -0300, Ranier Vilela <ranier.vf@gmail.com> wrote in 
> At function has_matching_range, if variable ranges->nranges == 0,
> we exit quickly with a result equal to false.
> 
> This means that nranges can be zero.
> It occurs then that it is possible then to occur an array out of bonds, in
> the initialization of the variable maxvalue.
> So if nranges is equal to zero, there is no need to initialize minvalue and
> maxvalue.
> 
> The patch tries to fix it, avoiding possible errors by using maxvalue.

However it seems that nranges will never be zero, still the fix looks
good to me since it is generally allowed to be zero. I don't find a
similar mistake related to Range.nranges.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Em dom., 28 de ago. de 2022 às 22:06, Kyotaro Horiguchi <horikyota.ntt@gmail.com> escreveu:
At Fri, 26 Aug 2022 10:28:50 -0300, Ranier Vilela <ranier.vf@gmail.com> wrote in
> At function has_matching_range, if variable ranges->nranges == 0,
> we exit quickly with a result equal to false.
>
> This means that nranges can be zero.
> It occurs then that it is possible then to occur an array out of bonds, in
> the initialization of the variable maxvalue.
> So if nranges is equal to zero, there is no need to initialize minvalue and
> maxvalue.
>
> The patch tries to fix it, avoiding possible errors by using maxvalue.

However it seems that nranges will never be zero, still the fix looks
good to me since it is generally allowed to be zero. I don't find a
similar mistake related to Range.nranges.
Thanks  Kyotaro for taking a look at this.

regards,
Ranier Vilela
On Sat, 27 Aug 2022 at 01:29, Ranier Vilela <ranier.vf@gmail.com> wrote:
> At function has_matching_range, if variable ranges->nranges == 0,
> we exit quickly with a result equal to false.
>
> This means that nranges can be zero.
> It occurs then that it is possible then to occur an array out of bonds, in the initialization of the variable
maxvalue.
> So if nranges is equal to zero, there is no need to initialize minvalue and maxvalue.

I think there's more strange coding in the same file that might need
addressed, for example, AssertCheckRanges() does:

if (ranges->nranges == 0)
break;

from within the first for() loop.  Why can't that check be outside of
the loop. Nothing seems to make any changes to that field from within
the loop.

Also, in the final loop of the same function there's:

if (ranges->nsorted == 0)
break;

It's not very obvious to me why we don't only run that loop when
ranges->nsorted > 0.  Also, isn't it an array overrun to access:

Datum value = ranges->values[2 * ranges->nranges + i];

If there's only 1 range stored in the array, then there should be 2
elements, but that code will try to access the 3rd element with
ranges->values[2].

This is not so critical, but I'll note it down anyway.  The following
looks a bit suboptimal in brin_minmax_multi_summary_out():

StringInfoData str;

initStringInfo(&str);

a = FunctionCall1(&fmgrinfo, ranges_deserialized->values[idx++]);

appendStringInfoString(&str, DatumGetCString(a));

b = cstring_to_text(str.data);

Why do we need a StringInfoData there?  Why not just do:

b = cstring_to_text(DatumGetCString(a)); ?

That requires less memcpy()s and pallocs().

I've included Tomas just in case I've misunderstood the nrange stuff.

David



Em qui., 1 de set. de 2022 às 21:27, David Rowley <dgrowleyml@gmail.com> escreveu:
On Sat, 27 Aug 2022 at 01:29, Ranier Vilela <ranier.vf@gmail.com> wrote:
> At function has_matching_range, if variable ranges->nranges == 0,
> we exit quickly with a result equal to false.
>
> This means that nranges can be zero.
> It occurs then that it is possible then to occur an array out of bonds, in the initialization of the variable maxvalue.
> So if nranges is equal to zero, there is no need to initialize minvalue and maxvalue.

I think there's more strange coding in the same file that might need
addressed, for example, AssertCheckRanges() does:

if (ranges->nranges == 0)
break;

from within the first for() loop.  Why can't that check be outside of
the loop. Nothing seems to make any changes to that field from within
the loop.

Also, in the final loop of the same function there's:

if (ranges->nsorted == 0)
break;

It's not very obvious to me why we don't only run that loop when
ranges->nsorted > 0.  Also, isn't it an array overrun to access:

Datum value = ranges->values[2 * ranges->nranges + i];

If there's only 1 range stored in the array, then there should be 2
elements, but that code will try to access the 3rd element with
ranges->values[2].
Yeah, it seems to me that both nranges and nsorted are invariant there,
so we can safely avoid loops.
 

This is not so critical, but I'll note it down anyway.  The following
looks a bit suboptimal in brin_minmax_multi_summary_out():

StringInfoData str;

initStringInfo(&str);

a = FunctionCall1(&fmgrinfo, ranges_deserialized->values[idx++]);

appendStringInfoString(&str, DatumGetCString(a));

b = cstring_to_text(str.data);

Why do we need a StringInfoData there?  Why not just do:

b = cstring_to_text(DatumGetCString(a)); ?

That requires less memcpy()s and pallocs().
I agree that StringInfoData is not needed there.
Is it strange to convert char * to only store a temporary str.data.

Why not?
astate_values = accumArrayResult(astate_values,
                          PointerGetDatum(a),
                          false,
                         TEXTOID,
                         CurrentMemoryContext);

Is it possible to avoid cstring_to_text conversion?

regards,
Ranier Vilela
On Fri, 2 Sept 2022 at 12:55, Ranier Vilela <ranier.vf@gmail.com> wrote:
> Why not?
> astate_values = accumArrayResult(astate_values,
>                           PointerGetDatum(a),
>                           false,
>                          TEXTOID,
>                          CurrentMemoryContext);
>
> Is it possible to avoid cstring_to_text conversion?

Note the element_type is being passed to accumArrayResult() as
TEXTOID, so we should be passing a text type, not a cstring type. The
conversion to text is required.

David



On Mon, Aug 29, 2022 at 10:06:55AM +0900, Kyotaro Horiguchi wrote:
> At Fri, 26 Aug 2022 10:28:50 -0300, Ranier Vilela <ranier.vf@gmail.com> wrote in 
> > At function has_matching_range, if variable ranges->nranges == 0,
> > we exit quickly with a result equal to false.
> > 
> > This means that nranges can be zero.
> > It occurs then that it is possible then to occur an array out of bonds, in
> > the initialization of the variable maxvalue.
> > So if nranges is equal to zero, there is no need to initialize minvalue and
> > maxvalue.
> > 
> > The patch tries to fix it, avoiding possible errors by using maxvalue.
> 
> However it seems that nranges will never be zero, still the fix looks
> good to me since it is generally allowed to be zero. I don't find a
> similar mistake related to Range.nranges.

Actually, the nranges==0 branch is hit during regression tests:
https://coverage.postgresql.org/src/backend/access/brin/brin_minmax_multi.c.gcov.html

I'm not sure, but I *suspect* that compilers usually check
  ranges->nranges==0
before reading ranges->values[2 * ranges->nranges - 1];

Especially since it's a static function.

Even if they didn't (say, under -O0), values[-1] would probably point to
a palloc header, which would be enough to "not crash" before returning
one line later.

But +1 to fix this and other issues even if they would never crash.

-- 
Justin



Em sex., 2 de set. de 2022 às 09:01, Justin Pryzby <pryzby@telsasoft.com> escreveu:
On Mon, Aug 29, 2022 at 10:06:55AM +0900, Kyotaro Horiguchi wrote:
> At Fri, 26 Aug 2022 10:28:50 -0300, Ranier Vilela <ranier.vf@gmail.com> wrote in
> > At function has_matching_range, if variable ranges->nranges == 0,
> > we exit quickly with a result equal to false.
> >
> > This means that nranges can be zero.
> > It occurs then that it is possible then to occur an array out of bonds, in
> > the initialization of the variable maxvalue.
> > So if nranges is equal to zero, there is no need to initialize minvalue and
> > maxvalue.
> >
> > The patch tries to fix it, avoiding possible errors by using maxvalue.
>
> However it seems that nranges will never be zero, still the fix looks
> good to me since it is generally allowed to be zero. I don't find a
> similar mistake related to Range.nranges.

Actually, the nranges==0 branch is hit during regression tests:
https://coverage.postgresql.org/src/backend/access/brin/brin_minmax_multi.c.gcov.html

I'm not sure, but I *suspect* that compilers usually check
  ranges->nranges==0
before reading ranges->values[2 * ranges->nranges - 1];

Especially since it's a static function.

Even if they didn't (say, under -O0), values[-1] would probably point to
a palloc header, which would be enough to "not crash" before returning
one line later.

But +1 to fix this and other issues even if they would never crash.
Thanks Justin.

Based on comments by David, I made a new patch.
To simplify I've included the 0001 in the 0002 patch.

Summary:
1. Once that ranges->nranges is invariant, avoid the loop if ranges->nranges <= 0.
This matches the current behavior.

2. Once that ranges->nsorted is invariant, avoid the loop if ranges->nsorted <= 0.
This matches the current behavior.

3. Remove the invariant cxt from ranges->nsorted loop.

4. Avoid possible overflows when using int to store length strings.

5. Avoid possible out-of-bounds when ranges->nranges == 0.

6. Avoid overhead when using unnecessary StringInfoData to convert Datum a to Text b.

Attached is 0002.

regards,
Ranier Vilela
Attachment
On Sat, 3 Sept 2022 at 00:37, Ranier Vilela <ranier.vf@gmail.com> wrote:
>> But +1 to fix this and other issues even if they would never crash.

Yeah, I don't think any of this coding would lead to a crash, but it's
pretty weird coding and we should fix it.

> 1. Once that ranges->nranges is invariant, avoid the loop if ranges->nranges <= 0.
> This matches the current behavior.
>
> 2. Once that ranges->nsorted is invariant, avoid the loop if ranges->nsorted <= 0.
> This matches the current behavior.
>
> 3. Remove the invariant cxt from ranges->nsorted loop.
>
> 4. Avoid possible overflows when using int to store length strings.
>
> 5. Avoid possible out-of-bounds when ranges->nranges == 0.
>
> 6. Avoid overhead when using unnecessary StringInfoData to convert Datum a to Text b.

I've ripped out #4 and #6 for now. I think we should do #6 in master
only, probably as part of a wider cleanup of StringInfo misusages.

I also spent some time trying to ensure I understand this code
correctly.  I was unable to work out what the nsorted field was from
just the comments. I needed to look at the code to figure it out, so I
think the comments for that field need to be improved.  A few of the
others were not that clear either.  I hope I've improved them in the
attached.

I was also a bit confused at various other comments. e.g:

/*
* Is the value greater than the maxval? If yes, we'll recurse to the
* right side of range array.
*/

I don't see any sort of recursion going on here. All I see are
skipping of values that are out of bounds of the lower bound of the
lowest range, and above the upper bound of the highest range.

I propose to backpatch the attached into v14 tomorrow morning (about
12 hours from now).

The only other weird coding I found was in brin_range_deserialize:

for (i = 0; (i < nvalues) && (!typbyval); i++)

I imagine most compilers would optimize that so that the typbyval
check is done before the first loop and not done on every loop, but I
don't think that coding practice helps the human readers out much. I
left that one alone, for now.

David

Attachment
Em seg., 5 de set. de 2022 às 07:15, David Rowley <dgrowleyml@gmail.com> escreveu:
On Sat, 3 Sept 2022 at 00:37, Ranier Vilela <ranier.vf@gmail.com> wrote:
>> But +1 to fix this and other issues even if they would never crash.

Yeah, I don't think any of this coding would lead to a crash, but it's
pretty weird coding and we should fix it.

> 1. Once that ranges->nranges is invariant, avoid the loop if ranges->nranges <= 0.
> This matches the current behavior.
>
> 2. Once that ranges->nsorted is invariant, avoid the loop if ranges->nsorted <= 0.
> This matches the current behavior.
>
> 3. Remove the invariant cxt from ranges->nsorted loop.
>
> 4. Avoid possible overflows when using int to store length strings.
>
> 5. Avoid possible out-of-bounds when ranges->nranges == 0.
>
> 6. Avoid overhead when using unnecessary StringInfoData to convert Datum a to Text b.

I've ripped out #4 and #6 for now. I think we should do #6 in master
only, probably as part of a wider cleanup of StringInfo misusages.

I also spent some time trying to ensure I understand this code
correctly.  I was unable to work out what the nsorted field was from
just the comments. I needed to look at the code to figure it out, so I
think the comments for that field need to be improved.  A few of the
others were not that clear either.  I hope I've improved them in the
attached.

I was also a bit confused at various other comments. e.g:

/*
* Is the value greater than the maxval? If yes, we'll recurse to the
* right side of range array.
*/
The second comment in the v3 patch, must be:
/*
* Is the value greater than the maxval? If yes, we'll recurse
* to the right side of the range array.
*/

I think this is copy-and-paste thinko with the word "minval".
 

I don't see any sort of recursion going on here. All I see are
skipping of values that are out of bounds of the lower bound of the
lowest range, and above the upper bound of the highest range.
I think this kind recursion, because the loop is restarted
with:
start = (midpoint + 1);
continue;
 

I propose to backpatch the attached into v14 tomorrow morning (about
12 hours from now).

The only other weird coding I found was in brin_range_deserialize:

for (i = 0; (i < nvalues) && (!typbyval); i++)

I imagine most compilers would optimize that so that the typbyval
check is done before the first loop and not done on every loop, but I
don't think that coding practice helps the human readers out much. I
left that one alone, for now.
Yeah, I prefer write:
if (!typbyval)
{
    for (i = 0; (i < nvalues); i++)

regards,
Ranier Vilela
On Mon, 5 Sept 2022 at 22:15, David Rowley <dgrowleyml@gmail.com> wrote:
> On Sat, 3 Sept 2022 at 00:37, Ranier Vilela <ranier.vf@gmail.com> wrote:
> > 6. Avoid overhead when using unnecessary StringInfoData to convert Datum a to Text b.
>
> I've ripped out #4 and #6 for now. I think we should do #6 in master
> only, probably as part of a wider cleanup of StringInfo misusages.

I've attached a patch which does various other string operation cleanups.

* This changes cstring_to_text() to use cstring_to_text_with_len when
we're working with a StringInfo and can just access the .len field.
* Uses appendStringInfoString instead of appendStringInfo when there
is special formatting.
* Uses pstrdup(str) instead of psprintf("%s", str).  In many cases
this will save a bit of memory
* Uses appendPQExpBufferChar instead of appendPQExpBufferStr() when
appending a 1 byte string.
* Uses appendStringInfoChar() instead of appendStringInfo() when no
formatting and string is 1 byte.
* Uses appendStringInfoChar() instead of appendStringInfoString() when
string is 1 byte.
* Uses appendPQExpBuffer(b , ...) instead of appendPQExpBufferStr(b, "%s" ...)

I'm aware there are a few other places that we could use
cstring_to_text_with_len() instead of cstring_to_text(). For example,
using the return value of snprintf() to obtain the length. I just
didn't do that because we need to take care to check the return value
isn't -1.

My grep patterns didn't account for these function calls spanning
multiple lines, so I may have missed a few.

David

Attachment
Em seg., 5 de set. de 2022 às 10:40, David Rowley <dgrowleyml@gmail.com> escreveu:
On Mon, 5 Sept 2022 at 22:15, David Rowley <dgrowleyml@gmail.com> wrote:
> On Sat, 3 Sept 2022 at 00:37, Ranier Vilela <ranier.vf@gmail.com> wrote:
> > 6. Avoid overhead when using unnecessary StringInfoData to convert Datum a to Text b.
>
> I've ripped out #4 and #6 for now. I think we should do #6 in master
> only, probably as part of a wider cleanup of StringInfo misusages.

I've attached a patch which does various other string operation cleanups.

* This changes cstring_to_text() to use cstring_to_text_with_len when
we're working with a StringInfo and can just access the .len field.
* Uses appendStringInfoString instead of appendStringInfo when there
is special formatting.
* Uses pstrdup(str) instead of psprintf("%s", str).  In many cases
this will save a bit of memory
* Uses appendPQExpBufferChar instead of appendPQExpBufferStr() when
appending a 1 byte string.
* Uses appendStringInfoChar() instead of appendStringInfo() when no
formatting and string is 1 byte.
* Uses appendStringInfoChar() instead of appendStringInfoString() when
string is 1 byte.
* Uses appendPQExpBuffer(b , ...) instead of appendPQExpBufferStr(b, "%s" ...)

I'm aware there are a few other places that we could use
cstring_to_text_with_len() instead of cstring_to_text(). For example,
using the return value of snprintf() to obtain the length. I just
didn't do that because we need to take care to check the return value
isn't -1.

My grep patterns didn't account for these function calls spanning
multiple lines, so I may have missed a few.
I did a search and found a few more places.
v1 attached.

regards,
Ranier Vilela
Attachment
On Tue, 6 Sept 2022 at 06:07, Ranier Vilela <ranier.vf@gmail.com> wrote:
> I did a search and found a few more places.
> v1 attached.

Thanks.  I've done a bit more looking and found a few more places that
we can improve and I've pushed the result.

It feels like it would be good if we had a way to detect a few of
these issues much earlier than we are currently.  There's been a long
series of commits fixing up this sort of thing.  If we had a tool to
parse the .c files and look for things like a function call to
appendPQExpBuffer() and appendStringInfo() with only 2 parameters (i.e
no va_arg arguments).

I'll hold off a few days before pushing the other patch.  Tom stamped
beta4 earlier, so I'll hold off until after the tag.

David



Em seg., 5 de set. de 2022 às 22:29, David Rowley <dgrowleyml@gmail.com> escreveu:
On Tue, 6 Sept 2022 at 06:07, Ranier Vilela <ranier.vf@gmail.com> wrote:
> I did a search and found a few more places.
> v1 attached.

Thanks.  I've done a bit more looking and found a few more places that
we can improve and I've pushed the result.
Thanks.
 

It feels like it would be good if we had a way to detect a few of
these issues much earlier than we are currently.  There's been a long
series of commits fixing up this sort of thing.  If we had a tool to
parse the .c files and look for things like a function call to
appendPQExpBuffer() and appendStringInfo() with only 2 parameters (i.e
no va_arg arguments).
StaticAssert could check va_arg no?
 
regards,
Ranier Vilela
On Tue, 6 Sept 2022 at 13:52, Ranier Vilela <ranier.vf@gmail.com> wrote:
>
> Em seg., 5 de set. de 2022 às 22:29, David Rowley <dgrowleyml@gmail.com> escreveu:
>> It feels like it would be good if we had a way to detect a few of
>> these issues much earlier than we are currently.  There's been a long
>> series of commits fixing up this sort of thing.  If we had a tool to
>> parse the .c files and look for things like a function call to
>> appendPQExpBuffer() and appendStringInfo() with only 2 parameters (i.e
>> no va_arg arguments).
>
> StaticAssert could check va_arg no?

I'm not sure exactly what you have in mind. If you think you have a
way to make that work, it would be good to see a patch with it.

David



Em seg., 5 de set. de 2022 às 23:02, David Rowley <dgrowleyml@gmail.com> escreveu:
On Tue, 6 Sept 2022 at 13:52, Ranier Vilela <ranier.vf@gmail.com> wrote:
>
> Em seg., 5 de set. de 2022 às 22:29, David Rowley <dgrowleyml@gmail.com> escreveu:
>> It feels like it would be good if we had a way to detect a few of
>> these issues much earlier than we are currently.  There's been a long
>> series of commits fixing up this sort of thing.  If we had a tool to
>> parse the .c files and look for things like a function call to
>> appendPQExpBuffer() and appendStringInfo() with only 2 parameters (i.e
>> no va_arg arguments).
>
> StaticAssert could check va_arg no?

I'm not sure exactly what you have in mind. If you think you have a
way to make that work, it would be good to see a patch with it.
I will study the matter.
But first, I would like to continue with this correction of using strings.
In the following cases:
fprintf -> fputs -> fputc
printf -> puts -> putchar

There are many occurrences, do you think it would be worth the effort?

regards,
Ranier Vilela
On Tue, 6 Sept 2022 at 23:25, Ranier Vilela <ranier.vf@gmail.com> wrote:
> But first, I would like to continue with this correction of using strings.
> In the following cases:
> fprintf -> fputs -> fputc
> printf -> puts -> putchar
>
> There are many occurrences, do you think it would be worth the effort?

I'm pretty unexcited about that. Quite a bit of churn and adding
another precedent that we currently have no good way to enforce or
maintain.

In addition to that, puts() is a fairly seldom used function, which
perhaps is because it's a bit quirky and appends a \n to the end of
the string. I'm just imagining all the bugs where we append an extra
newline. But, feel free to open another thread about it and see if you
can drum up any support.

David



On Tue, 6 Sept 2022 at 13:29, David Rowley <dgrowleyml@gmail.com> wrote:
> I'll hold off a few days before pushing the other patch.  Tom stamped
> beta4 earlier, so I'll hold off until after the tag.

I've now pushed this.

David



Em seg., 12 de set. de 2022 às 20:06, David Rowley <dgrowleyml@gmail.com> escreveu:
On Tue, 6 Sept 2022 at 13:29, David Rowley <dgrowleyml@gmail.com> wrote:
> I'll hold off a few days before pushing the other patch.  Tom stamped
> beta4 earlier, so I'll hold off until after the tag.

I've now pushed this.
Thank you David.
But the correct thing is to put you also as author, after all, there's more of your code there than mine.
Anyway, I appreciate the consideration.

regards,
Ranier Vilela
Em seg., 5 de set. de 2022 às 23:02, David Rowley <dgrowleyml@gmail.com> escreveu:
On Tue, 6 Sept 2022 at 13:52, Ranier Vilela <ranier.vf@gmail.com> wrote:
>
> Em seg., 5 de set. de 2022 às 22:29, David Rowley <dgrowleyml@gmail.com> escreveu:
>> It feels like it would be good if we had a way to detect a few of
>> these issues much earlier than we are currently.  There's been a long
>> series of commits fixing up this sort of thing.  If we had a tool to
>> parse the .c files and look for things like a function call to
>> appendPQExpBuffer() and appendStringInfo() with only 2 parameters (i.e
>> no va_arg arguments).
>
> StaticAssert could check va_arg no?

I'm not sure exactly what you have in mind. If you think you have a
way to make that work, it would be good to see a patch with it.
About this:

1. StaticAssertSmt can not help.
Although some posts on the web show that it is possible to calculate the number of arguments,
I didn't get anything useful.
So I left this option.

2. Compiler supports
Best solution.
But currently does not allow the suggestion to use another function.

3.  Owner tool
Temporary solution.
Can help, until the compilers build support for it.

So, I made one very simple tool, can do the basics here.
Not meant to be some universal lint.
It only processes previously coded functions.

pg_check test1.c
line (1): should be appendPQExpBufferStr?
line (2): should be appendPQExpBufferChar?
line (4): should be appendPQExpBufferStr?
line (5): should be appendPQExpBufferStr?

I don't think it's anywhere near the quality to be considered Postgres, but it could be a start.
If it helps, great, if not, fine.

regards,
Ranier Vilela
Attachment