Thread: BUG #17950: Incorrect memory access in gtsvector_picksplit()

BUG #17950: Incorrect memory access in gtsvector_picksplit()

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      17950
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 16beta1
Operating system:   Ubuntu 22.04
Description:

The following script:
CREATE TABLE test_tsvector(t text, a tsvector);

SELECT 'COPY test_tsvector FROM ''.../src/test/regress/data/tsearch.data'';'
FROM generate_series(1, 19)
\gexec

CREATE TRIGGER tsvectorupdate
        BEFORE UPDATE OR INSERT ON test_tsvector
        FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(a,
'pg_catalog.english', t);

SELECT 'COPY test_tsvector FROM ''.../src/test/regress/data/tsearch.data'';'
FROM generate_series(1, 38)
\gexec

CREATE INDEX gistidx ON test_tsvector USING gist (a
tsvector_ops(siglen=1));
-- I believe it's not the only way to get a data pattern needed, so
-- probably the repro could be simplified, if desired.

triggers a Valgrind-detected memory access error:
==00:00:00:53.414 342514== Invalid read of size 1
==00:00:00:53.414 342514==    at 0x79787D: pg_popcount (pg_bitutils.c:332)
==00:00:00:53.414 342514==    by 0x6F93C6: sizebitvec (tsgistidx.c:488)
==00:00:00:53.414 342514==    by 0x6FA24A: gtsvector_picksplit
(tsgistidx.c:731)
==00:00:00:53.414 342514==    by 0x74146D: FunctionCall2Coll (fmgr.c:1132)
==00:00:00:53.414 342514==    by 0x217E65: gistUserPicksplit
(gistsplit.c:433)
==00:00:00:53.414 342514==    by 0x2184A5: gistSplitByKey
(gistsplit.c:697)
==00:00:00:53.414 342514==    by 0x20C237: gistSplit (gist.c:1450)
==00:00:00:53.414 342514==    by 0x20CA32: gistplacetopage (gist.c:309)
==00:00:00:53.414 342514==    by 0x20DBFB: gistinserttuples (gist.c:1278)
==00:00:00:53.414 342514==    by 0x20DF85: gistfinishsplit (gist.c:1376)
==00:00:00:53.414 342514==    by 0x20DC79: gistinserttuples (gist.c:1305)
==00:00:00:53.414 342514==    by 0x20E39E: gistinserttuple (gist.c:1231)
==00:00:00:53.414 342514==  Address 0x7386c68 is 16,024 bytes inside a block
of size 16,384 alloc'd
==00:00:00:53.414 342514==    at 0x4848899: malloc
(vg_replace_malloc.c:381)
==00:00:00:53.414 342514==    by 0x7602C7: AllocSetAlloc (aset.c:924)
==00:00:00:53.414 342514==    by 0x76C26A: palloc (mcxt.c:1240)
==00:00:00:53.414 342514==    by 0x20C24A: gistSplit (gist.c:1453)
==00:00:00:53.414 342514==    by 0x20CA32: gistplacetopage (gist.c:309)
==00:00:00:53.414 342514==    by 0x20DBFB: gistinserttuples (gist.c:1278)
==00:00:00:53.414 342514==    by 0x20E39E: gistinserttuple (gist.c:1231)
==00:00:00:53.414 342514==    by 0x20E940: gistdoinsert (gist.c:886)
==00:00:00:53.414 342514==    by 0x21152B: gistBuildCallback
(gistbuild.c:929)
==00:00:00:53.414 342514==    by 0x24420D: heapam_index_build_range_scan
(heapam_handler.c:1708)
==00:00:00:53.414 342514==    by 0x2119E4: table_index_build_scan
(tableam.h:1781)
==00:00:00:53.414 342514==    by 0x2119E4: gistbuild (gistbuild.c:317)
==00:00:00:53.414 342514==    by 0x2E06F5: index_build (index.c:3032)
==00:00:00:53.414 342514==
(Several runs might be required for the issue reproduction.)

With the additional debug logging in gtsvector_picksplit():
@@ -722,6 +722,11 @@ gtsvector_picksplit(PG_FUNCTION_ARGS)
             continue;
         }
 
+if (!cache[j].allistrue) {
+elog(LOG, "!!!gtsvector_picksplit| j: %d, cache[j].sign: %p,
GETSIGN(cache[j].sign): %p", j, cache[j].sign, GETSIGN(cache[j].sign));
+VALGRIND_CHECK_MEM_IS_DEFINED(GETSIGN(cache[j].sign), siglen);
+}
+
         if (ISALLTRUE(datum_l) || cache[j].allistrue)
         {
             if (ISALLTRUE(datum_l) && cache[j].allistrue)

(and #include "utils/memdebug.h")
I see:
cache[j].sign: 0x723a999, GETSIGN(cache[j].sign): 0x723a9a1

==00:00:00:18.356 351519== Unaddressable byte(s) found during client check
request
==00:00:00:18.356 351519==    at 0x6FA2BE: gtsvector_picksplit
(tsgistidx.c:727)
...
==00:00:00:18.356 351519==  Address 0x723a9a1 is 4,977 bytes inside a block
of size 8,192 alloc'd

Reproduced starting from 911e70207.

But with the slight variation:
CREATE INDEX gistidx ON test_tsvector USING gist (a tsvector_ops);
and the debugging patch:
@@ -711,6 +712,9 @@ gtsvector_picksplit(PG_FUNCTION_ARGS)
         else
             size_alpha = hemdistsign(cache[j].sign, GETSIGN(datum_l));
 
+if (!cache[j].allistrue)
+VALGRIND_CHECK_MEM_IS_DEFINED(GETSIGN(cache[j].sign), SIGLEN);
+
         if (ISALLTRUE(datum_r) || cache[j].allistrue)
         {
             if (ISALLTRUE(datum_r) && cache[j].allistrue)

it's reproduced even on 911e70207~1:
==00:00:00:15.858 370963== Unaddressable byte(s) found during client check
request
==00:00:00:15.858 370963==    at 0x636E0E: gtsvector_picksplit
(tsgistidx.c:716)
==00:00:00:15.858 370963==    by 0x67B6D3: FunctionCall2Coll (fmgr.c:1162)
==00:00:00:15.858 370963==    by 0x1FB662: gistUserPicksplit
(gistsplit.c:433)
==00:00:00:15.858 370963==    by 0x1FBCCF: gistSplitByKey
(gistsplit.c:697)
==00:00:00:15.858 370963==    by 0x1F088A: gistSplit (gist.c:1441)
==00:00:00:15.858 370963==    by 0x1F0F3C: gistplacetopage (gist.c:302)
==00:00:00:15.858 370963==    by 0x1F202F: gistinserttuples (gist.c:1270)
==00:00:00:15.858 370963==    by 0x1F273A: gistinserttuple (gist.c:1223)
==00:00:00:15.858 370963==    by 0x1F2BCE: gistdoinsert (gist.c:879)
==00:00:00:15.858 370963==    by 0x1F4E8F: gistBuildCallback
(gistbuild.c:470)
==00:00:00:15.858 370963==    by 0x2262BD: heapam_index_build_range_scan
(heapam_handler.c:1659)
==00:00:00:15.858 370963==    by 0x1F50E7: table_index_build_scan
(tableam.h:1540)
==00:00:00:15.858 370963==    by 0x1F50E7: gistbuild (gistbuild.c:196)
==00:00:00:15.858 370963==  Address 0x11a3a68b is 4,939 bytes inside a block
of size 16,384 alloc'd
==00:00:00:15.858 370963==    at 0x4848899: malloc
(vg_replace_malloc.c:381)
==00:00:00:15.858 370963==    by 0x699954: AllocSetAlloc (aset.c:941)
==00:00:00:15.858 370963==    by 0x6A26CC: palloc (mcxt.c:963)
==00:00:00:15.858 370963==    by 0x636990: gtsvector_picksplit
(tsgistidx.c:613)
...

So it looks like this defect exists in core since 140d4ebcb.

IIUC, using the GETSIGN macro with cache[j].sign is a mistake -- it
erroneously adds 8 to an address of the sign field, so for the last j
it leads to an out-of-bounds memory read.


Re: BUG #17950: Incorrect memory access in gtsvector_picksplit()

From
Alexander Lakhin
Date:
29.05.2023 23:00, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      17950

I managed to reduce the reproducer to the following:
CREATE TABLE tst(t tsvector);
INSERT INTO tst SELECT array_to_string(array(SELECT 'a' || x::text FROM generate_series(1, 125) x), ' ')::tsvector FROM

generate_series(1, 3000);
INSERT INTO tst SELECT '' FROM generate_series(1, 100);
CREATE INDEX gistidx ON tst USING gist (t tsvector_ops(siglen=1));

(Sorry for the previous messy script.)

A trivial fix for the issue is attached.

BTW, when looking at the index contents (page 0) using pageinspect, I saw:
  itemoffset |    ctid     | itemlen | dead | keys
------------+-------------+---------+------+-----------------------------------
           1 | (367,65535) |      16 | f    | (a)=("0 true bits, 0 false bits")
           2 | (368,65535) |      16 | f    | (a)=("0 true bits, 0 false bits")
The text describing keys looks confusing, just as if siglen was 0, but it's
not the case.
This is explained by the code:
         int            siglen = GETSIGLEN(key);
         int            cnttrue = (ISALLTRUE(key)) ? SIGLENBIT(siglen) : sizebitvec(GETSIGN(key), siglen);

         sprintf(outbuf, SINGOUTSTR, cnttrue, (int) SIGLENBIT(siglen) - cnttrue);

When ISALLTRUE, the code tries to calculate bit count from siglen, but
siglen is 0 in this case.
So maybe fix it in passing too...

Best regards,
Alexander
Attachment

Re: BUG #17950: Incorrect memory access in gtsvector_picksplit()

From
Alexander Lakhin
Date:
17.06.2023 17:00, Alexander Lakhin wrote:
> 29.05.2023 23:00, PG Bug reporting form wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference:      17950
>
> I managed to reduce the reproducer to the following:
> CREATE TABLE tst(t tsvector);
> INSERT INTO tst SELECT array_to_string(array(SELECT 'a' || x::text FROM generate_series(1, 125) x), ' ')::tsvector 
> FROM generate_series(1, 3000);
> INSERT INTO tst SELECT '' FROM generate_series(1, 100);
> CREATE INDEX gistidx ON tst USING gist (t tsvector_ops(siglen=1));
>
>
> A trivial fix for the issue is attached.
>

I can also propose a regression test addition that demonstrates the valgrind
complaint and also the output of gtsvectorout() for the case ISALLTRUE and
the opposite.

This addition increases the duration of `make check -C contrib/pageinspect`
under valgrind by 7-8 seconds for me:
ok 5         - gist                                     2496 ms
->
ok 5         - gist                                     9890 ms

In absence of any objections or other propositions, I'm inclined to register
this bugfix on the commitfest.

Best regards,
Alexander
Attachment

Re: BUG #17950: Incorrect memory access in gtsvector_picksplit()

From
Tom Lane
Date:
Alexander Lakhin <exclusion@gmail.com> writes:
> I can also propose a regression test addition that demonstrates the valgrind
> complaint and also the output of gtsvectorout() for the case ISALLTRUE and
> the opposite.

OK, but ...

> This addition increases the duration of `make check -C contrib/pageinspect`
> under valgrind by 7-8 seconds for me:

[ ie, more than triple its previous runtime ]

... that seems completely unacceptable cost-wise.  I'd be inclined
to commit the fix without a supporting test case, instead of that.

Given that the misapplication of GETSIGN is causing an incorrect
pointer to be passed to sizebitvec(), how come the error is not
leading to outright wrong answers?  I guess because it's in
picksplit, the worst outcome normally is a poor choice of split,
so maybe exhibiting wrong behavior in a detectable way is hard.

> In absence of any objections or other propositions, I'm inclined to register
> this bugfix on the commitfest.

Please do that in any case, so we don't forget about it.

            regards, tom lane



Re: BUG #17950: Incorrect memory access in gtsvector_picksplit()

From
Alexander Lakhin
Date:
Hello Tom,

13.08.2023 17:35, Tom Lane wrote:
> Alexander Lakhin <exclusion@gmail.com> writes:
>> I can also propose a regression test addition that demonstrates the valgrind
>> complaint and also the output of gtsvectorout() for the case ISALLTRUE and
>> the opposite.
> OK, but ...
>
>> This addition increases the duration of `make check -C contrib/pageinspect`
>> under valgrind by 7-8 seconds for me:
> [ ie, more than triple its previous runtime ]

Yes, unfortunately.

> ... that seems completely unacceptable cost-wise.  I'd be inclined
> to commit the fix without a supporting test case, instead of that.

I've tried to make that addition as valuable as possible in the context of
this issue, but as it is too expensive (to be honest, I was bolstered by
the gin test duration (~10 secs too)), then let's leave it aside.

> Given that the misapplication of GETSIGN is causing an incorrect
> pointer to be passed to sizebitvec(), how come the error is not
> leading to outright wrong answers?  I guess because it's in
> picksplit, the worst outcome normally is a poor choice of split,
> so maybe exhibiting wrong behavior in a detectable way is hard.

(In fact, I stopped my previous research when I had come to the conclusion
that generating data pattern needed to demonstrate wrong answers or at least
an inefficient split require many more data rows than we can afford in a
regression test.)

>> In absence of any objections or other propositions, I'm inclined to register
>> this bugfix on the commitfest.
> Please do that in any case, so we don't forget about it.

Thank you!
Done:
https://commitfest.postgresql.org/44/4498/

Best regards,
Alexander



Re: BUG #17950: Incorrect memory access in gtsvector_picksplit()

From
Michael Paquier
Date:
On Sun, Aug 13, 2023 at 09:00:01PM +0300, Alexander Lakhin wrote:
> I've tried to make that addition as valuable as possible in the context of
> this issue, but as it is too expensive (to be honest, I was bolstered by
> the gin test duration (~10 secs too)), then let's leave it aside.

I have applied 0001 down to 11 to get the basic fix in place, but the
regression tests are really too expensive compared to the value they
bring.

Regarding the changes in gtsvectorout(), the output produced is indeed
confusing when ISALLTRUE is set.

-        int            siglen = GETSIGLEN(key);
-        int            cnttrue = (ISALLTRUE(key)) ? SIGLENBIT(siglen) : sizebitvec(GETSIGN(key), siglen);
+        if (ISALLTRUE(key))
+            sprintf(outbuf, "all true bits");
+        else
+        {
+            int            siglen = GETSIGLEN(key);
+            int            cnttrue = (ISALLTRUE(key)) ? SIGLENBIT(siglen) : sizebitvec(GETSIGN(key), siglen);

-        sprintf(outbuf, SINGOUTSTR, cnttrue, (int) SIGLENBIT(siglen) - cnttrue);
+            sprintf(outbuf, SINGOUTSTR, cnttrue, (int) SIGLENBIT(siglen) - cnttrue);
+        }

In the false branch of ISALLTRUE(key), why isn't cnttrue always
calculated with sizebitvec()?  It's also not something I would
backpatch.  That's confusing, for sure, but there is also the argument
of keeping a consistent output in the stable branches.
--
Michael

Attachment

Re: BUG #17950: Incorrect memory access in gtsvector_picksplit()

From
Alexander Lakhin
Date:
Hi Mchael,

Thank you for committing the fix!

04.09.2023 09:35, Michael Paquier wrote:
> Regarding the changes in gtsvectorout(), the output produced is indeed
> confusing when ISALLTRUE is set.
>
> -        int            siglen = GETSIGLEN(key);
> -        int            cnttrue = (ISALLTRUE(key)) ? SIGLENBIT(siglen) : sizebitvec(GETSIGN(key), siglen);
> +        if (ISALLTRUE(key))
> +            sprintf(outbuf, "all true bits");
> +        else
> +        {
> +            int            siglen = GETSIGLEN(key);
> +            int            cnttrue = (ISALLTRUE(key)) ? SIGLENBIT(siglen) : sizebitvec(GETSIGN(key), siglen);
>   
> -        sprintf(outbuf, SINGOUTSTR, cnttrue, (int) SIGLENBIT(siglen) - cnttrue);
> +            sprintf(outbuf, SINGOUTSTR, cnttrue, (int) SIGLENBIT(siglen) - cnttrue);
> +        }
>
> In the false branch of ISALLTRUE(key), why isn't cnttrue always
> calculated with sizebitvec()?

Yes, that expression should be simpler there. Please look at the v2 attached.

> It's also not something I would
> backpatch.  That's confusing, for sure, but there is also the argument
> of keeping a consistent output in the stable branches.

I agree. Let's not backpatch it.

Best regards,
Alexander
Attachment

Re: BUG #17950: Incorrect memory access in gtsvector_picksplit()

From
Michael Paquier
Date:
On Mon, Sep 04, 2023 at 05:00:00PM +0300, Alexander Lakhin wrote:
> 04.09.2023 09:35, Michael Paquier wrote:
>> It's also not something I would
>> backpatch.  That's confusing, for sure, but there is also the argument
>> of keeping a consistent output in the stable branches.
>
> I agree. Let's not backpatch it.

Applied on HEAD, then.  This item is now completely closed.
--
Michael

Attachment