Thread: BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit

BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit

From
olaf.gw@googlemail.com
Date:
The following bug has been logged on the website:

Bug reference:      12694
Logged by:          Olaf Gawenda
Email address:      olaf.gw@googlemail.com
PostgreSQL version: 9.4.0
Operating system:   Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7
Description:

the following sequence of commands get a crash if the numer of result rows
is lower than gin_fuzzy_search_limit:

create table test (t text, ts_vec tsvector);

insert into test (t) values (),(),(), ...; -- test data not posted

update test set ts_vec = to_tsvector('english', t);

create index on test using gin(ts_vec);
analyze test;
set enable_seqscan = off;
set gin_fuzzy_search_limit = 1000;

select t from test where ts_vec @@ to_tsquery('english', '...');
On Thu, Jan 29, 2015 at 3:12 AM,  <olaf.gw@googlemail.com> wrote:
> Bug reference:      12694
> Logged by:          Olaf Gawenda
> Email address:      olaf.gw@googlemail.com
> PostgreSQL version: 9.4.0
> Operating system:   Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7
> Description:
>
> the following sequence of commands get a crash if the numer of result rows
> is lower than gin_fuzzy_search_limit:
>
> create table test (t text, ts_vec tsvector);
>
> insert into test (t) values (),(),(), ...; -- test data not posted
>
> update test set ts_vec = to_tsvector('english', t);
>
> create index on test using gin(ts_vec);
> analyze test;
> set enable_seqscan = off;
> set gin_fuzzy_search_limit = 1000;
>
> select t from test where ts_vec @@ to_tsquery('english', '...');

This can be reproduced easily with a test case like that:
create table aa as
select array[(random() * 1000000)::int,
(random() * 1000000)::int,
(random() * 1000000)::int] as a
from generate_series(1,10);
create index aai on aa using gin(a);
set gin_fuzzy_search_limit = 1;
set enable_seqscan = off;
select * from aa where a <@ array[1,2];
--
Michael

Re: BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit

From
Heikki Linnakangas
Date:
On 01/29/2015 03:09 PM, Michael Paquier wrote:
> On Thu, Jan 29, 2015 at 3:12 AM,  <olaf.gw@googlemail.com> wrote:
>> Bug reference:      12694
>> Logged by:          Olaf Gawenda
>> Email address:      olaf.gw@googlemail.com
>> PostgreSQL version: 9.4.0
>> Operating system:   Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7
>> Description:
>>
>> the following sequence of commands get a crash if the numer of result rows
>> is lower than gin_fuzzy_search_limit:
>>
>> create table test (t text, ts_vec tsvector);
>>
>> insert into test (t) values (),(),(), ...; -- test data not posted
>>
>> update test set ts_vec = to_tsvector('english', t);
>>
>> create index on test using gin(ts_vec);
>> analyze test;
>> set enable_seqscan = off;
>> set gin_fuzzy_search_limit = 1000;
>>
>> select t from test where ts_vec @@ to_tsquery('english', '...');
>
> This can be reproduced easily with a test case like that:
> create table aa as
> select array[(random() * 1000000)::int,
> (random() * 1000000)::int,
> (random() * 1000000)::int] as a
> from generate_series(1,10);
> create index aai on aa using gin(a);
> set gin_fuzzy_search_limit = 1;
> set enable_seqscan = off;
> select * from aa where a <@ array[1,2];

The problem is in startScan() function:

>     if (GinFuzzySearchLimit > 0)
>     {
>         /*
>          * If all of keys more than threshold we will try to reduce result, we
>          * hope (and only hope, for intersection operation of array our
>          * supposition isn't true), that total result will not more than
>          * minimal predictNumberResult.
>          */
>
>         for (i = 0; i < so->totalentries; i++)
>             if (so->entries[i]->predictNumberResult <= so->totalentries * GinFuzzySearchLimit)
>                 return;
>
>         for (i = 0; i < so->totalentries; i++)
>             if (so->entries[i]->predictNumberResult > so->totalentries * GinFuzzySearchLimit)
>             {
>                 so->entries[i]->predictNumberResult /= so->totalentries;
>                 so->entries[i]->reduceResult = TRUE;
>             }
>     }
>
>     for (i = 0; i < so->nkeys; i++)
>         startScanKey(ginstate, so, so->keys + i);
> }

If the early return is taken, startScanKey() is not called, and many
fields in the GinScanKey struct are left uninitialized. That causes the
segfault later.

This was not as big a problem before 9.4, because startScanKey() didn't
do very much. It just reset a few fields, which in a new scan were reset
already by ginNewScanKey(). But it is in fact possible to get an
assertion failure on 9.3 too, if the plan contains a re-scan of GIN
scan, and gin_fuzzy_search_limit is set. Attached is a script that does
it. Not sure why, but I'm not seeing a segfault or assert failure on
earlier branches. The plan of the segfaulting query looks identical
between 9.2 and 9.3, so perhaps there have been some changes to the
executor on how and when it calls rescan. Nevertheless, the code looks
just as wrong on earlier branches, so I think it should be fixed all the
way to 9.1 where that early return in startScan() was introduced.

The fix is simple: make sure that startScanKey() is always called, by
getting rid of the early return above. Attached. I'll apply this later
today or tomorrow unless someone sees a problem with this.

- Heikki


Attachment

Re: BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit

From
Heikki Linnakangas
Date:
On 01/29/2015 04:07 PM, Heikki Linnakangas wrote:
> The problem is in startScan() function:
>
>>     if (GinFuzzySearchLimit > 0)
>>     {
>>         /*
>>          * If all of keys more than threshold we will try to reduce result, we
>>          * hope (and only hope, for intersection operation of array our
>>          * supposition isn't true), that total result will not more than
>>          * minimal predictNumberResult.
>>          */
>>
>>         for (i = 0; i < so->totalentries; i++)
>>             if (so->entries[i]->predictNumberResult <= so->totalentries * GinFuzzySearchLimit)
>>                 return;
>>
>>         for (i = 0; i < so->totalentries; i++)
>>             if (so->entries[i]->predictNumberResult > so->totalentries * GinFuzzySearchLimit)
>>             {
>>                 so->entries[i]->predictNumberResult /= so->totalentries;
>>                 so->entries[i]->reduceResult = TRUE;
>>             }
>>     }
>>
>>     for (i = 0; i < so->nkeys; i++)
>>         startScanKey(ginstate, so, so->keys + i);
>> }
>
> If the early return is taken, startScanKey() is not called, and many
> fields in the GinScanKey struct are left uninitialized. That causes the
> segfault later.
>
> This was not as big a problem before 9.4, because startScanKey() didn't
> do very much. It just reset a few fields, which in a new scan were reset
> already by ginNewScanKey(). But it is in fact possible to get an
> assertion failure on 9.3 too, if the plan contains a re-scan of GIN
> scan, and gin_fuzzy_search_limit is set. Attached is a script that does
> it. Not sure why, but I'm not seeing a segfault or assert failure on
> earlier branches. The plan of the segfaulting query looks identical
> between 9.2 and 9.3, so perhaps there have been some changes to the
> executor on how and when it calls rescan.

Never mind, I had some debugging code in place in my 9.3 branch that
caused the assertion failure. Still, it looks broken on 9.3 and below,
too. Either that, or GinNewKey() check in gingetbitmap is always true,
and the startScanKey() calls were unnecessary.

- Heikki
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> The fix is simple: make sure that startScanKey() is always called, by
> getting rid of the early return above. Attached. I'll apply this later
> today or tomorrow unless someone sees a problem with this.

Another, even simpler fix would be to just move the startScanKey()
call loop to before the "if (GinFuzzySearchLimit > 0)" block.
Is there a particular reason why it's a good idea to do things in
the current order?  It almost looks like a patch application error
as it stands.

With either fix, I concur that we should back-patch it.  It's not at all
clear how come older branches don't fail because of this.

            regards, tom lane

Re: BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit

From
Heikki Linnakangas
Date:
On 01/29/2015 05:26 PM, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>> The fix is simple: make sure that startScanKey() is always called, by
>> getting rid of the early return above. Attached. I'll apply this later
>> today or tomorrow unless someone sees a problem with this.
>
> Another, even simpler fix would be to just move the startScanKey()
> call loop to before the "if (GinFuzzySearchLimit > 0)" block.
> Is there a particular reason why it's a good idea to do things in
> the current order?  It almost looks like a patch application error
> as it stands.

The code in startScanKey() uses the predictNumberResult estimates, which
the loop modifies, so it would change how that behaves. I'm not sure if
it would actually be better that way though; it's not clear to me how
the fuzzy search limit should interact with the fast scan code. It won't
affect correctness either way.

The for()-loop sets the so->entries[i]-reduceResult fields, so it makes
sense to do it right after the startScanEntry() calls. Otherwise the
logic would be "1. do some initialization on entries, 2. initialize
keys, 3. do more initialization on entries", which would be a bit strange.

<looks at the loop a bit more>

Hmm, isn't the if-check in the second loop unnecessary? The first loop
checks that so->entries[i]->predictNumberResult > so->totalentries *
GinFuzzySearchLimit for all entries. If there are any where that doesn't
hold, it returns (which is a bad idea). There's no need to check for the
same condition again in the second loop.

- Heikki
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 01/29/2015 05:26 PM, Tom Lane wrote:
>> Another, even simpler fix would be to just move the startScanKey()
>> call loop to before the "if (GinFuzzySearchLimit > 0)" block.
>> Is there a particular reason why it's a good idea to do things in
>> the current order?  It almost looks like a patch application error
>> as it stands.

> The code in startScanKey() uses the predictNumberResult estimates, which
> the loop modifies, so it would change how that behaves.

Oh!  Sorry, I'd missed looking at what the qsort comparator did.

> I'm not sure if
> it would actually be better that way though; it's not clear to me how
> the fuzzy search limit should interact with the fast scan code.

Yeah, it might be better but it's not very clear what the implications
are.  Probably shouldn't touch that as part of an emergency bug fix.

            regards, tom lane

Re: BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit

From
Heikki Linnakangas
Date:
On 01/29/2015 06:04 PM, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>> On 01/29/2015 05:26 PM, Tom Lane wrote:
>> I'm not sure if
>> it would actually be better that way though; it's not clear to me how
>> the fuzzy search limit should interact with the fast scan code.
>
> Yeah, it might be better but it's not very clear what the implications
> are.  Probably shouldn't touch that as part of an emergency bug fix.

Committed and backpatched a minimal fix.

For master and 9.4, I'm thinking of applying the attached. It makes it
clear that startScan() is not used to re-start a scan with existing scan
keys, but is always called on a newly initialized scan keys.

It also plugs the obvious leaking of the arrays. I'll look at the other
memory leaks separately, but this seems appropriate for 9.4.

- Heikki


Attachment
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> For master and 9.4, I'm thinking of applying the attached. It makes it
> clear that startScan() is not used to re-start a scan with existing scan
> keys, but is always called on a newly initialized scan keys.

Looks reasonable to me, but should ginFreeScanKeys() null out the pointers
after freeing them, to be sure we find any incorrect accesses?  It might
not be worth the trouble; but if you have any doubts at all about the
order of operations this seems like a good safety feature.

Also, in the department of nitpicks, I'd do this:

 {
     IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
+    GinScanOpaque so = (GinScanOpaque) scan->opaque;
     TIDBitmap  *tbm = (TIDBitmap *) PG_GETARG_POINTER(1);
     int64        ntids;

more like this:

 {
     IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
     TIDBitmap  *tbm = (TIDBitmap *) PG_GETARG_POINTER(1);
+    GinScanOpaque so = (GinScanOpaque) scan->opaque;
     int64        ntids;

I think of the PG_GETARG calls as being an ugly stepchild of a proper
function header declaration, and as such, they should come first unless
there is an unavoidable reason not to.

            regards, tom lane

Re: BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit

From
Heikki Linnakangas
Date:
On 01/29/2015 08:59 PM, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>> For master and 9.4, I'm thinking of applying the attached. It makes it
>> clear that startScan() is not used to re-start a scan with existing scan
>> keys, but is always called on a newly initialized scan keys.
>
> Looks reasonable to me, but should ginFreeScanKeys() null out the pointers
> after freeing them, to be sure we find any incorrect accesses?  It might
> not be worth the trouble; but if you have any doubts at all about the
> order of operations this seems like a good safety feature.

Nah, I'm not worried about that. ginFreeScanKeys() frees the whole
'keys' array, so we'd have bigger problems if there was a
reference-after-free.

> Also, in the department of nitpicks, I'd do this:
>
>   {
>       IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
> +    GinScanOpaque so = (GinScanOpaque) scan->opaque;
>       TIDBitmap  *tbm = (TIDBitmap *) PG_GETARG_POINTER(1);
>       int64        ntids;
>
> more like this:
>
>   {
>       IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
>       TIDBitmap  *tbm = (TIDBitmap *) PG_GETARG_POINTER(1);
> +    GinScanOpaque so = (GinScanOpaque) scan->opaque;
>       int64        ntids;
>
> I think of the PG_GETARG calls as being an ugly stepchild of a proper
> function header declaration, and as such, they should come first unless
> there is an unavoidable reason not to.

Ok, committed with that fix.

- Heikki