Re: BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit
Date
Msg-id 54CA3EB5.3030804@vmware.com
Whole thread Raw
In response to Re: BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Re: BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On 01/29/2015 03:09 PM, Michael Paquier wrote:
> On Thu, Jan 29, 2015 at 3:12 AM,  <olaf.gw@googlemail.com> wrote:
>> Bug reference:      12694
>> Logged by:          Olaf Gawenda
>> Email address:      olaf.gw@googlemail.com
>> PostgreSQL version: 9.4.0
>> Operating system:   Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7
>> Description:
>>
>> the following sequence of commands get a crash if the numer of result rows
>> is lower than gin_fuzzy_search_limit:
>>
>> create table test (t text, ts_vec tsvector);
>>
>> insert into test (t) values (),(),(), ...; -- test data not posted
>>
>> update test set ts_vec = to_tsvector('english', t);
>>
>> create index on test using gin(ts_vec);
>> analyze test;
>> set enable_seqscan = off;
>> set gin_fuzzy_search_limit = 1000;
>>
>> select t from test where ts_vec @@ to_tsquery('english', '...');
>
> This can be reproduced easily with a test case like that:
> create table aa as
> select array[(random() * 1000000)::int,
> (random() * 1000000)::int,
> (random() * 1000000)::int] as a
> from generate_series(1,10);
> create index aai on aa using gin(a);
> set gin_fuzzy_search_limit = 1;
> set enable_seqscan = off;
> select * from aa where a <@ array[1,2];

The problem is in startScan() function:

>     if (GinFuzzySearchLimit > 0)
>     {
>         /*
>          * If all of keys more than threshold we will try to reduce result, we
>          * hope (and only hope, for intersection operation of array our
>          * supposition isn't true), that total result will not more than
>          * minimal predictNumberResult.
>          */
>
>         for (i = 0; i < so->totalentries; i++)
>             if (so->entries[i]->predictNumberResult <= so->totalentries * GinFuzzySearchLimit)
>                 return;
>
>         for (i = 0; i < so->totalentries; i++)
>             if (so->entries[i]->predictNumberResult > so->totalentries * GinFuzzySearchLimit)
>             {
>                 so->entries[i]->predictNumberResult /= so->totalentries;
>                 so->entries[i]->reduceResult = TRUE;
>             }
>     }
>
>     for (i = 0; i < so->nkeys; i++)
>         startScanKey(ginstate, so, so->keys + i);
> }

If the early return is taken, startScanKey() is not called, and many
fields in the GinScanKey struct are left uninitialized. That causes the
segfault later.

This was not as big a problem before 9.4, because startScanKey() didn't
do very much. It just reset a few fields, which in a new scan were reset
already by ginNewScanKey(). But it is in fact possible to get an
assertion failure on 9.3 too, if the plan contains a re-scan of GIN
scan, and gin_fuzzy_search_limit is set. Attached is a script that does
it. Not sure why, but I'm not seeing a segfault or assert failure on
earlier branches. The plan of the segfaulting query looks identical
between 9.2 and 9.3, so perhaps there have been some changes to the
executor on how and when it calls rescan. Nevertheless, the code looks
just as wrong on earlier branches, so I think it should be fixed all the
way to 9.1 where that early return in startScan() was introduced.

The fix is simple: make sure that startScanKey() is always called, by
getting rid of the early return above. Attached. I'll apply this later
today or tomorrow unless someone sees a problem with this.

- Heikki


Attachment

pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit
Next
From: Heikki Linnakangas
Date:
Subject: Re: BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit