Re: SP-GiST bug. - Mailing list pgsql-hackers

From Tom Lane
Subject Re: SP-GiST bug.
Date
Msg-id 15869.1401393098@sss.pgh.pa.us
Whole thread Raw
In response to Re: SP-GiST bug.  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: SP-GiST bug.
List pgsql-hackers
Teodor Sigaev <teodor@sigaev.ru> writes:
>> I don't think that checkAllTheSame is broken, but it probably would be
>> after this patch --- ISTM you're breaking the corner case described in the
>> header comment for checkAllTheSame, where we need to force splitting of a
>> set of existing tuples that the opclass can't distinguish.

> INHO, this patch will not break - because it forces (in corner case) to create a 
> node for new tuple but exclude it from list to insert. On next iteration we will 
> find a needed node with empty list.

> Comment:
>   * If we know that the leaf tuples wouldn't all fit on one page, then we
>   * exclude the last tuple (which is the incoming new tuple that forced a split)
>   * from the check to see if more than one node is used.  The reason for this
>   * is that if the existing tuples are put into only one chain, then even if
>   * we move them all to an empty page, there would still not be room for the
>   * new tuple, so we'd get into an infinite loop of picksplit attempts.

> If we reserve a node for new tuple then on next iteration we will not fall into 
> picksplit again - we will have an empty list. So new tuple will fall into 
> another page.

The point I'm making is that the scenario your test case exposes is not
an infinite loop of picksplits, but an infinite loop of choose calls.
There are certainly ways to get into that loop that don't involve this
specific checkAllTheSame logic.  Here's an example:

regression=# create table xxx ( t text);
CREATE TABLE
regression=# create index xxxidx on xxx using spgist (t);
CREATE INDEX
regression=# insert into xxx select repeat('x',4000);
INSERT 0 1
regression=# insert into xxx select repeat('x',4000);
INSERT 0 1
regression=# insert into xxx select repeat('x',4000);
INSERT 0 1
regression=# insert into xxx select repeat('x',3996);

In this example, we get a picksplit at the third insert, and here we
*must* take the allTheSame path because the values are, in fact, all the
same (the SPGIST_MAX_PREFIX_LENGTH constraint means we can only put 3996
characters into the prefix, and then the 3997'th character of each value
is 'x').  Then the fourth insert triggers this same infinite loop, because
spg_text_choose is asked how to insert the slightly-shorter value into the
allTheSame tuple, and it does the wrong thing.

It's certainly possible that we could/should change what checkAllTheSame
is doing --- on re-reading the comment, I'm not really sure that the
scenario it's concerned about can happen.  However, that will not fix
the bug in spgtextproc.c; it will only block one avenue for reaching
the bug.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: recovery testing for beta
Next
From: Tom Lane
Date:
Subject: Re: recovery testing for beta