Re: SP-GiST bug. - Mailing list pgsql-hackers

From Tom Lane
Subject Re: SP-GiST bug.
Date
Msg-id 14826.1401316338@sss.pgh.pa.us
Whole thread Raw
In response to SP-GiST bug.  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: SP-GiST bug.
List pgsql-hackers
Teodor Sigaev <teodor@sigaev.ru> writes:
> create table xxx ( t text);

> insert into xxx select 'x' || v::text from generate_series(1, 291) as v;
> insert into xxx  values ('');

> create index xxxidx on xxx using spgist (t);

Fun!

> And postgres will eat memory forever. It seems to me that checkAllTheSame 
> wrongly decides that all tuples are the same. All tuples except tuple to be 
> inserted have the same character 'x' and only new tuple has '\0'.

I don't think that checkAllTheSame is broken, but it probably would be
after this patch --- ISTM you're breaking the corner case described in the
header comment for checkAllTheSame, where we need to force splitting of a
set of existing tuples that the opclass can't distinguish.

What actually is broken, I think, is the spgist text opclass, because it's
using a zero node label for two different situations.  The scenario we're
looking at here is:

1. We call picksplit with all 292 of these entries.  Not surprisingly,
it puts the first 291 into a single node with label 'x' and the last one
into another node with label '\0'.

2. Because this is too many to fit on one index page, checkAllTheSame
decides it had better create an allTheSame tuple for the first 291 and
then try again to insert the empty string into that tuple.  While you
could argue whether this is *necessary*, it is not *incorrect*, so ISTM
the opclass had better cope with it.

3. We call spg_text_choose with the allTheSame tuple and the empty-string
value to be inserted.  It finds the empty-string value doesn't match
(since all the nodes have label 'x') so it requests a split:
    out->resultType = spgSplitTuple;    out->result.splitTuple.prefixHasPrefix = in->hasPrefix;
out->result.splitTuple.prefixPrefixDatum= in->prefixDatum;    out->result.splitTuple.nodeLabel = UInt8GetDatum('\0');
out->result.splitTuple.postfixHasPrefix = false;
 

After splitting, we have a new upper tuple with a single node with label
'\0', pointing to a new allTheSame tuple containing the original 291
entries.

4. We call spg_text_choose with the new upper tuple and the empty-string
value to be inserted.  It looks for a node labeled '\0', and finds it, so
it reports that we should descend into that node --- ie, down to the new
allTheSame tuple.

5. We call spg_text_choose with the new allTheSame tuple and the
empty-string value to be inserted.  This is exactly like the situation
at step 3, so we're in an infinite loop.

It doesn't look to me like the core code has done anything wrong here;
it just did what the opclass requested.  Rather, the problem is we're
using '\0' node label both to represent a "no op" label descent and
to represent "we're past the end of the string".

I'm afraid there may not be any way to fix this without breaking on-disk
compatibility for spgist text indexes :-(.  It seems like we have to
distinguish the case of zero-length string from that of a dummy label
for a pushed-down allTheSame tuple.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Extended Prefetching using Asynchronous IO - proposal and patch
Next
From: andres@anarazel.de (Andres Freund)
Date:
Subject: Re: CREATE REPLICATION SLOT fails on a timeout