Re: SP-GiST bug. - Mailing list pgsql-hackers

From Tom Lane
Subject Re: SP-GiST bug.
Date
Msg-id 18153.1401817060@sss.pgh.pa.us
Whole thread Raw
In response to Re: SP-GiST bug.  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: SP-GiST bug.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Teodor Sigaev <teodor@sigaev.ru> writes:
>> The point I'm making is that the scenario your test case exposes is not
>> an infinite loop of picksplits, but an infinite loop of choose calls.

> Thank you, now I see what I missed before. After some brainstorming, it's 
> possible to use '\0' only as end of string marker.  The idea is: when we split 
> allthesame tuple to upper/lower we copy node label to upper tuple instead of set 
> it to '\0'. Node labels of lower tuple will be unchanged but should be ignored 
> for reconstructionValue - and they are ignored because of allTheSame flag.  See 
> attached patch.

I don't believe this patch works at all.  In particular, for an allTheSame
node occurring at the root level, omitting the nodeChar from the
reconstructed value is certainly wrong since there was no parent that
could have had a label containing the same character.  More generally, the
patch is assuming that any allTheSame tuple is one that is underneath a
split node; but that's surely wrong (where did such a tuple come from
before the split happened?).

If the root case were the only possible one then we could fix the problem
by adding a test on whether level == 0, but AFAICS, allTheSame tuples can
also get created at lower tree levels.  All you need is a bunch of strings
that are duplicates for more than the maximum prefix length.

I think what we have to do is use a different dummy value for the node
label of a pushed-down allTheSame tuple than we do for end-of-string.
This requires widening the node labels from uint8 to (at least) int16.
However, I think that's essentially free because pass-by-value node
labels are stored as Datums anyway.  In fact, it might even be upward
compatible, at least for cases that aren't broken today.

> I rewrited a patch to fix missed way - allTheSame result of picksplit and tooBig 
> is set. I believe this patch is still needed because it could make a tree more 
> efficient as it was demonstrated for quad tree.

The question here continues to be how sure we are that the case described
in checkAllTheSame's header comment can't actually occur.  We certainly
thought it could when we originally wrote this stuff.  I think possibly
we were wrong, but I'd like to see a clear proof of that before we discuss
dropping the logic that's meant to cope with the situation.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_basebackup failed to back up large file
Next
From: Andres Freund
Date:
Subject: Hide 'Execution time' in EXPLAIN (COSTS OFF)