Thread: issues with range types, btree_gist and constraints

issues with range types, btree_gist and constraints

From
Tomas Vondra
Date:
Hi,

I'm having trouble with range types and btree_gist - after some playing
I believe it's
caused by a bug in how btree_gist handles text columns. All this is on
freshly compiled
9.2.2.

I'm trying to achieve almost exactly what's described in the second
example on


http://www.postgresql.org/docs/9.2/interactive/rangetypes.html#RANGETYPES-CONSTRAINT

i.e. I maintaining a list of ranges for each ID, except that I'm using
text instead of
integers for an ID. so the table looks like this:

=========================================================================================
CREATE TABLE test (
     id        TEXT,
     validity  TSRANGE NOT NULL DEFAULT tsrange('-infinity'::timestamp,
'infinity'::timestamp),
     CONSTRAINT test_validity_check EXCLUDE USING GIST (id WITH =,
validity WITH &&)
);
=========================================================================================

This table is repeatedly filled with new versions of the data (which
were removed from
the demo for sake of simplicity), so I've defined a trigger that checks
if there's a
range with overlapping range, and split the range accordingly.

Each record starts with validity=[-infinity, infinity). On the first
update this would
be split into [-infinity, now()) and [now(), infinity) and so on. This
is what the following
trigger should do:

=========================================================================================
CREATE OR REPLACE FUNCTION test_close() RETURNS trigger AS $$
BEGIN

     -- close the previous record (set upper bound of the range)
     UPDATE test SET validity = tsrange(lower(validity),
now()::timestamp)
      WHERE id = NEW.id AND (upper(validity) = 'infinity'::timestamp);

     -- if there was a preceding record, set the lower bound (otherwise
use unbounded range)
     IF FOUND THEN
         NEW.validity := tsrange(now()::timestamp,
'infinity'::timestamp);
     END IF;

     RETURN NEW;

END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER test_close BEFORE INSERT ON test FOR EACH ROW EXECUTE
PROCEDURE test_close();
=========================================================================================

To generate the sample data, do this:

=========================================================================================
     echo "SimpleTestString" > /tmp/data.csv
     for f in `seq 1 20000`; do
         echo $f > /tmp/x.log;
         md5sum /tmp/x.log | awk '{print $1}' >> /tmp/data.csv;
     done;
=========================================================================================

The first line (combination of upper and lower-case letters) is what
seems to trigger the
behavior. Now load the file into the table repeatedly, and you'll
eventually get this error

=========================================================================================
db=# copy test(id) from '/tmp/data.csv';
COPY 10001
db=# copy test(id) from '/tmp/data.csv';
COPY 10001
db=# copy test(id) from '/tmp/data.csv';
ERROR:  conflicting key value violates exclusion constraint
"test_validity_check"
DETAIL:  Key (id, validity)=(SimpleTestString, ["2013-02-01
23:32:04.329975",infinity))
          conflicts with existing key (id, validity)=(SimpleTestString,
[-infinity,infinity)).
CONTEXT:  COPY test, line 1: "SimpleTestString"
=========================================================================================

The number of necessary COPY executions varies - what's even stranger
is the result of
this select once it fails:

=========================================================================================
test=# select * from test where id = 'SimpleTestString';
         id        |       validity
------------------+----------------------
  SimpleTestString | [-infinity,infinity)
  SimpleTestString | [-infinity,infinity)
(2 rows)
=========================================================================================

Yup, there are two overlapping ranges for the same ID. Moreover after
disabling bitmap and
index scans, the COPY takes much longer but works just fine (including
the trigger).
Creating a plain b-tree index on the "ID" column seems to fix that too.

That leads me to the belief that this is a bug in the GIST indexing,
and the variations
are probably caused by the index scan kicking in after one of the COPY
executions (and
reaching some threshold). I'm using en_US.UTF-8 for the database.

By replacing the "infinity" with a plain NULL (in the table and
trigger), it fails too,
but in a slightly different way. For example I'm seeing this after the
failure:

=========================================================================================
test=# select * from test where id = 'SimpleTest';
      id     |            validity
------------+---------------------------------
  SimpleTest | (,"2013-02-02 00:07:07.038324")
(1 row)

test=# set enable_bitmapscan to off;
SET
test=# set enable_indexscan to off;
SET
test=# select * from test where id = 'SimpleTest';
      id     |            validity
------------+---------------------------------
  SimpleTest | (,"2013-02-02 00:07:07.038324")
  SimpleTest | ["2013-02-02 00:07:07.038324",)
(2 rows)
=========================================================================================

I've been unable to achieve this using a generated sample, therefore
prepared sample
scripts and CSV files

   1) with-infinity.sql + sample-1.csv (this is described in the text
above)
   2) with-nulls.sql + sample-2.csv (this is the NULL version)

available for download at http://www.fuzzy.cz/tmp/samples.tgz (~1MB).

kind regards
Tomas



Re: issues with range types, btree_gist and constraints

From
Tomas Vondra
Date:
Hi,

I've managed to further simplify the test-case, and I've verified that
it's reproducible on current 9.2 and 9.3 branches.

This is the necessary table structure:

-----------------------------------------------------------------------
CREATE TABLE test (   id        TEXT,   valid     TSRANGE NOT NULL DEFAULT tsrange(NULL, NULL),   CONSTRAINT unique_ids
EXCLUDEUSING GIST (id WITH =, valid WITH &&)
 
);

CREATE OR REPLACE FUNCTION skip_existing() RETURNS trigger AS $$
DECLARE   v_exists BOOLEAN;
BEGIN
   SELECT TRUE INTO v_exists FROM test WHERE id = NEW.id;   IF v_exists THEN       RETURN NULL;   END IF;
   RETURN NEW;

END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER skip_existing BEFORE INSERT ON test FOR EACH ROW EXECUTE
PROCEDURE skip_existing();
-----------------------------------------------------------------------

I've been unable to reproduce the bug with just a single text column.

The trigger simply skips existing records without throwing any error.

Now let's insert some data into the table - I'll use the same samples as
in the previous test-case (http://www.fuzzy.cz/tmp/samples.tgz).


test=# copy test(id) from '/tmp/sample-1.csv';
COPY 20001
test=# copy test(id) from '/tmp/sample-2.csv';
COPY 18590
test=# copy test(id) from '/tmp/sample-1.csv';
COPY 25
test=# copy test(id) from '/tmp/sample-2.csv';
COPY 45

The last two results are really suspicious - it means that some record
were inserted "again", so let's verify that:

test=# select id, count(*) from test group by id having count(*) > 1;               id                | count
----------------------------------+-------0aab4791e1e41f62fd8452ae2c854a34 |     20aa08441cd4526b972bb3451d9f8e4ea |
20ab969a3333342837484ec0f81bf1e03 |     20aea0c33c76b1fe5d123b18cec184dc0 |     20af75a99b37be6dde08afaa69de36d29 |
20af80d2c2931756b897b3ca5a0055820|     2... many more ...
 

On 9.3 the number of duplicates is much lower, and it's not stable - on
one run I get 6, on the very next one I get 2, then 5 and so on. That
leads me to a suspicion that it might be an uninitialized variable or
something like that.

Tomas



Re: issues with range types, btree_gist and constraints

From
Tom Lane
Date:
Tomas Vondra <tv@fuzzy.cz> writes:
> I've managed to further simplify the test-case, and I've verified that
> it's reproducible on current 9.2 and 9.3 branches.

Yeah, I've been able to reproduce it as well.  I found what I thought
might be the bug: gbt_var_bin_union() looks like it does the wrong thing
if both ends of the range need to be extended.  However, fixing that
doesn't fix your test case, so there's more to it.

> I've been unable to reproduce the bug with just a single text column.

Hm.  I noticed a lot of elog complaints about "picksplit method
... doesn't support secondary split", so maybe the bug is in the code
that's trying to deal with that.  I've not had time to look closer.
        regards, tom lane



Re: issues with range types, btree_gist and constraints

From
Tom Lane
Date:
Tomas Vondra <tv@fuzzy.cz> writes:
> I've managed to further simplify the test-case, and I've verified that
> it's reproducible on current 9.2 and 9.3 branches.

I'm not sure why you're getting unstable behavior --- it's pretty stable
for me, at least in a debug build.  What I'm finding is that gistsplit.c
appears to contain multiple bugs.  If you just insert the first 271 rows
of sample-1.csv, the 271'st row results in a gist page split.  The split
results in two pages having these root-page downlink entries:
Item   2 -- Length:   88  Offset: 8000 (0x1f40)  Flags: NORMAL Block Id: 2  linp Index: 65535  Size: 88 Has Nulls: 0
HasVarwidths: 1
 
 1f40: 00000200 ffff5840 93900000 00373637  ......X@.....767 1f50: 61626536 31313961 30313834 34366263
abe6119a018446bc1f60: 64373632 37666638 61346632 64900000  d7627ff8a4f2d... 1f70: 00626465 61613234 38323966 31626365
.bdeaa24829f1bce1f80: 36343561 39313463 38386665 61613739  645a914c88feaa79 1f90: 340d440f 00001800
4.D.....       
 

(ie, range 767abe6119a018446bcd7627ff8a4f2d..bdeaa24829f1bce645a914c88feaa794)
Item   3 -- Length:   72  Offset: 7928 (0x1ef8)  Flags: NORMAL Block Id: 3  linp Index: 65535  Size: 72 Has Nulls: 0
HasVarwidths: 1
 
 1ef8: 00000300 ffff4840 73900000 00626538  ......H@s....be8 1f08: 30333261 33656138 31326565 62323838
032a3ea812eeb2881f18: 64663361 65636161 30666361 34500000  df3aecaa0fca4P.. 1f28: 0053696d 706c6554 65737453 7472696e
.SimpleTestStrin1f38: 670d440f 00001800                    g.D.....        
 

(ie, range be8032a3ea812eeb288df3aecaa0fca4..SimpleTestString)

which looks fine .. but the leaf entry for SimpleTestString went into
the first of these two pages, which is 100% wrong.

The split produced by contrib/btree_gist is perfectly fine, and it
assigns SimpleTestString to the right-hand page as expected.  The
bug is being introduced in lines 435..480 of gistsplit.c, which
is code that doesn't fire unless we're working on a non-last
column of the index (which is why you couldn't reproduce this in
a 1-column index).  It seems that

(1) gistfindgroup decides that SimpleTestString is "equiv" to something.
It's not too clear what; for sure there is no other leaf key of the same
value.  The code in gistfindgroup looks a bit like it ought to think
that all the items are "equiv" to one of the union datums or the other,
but that's not the case --- no other item gets marked.  (After further
investigation it seems that gbt_var_penalty is returning a negative
penalty at the critical spot here, which might have something to
do with it --- the business with the tmp[4] array seems many degrees
away from trustworthy.)

(2) cleanupOffsets removes the item for SimpleTestString from
sv->spl_right[], evidently because it's "equiv".

(3) placeOne() sticks the entry into the spl_left[] array, which is
an absolutely horrid decision: it means the left-side page will now
need a range spanning the right-side page's range, when a moment
ago they had been disjoint.

(4) gistunionsubkey(), which apparently is supposed to fix the union
datums to reflect this rearrangement, does no such thing because it
only processes the index columns to the right of the one where the
damage has been done.  (It's not clear to me that it could ever
be allowed to skip any columns, but that's what it's doing.)

If I don't hear a commitment from Teodor to fix this, I'm going to
rip out pretty much all of this logic as being under-documented
and overly-buggy.
        regards, tom lane



Re: issues with range types, btree_gist and constraints

From
Tom Lane
Date:
I wrote:
> Tomas Vondra <tv@fuzzy.cz> writes:
>> I've managed to further simplify the test-case, and I've verified that
>> it's reproducible on current 9.2 and 9.3 branches.

> It seems that

> (1) gistfindgroup decides that SimpleTestString is "equiv" to something.
> It's not too clear what; for sure there is no other leaf key of the same
> value.  The code in gistfindgroup looks a bit like it ought to think
> that all the items are "equiv" to one of the union datums or the other,
> but that's not the case --- no other item gets marked.

After looking closer at this, I see that it's marking left-side tuples
as "equiv" if they could be put in the right-side page for zero penalty,
and similarly marking right-side tuples as "equiv" if they could be put
in the left-side page for zero penalty.  IOW it's finding the tuples for
which it doesn't matter which side they go to.  So this is not quite as
insane as I thought, although the name "equiv" for the flag does not
seem like le mot juste, and the function name and comment are rather
misleading IMO.

> (After further
> investigation it seems that gbt_var_penalty is returning a negative
> penalty at the critical spot here, which might have something to
> do with it --- the business with the tmp[4] array seems many degrees
> away from trustworthy.)

This is a bug, but the core gist code ought not generate invalid indexes
just because the type-specific penalty function is doing something
stupid.

> (2) cleanupOffsets removes the item for SimpleTestString from
> sv->spl_right[], evidently because it's "equiv".

> (3) placeOne() sticks the entry into the spl_left[] array, which is
> an absolutely horrid decision: it means the left-side page will now
> need a range spanning the right-side page's range, when a moment
> ago they had been disjoint.

The above behaviors are fine, really, because they are driven by the
penalty function's claim that there's zero/negative cost to moving this
tuple to the other side.

> (4) gistunionsubkey(), which apparently is supposed to fix the union
> datums to reflect this rearrangement, does no such thing because it
> only processes the index columns to the right of the one where the
> damage has been done.  (It's not clear to me that it could ever
> be allowed to skip any columns, but that's what it's doing.)

This is a bug.  It's also a bug that gistunionsubkey() fails to null out
the existing union keys for each side before calling gistMakeUnionItVec
--- that can result in the recomputed union keys being larger than
necessary.

Furthermore, there's also a bug in gbt_var_bin_union(), the same one
I saw previously that it does the wrong thing when both sides of the
existing range need to be enlarged.

The attached patch fixes these things, but not the buggy penalty
function, because we ought to try to verify that these changes are
enough to prevent creation of an incorrect index.  I can't reproduce any
misbehavior anymore with this patch applied.  However, I'm troubled by
your report that the behavior is unstable, because I get the same result
each time if I start from an empty (truncated) table, with or without
the patch.  You may be seeing some further bug(s).  Could you test this
fix against your own test cases?

There's a lot of other things I don't much like in gistsplit.c, but
they're in the nature of cosmetic and documentation fixes.

            regards, tom lane


Attachment

Re: issues with range types, btree_gist and constraints

From
Tomas Vondra
Date:
On 7.2.2013 01:10, Tom Lane wrote:
> The attached patch fixes these things, but not the buggy penalty
> function, because we ought to try to verify that these changes are
> enough to prevent creation of an incorrect index.  I can't reproduce any
> misbehavior anymore with this patch applied.  However, I'm troubled by

I can't reproduce any of the issues anymore - I've tested both 9.2 and
9.3 branches (both containing the already commited fixes). And not only
that the issues seem to be gone, but I'm actually getting significantly
better performance. For example this is the old code:

test=# copy test(id) from '/home/tomas/sample-1.csv';
COPY 20001
Time: 17802,500 ms
test=# analyze;
ANALYZE
Time: 122,192 ms
test=# copy test(id) from '/home/tomas/sample-2.csv';
COPY 18590
Time: 81253,104 ms
test=# copy test(id) from '/home/tomas/sample-1.csv';
COPY 7
Time: 11008,737 ms
test=# copy test(id) from '/home/tomas/sample-2.csv';
COPY 1
Time: 21710,315 ms
test=#

and this is the new code:

test=# copy test(id) from '/home/tomas/sample-1.csv';
COPY 20001
Time: 2196,802 ms
test=# analyze;
ANALYZE
Time: 113,598 ms
test=# copy test(id) from '/home/tomas/sample-2.csv';
COPY 18590
Time: 2369,028 ms
test=# copy test(id) from '/home/tomas/sample-1.csv';
COPY 0
Time: 832,910 ms
test=# copy test(id) from '/home/tomas/sample-2.csv';
COPY 0
Time: 778,241 ms

That's more than 10x faster in all cases (and about 30x faster in some
of them).

I've also tested both branches (9.2 and 9.3) with the provided patch,
and I can't reproduce any of the errors, but the performance seems to be
equal to the old code. So it seems that the performance boost is due to
the changes to the penalty function. Nice!

> your report that the behavior is unstable, because I get the same result
> each time if I start from an empty (truncated) table, with or without
> the patch.  You may be seeing some further bug(s).  Could you test this
> fix against your own test cases?

This is what I meant by unstability:

test=# truncate test;
TRUNCATE TABLE
test=# copy test(id) from '/home/tomas/sample-1.csv';
COPY 20001
test=# copy test(id) from '/home/tomas/sample-2.csv';
COPY 18590
test=# copy test(id) from '/home/tomas/sample-1.csv';
COPY 1
test=# copy test(id) from '/home/tomas/sample-2.csv';
COPY 0
test=# copy test(id) from '/home/tomas/sample-1.csv';
COPY 0
test=# truncate test;
TRUNCATE TABLE
test=# copy test(id) from '/home/tomas/sample-1.csv';
COPY 20001
test=# copy test(id) from '/home/tomas/sample-2.csv';
COPY 18590
test=# copy test(id) from '/home/tomas/sample-1.csv';
COPY 4
test=# copy test(id) from '/home/tomas/sample-2.csv';
COPY 0
test=# truncate test;
TRUNCATE TABLE
test=# copy test(id) from '/home/tomas/sample-1.csv';
COPY 20001
test=# copy test(id) from '/home/tomas/sample-2.csv';
COPY 18590
test=# copy test(id) from '/home/tomas/sample-1.csv';
COPY 3
test=# copy test(id) from '/home/tomas/sample-2.csv';
COPY 0

Notice the numbers change all the time. But I can't really reproduce
these with the current HEAD (nor with the patch), so I assume these were
caused by switching plans at different times.

Seems fixed to me.


Tomas



Re: issues with range types, btree_gist and constraints

From
Tom Lane
Date:
Tomas Vondra <tv@fuzzy.cz> writes:
> I can't reproduce any of the issues anymore - I've tested both 9.2 and
> 9.3 branches (both containing the already commited fixes). And not only
> that the issues seem to be gone, but I'm actually getting significantly
> better performance.

Cool.  I noticed that it seemed faster too, but hadn't tried to quantify
that.

> I've also tested both branches (9.2 and 9.3) with the provided patch,
> and I can't reproduce any of the errors, but the performance seems to be
> equal to the old code. So it seems that the performance boost is due to
> the changes to the penalty function. Nice!

Yeah, the old penalty function was pretty badly broken with any non-C
sort order.

>> your report that the behavior is unstable, because I get the same result
>> each time if I start from an empty (truncated) table, with or without
>> the patch.  You may be seeing some further bug(s).  Could you test this
>> fix against your own test cases?

> This is what I meant by unstability:
> ....
> Notice the numbers change all the time.

[ scratches head... ]  In HEAD, that's not so surprising because of
commit ba1cc6501, which added some randomness to choices about which
GiST page to insert new entries to (which could then affect whether the
union-calculation bugs got exercised).  If you saw that in older
branches, though, it might merit some further investigation.
        regards, tom lane