Thread: issues with range types, btree_gist and constraints
Hi, I'm having trouble with range types and btree_gist - after some playing I believe it's caused by a bug in how btree_gist handles text columns. All this is on freshly compiled 9.2.2. I'm trying to achieve almost exactly what's described in the second example on http://www.postgresql.org/docs/9.2/interactive/rangetypes.html#RANGETYPES-CONSTRAINT i.e. I maintaining a list of ranges for each ID, except that I'm using text instead of integers for an ID. so the table looks like this: ========================================================================================= CREATE TABLE test ( id TEXT, validity TSRANGE NOT NULL DEFAULT tsrange('-infinity'::timestamp, 'infinity'::timestamp), CONSTRAINT test_validity_check EXCLUDE USING GIST (id WITH =, validity WITH &&) ); ========================================================================================= This table is repeatedly filled with new versions of the data (which were removed from the demo for sake of simplicity), so I've defined a trigger that checks if there's a range with overlapping range, and split the range accordingly. Each record starts with validity=[-infinity, infinity). On the first update this would be split into [-infinity, now()) and [now(), infinity) and so on. This is what the following trigger should do: ========================================================================================= CREATE OR REPLACE FUNCTION test_close() RETURNS trigger AS $$ BEGIN -- close the previous record (set upper bound of the range) UPDATE test SET validity = tsrange(lower(validity), now()::timestamp) WHERE id = NEW.id AND (upper(validity) = 'infinity'::timestamp); -- if there was a preceding record, set the lower bound (otherwise use unbounded range) IF FOUND THEN NEW.validity := tsrange(now()::timestamp, 'infinity'::timestamp); END IF; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER test_close BEFORE INSERT ON test FOR EACH ROW EXECUTE PROCEDURE test_close(); ========================================================================================= To generate the sample data, do this: ========================================================================================= echo "SimpleTestString" > /tmp/data.csv for f in `seq 1 20000`; do echo $f > /tmp/x.log; md5sum /tmp/x.log | awk '{print $1}' >> /tmp/data.csv; done; ========================================================================================= The first line (combination of upper and lower-case letters) is what seems to trigger the behavior. Now load the file into the table repeatedly, and you'll eventually get this error ========================================================================================= db=# copy test(id) from '/tmp/data.csv'; COPY 10001 db=# copy test(id) from '/tmp/data.csv'; COPY 10001 db=# copy test(id) from '/tmp/data.csv'; ERROR: conflicting key value violates exclusion constraint "test_validity_check" DETAIL: Key (id, validity)=(SimpleTestString, ["2013-02-01 23:32:04.329975",infinity)) conflicts with existing key (id, validity)=(SimpleTestString, [-infinity,infinity)). CONTEXT: COPY test, line 1: "SimpleTestString" ========================================================================================= The number of necessary COPY executions varies - what's even stranger is the result of this select once it fails: ========================================================================================= test=# select * from test where id = 'SimpleTestString'; id | validity ------------------+---------------------- SimpleTestString | [-infinity,infinity) SimpleTestString | [-infinity,infinity) (2 rows) ========================================================================================= Yup, there are two overlapping ranges for the same ID. Moreover after disabling bitmap and index scans, the COPY takes much longer but works just fine (including the trigger). Creating a plain b-tree index on the "ID" column seems to fix that too. That leads me to the belief that this is a bug in the GIST indexing, and the variations are probably caused by the index scan kicking in after one of the COPY executions (and reaching some threshold). I'm using en_US.UTF-8 for the database. By replacing the "infinity" with a plain NULL (in the table and trigger), it fails too, but in a slightly different way. For example I'm seeing this after the failure: ========================================================================================= test=# select * from test where id = 'SimpleTest'; id | validity ------------+--------------------------------- SimpleTest | (,"2013-02-02 00:07:07.038324") (1 row) test=# set enable_bitmapscan to off; SET test=# set enable_indexscan to off; SET test=# select * from test where id = 'SimpleTest'; id | validity ------------+--------------------------------- SimpleTest | (,"2013-02-02 00:07:07.038324") SimpleTest | ["2013-02-02 00:07:07.038324",) (2 rows) ========================================================================================= I've been unable to achieve this using a generated sample, therefore prepared sample scripts and CSV files 1) with-infinity.sql + sample-1.csv (this is described in the text above) 2) with-nulls.sql + sample-2.csv (this is the NULL version) available for download at http://www.fuzzy.cz/tmp/samples.tgz (~1MB). kind regards Tomas
Hi, I've managed to further simplify the test-case, and I've verified that it's reproducible on current 9.2 and 9.3 branches. This is the necessary table structure: ----------------------------------------------------------------------- CREATE TABLE test ( id TEXT, valid TSRANGE NOT NULL DEFAULT tsrange(NULL, NULL), CONSTRAINT unique_ids EXCLUDEUSING GIST (id WITH =, valid WITH &&) ); CREATE OR REPLACE FUNCTION skip_existing() RETURNS trigger AS $$ DECLARE v_exists BOOLEAN; BEGIN SELECT TRUE INTO v_exists FROM test WHERE id = NEW.id; IF v_exists THEN RETURN NULL; END IF; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER skip_existing BEFORE INSERT ON test FOR EACH ROW EXECUTE PROCEDURE skip_existing(); ----------------------------------------------------------------------- I've been unable to reproduce the bug with just a single text column. The trigger simply skips existing records without throwing any error. Now let's insert some data into the table - I'll use the same samples as in the previous test-case (http://www.fuzzy.cz/tmp/samples.tgz). test=# copy test(id) from '/tmp/sample-1.csv'; COPY 20001 test=# copy test(id) from '/tmp/sample-2.csv'; COPY 18590 test=# copy test(id) from '/tmp/sample-1.csv'; COPY 25 test=# copy test(id) from '/tmp/sample-2.csv'; COPY 45 The last two results are really suspicious - it means that some record were inserted "again", so let's verify that: test=# select id, count(*) from test group by id having count(*) > 1; id | count ----------------------------------+-------0aab4791e1e41f62fd8452ae2c854a34 | 20aa08441cd4526b972bb3451d9f8e4ea | 20ab969a3333342837484ec0f81bf1e03 | 20aea0c33c76b1fe5d123b18cec184dc0 | 20af75a99b37be6dde08afaa69de36d29 | 20af80d2c2931756b897b3ca5a0055820| 2... many more ... On 9.3 the number of duplicates is much lower, and it's not stable - on one run I get 6, on the very next one I get 2, then 5 and so on. That leads me to a suspicion that it might be an uninitialized variable or something like that. Tomas
Tomas Vondra <tv@fuzzy.cz> writes: > I've managed to further simplify the test-case, and I've verified that > it's reproducible on current 9.2 and 9.3 branches. Yeah, I've been able to reproduce it as well. I found what I thought might be the bug: gbt_var_bin_union() looks like it does the wrong thing if both ends of the range need to be extended. However, fixing that doesn't fix your test case, so there's more to it. > I've been unable to reproduce the bug with just a single text column. Hm. I noticed a lot of elog complaints about "picksplit method ... doesn't support secondary split", so maybe the bug is in the code that's trying to deal with that. I've not had time to look closer. regards, tom lane
Tomas Vondra <tv@fuzzy.cz> writes: > I've managed to further simplify the test-case, and I've verified that > it's reproducible on current 9.2 and 9.3 branches. I'm not sure why you're getting unstable behavior --- it's pretty stable for me, at least in a debug build. What I'm finding is that gistsplit.c appears to contain multiple bugs. If you just insert the first 271 rows of sample-1.csv, the 271'st row results in a gist page split. The split results in two pages having these root-page downlink entries: Item 2 -- Length: 88 Offset: 8000 (0x1f40) Flags: NORMAL Block Id: 2 linp Index: 65535 Size: 88 Has Nulls: 0 HasVarwidths: 1 1f40: 00000200 ffff5840 93900000 00373637 ......X@.....767 1f50: 61626536 31313961 30313834 34366263 abe6119a018446bc1f60: 64373632 37666638 61346632 64900000 d7627ff8a4f2d... 1f70: 00626465 61613234 38323966 31626365 .bdeaa24829f1bce1f80: 36343561 39313463 38386665 61613739 645a914c88feaa79 1f90: 340d440f 00001800 4.D..... (ie, range 767abe6119a018446bcd7627ff8a4f2d..bdeaa24829f1bce645a914c88feaa794) Item 3 -- Length: 72 Offset: 7928 (0x1ef8) Flags: NORMAL Block Id: 3 linp Index: 65535 Size: 72 Has Nulls: 0 HasVarwidths: 1 1ef8: 00000300 ffff4840 73900000 00626538 ......H@s....be8 1f08: 30333261 33656138 31326565 62323838 032a3ea812eeb2881f18: 64663361 65636161 30666361 34500000 df3aecaa0fca4P.. 1f28: 0053696d 706c6554 65737453 7472696e .SimpleTestStrin1f38: 670d440f 00001800 g.D..... (ie, range be8032a3ea812eeb288df3aecaa0fca4..SimpleTestString) which looks fine .. but the leaf entry for SimpleTestString went into the first of these two pages, which is 100% wrong. The split produced by contrib/btree_gist is perfectly fine, and it assigns SimpleTestString to the right-hand page as expected. The bug is being introduced in lines 435..480 of gistsplit.c, which is code that doesn't fire unless we're working on a non-last column of the index (which is why you couldn't reproduce this in a 1-column index). It seems that (1) gistfindgroup decides that SimpleTestString is "equiv" to something. It's not too clear what; for sure there is no other leaf key of the same value. The code in gistfindgroup looks a bit like it ought to think that all the items are "equiv" to one of the union datums or the other, but that's not the case --- no other item gets marked. (After further investigation it seems that gbt_var_penalty is returning a negative penalty at the critical spot here, which might have something to do with it --- the business with the tmp[4] array seems many degrees away from trustworthy.) (2) cleanupOffsets removes the item for SimpleTestString from sv->spl_right[], evidently because it's "equiv". (3) placeOne() sticks the entry into the spl_left[] array, which is an absolutely horrid decision: it means the left-side page will now need a range spanning the right-side page's range, when a moment ago they had been disjoint. (4) gistunionsubkey(), which apparently is supposed to fix the union datums to reflect this rearrangement, does no such thing because it only processes the index columns to the right of the one where the damage has been done. (It's not clear to me that it could ever be allowed to skip any columns, but that's what it's doing.) If I don't hear a commitment from Teodor to fix this, I'm going to rip out pretty much all of this logic as being under-documented and overly-buggy. regards, tom lane
I wrote: > Tomas Vondra <tv@fuzzy.cz> writes: >> I've managed to further simplify the test-case, and I've verified that >> it's reproducible on current 9.2 and 9.3 branches. > It seems that > (1) gistfindgroup decides that SimpleTestString is "equiv" to something. > It's not too clear what; for sure there is no other leaf key of the same > value. The code in gistfindgroup looks a bit like it ought to think > that all the items are "equiv" to one of the union datums or the other, > but that's not the case --- no other item gets marked. After looking closer at this, I see that it's marking left-side tuples as "equiv" if they could be put in the right-side page for zero penalty, and similarly marking right-side tuples as "equiv" if they could be put in the left-side page for zero penalty. IOW it's finding the tuples for which it doesn't matter which side they go to. So this is not quite as insane as I thought, although the name "equiv" for the flag does not seem like le mot juste, and the function name and comment are rather misleading IMO. > (After further > investigation it seems that gbt_var_penalty is returning a negative > penalty at the critical spot here, which might have something to > do with it --- the business with the tmp[4] array seems many degrees > away from trustworthy.) This is a bug, but the core gist code ought not generate invalid indexes just because the type-specific penalty function is doing something stupid. > (2) cleanupOffsets removes the item for SimpleTestString from > sv->spl_right[], evidently because it's "equiv". > (3) placeOne() sticks the entry into the spl_left[] array, which is > an absolutely horrid decision: it means the left-side page will now > need a range spanning the right-side page's range, when a moment > ago they had been disjoint. The above behaviors are fine, really, because they are driven by the penalty function's claim that there's zero/negative cost to moving this tuple to the other side. > (4) gistunionsubkey(), which apparently is supposed to fix the union > datums to reflect this rearrangement, does no such thing because it > only processes the index columns to the right of the one where the > damage has been done. (It's not clear to me that it could ever > be allowed to skip any columns, but that's what it's doing.) This is a bug. It's also a bug that gistunionsubkey() fails to null out the existing union keys for each side before calling gistMakeUnionItVec --- that can result in the recomputed union keys being larger than necessary. Furthermore, there's also a bug in gbt_var_bin_union(), the same one I saw previously that it does the wrong thing when both sides of the existing range need to be enlarged. The attached patch fixes these things, but not the buggy penalty function, because we ought to try to verify that these changes are enough to prevent creation of an incorrect index. I can't reproduce any misbehavior anymore with this patch applied. However, I'm troubled by your report that the behavior is unstable, because I get the same result each time if I start from an empty (truncated) table, with or without the patch. You may be seeing some further bug(s). Could you test this fix against your own test cases? There's a lot of other things I don't much like in gistsplit.c, but they're in the nature of cosmetic and documentation fixes. regards, tom lane
Attachment
On 7.2.2013 01:10, Tom Lane wrote: > The attached patch fixes these things, but not the buggy penalty > function, because we ought to try to verify that these changes are > enough to prevent creation of an incorrect index. I can't reproduce any > misbehavior anymore with this patch applied. However, I'm troubled by I can't reproduce any of the issues anymore - I've tested both 9.2 and 9.3 branches (both containing the already commited fixes). And not only that the issues seem to be gone, but I'm actually getting significantly better performance. For example this is the old code: test=# copy test(id) from '/home/tomas/sample-1.csv'; COPY 20001 Time: 17802,500 ms test=# analyze; ANALYZE Time: 122,192 ms test=# copy test(id) from '/home/tomas/sample-2.csv'; COPY 18590 Time: 81253,104 ms test=# copy test(id) from '/home/tomas/sample-1.csv'; COPY 7 Time: 11008,737 ms test=# copy test(id) from '/home/tomas/sample-2.csv'; COPY 1 Time: 21710,315 ms test=# and this is the new code: test=# copy test(id) from '/home/tomas/sample-1.csv'; COPY 20001 Time: 2196,802 ms test=# analyze; ANALYZE Time: 113,598 ms test=# copy test(id) from '/home/tomas/sample-2.csv'; COPY 18590 Time: 2369,028 ms test=# copy test(id) from '/home/tomas/sample-1.csv'; COPY 0 Time: 832,910 ms test=# copy test(id) from '/home/tomas/sample-2.csv'; COPY 0 Time: 778,241 ms That's more than 10x faster in all cases (and about 30x faster in some of them). I've also tested both branches (9.2 and 9.3) with the provided patch, and I can't reproduce any of the errors, but the performance seems to be equal to the old code. So it seems that the performance boost is due to the changes to the penalty function. Nice! > your report that the behavior is unstable, because I get the same result > each time if I start from an empty (truncated) table, with or without > the patch. You may be seeing some further bug(s). Could you test this > fix against your own test cases? This is what I meant by unstability: test=# truncate test; TRUNCATE TABLE test=# copy test(id) from '/home/tomas/sample-1.csv'; COPY 20001 test=# copy test(id) from '/home/tomas/sample-2.csv'; COPY 18590 test=# copy test(id) from '/home/tomas/sample-1.csv'; COPY 1 test=# copy test(id) from '/home/tomas/sample-2.csv'; COPY 0 test=# copy test(id) from '/home/tomas/sample-1.csv'; COPY 0 test=# truncate test; TRUNCATE TABLE test=# copy test(id) from '/home/tomas/sample-1.csv'; COPY 20001 test=# copy test(id) from '/home/tomas/sample-2.csv'; COPY 18590 test=# copy test(id) from '/home/tomas/sample-1.csv'; COPY 4 test=# copy test(id) from '/home/tomas/sample-2.csv'; COPY 0 test=# truncate test; TRUNCATE TABLE test=# copy test(id) from '/home/tomas/sample-1.csv'; COPY 20001 test=# copy test(id) from '/home/tomas/sample-2.csv'; COPY 18590 test=# copy test(id) from '/home/tomas/sample-1.csv'; COPY 3 test=# copy test(id) from '/home/tomas/sample-2.csv'; COPY 0 Notice the numbers change all the time. But I can't really reproduce these with the current HEAD (nor with the patch), so I assume these were caused by switching plans at different times. Seems fixed to me. Tomas
Tomas Vondra <tv@fuzzy.cz> writes: > I can't reproduce any of the issues anymore - I've tested both 9.2 and > 9.3 branches (both containing the already commited fixes). And not only > that the issues seem to be gone, but I'm actually getting significantly > better performance. Cool. I noticed that it seemed faster too, but hadn't tried to quantify that. > I've also tested both branches (9.2 and 9.3) with the provided patch, > and I can't reproduce any of the errors, but the performance seems to be > equal to the old code. So it seems that the performance boost is due to > the changes to the penalty function. Nice! Yeah, the old penalty function was pretty badly broken with any non-C sort order. >> your report that the behavior is unstable, because I get the same result >> each time if I start from an empty (truncated) table, with or without >> the patch. You may be seeing some further bug(s). Could you test this >> fix against your own test cases? > This is what I meant by unstability: > .... > Notice the numbers change all the time. [ scratches head... ] In HEAD, that's not so surprising because of commit ba1cc6501, which added some randomness to choices about which GiST page to insert new entries to (which could then affect whether the union-calculation bugs got exercised). If you saw that in older branches, though, it might merit some further investigation. regards, tom lane