issues with range types, btree_gist and constraints - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | issues with range types, btree_gist and constraints |
Date | |
Msg-id | 9c83e4bc23ca2a4a2b1fbb622f3abc1c@fuzzy.cz Whole thread Raw |
Responses |
Re: issues with range types, btree_gist and constraints
|
List | pgsql-hackers |
Hi, I'm having trouble with range types and btree_gist - after some playing I believe it's caused by a bug in how btree_gist handles text columns. All this is on freshly compiled 9.2.2. I'm trying to achieve almost exactly what's described in the second example on http://www.postgresql.org/docs/9.2/interactive/rangetypes.html#RANGETYPES-CONSTRAINT i.e. I maintaining a list of ranges for each ID, except that I'm using text instead of integers for an ID. so the table looks like this: ========================================================================================= CREATE TABLE test ( id TEXT, validity TSRANGE NOT NULL DEFAULT tsrange('-infinity'::timestamp, 'infinity'::timestamp), CONSTRAINT test_validity_check EXCLUDE USING GIST (id WITH =, validity WITH &&) ); ========================================================================================= This table is repeatedly filled with new versions of the data (which were removed from the demo for sake of simplicity), so I've defined a trigger that checks if there's a range with overlapping range, and split the range accordingly. Each record starts with validity=[-infinity, infinity). On the first update this would be split into [-infinity, now()) and [now(), infinity) and so on. This is what the following trigger should do: ========================================================================================= CREATE OR REPLACE FUNCTION test_close() RETURNS trigger AS $$ BEGIN -- close the previous record (set upper bound of the range) UPDATE test SET validity = tsrange(lower(validity), now()::timestamp) WHERE id = NEW.id AND (upper(validity) = 'infinity'::timestamp); -- if there was a preceding record, set the lower bound (otherwise use unbounded range) IF FOUND THEN NEW.validity := tsrange(now()::timestamp, 'infinity'::timestamp); END IF; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER test_close BEFORE INSERT ON test FOR EACH ROW EXECUTE PROCEDURE test_close(); ========================================================================================= To generate the sample data, do this: ========================================================================================= echo "SimpleTestString" > /tmp/data.csv for f in `seq 1 20000`; do echo $f > /tmp/x.log; md5sum /tmp/x.log | awk '{print $1}' >> /tmp/data.csv; done; ========================================================================================= The first line (combination of upper and lower-case letters) is what seems to trigger the behavior. Now load the file into the table repeatedly, and you'll eventually get this error ========================================================================================= db=# copy test(id) from '/tmp/data.csv'; COPY 10001 db=# copy test(id) from '/tmp/data.csv'; COPY 10001 db=# copy test(id) from '/tmp/data.csv'; ERROR: conflicting key value violates exclusion constraint "test_validity_check" DETAIL: Key (id, validity)=(SimpleTestString, ["2013-02-01 23:32:04.329975",infinity)) conflicts with existing key (id, validity)=(SimpleTestString, [-infinity,infinity)). CONTEXT: COPY test, line 1: "SimpleTestString" ========================================================================================= The number of necessary COPY executions varies - what's even stranger is the result of this select once it fails: ========================================================================================= test=# select * from test where id = 'SimpleTestString'; id | validity ------------------+---------------------- SimpleTestString | [-infinity,infinity) SimpleTestString | [-infinity,infinity) (2 rows) ========================================================================================= Yup, there are two overlapping ranges for the same ID. Moreover after disabling bitmap and index scans, the COPY takes much longer but works just fine (including the trigger). Creating a plain b-tree index on the "ID" column seems to fix that too. That leads me to the belief that this is a bug in the GIST indexing, and the variations are probably caused by the index scan kicking in after one of the COPY executions (and reaching some threshold). I'm using en_US.UTF-8 for the database. By replacing the "infinity" with a plain NULL (in the table and trigger), it fails too, but in a slightly different way. For example I'm seeing this after the failure: ========================================================================================= test=# select * from test where id = 'SimpleTest'; id | validity ------------+--------------------------------- SimpleTest | (,"2013-02-02 00:07:07.038324") (1 row) test=# set enable_bitmapscan to off; SET test=# set enable_indexscan to off; SET test=# select * from test where id = 'SimpleTest'; id | validity ------------+--------------------------------- SimpleTest | (,"2013-02-02 00:07:07.038324") SimpleTest | ["2013-02-02 00:07:07.038324",) (2 rows) ========================================================================================= I've been unable to achieve this using a generated sample, therefore prepared sample scripts and CSV files 1) with-infinity.sql + sample-1.csv (this is described in the text above) 2) with-nulls.sql + sample-2.csv (this is the NULL version) available for download at http://www.fuzzy.cz/tmp/samples.tgz (~1MB). kind regards Tomas
pgsql-hackers by date: