Thread: Exclude constraint problem
I'm relatively unsure where and how to proceed. Short version: I dumped a table with an exclude constraint with pg_dump, and loaded it into another instance of postgres where it promptly failed to create the constraint. The complaint was... key conflicts with key. IOW, duplicate data. This should not happen, correct? If it *should* happen, why? If it shouldn't happen, where should I start looking? Long version: I'm playing with postgres 9.0b4, postgis 1.5.2 (SVN r5851). The btree_gist contrib module and period pgFoundry module are used as appropriate. The machine is a jail on a FreeBSD 7.2/x64 box. I'm trying to track some GPS data that's in need of sanitization. Often times reports will come in a few seconds apart with identical coordinates and identical non-zero speeds, so I'd like to reject rows with a similar position and similar time for a given vehicle. The constraint is such: "exclude_time_buffer" EXCLUDE USING gist (vehicle WITH =, buffer_time WITH &&, bbox_dup WITH &&) Buffer time is a period +- 1 minute from the report time, calculated by the client and inserted with the other clumns. bbox_dup is a geometry column containing a polygon. - alex
On Mon, 2010-08-30 at 14:05 -0700, Alex Zepeda wrote: > Short version: > > I dumped a table with an exclude constraint with pg_dump, and loaded it > into another instance of postgres where it promptly failed to create the > constraint. The complaint was... key conflicts with key. IOW, > duplicate data. > > This should not happen, correct? If it *should* happen, why? If it > shouldn't happen, where should I start looking? Are you using any floating point values, or floating-point timestamps (that's a compile-time option)? If you dump a floating-point value, and then reload it, it may be different than the one you started with. That can cause a problem with either UNIQUE or EXCLUDE constraints. If you are not using floating point values, please try to make a self-contained test case that includes data that can reproduce the problem. Regards, Jeff Davis
Alex Zepeda <zipzippy@sonic.net> writes: > I dumped a table with an exclude constraint with pg_dump, and loaded it > into another instance of postgres where it promptly failed to create the > constraint. The complaint was... key conflicts with key. IOW, > duplicate data. > This should not happen, correct? If it *should* happen, why? If it > shouldn't happen, where should I start looking? Well, for starters, do the complained-of values actually violate the constraint? regards, tom lane
Jeff Davis <pgsql@j-davis.com> writes: > On Mon, 2010-08-30 at 14:05 -0700, Alex Zepeda wrote: >> I dumped a table with an exclude constraint with pg_dump, and loaded it >> into another instance of postgres where it promptly failed to create the >> constraint. The complaint was... key conflicts with key. IOW, >> duplicate data. > If you dump a floating-point value, and then reload it, it may be > different than the one you started with. That's really *not* supposed to happen, assuming that both machines have IEEE float arithmetic and competently written float I/O code. My own WAG is that we're talking about a GiST bug that causes it to recognize or not recognize duplicates depending on order of insertion. But that's theorizing far in advance of the data. > If you are not using floating point values, please try to make a > self-contained test case that includes data that can reproduce the > problem. Yeah, a test case would be awfully helpful here. regards, tom lane
Jeff Davis wrote: > Are you using any floating point values, or floating-point timestamps > (that's a compile-time option)? I'm using whatever the default compile time options are. It's worth noting that if I delete the row, and then attempt to insert a row with the same data on the original database... it fails. In this case the period is the 'reported_at' column +- one minute, and the second, conflicting, row has a 'reported_at' value of more than one second greater than the original row. > If you dump a floating-point value, and then reload it, it may be > different than the one you started with. That can cause a problem with > either UNIQUE or EXCLUDE constraints. The index is on an integer (vehicle), a polygon (created with ST_Expand), and a period. When I attempt to delete+reinsert the row, I use the binary representation of the polygon. > If you are not using floating point values, please try to make a > self-contained test case that includes data that can reproduce the > problem. I'll try. Input is serialized, so would simply logging the queries suffice? Tom: the machine collecting the data is a FreeBSD 7.2p8/x64 box, the other is a 32-bit Ubuntu 10.04 virtual machine for doing bad things with mapnik. I certainly *hope* the BSD guys have their FP math stuff in order. If I try to manually add the constraint on the table I get: blockface=# alter table bus_positions add constraint "exclude_time_buffer" EXCLUDE USING gist (vehicle WITH =, buffer_timeWITH &&, bbox_dup WITH &&); NOTICE: ALTER TABLE / ADD EXCLUDE will create implicit index "exclude_time_buffer" for table "bus_positions" ERROR: could not create exclusion constraint "exclude_time_buffer" DETAIL: Key (vehicle, buffer_time, bbox_dup)=(54, [2010-08-28 07:08:21-07, 2010-08-28 07:10:21-07), 0103000020E6100000010000000500000076374F75C8995EC07E91D09673E5424076374F75C8995EC09A42E73576E54240E8DEC325C7995EC09A42E73576E54240E8DEC325C7995EC07E91D09673E5424076374F75C8995EC07E91D09673E54240) conflicts with key (vehicle, buffer_time, bbox_dup)=(54, [2010-08-28 07:09:34-07, 2010-08-28 07:11:34-07), 0103000020E6100000010000000500000016FBCBEEC9995EC09A42E73576E5424016FBCBEEC9995EC0B6F3FDD478E5424088A2409FC8995EC0B6F3FDD478E5424088A2409FC8995EC09A42E73576E5424016FBCBEEC9995EC09A42E73576E54240). When I go back to the original data, yup, there are two rows with those data in them. On both machines: blockface=# SELECT period('2010-08-28 07:08:21-07', '2010-08-28 07:10:21-07') && period('2010-08-28 07:09:34-07', '2010-08-2807:11:34-07') AS period_intersect, '0103000020E6100000010000000500000076374F75C8995EC07E91D09673E5424076374F75C8995EC09A42E73576E54240E8DEC325C7995EC09A42E73576E54240E8DEC325C7995EC07E91D09673E5424076374F75C8995EC07E91D09673E54240'::geometry && '0103000020E6100000010000000500000016FBCBEEC9995EC09A42E73576E5424016FBCBEEC9995EC0B6F3FDD478E5424088A2409FC8995EC0B6F3FDD478E5424088A2409FC8995EC09A42E73576E5424016FBCBEEC9995EC09A42E73576E54240'::geometry ASgeom_intersect; period_intersect | geom_intersect ------------------+---------------- t | t (1 row) Another thing I stumbled over was that I could not (accidentally) insert an empty period on a column with an exclude constraint using the && operator. - alex
Jeff Davis <pgsql@j-davis.com> writes: > On Mon, 2010-08-30 at 19:40 -0400, Tom Lane wrote: >> That's really *not* supposed to happen, assuming that both machines have >> IEEE float arithmetic and competently written float I/O code. > On my machine I see: > => select float4in(float4out((1::numeric + 1e-7::numeric)::float4)) = > (1::numeric + 1e-7::numeric)::float4; You need extra_float_digits cranked up. Which pg_dump knows about. regards, tom lane
On Mon, 2010-08-30 at 19:40 -0400, Tom Lane wrote: > That's really *not* supposed to happen, assuming that both machines have > IEEE float arithmetic and competently written float I/O code. On my machine I see: => select float4in(float4out((1::numeric + 1e-7::numeric)::float4)) = (1::numeric + 1e-7::numeric)::float4; ?column? ---------- f (1 row) > My own WAG is that we're talking about a GiST bug that causes it to > recognize or not recognize duplicates depending on order of insertion. > But that's theorizing far in advance of the data. Seems plausible. Regards, Jeff Davis
Alex Zepeda <zipzippy@sonic.net> writes: > If I try to manually add the constraint on the table I get: > blockface=# alter table bus_positions add constraint "exclude_time_buffer" EXCLUDE USING gist (vehicle WITH =, buffer_timeWITH &&, bbox_dup WITH &&); > NOTICE: ALTER TABLE / ADD EXCLUDE will create implicit index "exclude_time_buffer" for table "bus_positions" > ERROR: could not create exclusion constraint "exclude_time_buffer" Try manually adding a similar constraint (ie, one redundant with the existing one) on the source box. Does it work? regards, tom lane
Tom Lane wrote: > Alex Zepeda <zipzippy@sonic.net> writes: >> If I try to manually add the constraint on the table I get: >> blockface=# alter table bus_positions add constraint "exclude_time_buffer" EXCLUDE USING gist (vehicle WITH =, buffer_timeWITH &&, bbox_dup WITH &&); >> NOTICE: ALTER TABLE / ADD EXCLUDE will create implicit index "exclude_time_buffer" for table "bus_positions" >> ERROR: could not create exclusion constraint "exclude_time_buffer" > > Try manually adding a similar constraint (ie, one redundant with the > existing one) on the source box. Does it work? So... erm... dinky machine (4GB physical + 4GB swap for now), rather large data set (a hair over a million rows). I ran out of memory before the add constraint operation could complete (reindex as well). My inclination would be to drop the most recent half or so of the rows and attempt to reindex. I'd be happy to provide someone with a copy of the dump... but it's ~70MB bzip'd. - alex
Alex Zepeda <zipzippy@sonic.net> writes: > I'd be happy to provide someone with a copy of the dump... > but it's ~70MB bzip'd. I'd be willing to take a look ... but if my WAG is right, the insertion order would be critical information to reproduce the problem. Is it possible to tell what order the rows were inserted in? regards, tom lane
Tom Lane wrote: > Alex Zepeda <zipzippy@sonic.net> writes: >> I'd be happy to provide someone with a copy of the dump... >> but it's ~70MB bzip'd. > > I'd be willing to take a look ... but if my WAG is right, the insertion > order would be critical information to reproduce the problem. Is it > possible to tell what order the rows were inserted in? Yes. There are two relevant columns: id -- a sequence created_at -- a timestamp w/ timezone The timestamps are all a bit wonky, but in a consistent manner due to how ActiveRecord handles timezones. - alex
On Mon, 2010-08-30 at 20:06 -0400, Tom Lane wrote: > Jeff Davis <pgsql@j-davis.com> writes: > > On Mon, 2010-08-30 at 19:40 -0400, Tom Lane wrote: > >> That's really *not* supposed to happen, assuming that both machines have > >> IEEE float arithmetic and competently written float I/O code. > > > On my machine I see: > > > => select float4in(float4out((1::numeric + 1e-7::numeric)::float4)) = > > (1::numeric + 1e-7::numeric)::float4; > > You need extra_float_digits cranked up. Which pg_dump knows about. I can't reproduce the problem with float4/8, but I still see a problem with floating-point timestamps: => show integer_datetimes ; ------------------- off (1 row) => show extra_float_digits ; -------------------- 3 (1 row) => select timestamp_in(timestamp_out('2009-01-01'::timestamp + '0.0000007 sec'::interval),0,-1) = ('2009-01-01'::timestamp + '0.0000007 sec'::interval); ?column? ---------- f (1 row) Machine is 64-bit linux, 9.1devel. It's academic at this point, however, because Alex compiled with the default options. Regards, Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Mon, 2010-08-30 at 20:06 -0400, Tom Lane wrote: >> You need extra_float_digits cranked up. Which pg_dump knows about. > I can't reproduce the problem with float4/8, but I still see a problem > with floating-point timestamps: Hmmm ... timestamp_out pays no attention to extra_float_digits, but just arbitrarily rounds off at six fractional digits. Maybe we should change that. Although I'm not certain it would result in any strong guarantees anyway, because of the extra arithmetic involved in timestamp I/O conversions. As you say, it's irrelevant for Alex's problem; and maybe it's not worth working on now that float timestamps are deprecated. regards, tom lane
On Tue, 2010-08-31 at 14:13 -0400, Tom Lane wrote: > Jeff Davis <pgsql@j-davis.com> writes: > > On Mon, 2010-08-30 at 20:06 -0400, Tom Lane wrote: > >> You need extra_float_digits cranked up. Which pg_dump knows about. > > > I can't reproduce the problem with float4/8, but I still see a problem > > with floating-point timestamps: > > Hmmm ... timestamp_out pays no attention to extra_float_digits, but just > arbitrarily rounds off at six fractional digits. Maybe we should change > that. Although I'm not certain it would result in any strong guarantees > anyway, because of the extra arithmetic involved in timestamp I/O > conversions. As you say, it's irrelevant for Alex's problem; and maybe > it's not worth working on now that float timestamps are deprecated. Agreed, it wouldn't be worth fixing if we can't really be sure that the input/output functions are inverses. Also, the potential problem with UNIQUE is pretty unlikely (both in terms of "why would you do that?" and "you would have to be very unlucky"). It seems like it's mostly a problem for people using floating point timestamps as a boundary, because you may expect the boundaries to line up in a certain way. Anyone using floating-point timestamps with the PERIOD data type, or anyone using the two-column "_start" and "_end" representation is at risk. I think this is a pretty strong argument that floating-point timestamps are broken for all but fairly simple use-cases. Regards, Jeff Davis
Well, the answer is that Jeff's instinct was right: the dump and reload isn't reproducing the original data exactly. It's not our fault though, it's a postgis bug. Observe: gisttest2=# select ST_expand(setsrid(makepoint(-122.50367,37.74189),4326), 0.00004); st_expand ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0103000020E61000000100000005000000C32ADEC83CA05EC0D044D8F0F4DE4240C32ADEC83CA05EC0ECF5EE8FF7DE424035D252793BA05EC0ECF5EE8FF7DE424035D252793BA05EC0D044D8F0F4DE4240C32ADEC83CA05EC0D044D8F0F4DE4240 (1 row) gisttest2=# select ST_expand(setsrid(makepoint(-122.50376,37.74185),4326), 0.00004); st_expand ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0103000020E6100000010000000500000063EE5A423EA05EC042EC4CA1F3DE424063EE5A423EA05EC05E9D6340F6DE4240D595CFF23CA05EC05E9D6340F6DE4240D595CFF23CA05EC042EC4CA1F3DE424063EE5A423EA05EC042EC4CA1F3DE4240 (1 row) gisttest2=# select ST_expand(setsrid(makepoint(-122.50367,37.74189),4326), 0.00004) && ST_expand(setsrid(makepoint(-122.50376,37.74185),4326),0.00004); ?column? ---------- f (1 row) gisttest2=# select '0103000020E61000000100000005000000C32ADEC83CA05EC0D044D8F0F4DE4240C32ADEC83CA05EC0ECF5EE8FF7DE424035D252793BA05EC0ECF5EE8FF7DE424035D252793BA05EC0D044D8F0F4DE4240C32ADEC83CA05EC0D044D8F0F4DE4240'::geometry && gisttest2-# '0103000020E6100000010000000500000063EE5A423EA05EC042EC4CA1F3DE424063EE5A423EA05EC05E9D6340F6DE4240D595CFF23CA05EC05E9D6340F6DE4240D595CFF23CA05EC042EC4CA1F3DE424063EE5A423EA05EC042EC4CA1F3DE4240'::geometry; ?column? ---------- t (1 row) So these two geometry values do not overlap in the original database, but they do overlap in the clone, apparently because the output representation of geometry doesn't result in an exact reconstruction of the value. Somebody better complain over in the postgis lists. regards, tom lane
Tom Lane wrote: > So these two geometry values do not overlap in the original database, > but they do overlap in the clone, apparently because the output > representation of geometry doesn't result in an exact reconstruction > of the value. Somebody better complain over in the postgis lists. Thanks for doing all of that leg work. Whinging has commenced. - alex