Thread: BUG #4748: hash join and sort-merge join make different results
The following bug has been logged online: Bug reference: 4748 Logged by: Roman Kononov Email address: kononov@ftml.net PostgreSQL version: 8.3.7 Operating system: GNU/Linux x86_64 Description: hash join and sort-merge join make different results Details: test-std=# create table t(s int,i interval); CREATE TABLE test-std=# insert into t values (0,'30 days'), (1,'1 month'); INSERT 0 2 test-std=# select * from t as a, t as b where a.i=b.i; s | i | s | i ---+---------+---+--------- 0 | 30 days | 0 | 30 days 0 | 30 days | 1 | 1 mon 1 | 1 mon | 0 | 30 days 1 | 1 mon | 1 | 1 mon (4 rows) test-std=# analyze; ANALYZE test-std=# select * from t as a, t as b where a.i=b.i; s | i | s | i ---+---------+---+--------- 0 | 30 days | 0 | 30 days 1 | 1 mon | 1 | 1 mon (2 rows)
Roman Kononov wrote: > Description: hash join and sort-merge join make different results > Details: > > test-std=# create table t(s int,i interval); > CREATE TABLE > test-std=# insert into t values (0,'30 days'), (1,'1 month'); > INSERT 0 2 > test-std=# select * from t as a, t as b where a.i=b.i; Reproducible in 8.2.13 as well .. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, Apr 3, 2009 at 2:10 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Roman Kononov wrote: > >> Description: =A0 =A0 =A0 =A0hash join and sort-merge join make different= results >> Details: >> >> test-std=3D# create table t(s int,i interval); >> CREATE TABLE >> test-std=3D# insert into t values (0,'30 days'), (1,'1 month'); >> INSERT 0 2 >> test-std=3D# select * from t as a, t as b where a.i=3Db.i; > > Reproducible in 8.2.13 as well .. > and the same in HEAD PS: the analyze not always brings the problems, i had to turn off enable_mergejoin and enable_nestloop --=20 Atentamente, Jaime Casanova Soporte y capacitaci=F3n de PostgreSQL Asesor=EDa y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Alvaro Herrera <alvherre@commandprompt.com> writes: > Roman Kononov wrote: >> Description: hash join and sort-merge join make different results >> Details: >> >> test-std=# create table t(s int,i interval); >> CREATE TABLE >> test-std=# insert into t values (0,'30 days'), (1,'1 month'); >> INSERT 0 2 >> test-std=# select * from t as a, t as b where a.i=b.i; > Reproducible in 8.2.13 as well .. The problem is that interval_cmp_internal(), and hence interval_eq(), says that '1 month' = '30 days'. But they don't hash to the same thing. I think we could fix this by having interval_hash() duplicate the total-span calculation done by interval_cmp_internal, and then return the hash of the resulting TimeOffset. This is going to break existing hash indexes on intervals, but there seems little choice... regards, tom lane
Em Sex, 2009-04-03 =E0s 15:10 -0400, Alvaro Herrera escreveu: > Roman Kononov wrote: >=20 > > Description: hash join and sort-merge join make different results > > Details:=20 > >=20 > > test-std=3D# create table t(s int,i interval); > > CREATE TABLE > > test-std=3D# insert into t values (0,'30 days'), (1,'1 month'); > > INSERT 0 2 > > test-std=3D# select * from t as a, t as b where a.i=3Db.i; >=20 > Reproducible in 8.2.13 as well .. I could reproduce this once in a database that already have a table named "t", then after i did dropped it i couldn't anymore. I'm using 8.3.7. # create table tt_01(s int,i interval); CREATE TABLE # insert into tt_01 values (0,'30 days'), (1,'1 month'); INSERT 0 2 # select * from tt_01 as a, tt_01 as b where a.i=3Db.i; s | i | s | i=20=20=20=20 ---+---------+---+--------- 0 | 30 days | 0 | 30 days 0 | 30 days | 1 | 1 mon 1 | 1 mon | 0 | 30 days 1 | 1 mon | 1 | 1 mon (4 registros) # ANALYZE ; ANALYZE # select * from tt_01 as a, tt_01 as b where a.i=3Db.i; s | i | s | i=20=20=20=20 ---+---------+---+--------- 0 | 30 days | 0 | 30 days 0 | 30 days | 1 | 1 mon 1 | 1 mon | 0 | 30 days 1 | 1 mon | 1 | 1 mon (4 registros) --=20 Dickson S. Guedes=20 mail/xmpp: guedes@guedesoft.net - skype: guediz http://guedesoft.net - http://planeta.postgresql.org.br
"Dickson S. Guedes" <listas@guedesoft.net> writes: > I could reproduce this once in a database that already have a table > named "t", then after i did dropped it i couldn't anymore. As noted, you might have to force use of a hash join (my machine preferred a mergejoin before the ANALYZE and a nestloop after). It's definitely broken :-( regards, tom lane
On 2009-04-03 14:57 Tom Lane said the following: > I think we could fix this by having interval_hash() duplicate the > total-span calculation done by interval_cmp_internal, and then return > the hash of the resulting TimeOffset. This is going to break existing > hash indexes on intervals, but there seems little choice... Consider hashing the result of justify_interval().
Roman Kononov <kononov@ftml.net> writes: > On 2009-04-03 14:57 Tom Lane said the following: >> I think we could fix this by having interval_hash() duplicate the >> total-span calculation done by interval_cmp_internal, and then return >> the hash of the resulting TimeOffset. This is going to break existing >> hash indexes on intervals, but there seems little choice... > Consider hashing the result of justify_interval(). Uh, what's your point? We have to match interval_eq, not justify_interval. regards, tom lane
"Roman Kononov" <kononov@ftml.net> writes: > Description: hash join and sort-merge join make different results I've applied a patch for this: http://archives.postgresql.org/pgsql-committers/2009-04/msg00048.php Thanks for the report! regards, tom lane
On 2009-04-03 23:32 Tom Lane said the following: > Roman Kononov <kononov@ftml.net> writes: >> On 2009-04-03 14:57 Tom Lane said the following: >>> I think we could fix this by having interval_hash() duplicate the >>> total-span calculation done by interval_cmp_internal, and then return >>> the hash of the resulting TimeOffset. This is going to break existing >>> hash indexes on intervals, but there seems little choice... > >> Consider hashing the result of justify_interval(). > > Uh, what's your point? We have to match interval_eq, not > justify_interval. For any two intervals a and b, saying that interval_cmp_interval(a,b)==0 is exactly the same as saying that (aj.month==bj.month && aj.day==bj.day && aj.time==bj.time), where aj=justify_interval(a) and bj=justify_interval(b). Therefore, instead of hashing interval_cmp_value() you can hash justify_interval(), where interval_cmp_value() is the transformation of intervals in interval_cmp_interval(). You said that hashing interval_cmp_value() breaks existing hash indexes. Hashing "justified" intervals avoids such breaking in some cases.
Roman Kononov <kononov@ftml.net> writes: > On 2009-04-03 23:32 Tom Lane said the following: >> Uh, what's your point? We have to match interval_eq, not >> justify_interval. > For any two intervals a and b, saying that interval_cmp_interval(a,b)==0 > is exactly the same as saying that (aj.month==bj.month && aj.day==bj.day > && aj.time==bj.time), where aj=justify_interval(a) and > bj=justify_interval(b). I doubt that that's exactly true when you take floating-point roundoff into account ... regards, tom lane