Thread: BUG #4748: hash join and sort-merge join make different results

BUG #4748: hash join and sort-merge join make different results

From

"Roman Kononov"

Date:

03 April 2009, 19:07:49

The following bug has been logged online:

Bug reference:      4748
Logged by:          Roman Kononov
Email address:      kononov@ftml.net
PostgreSQL version: 8.3.7
Operating system:   GNU/Linux x86_64
Description:        hash join and sort-merge join make different results
Details:

test-std=# create table t(s int,i interval);
CREATE TABLE
test-std=# insert into t values (0,'30 days'), (1,'1 month');
INSERT 0 2
test-std=# select * from t as a, t as b where a.i=b.i;
 s |    i    | s |    i
---+---------+---+---------
 0 | 30 days | 0 | 30 days
 0 | 30 days | 1 | 1 mon
 1 | 1 mon   | 0 | 30 days
 1 | 1 mon   | 1 | 1 mon
(4 rows)

test-std=# analyze;
ANALYZE
test-std=# select * from t as a, t as b where a.i=b.i;
 s |    i    | s |    i
---+---------+---+---------
 0 | 30 days | 0 | 30 days
 1 | 1 mon   | 1 | 1 mon
(2 rows)

Re: BUG #4748: hash join and sort-merge join make different results

From

Alvaro Herrera

Date:

03 April 2009, 19:10:26

Roman Kononov wrote:

> Description:        hash join and sort-merge join make different results
> Details:
>
> test-std=# create table t(s int,i interval);
> CREATE TABLE
> test-std=# insert into t values (0,'30 days'), (1,'1 month');
> INSERT 0 2
> test-std=# select * from t as a, t as b where a.i=b.i;

Reproducible in 8.2.13 as well ..

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #4748: hash join and sort-merge join make different results

From

Jaime Casanova

Date:

03 April 2009, 19:44:50

On Fri, Apr 3, 2009 at 2:10 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Roman Kononov wrote:
>
>> Description: =A0 =A0 =A0 =A0hash join and sort-merge join make different=
 results
>> Details:
>>
>> test-std=3D# create table t(s int,i interval);
>> CREATE TABLE
>> test-std=3D# insert into t values (0,'30 days'), (1,'1 month');
>> INSERT 0 2
>> test-std=3D# select * from t as a, t as b where a.i=3Db.i;
>
> Reproducible in 8.2.13 as well ..
>

and the same in HEAD

PS: the analyze not always brings the problems, i had to turn off
enable_mergejoin and enable_nestloop

--=20
Atentamente,
Jaime Casanova
Soporte y capacitaci=F3n de PostgreSQL
Asesor=EDa y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Re: BUG #4748: hash join and sort-merge join make different results

From

Tom Lane

Date:

03 April 2009, 19:57:56

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Roman Kononov wrote:
>> Description:        hash join and sort-merge join make different results
>> Details:
>>
>> test-std=# create table t(s int,i interval);
>> CREATE TABLE
>> test-std=# insert into t values (0,'30 days'), (1,'1 month');
>> INSERT 0 2
>> test-std=# select * from t as a, t as b where a.i=b.i;

> Reproducible in 8.2.13 as well ..

The problem is that interval_cmp_internal(), and hence interval_eq(),
says that '1 month' = '30 days'.  But they don't hash to the same thing.

I think we could fix this by having interval_hash() duplicate the
total-span calculation done by interval_cmp_internal, and then return
the hash of the resulting TimeOffset.  This is going to break existing
hash indexes on intervals, but there seems little choice...

            regards, tom lane

Re: BUG #4748: hash join and sort-merge join make different results

From

"Dickson S. Guedes"

Date:

03 April 2009, 20:08:54

Em Sex, 2009-04-03 =E0s 15:10 -0400, Alvaro Herrera escreveu:
> Roman Kononov wrote:
>=20
> > Description:        hash join and sort-merge join make different results
> > Details:=20
> >=20
> > test-std=3D# create table t(s int,i interval);
> > CREATE TABLE
> > test-std=3D# insert into t values (0,'30 days'), (1,'1 month');
> > INSERT 0 2
> > test-std=3D# select * from t as a, t as b where a.i=3Db.i;
>=20
> Reproducible in 8.2.13 as well ..

I could reproduce this once in a database that already have a table
named "t", then after i did dropped it i couldn't anymore.

I'm using 8.3.7.

# create table tt_01(s int,i interval);
CREATE TABLE

# insert into tt_01 values (0,'30 days'), (1,'1 month');
INSERT 0 2

# select * from tt_01 as a, tt_01 as b where a.i=3Db.i;
 s |    i    | s |    i=20=20=20=20
---+---------+---+---------
 0 | 30 days | 0 | 30 days
 0 | 30 days | 1 | 1 mon
 1 | 1 mon   | 0 | 30 days
 1 | 1 mon   | 1 | 1 mon
(4 registros)

# ANALYZE ;
ANALYZE

# select * from tt_01 as a, tt_01 as b where a.i=3Db.i;
 s |    i    | s |    i=20=20=20=20
---+---------+---+---------
 0 | 30 days | 0 | 30 days
 0 | 30 days | 1 | 1 mon
 1 | 1 mon   | 0 | 30 days
 1 | 1 mon   | 1 | 1 mon
(4 registros)


--=20
Dickson S. Guedes=20
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://planeta.postgresql.org.br

Re: BUG #4748: hash join and sort-merge join make different results

From

Tom Lane

Date:

03 April 2009, 20:18:33

"Dickson S. Guedes" <listas@guedesoft.net> writes:
> I could reproduce this once in a database that already have a table
> named "t", then after i did dropped it i couldn't anymore.

As noted, you might have to force use of a hash join (my machine
preferred a mergejoin before the ANALYZE and a nestloop after).
It's definitely broken :-(

            regards, tom lane

Re: BUG #4748: hash join and sort-merge join make different results

From

Roman Kononov

Date:

04 April 2009, 04:01:25

On 2009-04-03 14:57 Tom Lane said the following:
> I think we could fix this by having interval_hash() duplicate the
> total-span calculation done by interval_cmp_internal, and then return
> the hash of the resulting TimeOffset.  This is going to break existing
> hash indexes on intervals, but there seems little choice...

Consider hashing the result of justify_interval().

Re: BUG #4748: hash join and sort-merge join make different results

From

Tom Lane

Date:

04 April 2009, 04:32:08

Roman Kononov <kononov@ftml.net> writes:
> On 2009-04-03 14:57 Tom Lane said the following:
>> I think we could fix this by having interval_hash() duplicate the
>> total-span calculation done by interval_cmp_internal, and then return
>> the hash of the resulting TimeOffset.  This is going to break existing
>> hash indexes on intervals, but there seems little choice...

> Consider hashing the result of justify_interval().

Uh, what's your point?  We have to match interval_eq, not
justify_interval.

            regards, tom lane

Re: BUG #4748: hash join and sort-merge join make different results

From

Tom Lane

Date:

04 April 2009, 05:02:44

"Roman Kononov" <kononov@ftml.net> writes:
> Description:        hash join and sort-merge join make different results

I've applied a patch for this:
http://archives.postgresql.org/pgsql-committers/2009-04/msg00048.php

Thanks for the report!

            regards, tom lane

Re: BUG #4748: hash join and sort-merge join make different results

From

Roman Kononov

Date:

04 April 2009, 05:14:00

On 2009-04-03 23:32 Tom Lane said the following:
> Roman Kononov <kononov@ftml.net> writes:
>> On 2009-04-03 14:57 Tom Lane said the following:
>>> I think we could fix this by having interval_hash() duplicate the
>>> total-span calculation done by interval_cmp_internal, and then return
>>> the hash of the resulting TimeOffset.  This is going to break existing
>>> hash indexes on intervals, but there seems little choice...
>
>> Consider hashing the result of justify_interval().
>
> Uh, what's your point?  We have to match interval_eq, not
> justify_interval.

For any two intervals a and b, saying that interval_cmp_interval(a,b)==0
is exactly the same as saying that (aj.month==bj.month && aj.day==bj.day
&& aj.time==bj.time), where aj=justify_interval(a) and
bj=justify_interval(b). Therefore, instead of hashing
interval_cmp_value() you can hash justify_interval(), where
interval_cmp_value() is the transformation of intervals in
interval_cmp_interval().

You said that hashing interval_cmp_value() breaks existing hash indexes.
Hashing "justified" intervals avoids such breaking in some cases.

Re: BUG #4748: hash join and sort-merge join make different results

From

Tom Lane

Date:

04 April 2009, 05:25:09

Roman Kononov <kononov@ftml.net> writes:
> On 2009-04-03 23:32 Tom Lane said the following:
>> Uh, what's your point?  We have to match interval_eq, not
>> justify_interval.

> For any two intervals a and b, saying that interval_cmp_interval(a,b)==0
> is exactly the same as saying that (aj.month==bj.month && aj.day==bj.day
> && aj.time==bj.time), where aj=justify_interval(a) and
> bj=justify_interval(b).

I doubt that that's exactly true when you take floating-point roundoff
into account ...

            regards, tom lane