Thread: bug with aggregate + multi column index + index_scan

bug with aggregate + multi column index + index_scan

From
Brian Hirt
Date:
I've run across a rather nasty bug in 8.1.2.  It seems when the
planer uses an index_scan within a GroupAggregate for a multi column
index you can get incorrect results.   fwiw i also see this on a dual
xeon box running 8.1.1 and redhat 7.3.

I've created a simple test case that I hope isolates the problems
sufficiently.


x86imac:/tmp bhirt$ psql --echo-all --file=test weblogs
select version();
                                                                 version
------------------------------------------------------------------------
----------------------------------------------------------------
PostgreSQL 8.1.2 on i686-apple-darwin8.4.1, compiled by GCC i686-
apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5250)
(1 row)

create table test (
id1 int4,
id2 int4,
day date,
grp text,
v int4);
CREATE TABLE
create index test_idx on test (id1,id2,day,grp);
CREATE INDEX
insert into test values (1,1,'1/1/2006','there',1);
INSERT 0 1
insert into test values (1,1,'1/2/2006','there',2);
INSERT 0 1
insert into test values (1,1,'1/3/2006','there',3);
INSERT 0 1
insert into test values (1,1,'1/1/2006','hi',2);
INSERT 0 1
insert into test values (1,1,'1/2/2006','hi',3);
INSERT 0 1
insert into test values (1,1,'1/3/2006','hi',4);
INSERT 0 1
select grp,sum(v) from test where id1 = 1 and id2 = 1 and day between
'1/1/2006' and '1/31/2006' group by grp order by sum(v) desc;
   grp  | sum
-------+-----
hi    |   4
hi    |   3
there |   3
hi    |   2
there |   2
there |   1
(6 rows)

set enable_indexscan to false;
SET
select grp,sum(v) from test where id1 = 1 and id2 = 1 and day between
'1/1/2006' and '1/31/2006' group by grp order by sum(v) desc;
   grp  | sum
-------+-----
hi    |   9
there |   6
(2 rows)

x86imac:/tmp bhirt$

Re: bug with aggregate + multi column index + index_scan

From
Tom Lane
Date:
Brian Hirt <bhirt@mobygames.com> writes:
> I've run across a rather nasty bug in 8.1.2.  It seems when the
> planer uses an index_scan within a GroupAggregate for a multi column
> index you can get incorrect results.

Good catch.  Looks to me like it only happens if two or more leading
index columns are equated to the same constant value, ie
    where id1 = 1 and id2 = 1 and ...
will show the bug but
    where id1 = 1 and id2 = 2 and ...
won't.  Does that match up with the original behavior that led you to
make the test case?

The problem is that implied equality deduction causes the planner to
conclude id1 = id2, and this extra bit of info is confusing the code
that determines whether the index's sort order can be considered to
match the needs of the GROUP BY clause.  So you get a plan that feeds
the IndexScan directly to GroupAggregate, which is wrong because the
data isn't sorted by "grp".

In the related case
    select grp,sum(v) from test where id1 = 1 and id2 = 2 and
    day =  '1/1/2006' group by grp order by sum(v) desc;
it *is* OK to decide that the indexscan result is effectively sorted
by "grp", so it's important to have this check ... it's just not being
done quite right.  Thanks for the test case!

            regards, tom lane

Re: bug with aggregate + multi column index + index_scan

From
Tom Lane
Date:
Brian Hirt <bhirt@mobygames.com> writes:
> I've run across a rather nasty bug in 8.1.2.  It seems when the
> planer uses an index_scan within a GroupAggregate for a multi column
> index you can get incorrect results.

Patch is here if you need it before 8.1.3:
http://archives.postgresql.org/pgsql-committers/2006-01/msg00377.php

            regards, tom lane

Re: bug with aggregate + multi column index + index_scan

From
Brian Hirt
Date:
Tom,

Yes, what you describe are exactly the circumstances that are
required for our query to fail.  Once again, thanks for the great
help and quick fix.  Do you think this fix will make 8.1.3?


Best Regards,

Brian Hirt

On Jan 29, 2006, at 9:31 AM, Tom Lane wrote:

> Brian Hirt <bhirt@mobygames.com> writes:
>> I've run across a rather nasty bug in 8.1.2.  It seems when the
>> planer uses an index_scan within a GroupAggregate for a multi column
>> index you can get incorrect results.
>
> Good catch.  Looks to me like it only happens if two or more leading
> index columns are equated to the same constant value, ie
>     where id1 = 1 and id2 = 1 and ...
> will show the bug but
>     where id1 = 1 and id2 = 2 and ...
> won't.  Does that match up with the original behavior that led you to
> make the test case?
>
> The problem is that implied equality deduction causes the planner to
> conclude id1 = id2, and this extra bit of info is confusing the code
> that determines whether the index's sort order can be considered to
> match the needs of the GROUP BY clause.  So you get a plan that feeds
> the IndexScan directly to GroupAggregate, which is wrong because the
> data isn't sorted by "grp".
>
> In the related case
>     select grp,sum(v) from test where id1 = 1 and id2 = 2 and
>     day =  '1/1/2006' group by grp order by sum(v) desc;
> it *is* OK to decide that the indexscan result is effectively sorted
> by "grp", so it's important to have this check ... it's just not being
> done quite right.  Thanks for the test case!
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq

Re: bug with aggregate + multi column index + index_scan

From
Tom Lane
Date:
Brian Hirt <bhirt@mobygames.com> writes:
> Yes, what you describe are exactly the circumstances that are
> required for our query to fail.  Once again, thanks for the great
> help and quick fix.  Do you think this fix will make 8.1.3?

It's already in CVS ...

            regards, tom lane