Thread: bug with aggregate + multi column index + index_scan
I've run across a rather nasty bug in 8.1.2. It seems when the planer uses an index_scan within a GroupAggregate for a multi column index you can get incorrect results. fwiw i also see this on a dual xeon box running 8.1.1 and redhat 7.3. I've created a simple test case that I hope isolates the problems sufficiently. x86imac:/tmp bhirt$ psql --echo-all --file=test weblogs select version(); version ------------------------------------------------------------------------ ---------------------------------------------------------------- PostgreSQL 8.1.2 on i686-apple-darwin8.4.1, compiled by GCC i686- apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5250) (1 row) create table test ( id1 int4, id2 int4, day date, grp text, v int4); CREATE TABLE create index test_idx on test (id1,id2,day,grp); CREATE INDEX insert into test values (1,1,'1/1/2006','there',1); INSERT 0 1 insert into test values (1,1,'1/2/2006','there',2); INSERT 0 1 insert into test values (1,1,'1/3/2006','there',3); INSERT 0 1 insert into test values (1,1,'1/1/2006','hi',2); INSERT 0 1 insert into test values (1,1,'1/2/2006','hi',3); INSERT 0 1 insert into test values (1,1,'1/3/2006','hi',4); INSERT 0 1 select grp,sum(v) from test where id1 = 1 and id2 = 1 and day between '1/1/2006' and '1/31/2006' group by grp order by sum(v) desc; grp | sum -------+----- hi | 4 hi | 3 there | 3 hi | 2 there | 2 there | 1 (6 rows) set enable_indexscan to false; SET select grp,sum(v) from test where id1 = 1 and id2 = 1 and day between '1/1/2006' and '1/31/2006' group by grp order by sum(v) desc; grp | sum -------+----- hi | 9 there | 6 (2 rows) x86imac:/tmp bhirt$
Brian Hirt <bhirt@mobygames.com> writes: > I've run across a rather nasty bug in 8.1.2. It seems when the > planer uses an index_scan within a GroupAggregate for a multi column > index you can get incorrect results. Good catch. Looks to me like it only happens if two or more leading index columns are equated to the same constant value, ie where id1 = 1 and id2 = 1 and ... will show the bug but where id1 = 1 and id2 = 2 and ... won't. Does that match up with the original behavior that led you to make the test case? The problem is that implied equality deduction causes the planner to conclude id1 = id2, and this extra bit of info is confusing the code that determines whether the index's sort order can be considered to match the needs of the GROUP BY clause. So you get a plan that feeds the IndexScan directly to GroupAggregate, which is wrong because the data isn't sorted by "grp". In the related case select grp,sum(v) from test where id1 = 1 and id2 = 2 and day = '1/1/2006' group by grp order by sum(v) desc; it *is* OK to decide that the indexscan result is effectively sorted by "grp", so it's important to have this check ... it's just not being done quite right. Thanks for the test case! regards, tom lane
Brian Hirt <bhirt@mobygames.com> writes: > I've run across a rather nasty bug in 8.1.2. It seems when the > planer uses an index_scan within a GroupAggregate for a multi column > index you can get incorrect results. Patch is here if you need it before 8.1.3: http://archives.postgresql.org/pgsql-committers/2006-01/msg00377.php regards, tom lane
Tom, Yes, what you describe are exactly the circumstances that are required for our query to fail. Once again, thanks for the great help and quick fix. Do you think this fix will make 8.1.3? Best Regards, Brian Hirt On Jan 29, 2006, at 9:31 AM, Tom Lane wrote: > Brian Hirt <bhirt@mobygames.com> writes: >> I've run across a rather nasty bug in 8.1.2. It seems when the >> planer uses an index_scan within a GroupAggregate for a multi column >> index you can get incorrect results. > > Good catch. Looks to me like it only happens if two or more leading > index columns are equated to the same constant value, ie > where id1 = 1 and id2 = 1 and ... > will show the bug but > where id1 = 1 and id2 = 2 and ... > won't. Does that match up with the original behavior that led you to > make the test case? > > The problem is that implied equality deduction causes the planner to > conclude id1 = id2, and this extra bit of info is confusing the code > that determines whether the index's sort order can be considered to > match the needs of the GROUP BY clause. So you get a plan that feeds > the IndexScan directly to GroupAggregate, which is wrong because the > data isn't sorted by "grp". > > In the related case > select grp,sum(v) from test where id1 = 1 and id2 = 2 and > day = '1/1/2006' group by grp order by sum(v) desc; > it *is* OK to decide that the indexscan result is effectively sorted > by "grp", so it's important to have this check ... it's just not being > done quite right. Thanks for the test case! > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq
Brian Hirt <bhirt@mobygames.com> writes: > Yes, what you describe are exactly the circumstances that are > required for our query to fail. Once again, thanks for the great > help and quick fix. Do you think this fix will make 8.1.3? It's already in CVS ... regards, tom lane