Thread: Sequential Scan Index Bug
I have a table with an integer column with about 10M rows in it. This column has an index (btree). When I try to select a row using this column with an integer, e.g. select * from table where id=4, it always uses the index. However, if I select try to select a row using this column with a decimal, e.g. select * from table where id=4.343, it skips the index entirely and does a sequential scan of the table. I am using v7.4.2 on Freebsd 4.9. Gabriel _________________ Gabriel Weinberg yegg@alum.mit.edu
On Sat, Apr 03, 2004 at 13:51:56 -0500, Gabriel Weinberg <yegg@alum.mit.edu> wrote: > > I have a table with an integer column with about 10M rows in it. > > This column has an index (btree). > > When I try to select a row using this column with an integer, e.g. select * > from table where id=4, it always uses the index. However, if I select try > to select a row using this column with a decimal, e.g. select * from table > where id=4.343, it skips the index entirely and does a sequential scan of > the table. > > I am using v7.4.2 on Freebsd 4.9. Depending on what you want to do, you probably either want to cast the value to an int explicitly or combine that with a test (using a stable function) to make sure the number is actually an integer.
Yes, I thought I had done that, but now that I figured out what was going on, I did it for all cases. So it is no longer occurring for me, but it still seems like a bug in PostgreSQL. I would expect it to throw an error immediately, instead of scanning the table for a value of a different type. In my case, the table is huge, so it really put a hamper on the system. Gabriel _________________ Gabriel Weinberg yegg@alum.mit.edu=20 -----Original Message----- From: Bruno Wolff III [mailto:bruno@wolff.to]=20 Sent: Wednesday, April 07, 2004 1:38 AM To: Gabriel Weinberg Cc: pgsql-bugs@postgresql.org Subject: Re: Sequential Scan Index Bug On Sat, Apr 03, 2004 at 13:51:56 -0500, Gabriel Weinberg <yegg@alum.mit.edu> wrote: >=20 > I have a table with an integer column with about 10M rows in it. >=20 > This column has an index (btree). >=20 > When I try to select a row using this column with an integer, e.g.=20 > select * from table where id=3D4, it always uses the index. However, if= =20 > I select try to select a row using this column with a decimal, e.g.=20 > select * from table where id=3D4.343, it skips the index entirely and=20 > does a sequential scan of the table. >=20 > I am using v7.4.2 on Freebsd 4.9. Depending on what you want to do, you probably either want to cast the value to an int explicitly or combine that with a test (using a stable function) to make sure the number is actually an integer.
On Wed, 7 Apr 2004, Gabriel Weinberg wrote: > Yes, I thought I had done that, but now that I figured out what was going > on, I did it for all cases. So it is no longer occurring for me, but it > still seems like a bug in PostgreSQL. I would expect it to throw an error > immediately, instead of scanning the table for a value of a different type. But what if you said id = 4.0? Would you want it to find the id=4 row?
On Wed, 7 Apr 2004, Gabriel Weinberg wrote: > Presumably, but that is not what I was doing. I was responding to the part that was: "I would expect it to throw an error immediately, instead of scanning the table for a value of a different type." If say intcol = 4.345 is an error, is intcol = 4.0 an error as well given that 4.345 and 4.0 are presumably the same type? I'm not sure what error you would expect.
On Wed, 7 Apr 2004, Gabriel Weinberg wrote: > I would expect if I did intcol = 4.35 or intcol = 'abc', it would throw a > type mismatch error. Well, in practice, the former is AFAICS required to do something "right" by the SQL spec because it explicitly states that all all numbers are mutually comparable, so erroring would technically be against spec. It's possible that we could do something more intelligent than the current behavior for that case but I can't come up with a particularly good choice that wouldn't have bad effects elsewhere.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > It's possible that we could do something more intelligent than the current > behavior for that case but I can't come up with a particularly good > choice that wouldn't have bad effects elsewhere. In theory we could recognize that "integer_column = 4.35" will yield a constant false. If the expression were replaced by "false" during constant folding then the planner would produce a short-circuited plan that won't actually examine the table. In practice, though, I don't see any way to do that that wouldn't be a horrendous kluge. I don't like putting special-case type-specific knowledge into the planner; yet here we have knowledge that's not only type-specific but specific to the combination of two different types. Yech. I don't see any hope for a catalog-driven, extensible approach for such things. You'd also have to ask questions about whether the planner time spent testing for such cases would really be a good investment... regards, tom lane
I would expect if I did intcol =3D 4.35 or intcol =3D 'abc', it would throw= a type mismatch error. Now whether you want to be clever and make intcol =3D 4.0 not throw an error and instead convert it to intcol =3D 4 is another thing. In that case, I still wouldn't scan the table. And now that I think of it, 4 is different than 4.0 in terms of precision. If you are querying an intcol, maybe that doesn't matter, but I would probably err on the side of precaution and throw a type mismatch error as well.=20=20 Gabriel _________________ Gabriel Weinberg yegg@alum.mit.edu=20 -----Original Message----- From: Stephan Szabo [mailto:sszabo@megazone.bigpanda.com]=20 Sent: Wednesday, April 07, 2004 12:58 PM To: Gabriel Weinberg Cc: 'Bruno Wolff III'; pgsql-bugs@postgresql.org Subject: RE: [BUGS] Sequential Scan Index Bug On Wed, 7 Apr 2004, Gabriel Weinberg wrote: > Presumably, but that is not what I was doing. I was responding to the part that was: "I would expect it to throw an error immediately, instead of scanning the table for a value of a different type." If say intcol =3D 4.345 is an error, is intcol =3D 4.0 an error as well giv= en that 4.345 and 4.0 are presumably the same type? I'm not sure what error you would expect.
Presumably, but that is not what I was doing. I was taking a number from a user, which was supposed to be divisible by an integer. Sometimes the user left off the last digit when typing in the number or otherwise typed it in wrong, rendering the input not divisible by that number. So it was looking for something like 4.345. Gabriel _________________ Gabriel Weinberg yegg@alum.mit.edu -----Original Message----- From: Stephan Szabo [mailto:sszabo@megazone.bigpanda.com] Sent: Wednesday, April 07, 2004 11:35 AM To: Gabriel Weinberg Cc: 'Bruno Wolff III'; pgsql-bugs@postgresql.org Subject: Re: [BUGS] Sequential Scan Index Bug On Wed, 7 Apr 2004, Gabriel Weinberg wrote: > Yes, I thought I had done that, but now that I figured out what was > going on, I did it for all cases. So it is no longer occurring for > me, but it still seems like a bug in PostgreSQL. I would expect it to > throw an error immediately, instead of scanning the table for a value > of a different type. But what if you said id = 4.0? Would you want it to find the id=4 row?