Thread: How to use index in WHERE int = float
I have table with index CREATE TABLE firma2.dok( ... dokumnr serial NOT NULL, ... CONSTRAINT dok_pkey PRIMARY KEY (dokumnr), ... ); CREATE INDEX dok_dokumnr_idx ON firma2.dok USING btree (dokumnr); I ran analyze command on it. explain analyze select * from firma2.dok where dokumnr='1228137'::float8 "Seq Scan on dok (cost=0.00..187766.23 rows=6255 width=1145) (actual time=43168.460..43176.063 rows=1 loops=1)" " Filter: ((dokumnr)::double precision = 1228137::double precision)" "Total runtime: 43176.375 ms" shows that index is not used and thus query takes very long time. How to force PostgreSql to speed up without changing query ? "PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC i686-pc-linux-gnu-gcc (GCC) 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9)" Andrus.
On Wed, Nov 05, 2008 at 08:52:50PM +0200, Andrus wrote: > explain analyze select * from firma2.dok where dokumnr='1228137'::float8 > > "Seq Scan on dok (cost=0.00..187766.23 rows=6255 width=1145) (actual > time=43168.460..43176.063 rows=1 loops=1)" > " Filter: ((dokumnr)::double precision = 1228137::double precision)" > "Total runtime: 43176.375 ms" > > shows that index is not used and thus query takes very long time. > > How to force PostgreSql to speed up without changing query ? I'm not sure if PG 8.1 will recognize it, but you could try creating a functional index on the column when cast to the appropriate type. I.e: CREATE INDEX dok_dokumnr_float8_idx ON dok (((dokumnr)::double precision)); Sam
On Wed, Nov 05, 2008 at 08:52:50PM +0200, Andrus wrote: > explain analyze select * from firma2.dok where dokumnr='1228137'::float8 > > "Seq Scan on dok (cost=0.00..187766.23 rows=6255 width=1145) (actual > time=43168.460..43176.063 rows=1 loops=1)" > " Filter: ((dokumnr)::double precision = 1228137::double precision)" > "Total runtime: 43176.375 ms" I've just noticed that "actual time" doesn't run from zero; is this part of another query, or is something else going on? Sam
On Wed, Nov 5, 2008 at 11:52 AM, Andrus <kobruleht2@hot.ee> wrote: > I have table with index > > CREATE TABLE firma2.dok( > ... > dokumnr serial NOT NULL, > ... > CONSTRAINT dok_pkey PRIMARY KEY (dokumnr), > ... > ); > > CREATE INDEX dok_dokumnr_idx > ON firma2.dok > USING btree > (dokumnr); > > I ran analyze command on it. > > explain analyze select * from firma2.dok where dokumnr='1228137'::float8 > > "Seq Scan on dok (cost=0.00..187766.23 rows=6255 width=1145) (actual > time=43168.460..43176.063 rows=1 loops=1)" > " Filter: ((dokumnr)::double precision = 1228137::double precision)" > "Total runtime: 43176.375 ms" > > shows that index is not used and thus query takes very long time. > > How to force PostgreSql to speed up without changing query ? Stop trying to compare exact and inexact types? You do realize that a float is not an exact number. What you and I see as 1228137 might really be, internally, 1228136.9999999999999999 ? So it won't get an exact match. What's wrong with trying to match to an exact number instead? I have a feeling you've given us a simplified example of your problem. It might help to see the real problem.
> Stop trying to compare exact and inexact types? > > You do realize that a float is not an exact number. What you and I > see as 1228137 might really be, internally, 1228136.9999999999999999 ? > So it won't get an exact match. What's wrong with trying to match to > an exact number instead? My query contains '1228137'::float8 I do'nt see 1228136.9999999999999999 in this query. Those strange casts are auto-generated by ODBC parameter passing software which I must use so I must live with it. Only way to fix this it to replace parameters manually by creating strings which I'm trying to do for slow queries. > I have a feeling you've given us a > simplified example of your problem. It might help to see the real > problem. I has feeling that PostgreSql must determine that index is integer type and convert float to integer itself to speed it up. Another issue: Wuery SELECT dokumnr FROM DOK where dokumnr IN (123) AND ( '0' or dokumnr IN (SELECT dokumnr FROM bilkaib WHERE alusdok='LG' AND masin LIKE 'a%') ) runs slowly. If '0' or is removed form where clause it runs fast since it founds indexes. It is real surpise that expression containing '0' or prevents PostgreSql to use indexes. I have changed those queries to manually created better sql statements which run fast. however I expected that PsotgreSql can optimize those itself. Andrus.
On Wed, Nov 5, 2008 at 2:42 PM, Andrus <kobruleht2@hot.ee> wrote: >> Stop trying to compare exact and inexact types? >> >> You do realize that a float is not an exact number. What you and I >> see as 1228137 might really be, internally, 1228136.9999999999999999 ? >> So it won't get an exact match. What's wrong with trying to match to >> an exact number instead? > > My query contains > > '1228137'::float8 > > I do'nt see > > 1228136.9999999999999999 > > in this query. Did you read what I wrote? Cause you just repeated it as an argument against my point. I don't think you get the difference between exact and inexact types. Google it for more comprehensive reading on the subject. > Those strange casts are auto-generated by ODBC parameter passing software > which I must use so I must live with it. Sorry to hear that. PostgreSQL doesn't automatically use indexes when comparing incompatible types (i.e. exact versus non-exact types) so it's up to you to use casting to make them match up. There are lots of things you can do here, but if you're comparing ints with floats you will eventually be bitten by some comparison that should match but doesn't. Stick to exact number types if possible.
On Wed, Nov 05, 2008 at 02:21:33PM -0700, Scott Marlowe wrote: > On Wed, Nov 5, 2008 at 11:52 AM, Andrus <kobruleht2@hot.ee> wrote: > > explain analyze select * from firma2.dok where dokumnr='1228137'::float8 > > > > How to force PostgreSql to speed up without changing query ? > > Stop trying to compare exact and inexact types? > > You do realize that a float is not an exact number. What you and I > see as 1228137 might really be, internally, 1228136.9999999999999999 ? A double precision number is defined to have 52bits of precision; which implies that integer values up to 4503599627370496 can be stored accurately (I'd not recommend relying on this though). In terms of correctness, what he's doing is OK as the column is a 32bit int and its complete range can be represented accurately. Yet, from an aesthetic point of view, the OP is loosing a lot of points here. PG 8.3 would even throw it out, unless dokumnr was explicitly cast to a float8 as well. Complaining about performance when doing this seems somewhat contradictory! Asking the database to do extra work (in this case, converting an integer to a floating point number, and then doing a floating point compare) for no good reason is never good. Sam
>> explain analyze select * from firma2.dok where dokumnr='1228137'::float8 >> >> "Seq Scan on dok (cost=0.00..187766.23 rows=6255 width=1145) (actual >> time=43168.460..43176.063 rows=1 loops=1)" >> " Filter: ((dokumnr)::double precision = 1228137::double precision)" >> "Total runtime: 43176.375 ms" > > I've just noticed that "actual time" doesn't run from zero; is this part > of another query, or is something else going on? Is is standalone query. This is exact statement executed in PgAdmin window and exact result copied from pgadmin result windw. I tried it again and got "Seq Scan on dok (cost=0.00..186943.67 rows=5974 width=1173) (actual time=39275.713..39319.865 rows=1 loops=1)" " Filter: ((dokumnr)::double precision = 1228137::double precision)" "Total runtime: 39320.030 ms" Andrus.
On Wed, Nov 05, 2008 at 11:42:12PM +0200, Andrus wrote: > Scott Marlowe wrote: > >You do realize that a float is not an exact number. What you and I > >see as 1228137 might really be, internally, 1228136.9999999999999999 ? > > My query contains > '1228137'::float8 > I do'nt see > 1228136.9999999999999999 > in this query. As Scott said, I'd definitely recommend reading about about how floating point numbers are represented and manipulated. The semantics can get a bit awkward in the corner cases and having a rough idea of where these are will help point you in the right direction should strangeness (e.g. when does 1/3*3 not equal 1) start of occur. > Those strange casts are auto-generated by ODBC parameter passing software > which I must use so I must live with it. > > Only way to fix this it to replace parameters manually by creating strings > which I'm trying to do for slow queries. These last two statements seems mutually exclusive; either use the "ODBC parameter passing software" or don't. By the sounds of it, I'd recommend not using it. > I has feeling that PostgreSql must determine that index is integer type and > convert float to integer itself to speed it up. If you build a functional index as in an earlier email from me, it should just work. Whether it's a good idea for it to work this way is another matter entirely! > Another issue: > > SELECT dokumnr > FROM DOK > where dokumnr IN (123) > AND ( '0' or > dokumnr IN (SELECT dokumnr FROM bilkaib WHERE > alusdok='LG' AND masin LIKE 'a%') ) > If > '0' or > is removed form where clause it runs fast since it founds indexes. > It is real surpise that expression containing > '0' or > prevents PostgreSql to use indexes. Constant propagation is something that PG doesn't do amazingly well at; I think the main reason is that it's assumed the programmer is capable of removing most constants and so the optimizer doesn't worry too much about it. That said, in 8.3.4, if I run: EXPLAIN SELECT * FROM source_livestock WHERE FALSE OR id IN ( SELECT id FROM source_livestock LIMIT 10); I get: Seq Scan on source_livestock (cost=0.22..848037.12 rows=18949156 width=58) Filter: (hashed subplan) SubPlan -> Limit (cost=0.00..0.20 rows=10 width=4) -> Seq Scan on source_livestock (cost=0.00..753291.12 rows=37898312 width=4) and if I replace the FALSE with TRUE, giving: EXPLAIN SELECT * FROM source_livestock WHERE TRUE OR id IN ( SELECT id FROM source_livestock LIMIT 10); it realizes the subselect isn't needed and I just get: Seq Scan on source_livestock (cost=0.00..753291.12 rows=37898312 width=58) so it seems to some propagation is performed, but not all cases are handled. I don't mind much though; it's a case I know about now and can deal with. Sam
> PG 8.3 would > even throw it out, unless dokumnr was explicitly cast to a float8 as > well. I tried in 8.3 create temp table dok ( dokumnr serial primary key ); select * from dok where dokumnr='1'::float8 and this run without error. So i do'nt understand how 8.3 throws out. Andrus.
> Did you read what I wrote? Cause you just repeated it as an argument > against my point. Lets re-visit the second issue in my reply. I tried in 8.3 explain SELECT dokumnr FROM DOK where dokumnr IN (1227714) AND ( '0' or dokumnr IN (SELECT dokumnr FROM firma1.bilkaib ) ) "Index Scan using dok_pkey on dok (cost=16.55..24.82 rows=1 width=4)" " Index Cond: (dokumnr = 1227714)" " Filter: (hashed subplan)" " SubPlan" " -> Seq Scan on bilkaib (cost=0.00..15.44 rows=444 width=4)" and explain SELECT dokumnr FROM DOK where dokumnr IN (1227714) AND ( -- '0' or dokumnr IN (SELECT dokumnr FROM firma1.bilkaib ) ) "Nested Loop IN Join (cost=0.00..16.55 rows=1 width=4)" " -> Index Scan using dok_pkey on dok (cost=0.00..8.27 rows=1 width=4)" " Index Cond: (dokumnr = 1227714)" " -> Index Scan using bilkaib_dokumnr_idx on bilkaib (cost=0.00..8.27 rows=1 width=4)" " Index Cond: (bilkaib.dokumnr = 1227714)" As you see simply removing constant expression '0' or produces different query plan which is much faster for large amoutnts of data. Same results are for large data set and for earlier postgresql versions. Do you think that is OK and reasonable ? Andrus.
"Andrus" <kobruleht2@hot.ee> writes: > As you see simply removing constant expression > '0' or > produces different query plan which is much faster for large amoutnts of > data. The IN-pullup code runs before constant-simplification does, so it doesn't see that as a simple join condition. regards, tom lane
On Thu, Nov 06, 2008 at 12:08:57AM +0200, Andrus wrote: > >PG 8.3 would > >even throw it out, unless dokumnr was explicitly cast to a float8 as > >well. > > I tried in 8.3 > > create temp table dok ( dokumnr serial primary key ); > select * from dok where dokumnr='1'::float8 > > and this run without error. Hum, it doesn't seem to does it. Sorry, I was thinking PG was more strict than it appears to be. > So i do'nt understand how 8.3 throws out. It won't, I was getting confused! Sam
> The IN-pullup code runs before constant-simplification does, so it > doesn't see that as a simple join condition. Seems serious design flaw. How to change expression parser so that contant parts of expressions are removed before IN-pullup? Andrus.
"Andrus" <kobruleht2@hot.ee> writes: >> The IN-pullup code runs before constant-simplification does, so it >> doesn't see that as a simple join condition. > Seems serious design flaw. > How to change expression parser so that contant parts of expressions are > removed before IN-pullup? It would be far simpler to fix your query generator to not emit the useless "0 or". Even if we made the planner deal with that, the number of cycles it would expend to recover from the generator's stupidity would be several orders of magnitude higher than the number of cycles needed to not be so stupid. regards, tom lane
> It would be far simpler to fix your query generator to not emit the > useless "0 or". I'm using ODBC and npgsql drivers. Those drivers replace parameters automatically. E.q. for npgsql or every other ADO .NET I can write "SELECT * FROM (:param1 OR (x IN SELECT y FROM z) AND :param2) ... etc. param1, param2, ..., paramn etc. are entered by user and passed to driver as parameters. Proposed solution requires: 1. Stop using this type parameter replacement. Write case statements for every possible paramn combination in application code, total n! case statements. or 2. Create expression parser which understands syntax for every backend version and simplifies expression as string before passing to backend. > Even if we made the planner deal with that, the number of cycles it > would expend to recover from the generator's stupidity would be several > orders of magnitude higher than the number of cycles needed to not be so > stupid. Proposed solution requires re-writing ODBC and npgsql and possibly every other driver to add expression parser and constant optimization to it in client side. This seems to be huge work and no one dbms does not implement this. In this case every query is processed two times. I do'nt understand how this takes less time that processing query once in backend. Andrus.