Thread: Huge speed penalty using <>TRUE instead of =FALSE
One of our customers discovered that by replacing <>TRUE with =FALSE in a query of a table containing 750.000 records reduced the query time from about 12 seconds to about 60 milliseconds! The problematic query looks like this: SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE AND Alarm_status='X' ORDER BY ID DESC If it is changed to this it works as expected: SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE AND Alarm_status='X' ORDER BY ID DESC After investigation (on a smaller dataset on my own database) I found that the query was resulting in a sequential scan: "explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE AND Alarm_status='X' ORDER BY ID DESC" "Sort (cost=49936.96..49936.96 rows=1 width=405) (actual time=837.793..837.793 rows=0 loops=1)" " Sort Key: id" " Sort Method: quicksort Memory: 17kB" " -> Seq Scan on alarmlogg (cost=0.00..49936.95 rows=1 width=405) (actual time=837.782..837.782 rows=0 loops=1)" " Filter: ((logg_avsluttet <> true) AND ((alarm_status)::text = 'X'::text))" "Total runtime: 837.896 ms" The modified query gave this result: "explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE AND Alarm_status='X' ORDER BY ID DESC" "Sort (cost=8.36..8.37 rows=1 width=405) (actual time=0.032..0.032 rows=0 loops=1)" " Sort Key: id" " Sort Method: quicksort Memory: 17kB" " -> Index Scan using i_alarmlogg_logg_avsluttet on alarmlogg (cost=0.00..8.35 rows=1 width=405) (actual time=0.024..0.024 rows=0 loops=1)" " Index Cond: (logg_avsluttet = false)" " Filter: ((NOT logg_avsluttet) AND ((alarm_status)::text = 'X'::text))" "Total runtime: 0.123 ms" This is a dramatical difference, but I cannot understand why. In my head "<>TRUE" should behave exactly the same as "=FALSE". This looks like a bug to me, or am I overlooking something? This was verified on PostgreSQL 8.3.7, both on Windows Xp and Ubuntu 8.10. Some relevant details from the table definition: CREATE TABLE alarmlogg ( id serial NOT NULL, alarm_status character varying(1) DEFAULT ''::character varying, logg_avsluttet boolean DEFAULT false, ... CONSTRAINT alarmlogg_pkey PRIMARY KEY (id) ) CREATE INDEX i_alarmlogg_alarm_status ON alarmlogg USING btree (alarm_status); CREATE INDEX i_alarmlogg_logg_avsluttet ON alarmlogg USING btree (logg_avsluttet); Regards, Jan-Ivar Mellingen Securinet AS
On Friday 17 July 2009 11:12:41 Jan-Ivar Mellingen wrote: > One of our customers discovered that by replacing <>TRUE with =FALSE in > a query of a table containing 750.000 records reduced the query time > from about 12 seconds to about 60 milliseconds! > This is a dramatical difference, but I cannot understand why. In my head > "<>TRUE" should behave exactly the same as "=FALSE". This looks like a > bug to me, or am I overlooking something? The planner just isn't that smart. The boolean type is a special case where <> some_value implies = some_other_value, but this doesn't generalize well to other data types. And the planner doesn't have a whole lot of data type specific knowledge. I think a better index definition might actually be on alarm_status, with a partial index predicate on logg_avsluttet = false.
It might be that your column may be NULL as well as TRUE or FALSE. I am no expert in this matter though. /M On Fri, Jul 17, 2009 at 10:12 AM, Jan-Ivar Mellingen<jan-ivar.mellingen@alreg.no> wrote: > One of our customers discovered that by replacing <>TRUE with =3DFALSE in > a query of a table containing 750.000 records reduced the query time > from about 12 seconds to about 60 milliseconds! > > The problematic query looks like this: > SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE AND > Alarm_status=3D'X' ORDER BY ID DESC > > If it is changed to this it works as expected: > SELECT * FROM AlarmLogg WHERE Logg_avsluttet =3D FALSE AND > Alarm_status=3D'X' ORDER BY ID DESC > > After investigation (on a smaller dataset on my own database) I found > that the query was resulting in a sequential scan: > > "explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE > AND Alarm_status=3D'X' ORDER BY ID DESC" > "Sort =A0(cost=3D49936.96..49936.96 rows=3D1 width=3D405) (actual > time=3D837.793..837.793 rows=3D0 loops=3D1)" > " =A0Sort Key: id" > " =A0Sort Method: =A0quicksort =A0Memory: 17kB" > " =A0-> =A0Seq Scan on alarmlogg =A0(cost=3D0.00..49936.95 rows=3D1 width= =3D405) > (actual time=3D837.782..837.782 rows=3D0 loops=3D1)" > " =A0 =A0 =A0 =A0Filter: ((logg_avsluttet <> true) AND ((alarm_status)::t= ext =3D > 'X'::text))" > "Total runtime: 837.896 ms" > > The modified query gave this result: > "explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet =3D FALSE > AND Alarm_status=3D'X' ORDER BY ID DESC" > "Sort =A0(cost=3D8.36..8.37 rows=3D1 width=3D405) (actual time=3D0.032..0= .032 > rows=3D0 loops=3D1)" > " =A0Sort Key: id" > " =A0Sort Method: =A0quicksort =A0Memory: 17kB" > " =A0-> =A0Index Scan using i_alarmlogg_logg_avsluttet on alarmlogg > (cost=3D0.00..8.35 rows=3D1 width=3D405) (actual time=3D0.024..0.024 rows= =3D0 > loops=3D1)" > " =A0 =A0 =A0 =A0Index Cond: (logg_avsluttet =3D false)" > " =A0 =A0 =A0 =A0Filter: ((NOT logg_avsluttet) AND ((alarm_status)::text = =3D > 'X'::text))" > "Total runtime: 0.123 ms" > > This is a dramatical difference, but I cannot understand why. In my head > "<>TRUE" should behave exactly the same as "=3DFALSE". This looks like a > bug to me, or am I overlooking something? > > This was verified on PostgreSQL 8.3.7, both on Windows Xp and Ubuntu 8.10. > > Some relevant details from the table definition: > CREATE TABLE alarmlogg > ( > =A0 id serial NOT NULL, > =A0 alarm_status character varying(1) DEFAULT ''::character varying, > =A0 logg_avsluttet boolean DEFAULT false, > =A0 ... > =A0 CONSTRAINT alarmlogg_pkey PRIMARY KEY (id) > ) > > CREATE INDEX i_alarmlogg_alarm_status > =A0ON alarmlogg > =A0USING btree > =A0(alarm_status); > > CREATE INDEX i_alarmlogg_logg_avsluttet > =A0ON alarmlogg > =A0USING btree > =A0(logg_avsluttet); > > Regards, > Jan-Ivar Mellingen > Securinet AS > > > -- > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs >
On Friday 17 July 2009 12:45:47 Mikael Krantz wrote: > It might be that your column may be NULL as well as TRUE or FALSE. I > am no expert in this matter though. Nulls also need to be considered when attempting to substitute purportedly equivalent clauses. But in this case it wouldn't actually matter, because WHERE foo <> TRUE and WHERE foo = false would both omit the row if foo is null. Both expressions only return true if foo has the value "false". But again, this is data type specific knowledge.
Jan-Ivar Mellingen <jan-ivar.mellingen@alreg.no> writes: > One of our customers discovered that by replacing <>TRUE with =FALSE in > a query of a table containing 750.000 records reduced the query time > from about 12 seconds to about 60 milliseconds! This is not a bug. The set of operators that are indexable is well documented, and <> is not one of them. regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: > ... But again, this is data type specific knowledge. Actually, now that I think about it, the planner already has datatype-specific knowledge about boolean equality (see simplify_boolean_equality). It would take just a few more lines of code there to recognize "x <> true" and "x <> false" as additional variant spellings of the generic "x" or "NOT x" constructs. Not sure if it's worth the trouble though; how many people really write such things? If you really wanted to take it to extremes, you could also reduce cases like "x > false", but that's starting to get a bit silly. regards, tom lane
I don't think it even has to be so specific. We should just always rewrite bool <> bool into bool = NOT bool. Hmm. That only has a 50/50 chance of creating an indexable clause. Perhaps we could even rewrite it as "a = NOT b AND NOT a = b". -- Greg On 2009-07-17, at 3:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> ... But again, this is data type specific knowledge. > > Actually, now that I think about it, the planner already has > datatype-specific knowledge about boolean equality (see > simplify_boolean_equality). It would take just a few more lines of > code > there to recognize "x <> true" and "x <> false" as additional variant > spellings of the generic "x" or "NOT x" constructs. Not sure if it's > worth the trouble though; how many people really write such things? > > If you really wanted to take it to extremes, you could also reduce > cases like "x > false", but that's starting to get a bit silly. > > regards, tom lane > > -- > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs
Jan-Ivar Mellingen skrev: > One of our customers discovered that by replacing <>TRUE with =FALSE in > a query of a table containing 750.000 records reduced the query time > from about 12 seconds to about 60 milliseconds! > > The problematic query looks like this: > SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE AND > Alarm_status='X' ORDER BY ID DESC > > If it is changed to this it works as expected: > SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE AND > Alarm_status='X' ORDER BY ID DESC > > After investigation (on a smaller dataset on my own database) I found > that the query was resulting in a sequential scan: > > "explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE > AND Alarm_status='X' ORDER BY ID DESC" > "Sort (cost=49936.96..49936.96 rows=1 width=405) (actual > time=837.793..837.793 rows=0 loops=1)" > " Sort Key: id" > " Sort Method: quicksort Memory: 17kB" > " -> Seq Scan on alarmlogg (cost=0.00..49936.95 rows=1 width=405) > (actual time=837.782..837.782 rows=0 loops=1)" > " Filter: ((logg_avsluttet <> true) AND ((alarm_status)::text = > 'X'::text))" > "Total runtime: 837.896 ms" > > The modified query gave this result: > "explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE > AND Alarm_status='X' ORDER BY ID DESC" > "Sort (cost=8.36..8.37 rows=1 width=405) (actual time=0.032..0.032 > rows=0 loops=1)" > " Sort Key: id" > " Sort Method: quicksort Memory: 17kB" > " -> Index Scan using i_alarmlogg_logg_avsluttet on alarmlogg > (cost=0.00..8.35 rows=1 width=405) (actual time=0.024..0.024 rows=0 > loops=1)" > " Index Cond: (logg_avsluttet = false)" > " Filter: ((NOT logg_avsluttet) AND ((alarm_status)::text = > 'X'::text))" > "Total runtime: 0.123 ms" > > This is a dramatical difference, but I cannot understand why. In my head > "<>TRUE" should behave exactly the same as "=FALSE". This looks like a > bug to me, or am I overlooking something? > > This was verified on PostgreSQL 8.3.7, both on Windows Xp and Ubuntu 8.10. > > Some relevant details from the table definition: > CREATE TABLE alarmlogg > ( > id serial NOT NULL, > alarm_status character varying(1) DEFAULT ''::character varying, > logg_avsluttet boolean DEFAULT false, > ... > CONSTRAINT alarmlogg_pkey PRIMARY KEY (id) > ) > > CREATE INDEX i_alarmlogg_alarm_status > ON alarmlogg > USING btree > (alarm_status); > > CREATE INDEX i_alarmlogg_logg_avsluttet > ON alarmlogg > USING btree > (logg_avsluttet); > > Regards, > Jan-Ivar Mellingen > Securinet AS > > > Thanks to all who answered my question and helped me discover that I forgot to consider the NULL values. After a cup of coffee and a little bit of thinking it became clear that <>TRUE is not the same as FALSE, and the NULLS are not in the index. PostgreSQL is a great database, but it does not hurt to think a little when using it... Thank You all! /Jan-Ivar
On Fri, Jul 17, 2009 at 10:21 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> ... But again, this is data type specific knowledge. > > Actually, now that I think about it, the planner already has > datatype-specific knowledge about boolean equality (see > simplify_boolean_equality). =A0It would take just a few more lines of code > there to recognize "x <> true" and "x <> false" as additional variant > spellings of the generic "x" or "NOT x" constructs. =A0Not sure if it's > worth the trouble though; how many people really write such things? I don't know, but there's probably somebody. I probably did it myself a few times, when I was just starting out. If it's easy, it seems worth doing. The problem with these things is that no matter how lame it seems to do whatever-it-is, the pain when someone does is really large... so adding a little bit of code to avoid that seems worthwhile, at least to me. > If you really wanted to take it to extremes, you could also reduce > cases like "x > false", but that's starting to get a bit silly. Probably that one is beyond even my tolerance. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jul 17, 2009 at 10:21 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Actually, now that I think about it, the planner already has >> datatype-specific knowledge about boolean equality (see >> simplify_boolean_equality). It would take just a few more lines of code >> there to recognize "x <> true" and "x <> false" as additional variant >> spellings of the generic "x" or "NOT x" constructs. Not sure if it's >> worth the trouble though; how many people really write such things? > I don't know, but there's probably somebody. I probably did it myself > a few times, when I was just starting out. If it's easy, it seems > worth doing. http://archives.postgresql.org/pgsql-committers/2009-07/msg00164.php regards, tom lane
On Mon, Aug 10, 2009 at 11:10 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Jul 17, 2009 at 10:21 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> Actually, now that I think about it, the planner already has >>> datatype-specific knowledge about boolean equality (see >>> simplify_boolean_equality). =A0It would take just a few more lines of c= ode >>> there to recognize "x <> true" and "x <> false" as additional variant >>> spellings of the generic "x" or "NOT x" constructs. =A0Not sure if it's >>> worth the trouble though; how many people really write such things? > >> I don't know, but there's probably somebody. =A0I probably did it myself >> a few times, when I was just starting out. =A0If it's easy, it seems >> worth doing. > > http://archives.postgresql.org/pgsql-committers/2009-07/msg00164.php > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0regards, tom lane Oh, cool. Sorry, I missed the fact that that email was almost a month old. ...Robert