NOT LIKE much faster than LIKE? - Mailing list pgsql-performance

From Andrea Arcangeli
Subject NOT LIKE much faster than LIKE?
Date
Msg-id 20060110014447.GB18474@opteron.random
Whole thread Raw
Responses Re: NOT LIKE much faster than LIKE?
Re: NOT LIKE much faster than LIKE?
List pgsql-performance
Hello,

I've a performance problem with the planner algorithm choosen in a website.
See the difference between this:

    http://klive.cpushare.com/?scheduler=cooperative

and this:

    http://klive.cpushare.com/?scheduler=preemptive

(note, there's much less data to show with preemptive, so it's not because of
the amount of data to output)

The second takes ages to complete and it overloads the server as well at 100%
cpu load for several seconds.

"cooperative" runs "WHERE kernel_version NOT LIKE '%% PREEMPT %%'", while
"preempt" runs "WHERE kernel_version LIKE '%% PREEMPT %%'. The only difference
is a NOT before "LIKE". No other differences at all.

The planner does apparently a big mistake using the nested loop in the "LIKE"
query, it should use the hash join lik in the "NOT LIKE" query instead.

I guess I can force it somehow (I found some docs about it here:

    http://www.postgresql.org/docs/8.1/static/runtime-config-query.html

) but it looks like something could be improved in the default mode too, so I'm
reporting the problem since it looks a performance bug to me. It just makes no
sense to me that the planner takes a difference decision based on a "not". It
can't know if it's more likely or less likely, this is a boolean return, it's
*exactly* the same cost to run it. Making assumption without user-provided
hints looks a mistake. I never said to the db that "not like" is more or less
likely to return data in output than "like".

Tried ANALYZE, including VACUUM FULL ANALYZE and it doesn't make a difference.

Perhaps it's analyze that suggests to use a different algorithm with "not like"
because there's much more data to analyze with "not like" than with "like", but
that algorithm works much better even when there's less data to analyze.

Indexes don't make any visible difference.

postgres is version 8.1.2 self compiled from CVS 8.1 branch of yesterday.

psandrea@opteron:~> psql -V
psql (PostgreSQL) 8.1.2
contains support for command-line editing
andrea@opteron:~>

The problem is reproducible on the shell, I only need to remove "explain". Of
course explain is wrong about the cost, according to explain the first query is
cheaper when infact it's an order of magnitude more costly.

klive=> explain SELECT class, vendor, device, revision, COUNT(*) as nr FROM pci NATURAL INNER JOIN klive WHERE
kernel_versionLIKE '%% PREEMPT %%' GROUP BY class, vendor, device, revision; 
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 HashAggregate  (cost=1687.82..1687.83 rows=1 width=16)
   ->  Nested Loop  (cost=235.86..1687.81 rows=1 width=16)
         ->  Seq Scan on klive  (cost=0.00..1405.30 rows=1 width=8)
               Filter: ((kernel_version)::text ~~ '%% PREEMPT %%'::text)
         ->  Bitmap Heap Scan on pci  (cost=235.86..282.32 rows=15 width=24)
               Recheck Cond: (pci.klive = "outer".klive)
               ->  Bitmap Index Scan on pci_pkey  (cost=0.00..235.86 rows=15 width=0)
                     Index Cond: (pci.klive = "outer".klive)
(8 rows)

klive=> explain SELECT class, vendor, device, revision, COUNT(*) as nr FROM pci NATURAL INNER JOIN klive WHERE
kernel_versionNOT LIKE '%% PREEMPT %%' GROUP BY class, vendor, device, revision; 
                                   QUERY PLAN
--------------------------------------------------------------------------------
 HashAggregate  (cost=3577.40..3612.00 rows=2768 width=16)
   ->  Hash Join  (cost=1569.96..3231.50 rows=27672 width=16)
         Hash Cond: ("outer".klive = "inner".klive)
         ->  Seq Scan on pci  (cost=0.00..480.73 rows=27673 width=24)
         ->  Hash  (cost=1405.30..1405.30 rows=22263 width=8)
               ->  Seq Scan on klive  (cost=0.00..1405.30 rows=22263 width=8)
                     Filter: ((kernel_version)::text !~~ '%% PREEMPT %%'::text)
(7 rows)

klive=>

Hints welcome, thanks!


PS. All the source code of the website where I'm reproducing the problem is
available at the above url under the GPL.

pgsql-performance by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [PERFORMANCE] Beetwen text and varchar field
Next
From: Tom Lane
Date:
Subject: Re: NOT LIKE much faster than LIKE?