Re: NOT LIKE much faster than LIKE?

From: Andrea Arcangeli
Subject: Re: NOT LIKE much faster than LIKE?
Date: ,
Msg-id: 20060110034721.GC20168@opteron.random
(view: Whole thread, Raw)
In response to: Re: NOT LIKE much faster than LIKE?  (Tom Lane)
Responses: Re: NOT LIKE much faster than LIKE?  (Greg Stark)
List: pgsql-performance

Tree view

NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
 Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
  Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
   Re: NOT LIKE much faster than LIKE?  (Christopher Kings-Lynne, )
    Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
   Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
    Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
     Re: NOT LIKE much faster than LIKE?  (Greg Stark, )
    Re: NOT LIKE much faster than LIKE?  (Matteo Beccati, )
     Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
      Re: NOT LIKE much faster than LIKE?  (Simon Riggs, )
       Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
        Re: NOT LIKE much faster than LIKE?  (Simon Riggs, )
         Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
          Re: NOT LIKE much faster than LIKE?  (Simon Riggs, )
           Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
        Re: NOT LIKE much faster than LIKE?  (Simon Riggs, )
   Re: NOT LIKE much faster than LIKE?  (Stephan Szabo, )
 Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
  Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
   Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
 Re: NOT LIKE much faster than LIKE?  ("Jim C. Nasby", )
  Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
   Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
   Re: NOT LIKE much faster than LIKE?  ("Jim C. Nasby", )
    Re: NOT LIKE much faster than LIKE?  (Tom Lane, )

On Mon, Jan 09, 2006 at 09:54:44PM -0500, Tom Lane wrote:
> Extrapolating from the observation that the heuristics don't work well
> on your data to the conclusion that they don't work for anybody is not
> good logic.  Replacing that code with a flat 50% is not going to happen
> (or if it does, I'll be sure to send the mob of unhappy users waving
> torches and pitchforks to your door not mine ;-)).

I'm not convinced but of course I cannot exclude that some people may be
depending on this very heuristic. But I consider this being
bug-compatible, I've an hard time to be convinced that such heuristic
isn't going to bite other people like it did with me.

> I did just think of something we could improve though.  The pattern
> selectivity code doesn't make any use of the statistics about "most
> common values".  For a constant pattern, we could actually apply the
> pattern test with each common value and derive answers that are exact
> for the portion of the population represented by the most-common-values
> list.  If the MCV list covers a large fraction of the population then
> this would be a big leg up in accuracy.  Dunno if that applies to your
> particular case or not, but it seems worth doing ...

Fixing this with proper stats would be great indeed. What would be the
most common value for the kernel_version? You can see samples of the
kernel_version here http://klive.cpushare.com/2.6.15/ .  That's the
string that is being searched against both PREEMPT and SMP.

BTW, I also run a LIKE '%% SMP %%' a NOT LIKE '%% SMP %%' but that runs
fine, probably because as you said in the first email PREEMPT is long
but SMP is short.

Thanks!


pgsql-performance by date:

From: Michael Fuhr
Date:
Subject: Re: Index isn't used during a join.
From: Alessandro Baretta
Date:
Subject: Re: 500x speed-down: Wrong statistics!