Home > mailing lists

Re: NOT LIKE much faster than LIKE? - Mailing list pgsql-performance

From	Simon Riggs
Subject	Re: NOT LIKE much faster than LIKE?
Date	January 10, 2006 19:36:47
Msg-id	1136936205.21025.507.camel@localhost.localdomain Whole thread Raw
In response to	Re: NOT LIKE much faster than LIKE? (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: NOT LIKE much faster than LIKE?
List	pgsql-performance

Tree view

On Tue, 2006-01-10 at 17:21 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > I think its OK to use the MCV, but I have a problem with the current
> > heuristics: they only work for randomly generated strings, since the
> > selectivity goes down geometrically with length.
>
> We could certainly use a less aggressive curve for that.  You got a
> specific proposal?

I read some research not too long ago that showed a frequency curve of
words by syllable length. I'll dig that out tomorrow.

> > I would favour the idea of dynamic sampling using a block sampling
> > approach; that was a natural extension of improving ANALYZE also.
>
> One thing at a time please.  Obtaining better statistics is one issue,
> but the one at hand here is what to do given particular statistics.

I meant use the same sampling approach as I was proposing for ANALYZE,
but do this at plan time for the query. That way we can apply the
function directly to the sampled rows and estimate selectivity.

I specifically didn't mention that in the Ndistinct discussion because I
didn't want to confuse the subject further, but the underlying block
sampling method would be identical, so the code is already almost
there...we just need to eval the RestrictInfo against the sampled
tuples.

Best Regards, Simon Riggs

pgsql-performance by date:

From: Tom Lane
Date: 10 January 2006, 18:21:37
Subject: Re: NOT LIKE much faster than LIKE?

From: Mark Lewis
Date: 10 January 2006, 20:29:23
Subject: Re: help tuning queries on large database

Re: NOT LIKE much faster than LIKE? - Mailing list pgsql-performance

Previous

Next