Re: Mnogosearch (Was: Re: website doc search is ... ) - Mailing list pgsql-general

From Tom Lane
Subject Re: Mnogosearch (Was: Re: website doc search is ... )
Date
Msg-id 29978.1072987956@sss.pgh.pa.us
Whole thread Raw
In response to Re: Mnogosearch (Was: Re: website doc search is ... )  ("Marc G. Fournier" <scrappy@postgresql.org>)
Responses Re: Mnogosearch (Was: Re: website doc search is ... )
List pgsql-general
"Marc G. Fournier" <scrappy@postgresql.org> writes:
> On Thu, 1 Jan 2004, Tom Lane wrote:
>> Is there a reason for the
>> concatenation part of the expression?

> Believe it or not, the concatenation was based on a discussion *way* back
> (2 years, maybe?) when we first started using Mnogosearch, in which you
> suggested going that route ... in fact, at the time (bear in mind, this is
> back in 7.2 days), it actually sped things up ...

Hmm, I vaguely remember that ... I think we were deliberately trying to
fool the planner at that time, because it was making some stupid
assumption about the selectivity of the LIKE clause.  It looks like that
problem is now mostly fixed, since your second example shows estimate of
236133 vs reality of 304811 rows for the URL condition:

>          ->  Index Scan using url_url on url  (cost=0.00..10768.79 rows=236133 width=4) (actual
time=225.243..8353.024rows=304811 loops=1) 
>                Index Cond: ((url >= 'http://archives.postgresql.org/'::text) AND (url <
'http://archives.postgresql.org0'::text))
>                Filter: (url ~~ 'http://archives.postgresql.org/%%'::text)
>  Total runtime: 16796.932 ms
> (12 rows)

> Closer to what you were looking/hoping for?

This probably says that we can stop using the concatenation hack, at
least.  I'd still suggest clustering the two tables as per my later
message.  (Note that clustering would help this mergejoin plan too,
so it could come out to be a win relative to the nestloop indexscan,
but we ought to try both and see.)

> what does that setting affect, *just* the time it takes to
> analyze the table?

Well, it will also bloat pg_statistic and slow down planning a little.
Can you try 100 and see if that gives reasonable estimates?  1000 is a
rather extreme setting I think; I'd go for 100 to start with.

> is this something that can be set database wide,

Yeah, see default_statistics_target in postgresql.conf.

            regards, tom lane

pgsql-general by date:

Previous
From: "Marc G. Fournier"
Date:
Subject: Re: Mnogosearch (Was: Re: website doc search is ... )
Next
From: Tom Lane
Date:
Subject: Re: Mnogosearch (Was: Re: website doc search is ... )