Re: Forcing the right queryplan - Mailing list pgsql-general

From Alban Hertroys
Subject Re: Forcing the right queryplan
Date
Msg-id 8DCA2553-B84E-424B-8806-DA7A2FE6D8C0@solfertje.student.utwente.nl
Whole thread Raw
In response to Forcing the right queryplan  (Henk van Lingen <H.G.K.vanLingen@uu.nl>)
Responses Re: Forcing the right queryplan  (Henk van Lingen <H.G.K.vanLingen@uu.nl>)
List pgsql-general
Sorry for not replying earlier, I've been quite busy.

On 31 Aug 2010, at 16:50, Henk van Lingen wrote:

> syslog=# \d systemevents
>                                         Table "public.systemevents"
>       Column       |            Type             |                         Modi
> fiers
> --------------------+-----------------------------+-----------------------------
> ------------------------------
> id                 | integer                     | not null default nextval('sy

(...)

> message            | text                        |

(...)

> Indexes:
>    "systemevents_pkey" PRIMARY KEY, btree (id)
>    "fromhost_idx" btree (fromhost)
>    "msgs_idx" gin (to_tsvector('english'::regconfig, message))
>
> The GIN index is to do text searching (via LogAnalyzer).
>
> Now there are two types of query plans:

Do you have output of explain analyse for these queries as well? It's hard to see what is actually going on with just
theexplain - we can't see which part of the query is more expensive than the planner expected, for starters. 

> syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM
systemeventsWHERE (  ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) )  ORDER BY id DESC LIMIT 100;
                                                 QUERY PLAN                    
>
> --------------------------------------------------------------------------------
> ---------------------------------
> Limit  (cost=0.00..10177.22 rows=100 width=159)
>   ->  Index Scan Backward using systemevents_pkey on systemevents  (cost=0.00..
> 1052934.86 rows=10346 width=159)
>         Filter: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131.
> 211.112.9'::text))
> (3 rows)
>
> This one is useless (takes very long). However this one:
>
> syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM
systemeventsWHERE (  ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) )  ORDER BY id DESC LIMIT 500; 
>                                                    QUERY PLAN
>
> --------------------------------------------------------------------------------
> -----------------------------------
> Limit  (cost=40928.89..40930.14 rows=500 width=159)
>   ->  Sort  (cost=40928.89..40954.76 rows=10346 width=159)
>         Sort Key: id
>         ->  Bitmap Heap Scan on systemevents  (cost=2898.06..40413.36 rows=1034
> 6 width=159)
>               Recheck Cond: (to_tsvector('english'::regconfig, message) @@ to_t
> squery('131.211.112.9'::text))
>               ->  Bitmap Index Scan on msgs_idx  (cost=0.00..2895.47 rows=10346
> width=0)
>                     Index Cond: (to_tsvector('english'::regconfig, message) @@
> to_tsquery('131.211.112.9'::text))
> (7 rows)
>
> works acceptable.

Odd that more records and a more complicated plan gives faster results... That's why I think we'd really want to see
explainanalyse output. 
I'm guessing that there are a lot of records matching your search string and that you've found the cut-off point where
theplanner thinks you're throwing away enough rows that it's not very useful to first select all the matching records
beforesorting the results. 

I think it decided to just start searching backwards along the id and returning the rows that match that IP (and are
visibleto your transaction) would be faster than trying to work with all the rows that match that IP. 

This probably means it misjudged the costs of sorting your index backwards, which indicates that your planning
statisticsare off, or that your cost parameters aren't appropriate for your system. 



One thing I do notice is that the first plan uses the index on id instead of the ts_vector one. For queries like those
youcould try to use a combined index like this: 

CREATE INDEX msgs_idx (to_tsvector('english'::regconfig, message), id) ON systemevents USING (gin);

Whether to use gist or gin there I really don't know. I'd probably go for gist, I seem to recall that gin is fairly
heavyto use. 

> How to use the right plan regardless of the 'LIMIT-size'?


You could try turning off planner-options, but that's probably a fairly bad idea.

Other options are to use a prepared statement or a stored procedure with the IP as a parameter, which force the planner
touse a more general plan because it doesn't know which values you're going to search for before it plans the query. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4c8675d010409863511634!



pgsql-general by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Configure default for sorting of null-values?
Next
From: Scott Marlowe
Date:
Subject: Re: Forcing the right queryplan