Thread: Forcing the right queryplan
Hi, I've the problem my database is not using the 'right' queryplan in all cases. Is there a way I can force that and/or how should I tuned the table statistics? I'm doing a rsyslog database in PostgreSQL with millions of records (firewall logging). The db scheme is the so called 'MonitorWare' scheme, to wich I added two extra indexes. syslog=# select version(); version -------------------------------------------------------------------------------- ---------------------------------- PostgreSQL 8.4.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20 080704 (Red Hat 4.1.2-48), 64-bit syslog=# \d systemevents Table "public.systemevents" Column | Type | Modi fiers --------------------+-----------------------------+----------------------------- ------------------------------ id | integer | not null default nextval('sy stemevents_id_seq'::regclass) customerid | bigint | receivedat | timestamp without time zone | devicereportedtime | timestamp without time zone | facility | smallint | priority | smallint | fromhost | character varying(60) | message | text | ntseverity | integer | importance | integer | eventsource | character varying(60) | eventuser | character varying(60) | eventcategory | integer | eventid | integer | eventbinarydata | text | maxavailable | integer | currusage | integer | minusage | integer | maxusage | integer | infounitid | integer | syslogtag | character varying(60) | eventlogtype | character varying(60) | genericfilename | character varying(60) | systemid | integer | Indexes: "systemevents_pkey" PRIMARY KEY, btree (id) "fromhost_idx" btree (fromhost) "msgs_idx" gin (to_tsvector('english'::regconfig, message)) The GIN index is to do text searching (via LogAnalyzer). Now there are two types of query plans: syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM systemeventsWHERE ( ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) ) ORDER BY id DESC LIMIT 100; QUERY PLAN -------------------------------------------------------------------------------- --------------------------------- Limit (cost=0.00..10177.22 rows=100 width=159) -> Index Scan Backward using systemevents_pkey on systemevents (cost=0.00.. 1052934.86 rows=10346 width=159) Filter: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131. 211.112.9'::text)) (3 rows) This one is useless (takes very long). However this one: syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM systemeventsWHERE ( ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) ) ORDER BY id DESC LIMIT 500; QUERY PLAN -------------------------------------------------------------------------------- ----------------------------------- Limit (cost=40928.89..40930.14 rows=500 width=159) -> Sort (cost=40928.89..40954.76 rows=10346 width=159) Sort Key: id -> Bitmap Heap Scan on systemevents (cost=2898.06..40413.36 rows=1034 6 width=159) Recheck Cond: (to_tsvector('english'::regconfig, message) @@ to_t squery('131.211.112.9'::text)) -> Bitmap Index Scan on msgs_idx (cost=0.00..2895.47 rows=10346 width=0) Index Cond: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131.211.112.9'::text)) (7 rows) works acceptable. Stats: syslog=# SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 'systemevents%'; relname | relkind | reltuples | relpages -------------------------------+---------+-------------+---------- systemevents_pkey | i | 2.06915e+06 | 71985 systemeventsproperties | r | 0 | 0 systemeventsproperties_pkey | i | 0 | 1 systemevents_id_seq | S | 1 | 1 systemeventsproperties_id_seq | S | 1 | 1 systemevents | r | 2.06915e+06 | 694826 (6 rows) syslog=# SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 'msg%'; relname | relkind | reltuples | relpages ----------+---------+-------------+---------- msgs_idx | i | 2.06915e+06 | 128069 (1 row) How to use the right plan regardless of the 'LIMIT-size'? Cheers, -- Henk van Lingen, ICT-SC Netwerk & Telefonie, (o- -+ Universiteit Utrecht, Jenalaan 18a, room 0.12 /\ | phone: +31-30-2538453 v_/_ | http://henk.vanlingen.net/ http://www.tuxtown.net/netiquette/
No ideas on this one? Regards, On Tue, Aug 31, 2010 at 04:50:09PM +0200, Henk van Lingen wrote: > > Hi, > > I've the problem my database is not using the 'right' queryplan in all > cases. Is there a way I can force that and/or how should I tuned the > table statistics? > > I'm doing a rsyslog database in PostgreSQL with millions of records > (firewall logging). The db scheme is the so called 'MonitorWare' scheme, > to wich I added two extra indexes. > > syslog=# select version(); > version > > -------------------------------------------------------------------------------- > ---------------------------------- > PostgreSQL 8.4.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20 > 080704 (Red Hat 4.1.2-48), 64-bit > > > syslog=# \d systemevents > Table "public.systemevents" > Column | Type | Modi > fiers > --------------------+-----------------------------+----------------------------- > ------------------------------ > id | integer | not null default nextval('sy > stemevents_id_seq'::regclass) > customerid | bigint | > receivedat | timestamp without time zone | > devicereportedtime | timestamp without time zone | > facility | smallint | > priority | smallint | > fromhost | character varying(60) | > message | text | > ntseverity | integer | > importance | integer | > eventsource | character varying(60) | > eventuser | character varying(60) | > eventcategory | integer | > eventid | integer | > eventbinarydata | text | > maxavailable | integer | > currusage | integer | > minusage | integer | > maxusage | integer | > infounitid | integer | > syslogtag | character varying(60) | > eventlogtype | character varying(60) | > genericfilename | character varying(60) | > systemid | integer | > Indexes: > "systemevents_pkey" PRIMARY KEY, btree (id) > "fromhost_idx" btree (fromhost) > "msgs_idx" gin (to_tsvector('english'::regconfig, message)) > > The GIN index is to do text searching (via LogAnalyzer). > > Now there are two types of query plans: > > syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM systemeventsWHERE ( ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) ) ORDER BY id DESC LIMIT 100; QUERY PLAN > > -------------------------------------------------------------------------------- > --------------------------------- > Limit (cost=0.00..10177.22 rows=100 width=159) > -> Index Scan Backward using systemevents_pkey on systemevents (cost=0.00.. > 1052934.86 rows=10346 width=159) > Filter: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131. > 211.112.9'::text)) > (3 rows) > > This one is useless (takes very long). However this one: > > syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM systemeventsWHERE ( ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) ) ORDER BY id DESC LIMIT 500; > QUERY PLAN > > -------------------------------------------------------------------------------- > ----------------------------------- > Limit (cost=40928.89..40930.14 rows=500 width=159) > -> Sort (cost=40928.89..40954.76 rows=10346 width=159) > Sort Key: id > -> Bitmap Heap Scan on systemevents (cost=2898.06..40413.36 rows=1034 > 6 width=159) > Recheck Cond: (to_tsvector('english'::regconfig, message) @@ to_t > squery('131.211.112.9'::text)) > -> Bitmap Index Scan on msgs_idx (cost=0.00..2895.47 rows=10346 > width=0) > Index Cond: (to_tsvector('english'::regconfig, message) @@ > to_tsquery('131.211.112.9'::text)) > (7 rows) > > works acceptable. > > Stats: > > syslog=# SELECT relname, relkind, reltuples, relpages > FROM pg_class WHERE relname LIKE 'systemevents%'; relname | relkind | reltuples | relpages > -------------------------------+---------+-------------+---------- > systemevents_pkey | i | 2.06915e+06 | 71985 > systemeventsproperties | r | 0 | 0 > systemeventsproperties_pkey | i | 0 | 1 > systemevents_id_seq | S | 1 | 1 > systemeventsproperties_id_seq | S | 1 | 1 > systemevents | r | 2.06915e+06 | 694826 > (6 rows) > > syslog=# SELECT relname, relkind, reltuples, relpages > FROM pg_class > WHERE relname LIKE 'msg%'; > relname | relkind | reltuples | relpages > ----------+---------+-------------+---------- > msgs_idx | i | 2.06915e+06 | 128069 > (1 row) > > How to use the right plan regardless of the 'LIMIT-size'? > > Cheers, > -- > Henk van Lingen, ICT-SC Netwerk & Telefonie, (o- -+ > Universiteit Utrecht, Jenalaan 18a, room 0.12 /\ | > phone: +31-30-2538453 v_/_ | > http://henk.vanlingen.net/ http://www.tuxtown.net/netiquette/ > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- Henk van Lingen, ICT-SC Netwerk & Telefonie, (o- -+ Universiteit Utrecht, Jenalaan 18a, room 0.12 /\ | phone: +31-30-2538453 v_/_ | http://henk.vanlingen.net/ http://www.tuxtown.net/netiquette/
Henk van Lingen wrote: > Now there are two types of query plans: > > syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM systemeventsWHERE ( ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) ) ORDER BY id DESC LIMIT 100; QUERY PLAN > > Limit (cost=0.00..10177.22 rows=100 width=159) > -> Index Scan Backward using systemevents_pkey on systemevents (cost=0.00.. > 1052934.86 rows=10346 width=159) > Filter: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131. > 211.112.9'::text)) > (3 rows) > > This one is useless (takes very long). However this one: > Hello Henk, I saw your other mail today, I'm replying on this one for better formatting. With a limit of 100 the planner guesses it will find 100 matching rows within some cost. At 500 rows the cost is higher than that of the second plan: > syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM systemeventsWHERE ( ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) ) ORDER BY id DESC LIMIT 500; > QUERY PLAN > > -------------------------------------------------------------------------------- > ----------------------------------- > Limit (cost=40928.89..40930.14 rows=500 width=159) > -> Sort (cost=40928.89..40954.76 rows=10346 width=159) > Sort Key: id > -> Bitmap Heap Scan on systemevents (cost=2898.06..40413.36 rows=1034 > 6 width=159) > Recheck Cond: (to_tsvector('english'::regconfig, message) @@ to_t > squery('131.211.112.9'::text)) > -> Bitmap Index Scan on msgs_idx (cost=0.00..2895.47 rows=10346 > width=0) > Index Cond: (to_tsvector('english'::regconfig, message) @@ > to_tsquery('131.211.112.9'::text)) > (7 rows) > > works acceptable. > > How to use the right plan regardless of the 'LIMIT-size'? > The planner obviously thinks it will have read 100 rows from systemevents backwards earlier than it actually does, with the where clause that contains the scanning for string 131.211.112.9. Increasing the stats target in this case will probably not help, since the statistics will not contain selectivity for all possible ts queries. If the index is useless anyway, you might consider dropping it. Otherwise, increasing random_page_cost might help in choosing the otherplan, but on the other hand that plan has index scanning too, so I'm not to sure there. If that doesn't help, it would be interesting to see some output of vmstat 1 (or better: iostat -xk 1) to see what is the bottleneck during execution of the first plan. If it is IO bound, you might want to increase RAM or add spindles for increased random io performance. If it is CPU bound, it is probably because of executing the to_tsvector function. In that case it might be interesting to see if changing ts_vectors cost (see ALTER FUNCTION ... COST .../ http://developer.postgresql.org/pgdocs/postgres/sql-alterfunction.html) again helps the planner to favor the second plan over the first. regards, Yeb Havinga
On Fri, Sep 03, 2010 at 09:20:39AM +0200, Yeb Havinga wrote: > > If the index is useless anyway, you might consider dropping it. > Otherwise, increasing random_page_cost might help in choosing the > otherplan, but on the other hand that plan has index scanning too, > so I'm not to sure there. > > If that doesn't help, it would be interesting to see some output > of vmstat 1 (or better: iostat -xk 1) to see what is the > bottleneck during execution of the first plan. If it is IO bound, > you might want to increase RAM or add spindles for increased > random io performance. If it is CPU bound, it is probably because > of executing the to_tsvector function. In that case it might be > interesting to see if changing ts_vectors cost (see ALTER FUNCTION Hi Yeb, Thanks for your answer. Dropping the (pkey) index is not an option. iostat suggest the thing is CPU bound (%iowait remaining 11% but cpu rizing from 1 to 13 %) However, I'm reluctant to changing the to_tsvector costs. (besides not knowing how the find out the current value). The pkey is also used for queries like this one, which also results in the wrong queryplan: syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM systemeventsWHERE ( ( to_tsvector('english', message) @@ to_tsquery ( 'error')) ) AND id <= 26689837 ; QUERY PLAN -------------------------------------------------------------------------------- --------------- Index Scan using systemevents_pkey on systemevents (cost=0.00..27302.74 rows=2 174 width=158) Index Cond: (id <= 26689837) Filter: (to_tsvector('english'::regconfig, message) @@ to_tsquery('error'::te xt)) (3 rows) So I'm afraid that not being able to force a plan is a showstopper for using postgresql with full text search for this project. Regards, -- Henk van Lingen, ICT-SC Netwerk & Telefonie, (o- -+ Universiteit Utrecht, Jenalaan 18a, room 0.12 /\ | phone: +31-30-2538453 v_/_ | http://henk.vanlingen.net/ http://www.tuxtown.net/netiquette/
Sorry for not replying earlier, I've been quite busy. On 31 Aug 2010, at 16:50, Henk van Lingen wrote: > syslog=# \d systemevents > Table "public.systemevents" > Column | Type | Modi > fiers > --------------------+-----------------------------+----------------------------- > ------------------------------ > id | integer | not null default nextval('sy (...) > message | text | (...) > Indexes: > "systemevents_pkey" PRIMARY KEY, btree (id) > "fromhost_idx" btree (fromhost) > "msgs_idx" gin (to_tsvector('english'::regconfig, message)) > > The GIN index is to do text searching (via LogAnalyzer). > > Now there are two types of query plans: Do you have output of explain analyse for these queries as well? It's hard to see what is actually going on with just theexplain - we can't see which part of the query is more expensive than the planner expected, for starters. > syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM systemeventsWHERE ( ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) ) ORDER BY id DESC LIMIT 100; QUERY PLAN > > -------------------------------------------------------------------------------- > --------------------------------- > Limit (cost=0.00..10177.22 rows=100 width=159) > -> Index Scan Backward using systemevents_pkey on systemevents (cost=0.00.. > 1052934.86 rows=10346 width=159) > Filter: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131. > 211.112.9'::text)) > (3 rows) > > This one is useless (takes very long). However this one: > > syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM systemeventsWHERE ( ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) ) ORDER BY id DESC LIMIT 500; > QUERY PLAN > > -------------------------------------------------------------------------------- > ----------------------------------- > Limit (cost=40928.89..40930.14 rows=500 width=159) > -> Sort (cost=40928.89..40954.76 rows=10346 width=159) > Sort Key: id > -> Bitmap Heap Scan on systemevents (cost=2898.06..40413.36 rows=1034 > 6 width=159) > Recheck Cond: (to_tsvector('english'::regconfig, message) @@ to_t > squery('131.211.112.9'::text)) > -> Bitmap Index Scan on msgs_idx (cost=0.00..2895.47 rows=10346 > width=0) > Index Cond: (to_tsvector('english'::regconfig, message) @@ > to_tsquery('131.211.112.9'::text)) > (7 rows) > > works acceptable. Odd that more records and a more complicated plan gives faster results... That's why I think we'd really want to see explainanalyse output. I'm guessing that there are a lot of records matching your search string and that you've found the cut-off point where theplanner thinks you're throwing away enough rows that it's not very useful to first select all the matching records beforesorting the results. I think it decided to just start searching backwards along the id and returning the rows that match that IP (and are visibleto your transaction) would be faster than trying to work with all the rows that match that IP. This probably means it misjudged the costs of sorting your index backwards, which indicates that your planning statisticsare off, or that your cost parameters aren't appropriate for your system. One thing I do notice is that the first plan uses the index on id instead of the ts_vector one. For queries like those youcould try to use a combined index like this: CREATE INDEX msgs_idx (to_tsvector('english'::regconfig, message), id) ON systemevents USING (gin); Whether to use gist or gin there I really don't know. I'd probably go for gist, I seem to recall that gin is fairly heavyto use. > How to use the right plan regardless of the 'LIMIT-size'? You could try turning off planner-options, but that's probably a fairly bad idea. Other options are to use a prepared statement or a stored procedure with the IP as a parameter, which force the planner touse a more general plan because it doesn't know which values you're going to search for before it plans the query. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4c8675d010409863511634!
On Tue, Sep 7, 2010 at 8:48 AM, Henk van Lingen <H.G.K.vanLingen@uu.nl> wrote: > > Thanks for your answer. Dropping the (pkey) index is not an option. > iostat suggest the thing is CPU bound (%iowait remaining 11% but cpu > rizing from 1 to 13 %) How man cores that server have? If you've got 8 cores and one IO bound on this query it'll hit 12%.. IOBound. rely on iostat -xd instead.
On Tue, Sep 07, 2010 at 07:26:25PM +0200, Alban Hertroys wrote: > > Do you have output of explain analyse for these queries as well? It's > hard to see what is actually going on with just the explain - we can't > see which part of the query is more expensive than the planner > expected, for starters. Hi Alban, Here are the explain analyse versions: syslog=# explain analyze SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROMsystemevents WHERE ( (to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) ) ORDER BY id DESC LIMIT 100; Q UERY PLAN -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------- Limit (cost=0.00..1205.09 rows=100 width=158) (actual time=16740.139..2360334. 006 rows=100 loops=1) -> Index Scan Backward using systemevents_pkey on systemevents (cost=0.00.. 2888974.17 rows=239730 width=158) (actual time=16740.137..2360333.916 rows=100 l oops=1) Filter: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131. 211.112.9'::text)) Total runtime: 2360334.078 ms (4 rows) syslog=# explain analyze SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROMsystemevents WHERE ( ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) ) ORDER BY id DESC LIMIT 500000; QUERY PLAN -------------------------------------------------------------------------------- --------------------------------------------------------------- Limit (cost=727944.30..728543.82 rows=239805 width=158) (actual time=1805.251. .1805.388 rows=464 loops=1) -> Sort (cost=727944.30..728543.82 rows=239805 width=158) (actual time=1805 .249..1805.300 rows=464 loops=1) Sort Key: id Sort Method: quicksort Memory: 148kB -> Bitmap Heap Scan on systemevents (cost=61221.23..668806.93 rows=23 9805 width=158) (actual time=9.131..1786.406 rows=464 loops=1) Recheck Cond: (to_tsvector('english'::regconfig, message) @@ to_t squery('131.211.112.9'::text)) -> Bitmap Index Scan on msgs_idx (cost=0.00..61161.28 rows=2398 05 width=0) (actual time=0.790..0.790 rows=464 loops=1) Index Cond: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131.211.112.9'::text)) Total runtime: 1805.483 ms (9 rows) > Odd that more records and a more complicated plan gives faster results... > That's why I think we'd really want to see explain analyse output. > I'm guessing that there are a lot of records matching your search string As you can see, there are only 464 matches. > One thing I do notice is that the first plan uses the index on id > instead of the ts_vector one. For queries like those you could try to > use a combined index like this: > > CREATE INDEX msgs_idx (to_tsvector('english'::regconfig, message), id) > ON systemevents USING (gin); I will look into this. Thanks, -- Henk van Lingen, ICT-SC Netwerk & Telefonie, (o- -+ Universiteit Utrecht, Jenalaan 18a, room 0.12 /\ | phone: +31-30-2538453 v_/_ | http://henk.vanlingen.net/ http://www.tuxtown.net/netiquette/
Henk van Lingen <H.G.K.vanLingen@uu.nl> writes: > -> Bitmap Heap Scan on systemevents (cost=61221.23..668806.93 rows=239805 width=158) (actual time=9.131..1786.406rows=464 loops=1) > Recheck Cond: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131.211.112.9'::text)) Well, there's your problem: the planner is off by a factor of about 500 on its estimate of the number of rows matching this query, and that's what's causing it to pick the wrong plan. What you need to look into is getting that estimate to be more in sync with reality. Probably increasing the stats target for the message column would help. regards, tom lane
On Thu, Sep 09, 2010 at 10:50:52AM -0400, Tom Lane wrote: > Henk van Lingen <H.G.K.vanLingen@uu.nl> writes: > > -> Bitmap Heap Scan on systemevents (cost=61221.23..668806.93 rows=239805 width=158) (actual time=9.131..1786.406rows=464 loops=1) > > Recheck Cond: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131.211.112.9'::text)) > > Well, there's your problem: the planner is off by a factor of about 500 > on its estimate of the number of rows matching this query, and that's > what's causing it to pick the wrong plan. What you need to look into > is getting that estimate to be more in sync with reality. Probably > increasing the stats target for the message column would help. But how can I get sane estimates for syslog data? Some searchstrings will result in only a few hits, others in thousands of records or more. Regards, -- Henk van Lingen, ICT-SC Netwerk & Telefonie, (o- -+ Universiteit Utrecht, Jenalaan 18a, room 0.12 /\ | phone: +31-30-2538453 v_/_ | http://henk.vanlingen.net/ http://www.tuxtown.net/netiquette/
Henk van Lingen <H.G.K.vanLingen@uu.nl> writes: > On Thu, Sep 09, 2010 at 10:50:52AM -0400, Tom Lane wrote: >>>> Well, there's your problem: the planner is off by a factor of about 500 >>>> on its estimate of the number of rows matching this query, and that's >>>> what's causing it to pick the wrong plan. What you need to look into >>>> is getting that estimate to be more in sync with reality. Probably >>>> increasing the stats target for the message column would help. > But how can I get sane estimates for syslog data? Some searchstrings will > result in only a few hits, others in thousands of records or more. That's what ANALYZE is for ... regards, tom lane
On Thu, Sep 09, 2010 at 11:16:36AM -0400, Tom Lane wrote: > Henk van Lingen <H.G.K.vanLingen@uu.nl> writes: > > On Thu, Sep 09, 2010 at 10:50:52AM -0400, Tom Lane wrote: > >>>> Well, there's your problem: the planner is off by a factor of about 500 > >>>> on its estimate of the number of rows matching this query, and that's > >>>> what's causing it to pick the wrong plan. What you need to look into > >>>> is getting that estimate to be more in sync with reality. Probably > >>>> increasing the stats target for the message column would help. > > > But how can I get sane estimates for syslog data? Some searchstrings will > > result in only a few hits, others in thousands of records or more. > > That's what ANALYZE is for ... Yes, off course. But I don't see how the most_common_vals & freqs and the histogram_bounds for a text field with syslog data make any sense when doing doing a search for a substring. Increasing the number of entries in those stats lists doesn't make any sense also, i presume. Those stats should be based on analysis of the to_tsvector index, to have any meaning, i think. Today I will look into the multicolumn index suggestion. Regards, -- Henk van Lingen, ICT-SC Netwerk & Telefonie, (o- -+ Universiteit Utrecht, Jenalaan 18a, room 0.12 /\ | phone: +31-30-2538453 v_/_ | http://henk.vanlingen.net/ http://www.tuxtown.net/netiquette/
Maybe you can extract stuff like IP addresses and words like 'error' and put it in a separate column in the table. Full text search is not a solution for data that is in a wrong format. On Fri, Sep 10, 2010 at 10:27 AM, Henk van Lingen <H.G.K.vanLingen@uu.nl> wrote: > On Thu, Sep 09, 2010 at 11:16:36AM -0400, Tom Lane wrote: > > Henk van Lingen <H.G.K.vanLingen@uu.nl> writes: > > > On Thu, Sep 09, 2010 at 10:50:52AM -0400, Tom Lane wrote: > > >>>> Well, there's your problem: the planner is off by a factor of about 500 > > >>>> on its estimate of the number of rows matching this query, and that's > > >>>> what's causing it to pick the wrong plan. What you need to look into > > >>>> is getting that estimate to be more in sync with reality. Probably > > >>>> increasing the stats target for the message column would help. > > > > > But how can I get sane estimates for syslog data? Some searchstrings will > > > result in only a few hits, others in thousands of records or more. > > > > That's what ANALYZE is for ... > > Yes, off course. But I don't see how the most_common_vals & freqs and the > histogram_bounds for a text field with syslog data make any sense when > doing doing a search for a substring. Increasing the number of entries in > those stats lists doesn't make any sense also, i presume. > > Those stats should be based on analysis of the to_tsvector index, to have > any meaning, i think. > > Today I will look into the multicolumn index suggestion. > > Regards, > > -- > Henk van Lingen, ICT-SC Netwerk & Telefonie, (o- -+ > Universiteit Utrecht, Jenalaan 18a, room 0.12 /\ | > phone: +31-30-2538453 v_/_ | > http://henk.vanlingen.net/ http://www.tuxtown.net/netiquette/ > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >