Thread: Index problem.... GIST (tsearch2)
Hello, I have a table like this with some indexes as identified: CREATE TABLE sometable ( data TEXT, data_fti TSVECTOR, category1 INTEGER, category2 INTEGER, category3 INTEGER ); CREATE OR REPLACE FUNCTION is_null(anyelement) RETURNS BOOLEAN AS 'SELECT $1 IS NULL;' LANGUAGE 'SQL' IMMUTABLE; CREATE FUNCTION sometable_category1_idx ON sometable (category1); CREATE FUNCTION sometable_category2_idx ON sometable (category2); CREATE FUNCTION sometable_category3_idx ON sometable (category3); CREATE FUNCTION sometable_data_fti_idx ON sometable USING gist(data_fti); When I do a query like this, it uses sometable_category1_idx and is very fast (it only returns a few rows out of several thousand) SELECT * from sometable WHERE is_null(category1)='f'; When I do a query like this though it is slow because it insists on doing the full-text index first: SELECT * from sometable WHERE is_null(category1)='f' AND data_fti @@ to_tsquery('default', 'postgres'); How can I make this query first use the is_null index?... It strikes me that this would almost always be faster then doing the full-text search first, right?... Thanks! - Greg
Re: Index problem.... GIST (tsearch2)
From
"Net Virtual Mailing Lists" (by way of
Net Virtual Mailing Lists
Date:
Than you very much Andrew... Yes you are right.. I mis-typeed CREATE INDEX.. ;-) The actual create indexes are as you suggested: CREATE INDEX sometable_category1_idx ON sometable (is_null(category1)); CREATE INDEX sometable_category2_idx ON sometable (is_null(category2)); CREATE INDEX sometable_category3_idx ON sometable (is_null(category3)); In an effort to simplify the problem, I was indicating the columns are integers.. They are, in fact, ltree and everytime I did a "SELECT * from sometable WHERE category1 IS NOT NULL", it shows that it does a sequential scan.. Thats the reason I created the is_null function (perhaps that was a bad idea, but I just could not make it work)... Regardless, it still won't use my is_null index first, any way around that?.. - Greg >Hi: > >> CREATE OR REPLACE FUNCTION is_null(anyelement) RETURNS BOOLEAN AS 'SELECT >> $1 IS NULL;' LANGUAGE 'SQL' IMMUTABLE; >> CREATE FUNCTION sometable_category1_idx ON sometable (category1); >> CREATE FUNCTION sometable_category2_idx ON sometable (category2); >> CREATE FUNCTION sometable_category3_idx ON sometable (category3); >> > >Alright ... there's something whacky here. I think you meant to use CREATE >INDEX? Is that right? > >CREATE INDEX sometable_category1_idx ON sometable (category1); > >> When I do a query like this though it is slow because it insists on doing >> the full-text index first: >> >> SELECT * from sometable WHERE is_null(category1)='f' AND data_fti @@ >> to_tsquery('default', 'postgres'); >> >> >> How can I make this query first use the is_null index?... It strikes me >> that this would almost always be faster then doing the full-text search >> first, right?... >> > > >It looks to me like there are better ways to accomplish what you are doing. > >Your indexes have indexed the value of the column ... not the value of your >function is_null(column). Which you could also do like this: > >CREATE INDEX sometable_category1_idx ON sometable (is_null(category1)); > >That would probably speed up the second query ... but I think that would be >slower than leaving an index on the INTEGER value and writing the query like >this: > >SELECT * from sometable WHERE category1 IS NOT NULL > AND data_fti @@ to_tsquery('default', 'postgres'); > >Why bother to write a function when what you want is supported right in SQL? >Avoids a function call. You might want to think about exactly what you want >to index ... indexes can be costly if not used on frequent access columns. > > > >HTH, > > >Andy >
"Net Virtual Mailing Lists" <mailinglists@net-virtual.com> writes: > SELECT * from sometable WHERE is_null(category1)='f' AND data_fti @@ > to_tsquery('default', 'postgres'); > > How can I make this query first use the is_null index?... It strikes me > that this would almost always be faster then doing the full-text search > first, right?... Well that depends on how many are false versus how many the full-text search finds. In this circumstance postgres is trying to compare two unknowns. It doesn't know how often is_nul() is going to return false, and it doesn't know how many records the full text search will match. 8.0 will have statistics on how often is_null() will return false. But that isn't really going to solve your problem since it still won't have any idea how many rows the full text search will find. I don't even know of anything you can do to influence the selectivity estimates of the full text search. -- greg
"Net Virtual Mailing Lists" <mailinglists@net-virtual.com>(by way of Net Virtual Mailing Lists <mailinglists@net-virtual.com>)writes: > >SELECT * from sometable WHERE category1 IS NOT NULL > > AND data_fti @@ to_tsquery('default', 'postgres'); What you can do is a partial index: create index fulltextindex on sometable (...) WHERE category1 IS NOT NULL This is even better than the is_null() index because it will still be able to do the full text search. However note that having three entire full text indexes is going to consume a lot of space and incur a lot of disk i/o overhead to maintain. -- greg
Greg Stark <gsstark@mit.edu> writes: > 8.0 will have statistics on how often is_null() will return false. But that > isn't really going to solve your problem since it still won't have any idea > how many rows the full text search will find. > I don't even know of anything you can do to influence the selectivity > estimates of the full text search. Write some code ;-) ? Seriously, we desperately need some people thinking about how to do statistics and selectivity estimates for these sorts of complex indexable conditions. Even crude estimates would be better than none at all, which is where we're at now. I think that as of 8.0 there is sufficient infrastructure in place to collect datatype-specific stats and do something with them --- but *what* to do is now the pressing problem. regards, tom lane
"Net Virtual Mailing Lists" <mailinglists@net-virtual.com> writes: > I have a table like this with some indexes as identified: > CREATE OR REPLACE FUNCTION is_null(anyelement) RETURNS BOOLEAN AS 'SELECT > $1 IS NULL;' LANGUAGE 'SQL' IMMUTABLE; > CREATE FUNCTION sometable_category1_idx ON sometable (category1); > CREATE FUNCTION sometable_category2_idx ON sometable (category2); > CREATE FUNCTION sometable_category3_idx ON sometable (category3); > CREATE FUNCTION sometable_data_fti_idx ON sometable USING gist(data_fti); [ raises eyebrow... ] It'd be easier to offer advice if you accurately depicted what you'd done. The above isn't even syntactically valid. I suppose what you meant is CREATE INDEX sometable_category1_idx ON sometable (is_null(category1)); The main problem with this is that before 8.0 there are no stats on functional indexes, and so the planner has no idea that the condition is_null(category1)='f' is very selective. (If you looked at the rowcount estimates from EXPLAIN this would be pretty obvious.) What I would suggest is that you forget the functional indexes and use partial indexes: CREATE INDEX sometable_category1_idx ON sometable (category1) WHERE category1 IS NOT NULL; SELECT * from sometable WHERE category1 IS NOT NULL AND data_fti @@ to_tsquery('default', 'postgres'); 7.4 has a reasonable chance of figuring out that the category1_idx is the thing to use if you cast it this way. regards, tom lane
Thanks, you are right, I mis-typed the statements (lack of sleep *shrug*), thanks for parsing through it... Your suggestion did resolve the situation nicely! - Greg >"Net Virtual Mailing Lists" <mailinglists@net-virtual.com> writes: >> I have a table like this with some indexes as identified: > >> CREATE OR REPLACE FUNCTION is_null(anyelement) RETURNS BOOLEAN AS 'SELECT >> $1 IS NULL;' LANGUAGE 'SQL' IMMUTABLE; >> CREATE FUNCTION sometable_category1_idx ON sometable (category1); >> CREATE FUNCTION sometable_category2_idx ON sometable (category2); >> CREATE FUNCTION sometable_category3_idx ON sometable (category3); > >> CREATE FUNCTION sometable_data_fti_idx ON sometable USING gist(data_fti); > >[ raises eyebrow... ] It'd be easier to offer advice if you accurately >depicted what you'd done. The above isn't even syntactically valid. > >I suppose what you meant is > >CREATE INDEX sometable_category1_idx ON sometable (is_null(category1)); > >The main problem with this is that before 8.0 there are no stats on >functional indexes, and so the planner has no idea that the condition >is_null(category1)='f' is very selective. (If you looked at the >rowcount estimates from EXPLAIN this would be pretty obvious.) > >What I would suggest is that you forget the functional indexes and use >partial indexes: > >CREATE INDEX sometable_category1_idx ON sometable (category1) >WHERE category1 IS NOT NULL; > >SELECT * from sometable WHERE category1 IS NOT NULL AND data_fti @@ >to_tsquery('default', 'postgres'); > >7.4 has a reasonable chance of figuring out that the category1_idx >is the thing to use if you cast it this way. > > regards, tom lane >