Thread: ERROR: Gin doesn't support full scan due to it's awful inefficiency
I've run across another GIN index issue - using postgresql 8.1.4 on Window/Linux with the GIN/tsearch2 patch. I have two tables like this: CREATE TABLE maps ( id serial, query tsearch2.tsquery ) CREATE TABLE features ( id serial, vector tsearch2.tsvector ) CREATE INDEX idx_features_tags_vector ON features USING gin (tags_vector); Where maps.query contains cached tsquery (they are cached for performance reasons). When I run this query: select * from maps, features where to_tsquery('test') @@ features.tags_vector I get this error: ERROR: Gin doesn't support full scan due to it's awful inefficiency Here is explain (from a very small test database): Nested Loop (cost=0.00..1878.71 rows=370 width=208) -> Seq Scan on maps (cost=0.00..14.80 rows=480 width=136) -> Index Scan using idx_features_tags_vector on features (cost=0.00..3.87 rows=1 width=72) Index Cond: ("outer".query @@ features.tags_vector) I thought that this would solve my problem: set enable_indexscan to off; But it does not. Interestingly, this does work: select * from features where to_tsquery('test') @@ features.tags_vector; Explain: Index Scan using idx_features_tags_vector on features (cost=0.00..3.87 rows=1 width=72) Index Cond: ('''test'''::tsquery @@ tags_vector) At first I thought the issue was that you couldn't use an Index Scan on gin index, but that now seems like an incorrect conclusion. So, two things: 1. How do I work around this issue? 2. Seems like postgresql should be smart enough to pick a query that will run. Thanks, Charlie
Sorry, mistyped the query causing the problem. It is: select * from maps, features where maps.query @@ features.tags_vector; Thanks, Charlie Charlie Savage wrote: > I've run across another GIN index issue - using postgresql 8.1.4 on > Window/Linux with the GIN/tsearch2 patch. > > I have two tables like this: > > CREATE TABLE maps > ( > id serial, > query tsearch2.tsquery > ) > > > CREATE TABLE features > ( > id serial, > vector tsearch2.tsvector > ) > > CREATE INDEX idx_features_tags_vector ON features USING gin (tags_vector); > > > Where maps.query contains cached tsquery (they are cached for > performance reasons). > > When I run this query: > > select * > from maps, features > where to_tsquery('test') @@ features.tags_vector > > I get this error: > > ERROR: Gin doesn't support full scan due to it's awful inefficiency > > Here is explain (from a very small test database): > > Nested Loop (cost=0.00..1878.71 rows=370 width=208) > -> Seq Scan on maps (cost=0.00..14.80 rows=480 width=136) > -> Index Scan using idx_features_tags_vector on features > (cost=0.00..3.87 rows=1 width=72) > Index Cond: ("outer".query @@ features.tags_vector) > > I thought that this would solve my problem: > > set enable_indexscan to off; > > But it does not. > > Interestingly, this does work: > > select * > from features > where to_tsquery('test') @@ features.tags_vector; > > Explain: > > Index Scan using idx_features_tags_vector on features (cost=0.00..3.87 > rows=1 width=72) > Index Cond: ('''test'''::tsquery @@ tags_vector) > > At first I thought the issue was that you couldn't use an Index Scan on > gin index, but that now seems like an incorrect conclusion. > > So, two things: > > 1. How do I work around this issue? > 2. Seems like postgresql should be smart enough to pick a query that > will run. > > Thanks, > > Charlie > > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org
Attachment
Charlie Savage <cfis@savagexi.com> writes: > Sorry, mistyped the query causing the problem. It is: > select * > from maps, features > where maps.query @@ features.tags_vector; In that case it's fair to ask what query values you have stored in maps. In particular I imagine that you'll find that a specific query is causing the problem ... regards, tom lane
Hi Tom, Thanks for the quick reply. >> Sorry, mistyped the query causing the problem. It is: > >> select * >> from maps, features >> where maps.query @@ features.tags_vector; > > In that case it's fair to ask what query values you have stored in maps. > In particular I imagine that you'll find that a specific query is > causing the problem ... > > regards, tom lane Interesting...that seems to be the case. For example, this will fail: explain analyze select * from test.features where to_tsquery('') @@ features.vector ERROR: Gin doesn't support full scan due to it's awful inefficiency Interestingly this works: explain analyze select * from test.features where NULL @@ features.vector Here is a slightly bigger test case: --drop schema test cascade; create schema test; CREATE TABLE test.maps ( id serial, query tsquery ); CREATE TABLE test.features ( id serial, vector tsvector ); CREATE INDEX features_vector ON test.features USING gin (vector); INSERT INTO test.maps (query) VALUES (to_tsquery('')); INSERT INTO test.features (vector) VALUES (to_tsvector('test')); analyze test.maps; analyze test.features; ---------- Now try this, which won't work (ERROR: Gin doesn't support full scan due to it's awful inefficiency): set enable_seqscan to off; explain select * from test.maps, test.features where features.vector @@ maps.query Nested Loop (cost=100000000.00..100000004.04 rows=1 width=36) -> Seq Scan on maps (cost=100000000.00..100000001.01 rows=1 width=12) -> Index Scan using features_vector on features (cost=0.00..3.01 rows=1 width=24) Index Cond: (features.vector @@ "outer".query) However, this works: set enable_seqscan to on; set enable_indexscan to off; set enable_bitmapscan to off; explain analyze select * from test.maps, test.features where features.vector @@ maps.query Nested Loop (cost=200000000.00..200000002.03 rows=1 width=36) (actual time=0.055..0.055 rows=0 loops=1) Join Filter: ("inner".vector @@ "outer".query) -> Seq Scan on maps (cost=100000000.00..100000001.01 rows=1 width=12) (actual time=0.011..0.014 rows=1 loops=1) -> Seq Scan on features (cost=100000000.00..100000001.01 rows=1 width=24) (actual time=0.006..0.010 rows=1 loops=1) Total runtime: 0.129 ms You see the same things if you put a NULL in the query column (unlike above). If instead, you do this in the script above: INSERT INTO test.maps (query) VALUES (to_tsquery('test')); Then it always works. Seems like the moral of the story, tsquery values of '' or NULL don't work. That is surprising to me - maybe the documentation should point out this issue? Thanks, Charlie
Attachment
> explain analyze > select * > from test.features > where to_tsquery('') @@ features.vector > > ERROR: Gin doesn't support full scan due to it's awful inefficiency Look: contrib_regression=# select '{1,2,3}'::int4[] @ '{}'; --contains ?column? ---------- t (1 row) contrib_regression=# select '{1,2,3}'::int4[] && '{}'; --overlap ?column? ---------- f (1 row) contrib_regression=# select to_tsvector('asdasd') @@ ''::tsquery; NOTICE: query doesn't contain lexeme(s) ?column? ---------- f (1 row) Semantic of different operation with void (but not NULL) argument is very different. If query doesn't contain any entry (returned by extractQuery() index support method), then GIN, in any case, doesn't know what it should return: whole set of pointers or nothing. But GIN can't return all - it will be very-very slow, because there is a lot of pointers in GIN index to each table's row. It seems to me that message makes confuse about reason of error... > > Interestingly this works: > > explain analyze > select * > from test.features > where NULL @@ features.vector That is because @@ is marked as 'returns NULL on NULL input', ie index will not be used. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Thanks for the clarification Teodor - it makes much more sense now. I'd agree the error message is confusing, particularly since this works: set enable_seqscan to on; set enable_indexscan to off; set enable_bitmapscan to off; select * from test.features where to_tsquery('') @@ features.vector; I see why it would now. But it would be nice if the message mentioned the root cause of the problem - that the tsquery value does not contain any lexemes and thus is not valid for doing a search. Thanks, Charlie Teodor Sigaev wrote: >> explain analyze >> select * >> from test.features >> where to_tsquery('') @@ features.vector >> >> ERROR: Gin doesn't support full scan due to it's awful inefficiency > > Look: > contrib_regression=# select '{1,2,3}'::int4[] @ '{}'; --contains > ?column? > ---------- > t > (1 row) > > contrib_regression=# select '{1,2,3}'::int4[] && '{}'; --overlap > ?column? > ---------- > f > (1 row) > > contrib_regression=# select to_tsvector('asdasd') @@ ''::tsquery; > NOTICE: query doesn't contain lexeme(s) > ?column? > ---------- > f > (1 row) > > Semantic of different operation with void (but not NULL) argument is > very different. If query doesn't contain any entry (returned by > extractQuery() index support method), then GIN, in any case, doesn't > know what it should return: whole set of pointers or nothing. But GIN > can't return all - it will be very-very slow, because there is a lot of > pointers in GIN index to each table's row. > > It seems to me that message makes confuse about reason of error... > > > >> >> Interestingly this works: >> >> explain analyze >> select * >> from test.features >> where NULL @@ features.vector > > That is because @@ is marked as 'returns NULL on NULL input', ie index > will not be used. >
Attachment
> I see why it would now. But it would be nice if the message mentioned > the root cause of the problem - that the tsquery value does not contain > any lexemes and thus is not valid for doing a search. ... doing a index search :(. GIN core doesn't know anything about tsvector/tsquery/tsearch - so, error message should mention something like this: GIN index doesn't support search with void argument. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/