full-text search doesn't fall back on sequential scan when it could - Mailing list pgsql-bugs
From | James Dietrich |
---|---|
Subject | full-text search doesn't fall back on sequential scan when it could |
Date | |
Msg-id | 9dd1c6ea0807151351g52776137nc0fdfcab59e95b5@mail.gmail.com Whole thread Raw |
List | pgsql-bugs |
Hi, I am running PostgreSQL 8.3.3 on Linux 2.6.18 jdietrch@io:~/z$ uname -a Linux io.mills.lan 2.6.18-6-vserver-k7 #1 SMP Sat Jun 7 01:10:29 UTC 2008 i686 GNU/Linux Short version of the problem: When I run a full-text search that requires a full scan of the table, an error is returned (GIN indexes don't support sequential scan) instead of falling back on a sequential scan which would return the results of the query. Long version: I have a database with two tables named 'one' and 'two': z1=> \d one Table "public.one" Column | Type | Modifiers --------+------+----------- a | text | Indexes: "fts_a" gin (to_tsvector('simple'::regconfig, a)) z1=> \d two Table "public.two" Column | Type | Modifiers --------+------+----------- b | text | Indexes: "fts_b" gin (to_tsvector('simple'::regconfig, b)) Table 'one' has 51 rows: z1=> select * from one; a -------------- Two Three Four Five ****<snip>**** Forty nine Fifty Fifty one Fifty two (51 rows) Table 'two' has 5001 rows: z1=> select * from two; b -------------------------------------------- Fifty three Fifty four Fifty five Fifty six ****<snip>**** Five thousand fifty Five thousand fifty one Five thousand fifty two Five thousand fifty three (5001 rows) (At the bottom of this email I have copied the commands I used to create the database.) Now I run a full-text query on table 'one': z1=> select count(*) from one where to_tsvector('simple', a) @@ to_tsquery('simple', '!Four'); count ------- 47 (1 row) Running the same query under 'explain analyze' shows that the index is not being used, but a sequential scan is being done: z1=> explain analyze select a from one where to_tsvector('simple', a) @@ to_tsquery('simple', '!Four'); QUERY PLAN ----------------------------------------------------------------------------------------------- Seq Scan on one (cost=0.00..1.77 rows=1 width=32) (actual time=0.019..0.309 rows=47 loops=1) Filter: (to_tsvector('simple'::regconfig, a) @@ '!''four'''::tsquery) Total runtime: 0.341 ms (3 rows) That all works fine. But all is not fine when I do the same thing on the longer table 'two': z1=> \set VERBOSITY verbose z1=> select count(*) from two where to_tsvector('simple', b) @@ to_tsquery('simple', '!Four'); ERROR: 0A000: query requires full scan, which is not supported by GIN indexes LOCATION: gin_extract_tsquery, tsginidx.c:74 I understand that this query does require a full scan, and I understand that GIN indexes don't support a full scan, but why couldn't the planner fall back to a sequential scan in this case? Of course it's slower, but I would prefer a slower answer than failure with an error and no answer at all. I can simulate this solution by doing the following, which forces a sequential scan. z1=> set enable_bitmapscan to off; SET z1=> set enable_indexscan to off; SET z1=> select count(*) from two where to_tsvector('simple', b) @@ to_tsquery('simple', '!Four'); count ------- 3277 (1 row) z1=> explain analyze select count(*) from two where to_tsvector('simple', b) @@ to_tsquery('simple', '!Four'); QUERY PLAN --------------------------------------------------------------------------------------------------------- Aggregate (cost=114.03..114.04 rows=1 width=0) (actual time=91.171..91.171 rows=1 loops=1) -> Seq Scan on two (cost=0.00..114.02 rows=5 width=0) (actual time=0.028..89.598 rows=3277 loops=1) Filter: (to_tsvector('simple'::regconfig, b) @@ '!''four'''::tsquery) Total runtime: 91.208 ms (4 rows) Any thoughts? Is this something that could be fixed in a future version of PostgreSQL? Or is the current behavior intentionally the way it is for some reason I haven't thought of yet? Thanks a lot for your help, James Dietrich P.S. Here is the procedure I used create the test database. I can also provide the output of pg_dump upon request. jdietrch@io:~/z$ psql template1 -U stariadmin -W Password for user stariadmin: Welcome to psql 8.3.3, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit template1=> create database z1; CREATE DATABASE template1=> \q jdietrch@io:~/z$ psql z1 -U stariadmin -W -f a_commands Password for user stariadmin: CREATE TABLE CREATE INDEX CREATE TABLE CREATE INDEX CREATE SEQUENCE CREATE LANGUAGE CREATE FUNCTION nextval --------- 1 (1 row) CREATE FUNCTION CREATE FUNCTION fill_one ---------- done (1 row) fill_two ---------- done (1 row) jdietrch@io:~/z$ cat a_commands create table one(a text); create index fts_a on one using gin(to_tsvector('simple', a)); create table two(b text); create index fts_b on two using gin(to_tsvector('simple', b)); create sequence sequ; create language plpgsql; create or replace function num_word(out m_word text) as $$ declare n integer; o money; begin select into o nextval('sequ')::text::money; select into m_word trim(substring(cash_words(o), 1, position('dollars' in cash_words(o))-1)); end; $$ language plpgsql; select nextval('sequ'); create or replace function fill_one(out f_o text) as $$ begin for i in 0..50 loop insert into one values(num_word()); end loop; f_o = 'done'; end; $$ language plpgsql; create or replace function fill_two(out f_tw text) as $$ begin for i in 0..5000 loop insert into two values(num_word()); end loop; f_tw = 'done'; end; $$ language plpgsql; select fill_one(); select fill_two(); jdietrch@io:~/z$
pgsql-bugs by date: