Thread: text search: tablescan cost for a tsvector
Hello, I have quite systematically better performance with the text search when I disable the statistics collection for the tsvector column. So I wonder if such statistics ever make sense. Here a testcase: The table contains 200'000 tsvector, whereas the lexeme 'fooblablabla' exists in all tsvector: Without statistics, the planner decide as expected for the gin index. After analyze, it switch to a table scan which is also expected, but the query is 3 times slower. My first thought was that the Bitmap Heap Scan was really fast as the searched term is always at the first position. So I repeated the test with an additional search term at the last position, but without significant change: (result from the 6. test below) without analyze: http://explain.depesz.com/s/6At with analyze: http://explain.depesz.com/s/r3B best regards, Marc Mamin Here all my results, always one of the fastest from a few runs. CREATE TABLE tsv_test ( id bigserial NOT NULL, v tsvector ); <The code to fill the table with test data can be found below> The test query: explain analyze select id from tsv_test where v @@ 'lexeme3179'::tsquery UNION ALL select id from tsv_test where v @@ 'lexeme5'::tsquery UNION ALL select id from tsv_test where v @@ 'fooblablabla'::tsquery The results A) on first lexeme 1) without indexes without analyze: http://explain.depesz.com/s/bOv 2) alter table tsv_test add constraint tsv_test_pk primary key(id); http://explain.depesz.com/s/9QQ (same as previous); 3) create index tsv_gin on tsv_test using gin(v); http://explain.depesz.com/s/r4M <= fastest 4) ANALYZE tsv_test (id); http://explain.depesz.com/s/MyC (same as previous); 5) ANALYZE tsv_test; http://explain.depesz.com/s/qu3S B) on lastlexeme 6) create table tsv_test2 as select id, v||'zzthisisalongerlexemethisisalongerlexeme'::tsvector from tsv_test; explain analyze select id from tsv_test2 where v @@ 'zzthisisalongerlexemethisisalongerlexeme'::tsquery http://explain.depesz.com/s/6At ANALYZE tsv_test2; http://explain.depesz.com/s/r3B test data: insert into tsv_test (v) select cast('fooblablabla' || ' lexeme'||s%2|| ' lexeme'||s%3|| ' lexeme'||s%4|| ' lexeme'||s%4|| ' lexeme'||s%5|| ' lexeme'||s%6|| ' lexeme'||s%7|| ' lexeme'||s%8|| ' lexeme'||s%9|| ' lexeme'||s%10 || ' lexeme2'||s%11 || ' lexeme3'||s%12 || ' lexeme'||s%11 || ' lexeme2'||s%12 || ' lexeme3'||s%22 || ' lexeme'||s%12 || ' lexeme2'||s%13 || ' lexeme3'||s%32 || ' lexeme'||s%13 || ' lexeme2'||s%14 || ' lexeme3'||s%42 || ' lexeme'||s%14 || ' lexeme2'||s%15 || ' lexeme3'||s%52 || ' lexeme'||s%15 || ' lexeme2'||s%16 || ' lexeme3'||s%62 || ' lexeme'||s%16 || ' lexeme2'||s%17 || ' lexeme3'||s%72 || ' lexeme'||s%17 || ' lexeme2'||s%18 || ' lexeme3'||s%82 || ' lexeme'||s%18 || ' lexeme2'||s%19 || ' lexeme3'||s%92 || ' lexeme'||s%19 || ' lexeme2'||s%10 || ' lexeme3'||s%15 || ' lexeme'||s%12 || ' lexeme2'||s%71 || ' lexeme3'||s%16 || ' lexeme'||s%20 || ' lexeme2'||s%81 || ' lexeme3'||s%17 || ' lexeme'||s%35 || ' lexeme2'||s%91 || ' lexeme3'||s%18 || ' lexeme'||s%100 || ' lexeme2'||s%110 || ' lexeme3'||s%120 || ' lexeme'||s%110 || ' lexeme2'||s%120 || ' lexeme3'||s%220 || ' lexeme'||s%120 || ' lexeme2'||s%130 || ' lexeme3'||s%320 || ' lexeme'||s%130 || ' lexeme2'||s%140 || ' lexeme3'||s%420 || ' lexeme'||s%140 || ' lexeme2'||s%150 || ' lexeme3'||s%520 || ' lexeme'||s%150 || ' lexeme2'||s%160 || ' lexeme3'||s%620 || ' lexeme'||s%160 || ' lexeme2'||s%170 || ' lexeme3'||s%720 || ' lexeme'||s%170 || ' lexeme2'||s%180 || ' lexeme3'||s%820 || ' lexeme'||s%180 || ' lexeme2'||s%190 || ' lexeme3'||s%920 || ' lexeme'||s%190 || ' lexeme2'||s%100 || ' lexeme3'||s%150 || ' lexeme'||s%120 || ' lexeme2'||s%710 || ' lexeme3'||s%160 || ' lexeme'||s%200 || ' lexeme2'||s%810 || ' lexeme3'||s%170 || ' lexeme'||s%350 || ' lexeme2'||s%910 || ' lexeme3'||s%180 as tsvector) FROM generate_series(1,100000) s UNION ALL select cast('fooblablabla' || ' thisisalongerlexemethisisalongerlexeme'||s%2|| ' thisisalongerlexemethisisalongerlexeme'||s%3|| ' thisisalongerlexemethisisalongerlexeme'||s%4|| ' thisisalongerlexemethisisalongerlexeme'||s%4|| ' thisisalongerlexemethisisalongerlexeme'||s%5|| ' thisisalongerlexemethisisalongerlexeme'||s%6|| ' thisisalongerlexemethisisalongerlexeme'||s%7|| ' thisisalongerlexemethisisalongerlexeme'||s%8|| ' thisisalongerlexemethisisalongerlexeme'||s%9|| ' thisisalongerlexemethisisalongerlexeme'||s%10 || ' thisisalongerlexemethisisalongerlexeme2'||s%11 || ' thisisalongerlexemethisisalongerlexeme3'||s%12 || ' thisisalongerlexemethisisalongerlexeme'||s%11 || ' thisisalongerlexemethisisalongerlexeme2'||s%12 || ' thisisalongerlexemethisisalongerlexeme3'||s%22 || ' thisisalongerlexemethisisalongerlexeme'||s%12 || ' thisisalongerlexemethisisalongerlexeme2'||s%13 || ' thisisalongerlexemethisisalongerlexeme3'||s%32 || ' thisisalongerlexemethisisalongerlexeme'||s%13 || ' thisisalongerlexemethisisalongerlexeme2'||s%14 || ' thisisalongerlexemethisisalongerlexeme3'||s%42 || ' thisisalongerlexemethisisalongerlexeme'||s%14 || ' thisisalongerlexemethisisalongerlexeme2'||s%15 || ' thisisalongerlexemethisisalongerlexeme3'||s%52 || ' thisisalongerlexemethisisalongerlexeme'||s%15 || ' thisisalongerlexemethisisalongerlexeme2'||s%16 || ' thisisalongerlexemethisisalongerlexeme3'||s%62 || ' thisisalongerlexemethisisalongerlexeme'||s%16 || ' thisisalongerlexemethisisalongerlexeme2'||s%17 || ' thisisalongerlexemethisisalongerlexeme3'||s%72 || ' thisisalongerlexemethisisalongerlexeme'||s%17 || ' thisisalongerlexemethisisalongerlexeme2'||s%18 || ' thisisalongerlexemethisisalongerlexeme3'||s%82 || ' thisisalongerlexemethisisalongerlexeme'||s%18 || ' thisisalongerlexemethisisalongerlexeme2'||s%19 || ' thisisalongerlexemethisisalongerlexeme3'||s%92 || ' thisisalongerlexemethisisalongerlexeme'||s%19 || ' thisisalongerlexemethisisalongerlexeme2'||s%10 || ' thisisalongerlexemethisisalongerlexeme3'||s%15 || ' thisisalongerlexemethisisalongerlexeme'||s%12 || ' thisisalongerlexemethisisalongerlexeme2'||s%71 || ' thisisalongerlexemethisisalongerlexeme3'||s%16 || ' thisisalongerlexemethisisalongerlexeme'||s%20 || ' thisisalongerlexemethisisalongerlexeme2'||s%81 || ' thisisalongerlexemethisisalongerlexeme3'||s%17 || ' thisisalongerlexemethisisalongerlexeme'||s%35 || ' thisisalongerlexemethisisalongerlexeme2'||s%91 || ' thisisalongerlexemethisisalongerlexeme3'||s%18 || ' thisisalongerlexemethisisalongerlexeme'||s%100 || ' thisisalongerlexemethisisalongerlexeme2'||s%110 || ' thisisalongerlexemethisisalongerlexeme3'||s%120 || ' thisisalongerlexemethisisalongerlexeme'||s%110 || ' thisisalongerlexemethisisalongerlexeme2'||s%120 || ' thisisalongerlexemethisisalongerlexeme3'||s%220 || ' thisisalongerlexemethisisalongerlexeme'||s%120 || ' thisisalongerlexemethisisalongerlexeme2'||s%130 || ' thisisalongerlexemethisisalongerlexeme3'||s%320 || ' thisisalongerlexemethisisalongerlexeme'||s%130 || ' thisisalongerlexemethisisalongerlexeme2'||s%140 || ' thisisalongerlexemethisisalongerlexeme3'||s%420 || ' thisisalongerlexemethisisalongerlexeme'||s%140 || ' thisisalongerlexemethisisalongerlexeme2'||s%150 || ' thisisalongerlexemethisisalongerlexeme3'||s%520 || ' thisisalongerlexemethisisalongerlexeme'||s%150 || ' thisisalongerlexemethisisalongerlexeme2'||s%160 || ' thisisalongerlexemethisisalongerlexeme3'||s%620 || ' thisisalongerlexemethisisalongerlexeme'||s%160 || ' thisisalongerlexemethisisalongerlexeme2'||s%170 || ' thisisalongerlexemethisisalongerlexeme3'||s%720 || ' thisisalongerlexemethisisalongerlexeme'||s%170 || ' thisisalongerlexemethisisalongerlexeme2'||s%180 || ' thisisalongerlexemethisisalongerlexeme3'||s%820 || ' thisisalongerlexemethisisalongerlexeme'||s%180 || ' thisisalongerlexemethisisalongerlexeme2'||s%190 || ' thisisalongerlexemethisisalongerlexeme3'||s%920 || ' thisisalongerlexemethisisalongerlexeme'||s%190 || ' thisisalongerlexemethisisalongerlexeme2'||s%100 || ' thisisalongerlexemethisisalongerlexeme3'||s%150 || ' thisisalongerlexemethisisalongerlexeme'||s%120 || ' thisisalongerlexemethisisalongerlexeme2'||s%710 || ' thisisalongerlexemethisisalongerlexeme3'||s%160 || ' thisisalongerlexemethisisalongerlexeme'||s%200 || ' thisisalongerlexemethisisalongerlexeme2'||s%810 || ' thisisalongerlexemethisisalongerlexeme3'||s%170 || ' thisisalongerlexemethisisalongerlexeme'||s%350 || ' thisisalongerlexemethisisalongerlexeme2'||s%910 || ' thisisalongerlexemethisisalongerlexeme3'||s%180 as tsvector) FROM generate_series(1,100000) s
On Mon, Feb 6, 2012 at 6:05 AM, Marc Mamin <M.Mamin@intershop.de> wrote: > without analyze: http://explain.depesz.com/s/6At > with analyze: http://explain.depesz.com/s/r3B I think this is the same issue complained about here: http://archives.postgresql.org/message-id/4ED68EEC.9030906@krogh.cc And here: http://archives.postgresql.org/message-id/CANxtv6XiuiqEkXRJU2vk=xKAFXrLeP7uVhgR-XMCyjgQz29EFQ@mail.gmail.com The problem seems to be that the cost estimator doesn't know that detoasting is expensive. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> Von: Robert Haas [mailto:robertmhaas@gmail.com]
> Gesendet: Mi 2/29/2012 7:32
>
> On Mon, Feb 6, 2012 at 6:05 AM, Marc Mamin <M.Mamin@intershop.de> wrote:
> > without analyze: http://explain.depesz.com/s/6At
> > with analyze: http://explain.depesz.com/s/r3B
...
> The problem seems to be that the cost estimator doesn't know that
> detoasting is expensive.
Hello,
Tom Lane has started a follow up thread in the hacker list.
Detoasting is indeed the main obstacle, but I've repeated my test using plain storage
and the planer still choose (systematically?) the slowest query.
It seems that I bumped into 2 different issues at the same time.
http://archives.postgresql.org/pgsql-hackers/2012-02/msg00896.php
Backround:
Our reporting system offers amongst others time histograms
combined with a FTS filtering on error occurences (imported from error logs),
It is hence not unusual that given search terms are found within a majority of the documents...
best regards,
Marc Mamin