Thread: text search: tablescan cost for a tsvector

text search: tablescan cost for a tsvector

From
"Marc Mamin"
Date:
Hello,

I have quite systematically better performance with the text search when
I disable the statistics collection for the tsvector column.
So I wonder if such statistics ever make sense.

Here a testcase:

The table contains 200'000 tsvector, whereas the lexeme 'fooblablabla'
exists in all tsvector:
Without statistics, the planner decide as expected for the gin index.
After analyze, it switch to a table scan which is also expected, but the
query is 3 times slower.

My first thought was that the Bitmap Heap Scan was really fast as the
searched term is always at the first position.
So I repeated the test with an additional search term at the last
position, but without significant change:

(result from the 6. test below)

without analyze: http://explain.depesz.com/s/6At
with analyze:    http://explain.depesz.com/s/r3B


best regards,

Marc Mamin




Here all my results, always one of the fastest from a few runs.


CREATE TABLE tsv_test
(
  id bigserial NOT NULL,
  v tsvector
);


<The code to fill the table with test data can be found below>


The test query:

explain analyze
select id from tsv_test where v @@ 'lexeme3179'::tsquery
UNION ALL
select id from tsv_test where v @@ 'lexeme5'::tsquery
UNION ALL
select id from tsv_test where v @@ 'fooblablabla'::tsquery

The results

A) on first lexeme

1) without indexes without analyze:
   http://explain.depesz.com/s/bOv

2) alter table tsv_test add constraint tsv_test_pk primary key(id);
   http://explain.depesz.com/s/9QQ (same as previous);

3) create index tsv_gin on tsv_test using gin(v);
   http://explain.depesz.com/s/r4M <= fastest

4) ANALYZE tsv_test (id);
   http://explain.depesz.com/s/MyC (same as previous);

5) ANALYZE tsv_test;
   http://explain.depesz.com/s/qu3S


B) on lastlexeme

6) create table  tsv_test2 as select id,
   v||'zzthisisalongerlexemethisisalongerlexeme'::tsvector
   from tsv_test;

   explain analyze
   select id from tsv_test2 where v @@
'zzthisisalongerlexemethisisalongerlexeme'::tsquery

   http://explain.depesz.com/s/6At

   ANALYZE tsv_test2;

   http://explain.depesz.com/s/r3B



test data:

insert into tsv_test (v)
select
cast('fooblablabla' ||
' lexeme'||s%2|| ' lexeme'||s%3|| ' lexeme'||s%4||
' lexeme'||s%4|| ' lexeme'||s%5|| ' lexeme'||s%6||
' lexeme'||s%7|| ' lexeme'||s%8|| ' lexeme'||s%9||
' lexeme'||s%10 || ' lexeme2'||s%11 || ' lexeme3'||s%12 ||
' lexeme'||s%11 || ' lexeme2'||s%12 || ' lexeme3'||s%22 ||
' lexeme'||s%12 || ' lexeme2'||s%13 || ' lexeme3'||s%32 ||
' lexeme'||s%13 || ' lexeme2'||s%14 || ' lexeme3'||s%42 ||
' lexeme'||s%14 || ' lexeme2'||s%15 || ' lexeme3'||s%52 ||
' lexeme'||s%15 || ' lexeme2'||s%16 || ' lexeme3'||s%62 ||
' lexeme'||s%16 || ' lexeme2'||s%17 || ' lexeme3'||s%72 ||
' lexeme'||s%17 || ' lexeme2'||s%18 || ' lexeme3'||s%82 ||
' lexeme'||s%18 || ' lexeme2'||s%19 || ' lexeme3'||s%92 ||
' lexeme'||s%19 || ' lexeme2'||s%10 || ' lexeme3'||s%15 ||
' lexeme'||s%12 || ' lexeme2'||s%71 || ' lexeme3'||s%16 ||
' lexeme'||s%20 || ' lexeme2'||s%81 || ' lexeme3'||s%17 ||
' lexeme'||s%35 || ' lexeme2'||s%91 || ' lexeme3'||s%18 ||
' lexeme'||s%100 || ' lexeme2'||s%110 || ' lexeme3'||s%120 ||
' lexeme'||s%110 || ' lexeme2'||s%120 || ' lexeme3'||s%220 ||
' lexeme'||s%120 || ' lexeme2'||s%130 || ' lexeme3'||s%320 ||
' lexeme'||s%130 || ' lexeme2'||s%140 || ' lexeme3'||s%420 ||
' lexeme'||s%140 || ' lexeme2'||s%150 || ' lexeme3'||s%520 ||
' lexeme'||s%150 || ' lexeme2'||s%160 || ' lexeme3'||s%620 ||
' lexeme'||s%160 || ' lexeme2'||s%170 || ' lexeme3'||s%720 ||
' lexeme'||s%170 || ' lexeme2'||s%180 || ' lexeme3'||s%820 ||
' lexeme'||s%180 || ' lexeme2'||s%190 || ' lexeme3'||s%920 ||
' lexeme'||s%190 || ' lexeme2'||s%100 || ' lexeme3'||s%150 ||
' lexeme'||s%120 || ' lexeme2'||s%710 || ' lexeme3'||s%160 ||
' lexeme'||s%200 || ' lexeme2'||s%810 || ' lexeme3'||s%170 ||
' lexeme'||s%350 || ' lexeme2'||s%910 || ' lexeme3'||s%180
as tsvector)
FROM generate_series(1,100000) s
UNION ALL
select
cast('fooblablabla' ||
' thisisalongerlexemethisisalongerlexeme'||s%2|| '
thisisalongerlexemethisisalongerlexeme'||s%3|| '
thisisalongerlexemethisisalongerlexeme'||s%4||
' thisisalongerlexemethisisalongerlexeme'||s%4|| '
thisisalongerlexemethisisalongerlexeme'||s%5|| '
thisisalongerlexemethisisalongerlexeme'||s%6||
' thisisalongerlexemethisisalongerlexeme'||s%7|| '
thisisalongerlexemethisisalongerlexeme'||s%8|| '
thisisalongerlexemethisisalongerlexeme'||s%9||
' thisisalongerlexemethisisalongerlexeme'||s%10 || '
thisisalongerlexemethisisalongerlexeme2'||s%11 || '
thisisalongerlexemethisisalongerlexeme3'||s%12 ||
' thisisalongerlexemethisisalongerlexeme'||s%11 || '
thisisalongerlexemethisisalongerlexeme2'||s%12 || '
thisisalongerlexemethisisalongerlexeme3'||s%22 ||
' thisisalongerlexemethisisalongerlexeme'||s%12 || '
thisisalongerlexemethisisalongerlexeme2'||s%13 || '
thisisalongerlexemethisisalongerlexeme3'||s%32 ||
' thisisalongerlexemethisisalongerlexeme'||s%13 || '
thisisalongerlexemethisisalongerlexeme2'||s%14 || '
thisisalongerlexemethisisalongerlexeme3'||s%42 ||
' thisisalongerlexemethisisalongerlexeme'||s%14 || '
thisisalongerlexemethisisalongerlexeme2'||s%15 || '
thisisalongerlexemethisisalongerlexeme3'||s%52 ||
' thisisalongerlexemethisisalongerlexeme'||s%15 || '
thisisalongerlexemethisisalongerlexeme2'||s%16 || '
thisisalongerlexemethisisalongerlexeme3'||s%62 ||
' thisisalongerlexemethisisalongerlexeme'||s%16 || '
thisisalongerlexemethisisalongerlexeme2'||s%17 || '
thisisalongerlexemethisisalongerlexeme3'||s%72 ||
' thisisalongerlexemethisisalongerlexeme'||s%17 || '
thisisalongerlexemethisisalongerlexeme2'||s%18 || '
thisisalongerlexemethisisalongerlexeme3'||s%82 ||
' thisisalongerlexemethisisalongerlexeme'||s%18 || '
thisisalongerlexemethisisalongerlexeme2'||s%19 || '
thisisalongerlexemethisisalongerlexeme3'||s%92 ||
' thisisalongerlexemethisisalongerlexeme'||s%19 || '
thisisalongerlexemethisisalongerlexeme2'||s%10 || '
thisisalongerlexemethisisalongerlexeme3'||s%15 ||
' thisisalongerlexemethisisalongerlexeme'||s%12 || '
thisisalongerlexemethisisalongerlexeme2'||s%71 || '
thisisalongerlexemethisisalongerlexeme3'||s%16 ||
' thisisalongerlexemethisisalongerlexeme'||s%20 || '
thisisalongerlexemethisisalongerlexeme2'||s%81 || '
thisisalongerlexemethisisalongerlexeme3'||s%17 ||
' thisisalongerlexemethisisalongerlexeme'||s%35 || '
thisisalongerlexemethisisalongerlexeme2'||s%91 || '
thisisalongerlexemethisisalongerlexeme3'||s%18 ||
' thisisalongerlexemethisisalongerlexeme'||s%100 || '
thisisalongerlexemethisisalongerlexeme2'||s%110 || '
thisisalongerlexemethisisalongerlexeme3'||s%120 ||
' thisisalongerlexemethisisalongerlexeme'||s%110 || '
thisisalongerlexemethisisalongerlexeme2'||s%120 || '
thisisalongerlexemethisisalongerlexeme3'||s%220 ||
' thisisalongerlexemethisisalongerlexeme'||s%120 || '
thisisalongerlexemethisisalongerlexeme2'||s%130 || '
thisisalongerlexemethisisalongerlexeme3'||s%320 ||
' thisisalongerlexemethisisalongerlexeme'||s%130 || '
thisisalongerlexemethisisalongerlexeme2'||s%140 || '
thisisalongerlexemethisisalongerlexeme3'||s%420 ||
' thisisalongerlexemethisisalongerlexeme'||s%140 || '
thisisalongerlexemethisisalongerlexeme2'||s%150 || '
thisisalongerlexemethisisalongerlexeme3'||s%520 ||
' thisisalongerlexemethisisalongerlexeme'||s%150 || '
thisisalongerlexemethisisalongerlexeme2'||s%160 || '
thisisalongerlexemethisisalongerlexeme3'||s%620 ||
' thisisalongerlexemethisisalongerlexeme'||s%160 || '
thisisalongerlexemethisisalongerlexeme2'||s%170 || '
thisisalongerlexemethisisalongerlexeme3'||s%720 ||
' thisisalongerlexemethisisalongerlexeme'||s%170 || '
thisisalongerlexemethisisalongerlexeme2'||s%180 || '
thisisalongerlexemethisisalongerlexeme3'||s%820 ||
' thisisalongerlexemethisisalongerlexeme'||s%180 || '
thisisalongerlexemethisisalongerlexeme2'||s%190 || '
thisisalongerlexemethisisalongerlexeme3'||s%920 ||
' thisisalongerlexemethisisalongerlexeme'||s%190 || '
thisisalongerlexemethisisalongerlexeme2'||s%100 || '
thisisalongerlexemethisisalongerlexeme3'||s%150 ||
' thisisalongerlexemethisisalongerlexeme'||s%120 || '
thisisalongerlexemethisisalongerlexeme2'||s%710 || '
thisisalongerlexemethisisalongerlexeme3'||s%160 ||
' thisisalongerlexemethisisalongerlexeme'||s%200 || '
thisisalongerlexemethisisalongerlexeme2'||s%810 || '
thisisalongerlexemethisisalongerlexeme3'||s%170 ||
' thisisalongerlexemethisisalongerlexeme'||s%350 || '
thisisalongerlexemethisisalongerlexeme2'||s%910 || '
thisisalongerlexemethisisalongerlexeme3'||s%180
as tsvector)
FROM generate_series(1,100000) s



Re: text search: tablescan cost for a tsvector

From
Robert Haas
Date:
On Mon, Feb 6, 2012 at 6:05 AM, Marc Mamin <M.Mamin@intershop.de> wrote:
> without analyze: http://explain.depesz.com/s/6At
> with analyze:    http://explain.depesz.com/s/r3B

I think this is the same issue complained about here:

http://archives.postgresql.org/message-id/4ED68EEC.9030906@krogh.cc

And here:

http://archives.postgresql.org/message-id/CANxtv6XiuiqEkXRJU2vk=xKAFXrLeP7uVhgR-XMCyjgQz29EFQ@mail.gmail.com

The problem seems to be that the cost estimator doesn't know that
detoasting is expensive.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: text search: tablescan cost for a tsvector

From
"Marc Mamin"
Date:

> Von: Robert Haas [mailto:robertmhaas@gmail.com]
> Gesendet: Mi 2/29/2012 7:32


> On Mon, Feb 6, 2012 at 6:05 AM, Marc Mamin <M.Mamin@intershop.de> wrote:
> > without analyze: http://explain.depesz.com/s/6At
> > with analyze:    http://explain.depesz.com/s/r3B
...
> The problem seems to be that the cost estimator doesn't know that
> detoasting is expensive.

Hello,

Tom Lane has started a follow up thread in the hacker list.
Detoasting is indeed the main obstacle, but I've repeated my test using plain storage
and the planer still choose (systematically?) the slowest query.
It seems that I bumped into 2 different issues at the same time.

http://archives.postgresql.org/pgsql-hackers/2012-02/msg00896.php

Backround:
Our reporting system offers amongst others time histograms
combined with a FTS filtering on error occurences (imported from error logs),
It is hence not unusual that given search terms are found within a majority of the documents...

best regards,

Marc Mamin