Re: Dumb question involving to_tsvector and a view - Mailing list pgsql-general

From Kevin Grittner
Subject Re: Dumb question involving to_tsvector and a view
Date
Msg-id 1361725014.4245.YahooMailNeo@web162905.mail.bf1.yahoo.com
Whole thread Raw
In response to Re: Dumb question involving to_tsvector and a view  (Jasen Betts <jasen@xnet.co.nz>)
List pgsql-general
Jasen Betts <jasen@xnet.co.nz> wrote:=0A>On 2013-02-23, Raymond C. Rodgers =
<sinful622@gmail.com> wrote:=0A=0A>> On 02/23/2013 05:26 AM, Tom Lane wrote=
:=0A>>> A "virtual" tsvector like that is probably going to be useless for=
=0A>>> searching as soon as you get a meaningful amount of data, because th=
e=0A>>> only way the DB can implement a search is to compute the tsvector=
=0A>>> value for each table row and then examine it for the target word(s).=
=0A>>> What you want is a GIST or GIN index on the contents of the tsvector=
.=0A>=0A>> I think the only real advantage to using something like this wou=
ld be a=0A>> space savings in terms of storing the tsvector data, but I don=
't see=0A>> that being a significant enough reason to go ahead and use this=
 idea in=0A>> a production situation. As mentioned [by pretty much all of u=
s], once=0A>> the table size is sufficiently large there would be a perform=
ance=0A>> penalty by to_tsvector being executed on every record in the tabl=
e.=0A>=0A>Unless the plan comes out as a table scan the index will be used=
=0A>instead ot to_tsvector()=0A>=0A>When there is a table scan to_tsvector =
will be used instead of reading=0A>from disk, I don't know how fast to_tsve=
ctor is compared to disk, but=0A>usually computing a result is faster than =
reading it from disk.=0A>=0A>Storing the tsvector in the table is likely to=
 be faster only when a=0A>tablescan is done and the table is fully cached i=
n ram.=0A=0AI guess I was being dumb in assuming that it was obvious that a=
 GIN=0Aor GiST index would be needed for decent performance at scale. =0AWi=
thout that, a scan of the whole table (or at least all rows=0Amatching othe=
r search criteria) is needed, which is going to hurt. =0AThe benchmarks I m=
entioned were for a GIN index on the results of=0Athe function which genera=
ted the tsvector, versus a GIN index on=0Athe stored tsvector.=A0 In our ca=
se, a typical scan for document text=0Aagainst years of accumulated court d=
ocuments was about 300 ms=0Aversus about 1.5 seconds.=A0 It may matter that=
 we weren't just=0Alooking for matches, but the top K matches based on the =
ranking=0Afunction.=0A=0A-- =0AKevin Grittner=0AEnterpriseDB: http://www.en=
terprisedb.com=0AThe Enterprise PostgreSQL Company

pgsql-general by date:

Previous
From: Alban Hertroys
Date:
Subject: Re: autoanalyze criteria
Next
From: Ray Stell
Date:
Subject: Re: [JDBC] can't access through SSL