Home > mailing lists

Re: tsvector limitations - Mailing list pgsql-admin

From	Kevin Grittner
Subject	Re: tsvector limitations
Date	June 14, 2011 15:36:37
Msg-id	4DF7399B020000250003E5AB@gw.wicourts.gov Whole thread Raw
In response to	Re: tsvector limitations (Tim <elatllat@gmail.com>)
Responses	Re: tsvector limitations ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) Re: tsvector limitations (Tim <elatllat@gmail.com>)
List	pgsql-admin

Tree view

Tim <elatllat@gmail.com> wrote:

> I would be surprised if there is no general "how big is this
> object" method in PostgreSQL.

You could cast to text and use octet_length().

> If it's "bad design" to store large text documents (pdf,docx,etc)
> as a BLOBs or on a filesystem and make them searchable with
> tsvectors can you suggest a good design?

Well, I suggested that storing a series of novels as a single entry
seemed bad design to me.  Perhaps one entry per novel or even finer
granularity would make more sense in most applications, but there
could be exceptions.  Likewise, a list of distinct words is of
dubious value in most applications' text searches.  We extract text
from court documents and store a tsvector for each document; we
don't aggregate all court documents for a year and create a
tsvector for that -- that would not be useful for us.

> If making your own search implementation is "better" what is the
> point of tsvectors?

I remember you asking about doing that, but I don't think anyone
else has advocated it.

> Maybe I'm missing something here?

If you were to ask for real-world numbers you'd probably get farther
than demanding that people volunteer their time to perform tests
that you define but don't seem willing to run.  Or if you describe
your use case in more detail, with questions about alternative
approaches, you're likely to get useful advice.

-Kevin

pgsql-admin by date:

From: Tim
Date: 14 June 2011, 15:08:21
Subject: Re: tsvector limitations

From: "Kevin Grittner"
Date: 14 June 2011, 15:44:31
Subject: Re: tsvector limitations

Re: tsvector limitations - Mailing list pgsql-admin

Previous

Next