I wonder if PLPerl could be used to extract the words from a PDF document and create a tsvector column from it.
I don't know about PLPerl(I'm pretty sure it could be used for this purpose, though.). On the other hand I've written code for this in Python which should be easy to adapt for PLPython, if necessary.
I'd swear someone already built something to do this. All you need is a library which reads PDF and transforms it into text, and then you can FTS it. I know there's a module for OpenOffice docs somewhere as well, but heck if I can remember where.
I used pdftotext for that.
I think it'd be useful to have extension{s}, which can be used to convert anything to text. I remember someone indexed chemical formulae, TeX/LaTeX, DOC files.
-- -- Josh Berkus Red Hat OSAS (any opinions are my own)