Re: Text searching HTML - Mailing list pgsql-sql

From Tom Lane
Subject Re: Text searching HTML
Date
Msg-id 25577.1415045341@sss.pgh.pa.us
Whole thread Raw
In response to Text searching HTML  ("Campbell, Lance" <lance@illinois.edu>)
List pgsql-sql
"Campbell, Lance" <lance@illinois.edu> writes:
> Is there a preferred way to search text within an HTML document?  I have been reading up on searching via
to_tsvector. You can pass the to_tsvector two parameters.  The first appears to be a dictionary and the second text.
Isthere by chance an English HTML dictionary?  That way html tags or html attributes would be ignored.
 

I believe all the built-in text search configurations ignore HTML tags by
default, since they have no mapping for the "tag" token type that the
built-in parser reports those as.  You could of course make a custom
configuration that acts differently.
        regards, tom lane



pgsql-sql by date:

Previous
From: "Campbell, Lance"
Date:
Subject: Text searching HTML
Next
From: "Campbell, Lance"
Date:
Subject: text search index help