Re: Full text search on partial URLs - Mailing list pgsql-general

From Zev Benjamin
Subject Re: Full text search on partial URLs
Date
Msg-id 527A937C.9000703@strangersgate.com
Whole thread Raw
In response to Re: Full text search on partial URLs  (bricklen <bricklen@gmail.com>)
List pgsql-general

On 11/06/2013 02:04 PM, bricklen wrote:
>
> On Wed, Nov 6, 2013 at 10:53 AM, Zev Benjamin
> <zev-pgsql@strangersgate.com <mailto:zev-pgsql@strangersgate.com>> wrote:
>
>     On 11/06/2013 01:47 PM, bricklen wrote:
>
>
>         On Wed, Nov 6, 2013 at 10:45 AM, Zev Benjamin
>         <zev-pgsql@strangersgate.com
>         <mailto:zev-pgsql@strangersgate.com>
>         <mailto:zev-pgsql@__strangersgate.com
>         <mailto:zev-pgsql@strangersgate.com>>> wrote:
>
>              Hi,
>
>              I have Postgres full text search set up for my application
>         and it's
>              been working great!  However, my users would like their
>         searches to
>              turn up parts of URLs.  For example, they would like a
>         search for
>              "foobar" to turn up a document that contains the string
>              "http://example.com/foobar/____blah
>         <http://example.com/foobar/__blah>
>         <http://example.com/foobar/__blah <http://example.com/foobar/blah>>"
>              (and similarly for queries like "example" and "blah).  With the
>              default dictionaries for host, url, and url_path, the
>         search query
>              would have to contain the complete host or url path.
>
>              What is the best way to accomplish this?  Should I be
>         looking at
>              building a custom dictionary that breaks down hosts and
>         urls or is
>              there something simpler I can do?
>
>
>         Have you looked into trigrams?
>         http://www.postgresql.org/__docs/current/static/pgtrgm.__html
>         <http://www.postgresql.org/docs/current/static/pgtrgm.html>
>
>
>     I've looked at it in the context of adding fuzzy search.  But my
>     understanding is that doing a fuzzy search here would only work if
>     the query were a significant fraction of, say, the url path.  For
>     example, I would expect a fuzzy search of "foobar" on "/foobar/x" to
>     return a high similarity, but a fuzzy search of "foobar" on
>     "/foobar/some/very/long/path/__x" to have a low similarity.
>
>     Or are you suggesting using trigrams in a different way?
>
>
> Yeah, I was thinking more along the lines of allowing wildcard
> searching, not similarity.
>
> Eg.
> CREATE INDEX yourtable_yourcol_gist_fbi ON yourtable using GIST (
> yourcol gist_trgm_ops );
> select * from yourtable where yourcol ~~ '%foobar%';
>

Hrm.  That might work.  So the application-level search functionality
would be the union of tsearch and trigram wildcard matching.

If anyone else has other ideas, I'd be interested in hearing them as well.


Thanks,
Zev


pgsql-general by date:

Previous
From: bricklen
Date:
Subject: Re: Full text search on partial URLs
Next
From: Jeff Ross
Date:
Subject: Re: After upgrade to 9.3, streaming replication fails to start