Thread: Full text search on a complex schema - a classic problem?

Full text search on a complex schema - a classic problem?

From
Ivan Voras
Date:
Hello,

I have a schema which tracks various pieces of information which would
need to be globally searchable. One approach I came up with to make all
of the data searchable is to create a view made of UNION ALL queries
that would integrate different tables into a common structure which
could be uniformly queried by using tsearch2 functions. This would work,
up to the point where it would be practically unavoidable (for
performance reasons) to create indexes on this view, which cannot be
done. I would like to avoid using a "hand-made" materialized view (via
triggers, etc.) because of administrative overhead and because it would
duplicate data, of which there is potentially a lot.

I think this looks like a fairly common problem with full text searches
on a large-ish schemas, so I'm wondering what are the best practices
here, specifically with using tsearch2?

Re: Full text search on a complex schema - a classic problem?

From
Craig Ringer
Date:
On 23/05/10 10:40, Ivan Voras wrote:
> Hello,
>
> I have a schema which tracks various pieces of information which would
> need to be globally searchable.

If systems that exist outside the database its self are acceptable,
check out Apache Lucerne, and tools that use it like Hibernate Search.

--
Craig Ringer

Re: Full text search on a complex schema - a classic problem?

From
Andy Colson
Date:
On 05/22/2010 09:40 PM, Ivan Voras wrote:
> Hello,
>
> I have a schema which tracks various pieces of information which would
> need to be globally searchable. One approach I came up with to make all
> of the data searchable is to create a view made of UNION ALL queries
> that would integrate different tables into a common structure which
> could be uniformly queried by using tsearch2 functions. This would work,
> up to the point where it would be practically unavoidable (for
> performance reasons) to create indexes on this view, which cannot be
> done. I would like to avoid using a "hand-made" materialized view (via
> triggers, etc.) because of administrative overhead and because it would
> duplicate data, of which there is potentially a lot.
>
> I think this looks like a fairly common problem with full text searches
> on a large-ish schemas, so I'm wondering what are the best practices
> here, specifically with using tsearch2?
>
>

I have something like this, but with PostGIS layers.  When a person clicks I search all the different layers (each a
table)for information.  I use a stored proc.  Each table has its own index so each table is fast.  It also lets me
abstractout differences between the layers (I can search each a little differently). 

If each of your tables had its own full text fields and indexes, then write a stored proc to search them all
individually,it should be pretty quick. 

-Andy

Re: Full text search on a complex schema - a classic problem?

From
Ivan Voras
Date:
On 05/23/10 18:03, Andy Colson wrote:
> On 05/22/2010 09:40 PM, Ivan Voras wrote:
>> Hello,
>>
>> I have a schema which tracks various pieces of information which would
>> need to be globally searchable. One approach I came up with to make all
>> of the data searchable is to create a view made of UNION ALL queries
>> that would integrate different tables into a common structure which
>> could be uniformly queried by using tsearch2 functions. This would work,
>> up to the point where it would be practically unavoidable (for
>> performance reasons) to create indexes on this view, which cannot be
>> done. I would like to avoid using a "hand-made" materialized view (via
>> triggers, etc.) because of administrative overhead and because it would
>> duplicate data, of which there is potentially a lot.
>>
>> I think this looks like a fairly common problem with full text searches
>> on a large-ish schemas, so I'm wondering what are the best practices
>> here, specifically with using tsearch2?
>>
>
> I have something like this, but with PostGIS layers.  When a person
> clicks I search all the different layers (each a table) for
> information.  I use a stored proc.  Each table has its own index so each
> table is fast.  It also lets me abstract out differences between the
> layers (I can search each a little differently).
>
> If each of your tables had its own full text fields and indexes, then
> write a stored proc to search them all individually, it should be pretty
> quick.

This looks like an interesting solution. And it could be done
generically in our case by having a separate table describing which
tables need to be searched and by what fields.

Re: Full text search on a complex schema - a classic problem?

From
Ivan Voras
Date:
On 05/23/10 07:17, Craig Ringer wrote:
> On 23/05/10 10:40, Ivan Voras wrote:
>> Hello,
>>
>> I have a schema which tracks various pieces of information which would
>> need to be globally searchable.
>
> If systems that exist outside the database its self are acceptable,
> check out Apache Lucerne, and tools that use it like Hibernate Search.

We are currently using such an external system and while it works fast
enough it has two problems:

1) the need to periodically refresh it (via cron)
2) it complicates deployment a bit by adding dependencies

so we're moving away from it.