Thread: full text search index

full text search index

From
Patrick Baker
Date:
Hi there,

I've got the following query:

SELECT COUNT(DISTINCT j0_.id) AS sclr10
FROM customers j0_
WHERE ((LOWER(j0_.name_first) LIKE '%some%'
        OR LOWER(j0_.name_last) LIKE '%some%')       
       AND j0_.id = 5)
  AND j0_.id = 5

The query is taking ages to run.

I read about wildcards and it seems I have to use a function with to_tsvector ? 

CREATE INDEX CONCURRENTLY ON public.customers USING gin ("clientid", ("full_text_universal_cast"("name_first"::"text")), ("full_text_universal_cast"("name_last"::"text")));

full_text_universal_cast:
CREATE OR REPLACE FUNCTION public.full_text_universal_cast(doc_data "text")
  RETURNS "tsvector" AS
$BODY$
SELECT to_tsvector('english', COALESCE(TRIM(CAST(doc_data AS TEXT)), ''));
$BODY$
  LANGUAGE sql IMMUTABLE
  COST 1000;



Would be something like above? Because it's not working...

What am I missing guys?
Thanks 

full text search index

From
"David G. Johnston"
Date:
On Thursday, May 26, 2016, Patrick Baker <patrickbakerbr@gmail.com> wrote:
Hi there,

I've got the following query:

SELECT COUNT(DISTINCT j0_.id) AS sclr10
FROM customers j0_
WHERE ((LOWER(j0_.name_first) LIKE '%some%'
        OR LOWER(j0_.name_last) LIKE '%some%')       
       AND j0_.id = 5)
  AND j0_.id = 5

The query is taking ages to run.

I read about wildcards and it seems I have to use a function with to_tsvector ? 

CREATE INDEX CONCURRENTLY ON public.customers USING gin ("clientid", ("full_text_universal_cast"("name_first"::"text")), ("full_text_universal_cast"("name_last"::"text")));

full_text_universal_cast:
CREATE OR REPLACE FUNCTION public.full_text_universal_cast(doc_data "text")
  RETURNS "tsvector" AS
$BODY$
SELECT to_tsvector('english', COALESCE(TRIM(CAST(doc_data AS TEXT)), ''));
$BODY$
  LANGUAGE sql IMMUTABLE
  COST 1000;



Would be something like above? Because it's not working...

What am I missing guys?
Thanks 

Maybe Lucas Possamai can help.  He seems to be a little bit further along in this exercise.

I'm too tired to care at the moment.  And I haven't had much practical work here anyway.

David J.
 


Re: full text search index

From
Patrick Baker
Date:


Maybe Lucas Possamai can help.  He seems to be a little bit further along in this exercise.

I'm too tired to care at the moment.  And I haven't had much practical work here anyway.

David J.
 




I subscribed to the list today, so don't have the old emails.... 

I had a look on the archives tough, Lucas do you have any progress? If so, please let me know :)


Thanks
Patrick

Re: full text search index

From
Alban Hertroys
Date:
On 26 May 2016 at 06:04, Patrick Baker <patrickbakerbr@gmail.com> wrote:
> Hi there,
>
> I've got the following query:
>>
>>
>> SELECT COUNT(DISTINCT j0_.id) AS sclr10
>> FROM customers j0_
>> WHERE ((LOWER(j0_.name_first) LIKE '%some%'
>>         OR LOWER(j0_.name_last) LIKE '%some%')
>>        AND j0_.id = 5)
>>   AND j0_.id = 5
>
>
> The query is taking ages to run.

Your guess is as good as ours without knowing what query plan the
database decided on. Post the output of explain analyze.

> I read about wildcards and it seems I have to use a function with
> to_tsvector ?

In general, you have to use an expression of which the query planner
can see that it's equivalent to the expression used in the index.
Otherwise the query planner has no way of knowing whether the index is
suitable for the query and it won't use the index.

>> CREATE INDEX CONCURRENTLY ON public.customers USING gin ("clientid",
>> ("full_text_universal_cast"("name_first"::"text")),
>> ("full_text_universal_cast"("name_last"::"text")));

In your case, you should query on full_text_universal_cast(your_field)
instead of on like '%some%'.

Alternatively, if your query always uses the sanme wildcard expression
you could create indexes on your_field like '%some%'.

> full_text_universal_cast:
>>
>> CREATE OR REPLACE FUNCTION public.full_text_universal_cast(doc_data
>> "text")
>>   RETURNS "tsvector" AS
>> $BODY$
>> SELECT to_tsvector('english', COALESCE(TRIM(CAST(doc_data AS TEXT)), ''));
>> $BODY$
>>   LANGUAGE sql IMMUTABLE
>>   COST 1000;

The query planner has no way of knowing what this function does
internally, so it certainly won't match the function results in the
index up with your like expression.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: full text search index

From
Kevin Grittner
Date:
You don't provide much context, like PostgreSQL version or machine
characteristics.

https://wiki.postgresql.org/wiki/SlowQueryQuestions

On Wed, May 25, 2016 at 11:04 PM, Patrick Baker
<patrickbakerbr@gmail.com> wrote:

>> SELECT COUNT(DISTINCT j0_.id) AS sclr10
>> FROM customers j0_
>> WHERE ((LOWER(j0_.name_first) LIKE '%some%'
>>         OR LOWER(j0_.name_last) LIKE '%some%')
>>        AND j0_.id = 5)
>>   AND j0_.id = 5
>
> The query is taking ages to run.
>
> I read about wildcards and it seems I have to use a function with
> to_tsvector ?

I very much doubt that full text search is going to be helpful here
-- perhaps trigrams with an appropriate gist or gin index could
help.  Depending on table sizes and data present, picking out rows
based on the OR of scanning for a sequence of characters in a
couple character string columns might not be your fastest query to
run.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company