Thread: full text search index
Hi there,
I've got the following query:
SELECT COUNT(DISTINCT j0_.id) AS sclr10
FROM customers j0_
WHERE ((LOWER(j0_.name_first) LIKE '%some%'
OR LOWER(j0_.name_last) LIKE '%some%')
AND j0_.id = 5)
AND j0_.id = 5
The query is taking ages to run.
I read about wildcards and it seems I have to use a function with to_tsvector ?
CREATE INDEX CONCURRENTLY ON public.customers USING gin ("clientid", ("full_text_universal_cast"("name_first"::"text")), ("full_text_universal_cast"("name_last"::"text")));
full_text_universal_cast:
CREATE OR REPLACE FUNCTION public.full_text_universal_cast(doc_data "text")
RETURNS "tsvector" AS
$BODY$
SELECT to_tsvector('english', COALESCE(TRIM(CAST(doc_data AS TEXT)), ''));
$BODY$
LANGUAGE sql IMMUTABLE
COST 1000;
Would be something like above? Because it's not working...
What am I missing guys?
Thanks
On Thursday, May 26, 2016, Patrick Baker <patrickbakerbr@gmail.com> wrote:
Hi there,I've got the following query:
SELECT COUNT(DISTINCT j0_.id) AS sclr10
FROM customers j0_
WHERE ((LOWER(j0_.name_first) LIKE '%some%'
OR LOWER(j0_.name_last) LIKE '%some%')
AND j0_.id = 5)
AND j0_.id = 5The query is taking ages to run.I read about wildcards and it seems I have to use a function with to_tsvector ?
CREATE INDEX CONCURRENTLY ON public.customers USING gin ("clientid", ("full_text_universal_cast"("name_first"::"text")), ("full_text_universal_cast"("name_last"::"text")));full_text_universal_cast:CREATE OR REPLACE FUNCTION public.full_text_universal_cast(doc_data "text")
RETURNS "tsvector" AS
$BODY$
SELECT to_tsvector('english', COALESCE(TRIM(CAST(doc_data AS TEXT)), ''));
$BODY$
LANGUAGE sql IMMUTABLE
COST 1000;Would be something like above? Because it's not working...What am I missing guys?Thanks
Maybe Lucas Possamai can help. He seems to be a little bit further along in this exercise.
I'm too tired to care at the moment. And I haven't had much practical work here anyway.
David J.
Maybe Lucas Possamai can help. He seems to be a little bit further along in this exercise.I'm too tired to care at the moment. And I haven't had much practical work here anyway.David J.
I subscribed to the list today, so don't have the old emails....
I had a look on the archives tough, Lucas do you have any progress? If so, please let me know :)
Thanks
Patrick
On 26 May 2016 at 06:04, Patrick Baker <patrickbakerbr@gmail.com> wrote: > Hi there, > > I've got the following query: >> >> >> SELECT COUNT(DISTINCT j0_.id) AS sclr10 >> FROM customers j0_ >> WHERE ((LOWER(j0_.name_first) LIKE '%some%' >> OR LOWER(j0_.name_last) LIKE '%some%') >> AND j0_.id = 5) >> AND j0_.id = 5 > > > The query is taking ages to run. Your guess is as good as ours without knowing what query plan the database decided on. Post the output of explain analyze. > I read about wildcards and it seems I have to use a function with > to_tsvector ? In general, you have to use an expression of which the query planner can see that it's equivalent to the expression used in the index. Otherwise the query planner has no way of knowing whether the index is suitable for the query and it won't use the index. >> CREATE INDEX CONCURRENTLY ON public.customers USING gin ("clientid", >> ("full_text_universal_cast"("name_first"::"text")), >> ("full_text_universal_cast"("name_last"::"text"))); In your case, you should query on full_text_universal_cast(your_field) instead of on like '%some%'. Alternatively, if your query always uses the sanme wildcard expression you could create indexes on your_field like '%some%'. > full_text_universal_cast: >> >> CREATE OR REPLACE FUNCTION public.full_text_universal_cast(doc_data >> "text") >> RETURNS "tsvector" AS >> $BODY$ >> SELECT to_tsvector('english', COALESCE(TRIM(CAST(doc_data AS TEXT)), '')); >> $BODY$ >> LANGUAGE sql IMMUTABLE >> COST 1000; The query planner has no way of knowing what this function does internally, so it certainly won't match the function results in the index up with your like expression. Regards, Alban Hertroys -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
You don't provide much context, like PostgreSQL version or machine characteristics. https://wiki.postgresql.org/wiki/SlowQueryQuestions On Wed, May 25, 2016 at 11:04 PM, Patrick Baker <patrickbakerbr@gmail.com> wrote: >> SELECT COUNT(DISTINCT j0_.id) AS sclr10 >> FROM customers j0_ >> WHERE ((LOWER(j0_.name_first) LIKE '%some%' >> OR LOWER(j0_.name_last) LIKE '%some%') >> AND j0_.id = 5) >> AND j0_.id = 5 > > The query is taking ages to run. > > I read about wildcards and it seems I have to use a function with > to_tsvector ? I very much doubt that full text search is going to be helpful here -- perhaps trigrams with an appropriate gist or gin index could help. Depending on table sizes and data present, picking out rows based on the OR of scanning for a sequence of characters in a couple character string columns might not be your fastest query to run. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company