Thread: Query taking long with levenshtein function

Query taking long with levenshtein function

From
Teju Jakkidi vlogs
Date:
Hello Admins,

We have a query as below which uses the levenshtein function to compare strings.

SELECT "name", levenshtein("name",'some string') as p FROM table1 where levenshtein("name",'some string')   <= 2 order by p desc;

We have a GIST index built on top of this table as below:
CREATE INDEX gist_idx ON table1 USING GIST("name");

The table has 70million rows.

The above query when executed in postgres, takes around 3 minutes to return the result whereas a similar query in BQ takes only 10 seconds.
We made sure that data is not being spilled to disks as the query has order by clause. 
work_mem looks good (4 MB) as no spilling to temp is observed.
Table is analyzed and vacuumed.
shared_buffer size is 5GB. (Instance has total of 15 GB mem)

We are not sure what else needs to be checked for improving the query performance. Please advise if levenshtein needs any sort of other indexes.

Thanks,
Teja. J.

Re: Query taking long with levenshtein function

From
Tom Lane
Date:
Teju Jakkidi vlogs <teja.jakkidi05@gmail.com> writes:
> We have a query as below which uses the levenshtein function to compare
> strings.

> SELECT "name", levenshtein("name",'some string') as p FROM table1
> where levenshtein("name",'some string')   <= 2 order by p desc;

> We have a GIST index built on top of this table as below:
> CREATE INDEX gist_idx ON table1 USING GIST("name");

AFAIK, that index is completely useless for this query.  I don't
really see a good way to index it either --- "levenshtein distance
less than X" seems like a not very tractable requirement.  Can
you formulate your matching rules some other way?

            regards, tom lane