Thread: sequential scan performance

sequential scan performance

From
Michael Engelhart
Date:
Hi -

I have a table of about 3 million rows of city "aliases" that I need
to query using LIKE - for example:

select * from city_alias where city_name like '%FRANCISCO'


When I do an EXPLAIN ANALYZE on the above query, the result is:

  Seq Scan on city_alias  (cost=0.00..59282.31 rows=2 width=42)
(actual time=73.369..3330.281 rows=407 loops=1)
    Filter: ((name)::text ~~ '%FRANCISCO'::text)
Total runtime: 3330.524 ms
(3 rows)


this is a query that our system needs to do a LOT.   Is there any way
to improve the performance on this either with changes to our query
or by configuring the database deployment?   We have an index on
city_name but when using the % operator on the front of the query
string postgresql can't use the index .

Thanks for any help.

Mike

Re: sequential scan performance

From
"Steinar H. Gunderson"
Date:
On Sun, May 29, 2005 at 08:27:26AM -0500, Michael Engelhart wrote:
> this is a query that our system needs to do a LOT.   Is there any way
> to improve the performance on this either with changes to our query
> or by configuring the database deployment?   We have an index on
> city_name but when using the % operator on the front of the query
> string postgresql can't use the index .

Try tsearch2 from contrib, it might help you.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: sequential scan performance

From
Christopher Kings-Lynne
Date:
> When I do an EXPLAIN ANALYZE on the above query, the result is:
>
>  Seq Scan on city_alias  (cost=0.00..59282.31 rows=2 width=42)  (actual
> time=73.369..3330.281 rows=407 loops=1)
>    Filter: ((name)::text ~~ '%FRANCISCO'::text)
> Total runtime: 3330.524 ms
> (3 rows)
>
>
> this is a query that our system needs to do a LOT.   Is there any way
> to improve the performance on this either with changes to our query  or
> by configuring the database deployment?   We have an index on  city_name
> but when using the % operator on the front of the query  string
> postgresql can't use the index .

Of course not.  There really is now way to make your literal query above
fast.  You could try making a functional index on the reverse() of the
string and querying for the reverse() of 'francisco'.

Or, if you want a general full text index, you should absolutely be
using contrib/tsearch2.

Chris

Re: sequential scan performance

From
Oleg Bartunov
Date:
Michael,

I'd recommend our contrib/pg_trgm module, which provides
trigram based fuzzy search and return results ordered by similarity
to your query.  Read http://www.sai.msu.su/~megera/postgres/gist/pg_trgm/README.pg_trgm
for more details.

Oleg
On Sun, 29 May 2005, Michael Engelhart wrote:

> Hi -
>
> I have a table of about 3 million rows of city "aliases" that I need to query
> using LIKE - for example:
>
> select * from city_alias where city_name like '%FRANCISCO'
>
>
> When I do an EXPLAIN ANALYZE on the above query, the result is:
>
> Seq Scan on city_alias  (cost=0.00..59282.31 rows=2 width=42) (actual
> time=73.369..3330.281 rows=407 loops=1)
>   Filter: ((name)::text ~~ '%FRANCISCO'::text)
> Total runtime: 3330.524 ms
> (3 rows)
>
>
> this is a query that our system needs to do a LOT.   Is there any way to
> improve the performance on this either with changes to our query or by
> configuring the database deployment?   We have an index on city_name but when
> using the % operator on the front of the query string postgresql can't use
> the index .
>
> Thanks for any help.
>
> Mike
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>   (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: sequential scan performance

From
Michael Engelhart
Date:
Thanks everyone for all the  suggestions.  I'll check into those
contrib modules.

Michael
On May 29, 2005, at 2:44 PM, Oleg Bartunov wrote:

> Michael,
>
> I'd recommend our contrib/pg_trgm module, which provides
> trigram based fuzzy search and return results ordered by similarity
> to your query.  Read http://www.sai.msu.su/~megera/postgres/gist/
> pg_trgm/README.pg_trgm
> for more details.
>
> Oleg
> On Sun, 29 May 2005, Michael Engelhart wrote:
>
>
>> Hi -
>>
>> I have a table of about 3 million rows of city "aliases" that I
>> need to query using LIKE - for example:
>>
>> select * from city_alias where city_name like '%FRANCISCO'
>>
>>
>> When I do an EXPLAIN ANALYZE on the above query, the result is:
>>
>> Seq Scan on city_alias  (cost=0.00..59282.31 rows=2 width=42)
>> (actual time=73.369..3330.281 rows=407 loops=1)
>>   Filter: ((name)::text ~~ '%FRANCISCO'::text)
>> Total runtime: 3330.524 ms
>> (3 rows)
>>
>>
>> this is a query that our system needs to do a LOT.   Is there any
>> way to improve the performance on this either with changes to our
>> query or by configuring the database deployment?   We have an
>> index on city_name but when using the % operator on the front of
>> the query string postgresql can't use the index .
>>
>> Thanks for any help.
>>
>> Mike
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 2: you can get off all lists at once with the unregister command
>>   (send "unregister YourEmailAddressHere" to
>> majordomo@postgresql.org)
>>
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo@postgresql.org so that your
>      message can get through to the mailing list cleanly
>


Re: sequential scan performance

From
Steve Atkins
Date:
On Sun, May 29, 2005 at 08:27:26AM -0500, Michael Engelhart wrote:
> Hi -
>
> I have a table of about 3 million rows of city "aliases" that I need
> to query using LIKE - for example:
>
> select * from city_alias where city_name like '%FRANCISCO'
>
>
> When I do an EXPLAIN ANALYZE on the above query, the result is:
>
>  Seq Scan on city_alias  (cost=0.00..59282.31 rows=2 width=42)
> (actual time=73.369..3330.281 rows=407 loops=1)
>    Filter: ((name)::text ~~ '%FRANCISCO'::text)
> Total runtime: 3330.524 ms
> (3 rows)
>
>
> this is a query that our system needs to do a LOT.   Is there any way
> to improve the performance on this either with changes to our query
> or by configuring the database deployment?   We have an index on
> city_name but when using the % operator on the front of the query
> string postgresql can't use the index .

If that's really what you're doing (the wildcard is always at the beginning)
then something like this

  create index city_name_idx on foo (reverse(city_name));

  select * from city_alias where reverse(city_name) like reverse('%FRANCISCO');

should do just what you need.

I use this, with a plpgsql implementation of reverse, and it works nicely.

CREATE OR REPLACE FUNCTION reverse(text) RETURNS text AS '
DECLARE
       original alias for $1;
       reverse_str text;
       i int4;
BEGIN
 reverse_str = '''';
 FOR i IN REVERSE LENGTH(original)..1 LOOP
  reverse_str = reverse_str || substr(original,i,1);
 END LOOP;
 return reverse_str;
END;'
LANGUAGE 'plpgsql' IMMUTABLE;


Someone will no doubt suggest using tsearch2, and you might want to
take a look at it if you actually need full-text search, but my
experience has been that it's too slow to be useful in production, and
it's not needed for the simple "leading wildcard" case.

Cheers,
  Steve