Thread: making 'like' queries quicker

making 'like' queries quicker

From

admin

Date:

17 December 1999, 22:06:18

Is there a way to make queries using the 'like' operator quicker, more
specifically for queries that look like:
select name from table where name like '%abc%';

Thanks,
Marc

Re: [SQL] making 'like' queries quicker

From

Peter Eisentraut

Date:

18 December 1999, 14:02:29

On 1999-12-17, admin mentioned:

> Is there a way to make queries using the 'like' operator quicker, more
> specifically for queries that look like:
> select name from table where name like '%abc%';

These kinds of queries (where the search string starts with a wildcard)
never use indexes, so you're stuck with the sequential scan. A faster
computer is probably your best option. :(

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden

Re: [SQL] making 'like' queries quicker

From

admin

Date:

18 December 1999, 14:53:30

Whatabout queries which only end with a wildcard? is there any way to
accelerate such a query?

> > Is there a way to make queries using the 'like' operator quicker, more
> > specifically for queries that look like:
> > select name from table where name like '%abc%';
> 
> These kinds of queries (where the search string starts with a wildcard)
> never use indexes, so you're stuck with the sequential scan. A faster
> computer is probably your best option. :(

Re: [SQL] making 'like' queries quicker

From

"tjk@tksoft.com"

Date:

18 December 1999, 17:02:31

A general rule of thumb is that indexes
only work on exact matches.

The reason why indexes are faster is because they store
a pointer to each of the actual records in a table in
a separate file. Each pointer is associated with a a number
which corresponds to the record's "hashed value."

E.g. if you have a record which contains a field
whose value is "dog ate my sandal," then, when the record
is indexed, the hashing algorith calculates a number
based on the letters in the value, which is the hashed
value (imagine adding the letters together to arrive
at a sum) *. That hashed value is then stored in a
hash or btree (in Postgres) index.

When you search for a word, if an index search is used,
the same hashing algorhithm is used to arrive at a hash
value for the search term.
Then all records with the same hash value are checked.
If you are looking for "dog ate my sandal," then the
hash value for your search term and the record in the
database is the same, so that record is returned.

Other values, such as "oh oh, I ate my dog," can
have the same hash value, depending on the algorithm
used. If multiple records are found with the index
search, each one is checked in turn for an exact match.

The moral of the story is that because the hashed value
only matches for records with identical data, and some
random records, while other records in the database
are not even looked at, using an index when
you don't have an exact value known won't work.

Because postgres allows function return values to be
used as a basis for an index, it is possible to find
"non-exact matches." E.g.  using a lowercase value of a
value to calculate an index. "lower(fieldname)" is
a valid "column name" to use when building an index in
Postgres. Similarly you could use a function which
only returns the first three letters of the value,
if those are the only letters you care about. You would
have to write a C function to do that, and then
use that when creating the index. You cannot use
SQL functions for creating an index in Postgres.
Built-in functions, on the other hand, (e.g. substring())
could be used, but I believe the parser can't handle them
if they have more than one argument. I could be wrong on
this. If I am, I would love to be enlightened on this
by somebody.

Just to elaborate on this a bit, it is possible to have
other schemes for calculating the hash value, with more
complexity, such as finding all words in a value, and then
adding an entry into an index for each of the matching
records. This type of approach would take more space, and
would be slower to maintain, but it would allow searches of
individuals words within strings. Postgres doesn't have
an indexing scheme like this built-in. This was just an
example of what you could do with an index.

I don't know all the internals of how indexes are
implemented in Postgres, so if I am off, please
correct me.


Troy


* A sum is not used.
  The algorithm calculates a value which has the
  best possible distribution of values within a
  certain range, so the index has an even distribution
  of values in all areas.


>
> Whatabout queries which only end with a wildcard? is there any way to
> accelerate such a query?
>
> > > Is there a way to make queries using the 'like' operator quicker, more
> > > specifically for queries that look like:
> > > select name from table where name like '%abc%';
> >
> > These kinds of queries (where the search string starts with a wildcard)
> > never use indexes, so you're stuck with the sequential scan. A faster
> > computer is probably your best option. :(
>
>
> ************
>
>

Re: [SQL] making 'like' queries quicker

From

Tom Lane

Date:

18 December 1999, 17:29:45

"tjk@tksoft.com" <tjk@tksoft.com> writes:
> A general rule of thumb is that indexes
> only work on exact matches.

Troy's rule of thumb is correct, but there's an important additional
property of some types of indexes: you can scan them in order (for
whatever kind of "order" is imposed by the index comparison operator).
Postgres' btree indexes work that way, but hash indexes don't.

An ordered index can be used to process inequalities and range
queries as well as exact-match queries. For example, with a btree
index you can do something likeWHERE lastname >= 'Smith' AND lastname <= 'Szekely'
fairly efficiently: you scan the portion of the index falling between
the given limits, and then extract the main-table records pointed to
by those index entries.

Therefore, it's practical to use a btree index to speed up match queries
that require a match at the start of the string. For example, givenWHERE lastname LIKE 'Smith%'
Postgres will generate additional clauseslastname >= 'Smith' AND lastname <= 'Smith\377'
which can be used with a btree index to restrict the number of records
that have to be looked at. You still have to do the LIKE comparison,
in general (consider LIKE 'Smith%Jr') but you don't have to do it for
every record in the table.

There isn't any obvious way to apply this trick for an unanchored match,
though (as in LIKE '%Smith%').

However, if you are actually interested in searching for whole words,
you could consider making an index that lists all of the whole words in
your target field, and doing an exact match with that index. See
contrib/fulltextindex in the Postgres distribution for an example.
regards, tom lane

Re: [SQL] making 'like' queries quicker

From

"tjk@tksoft.com"

Date:

19 December 1999, 05:23:39

Good point.

My attempt at a simplification oversimplified things a bit
where it shouldn't have.

Because records in the btree are based on an
alphabetic order, it is indeed possible to use an
index for finding records with 'abc%' etc.
Just like using a phonebook, really. You have
no problems using the index in a phonebook to
find your friend's number if you know the
first letters in your friend's last name. If
you don't know even just the first letter,
you will have to comb through the entire book
to find your friend.

When records are inserted into a btree, they are inserted
by comparing them to the other records in the table. This
means that they will always remain in a sorted order. The
backend can rearrange them to speed things up, but regardless,
you will always be able to find all values which are the same,
and all values which are lower or higher in value. In the case
of text strings, you would find records in an alpahabetical order.
Depending on the comparison function used, you will be able to
find records ordered according to different languages' alpahabet,
or another ordering mecchanism.

I haven't personally had the need to alter the comparison
function to sort strings, but you could do it by using a
different set of operators for the index. The default for
text columns is "text_ops," so you would change that to
your own set, or use one built-int, if one exists.
The docs contain info about this under "create index."

If you compiled with USE_LOCALE, it is possible that
sorting would be done according to the current locale,
but I can't say because I haven't tried it myself.

Troy

>
> "tjk@tksoft.com" <tjk@tksoft.com> writes:
> > A general rule of thumb is that indexes
> > only work on exact matches.
>
> Troy's rule of thumb is correct, but there's an important additional
> property of some types of indexes: you can scan them in order (for
> whatever kind of "order" is imposed by the index comparison operator).
> Postgres' btree indexes work that way, but hash indexes don't.
>
> An ordered index can be used to process inequalities and range
> queries as well as exact-match queries.  For example, with a btree
> index you can do something like
>     WHERE lastname >= 'Smith' AND lastname <= 'Szekely'
> fairly efficiently: you scan the portion of the index falling between
> the given limits, and then extract the main-table records pointed to
> by those index entries.
>
> Therefore, it's practical to use a btree index to speed up match queries
> that require a match at the start of the string.  For example, given
>     WHERE lastname LIKE 'Smith%'
> Postgres will generate additional clauses
>     lastname >= 'Smith' AND lastname <= 'Smith\377'
> which can be used with a btree index to restrict the number of records
> that have to be looked at.  You still have to do the LIKE comparison,
> in general (consider LIKE 'Smith%Jr') but you don't have to do it for
> every record in the table.
>
> There isn't any obvious way to apply this trick for an unanchored match,
> though (as in LIKE '%Smith%').
>
> However, if you are actually interested in searching for whole words,
> you could consider making an index that lists all of the whole words in
> your target field, and doing an exact match with that index.  See
> contrib/fulltextindex in the Postgres distribution for an example.
>
>             regards, tom lane
>
> ************
>
>