Boyer-More-Horspool searching LIKE queries - Mailing list pgsql-hackers

From Atsushi Ogawa
Subject Boyer-More-Horspool searching LIKE queries
Date
Msg-id CAO2GH9Ekvcj19ihQyjk_S6RUjPHADhUfPQBU0gRMs4qVfk12Aw@mail.gmail.com
Whole thread Raw
Responses Re: Boyer-More-Horspool searching LIKE queries
List pgsql-hackers

I have created a patch to enable the Boyer-More-Horspool search
algorithm (B-M-H) for LIKE queries.

B-M-H needs to initialize the skip table and keep it during SQL execution.
In this patch, flinfo->fn_extra is used to keep the skip table.

The conditions under which B-M-H can be used are as follows.

(1) B-M-H in LIKE search supports only single-byte character sets and UTF8.
Multibyte character sets does not support, because it may contain another
characters in the byte sequence. For UTF-8, it works fine, because in
UTF-8 the byte sequence of one character cannot contain another character.

(2) The pattern string should be stable parameter, because B-M-H needs to keep
the skip table generated from the pattern string during the execution of
the query.

(3) The pattern string should be at least 4 characters.
For example, '%AB%' can use B-M-H.

(4) The first and last character of the pattern string should be '%'.

(5) Characters other than the first and last of the pattern string
should not be '%', '_'.  However, escaped characters such as
'\%', '\_' are available.

Also, this patch changes the collation validity check in functions
(such as textlike) to be performed at the first execution of the query,
instead of each function execution.
I have measured the performance with the following query.

---------- ---------- ---------- ---------- ---------- ---------- ----------
SET client_min_messages TO notice;

\timing

DO $$
DECLARE
  cnt integer := 0;
  total integer := 0;
BEGIN
  FOR i IN 1..500 LOOP
    select count(*) into cnt
      from pg_catalog.pg_description d
      where d.description like '%greater%';

    total := total + cnt;
  END LOOP;

  RAISE NOTICE 'TOTAL: %', total;
END
$$
;
---------- ---------- ---------- ---------- ---------- ---------- ----------

Result
Without patch: 257.504ms
With patch: 191.638ms

Regards,
Atsushi Ogawa

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [Ext:] Re: Stream Replication not working
Next
From: Fujii Masao
Date:
Subject: Re: enhance pg_log_backend_memory_contexts() to log memory contexts of auxiliary processes