Thread: optimize query?

optimize query?

From
hamann.w@t-online.de
Date:
Hi,

I am trying to match items from 2 tables based on a common string.
One is a big table which has one column with entries like XY123, ABC44, etc
The table has an index on that column.
The second table is, typically, much smaller

select .... from tab1, tab2 where tab1.code = tab2.code;

This works fine and fast.
Now, as a variant, I have some entries like XY423A, XY423B, GF55A, GF55D in the
big table and want them to match XY423, GF55 in the second table

Variants I have tried

select  .... from tab1, tab2 where tab1.code ~ (tab2.code||'($|[A-Z])');
select  .... from tab1, tab2 where tab1.code ~ ('^'||tab2.code||'($|[A-Z])');

both take an enormous time. In the better case that I can subset (e.g. all candidates in table 2
share initial "AX") I get back to manageable times by adding
  and tab1.code ~ '^AX'
into the recipe. Actual runtime with about a million entries in tab1 and 800 entries in tab2
is about 40 seconds.

Regards
Wolfgang Hamann



Re: optimize query?

From
hamann.w@t-online.de
Date:
Bob Futrelle wrote:

If looking for the variants with a single suffixed character is all you'll
ever need to do:
Do a single pass on the large table, after creating a field, 'trunc' that
contains a truncated version of the item, e.g., adding XY423 to trunc for
each entry of the form XY423A, or XY423B, etc.  This would be a one-time
operation.

Hi Bob,

unfortunately it is not because  the suffix pattern is not always the same.
On the other hand, I tried to split the query into

select substring (code from '^[A-Z]+[0-9]+') as code into tmp table t
select .... where tab2.code = t.code

which definitely was a big improvement. I believe that most of the time
passing in a pattern like that would work

So thank you for that tip

Regards
Wolfgang