Thread: optimizing record matching

optimizing record matching

From

Markus Wagner

Date:

25 October 2001, 08:31:42

Hi,

we have a very large table (about 1 million entries), and we have an "add"
operation that will check a new entry for equality or similarity with all of
the existing entries. The generated SQL queries look like that:

SELECT pid FROM rec WHERE (((f_lname_PC = '2C38D2E44501ED31778E0EFDFD5200CD'
OR f_lname_PH = 'CB85F68FFDDECD7CC39AF5BC2FBC0BBC') OR (f_lname_PC IS NULL OR
f_lname_PH IS NULL)) AND (f_fname_PC = '3A160A9BFF2EA5A0918F5F6667A411A7' OR
f_fname_PH = '5152F1177F0BD28FB51501597669962E') AND f_bd =
'9E6E0D70A9B76BB6990477FCF100557E' AND f_bm =
'4BE74390684A423853B68B9F05A4BAA0' AND f_by =
'15FF84F58774D638B1C4EC82B413EBA9');

We have set indices for each of the fields (f_*), but the matching process
doesn't seem to become faster.
Are there any things we could improve, e. g. special index types or things
like that?

Thanks,
Markus

Re: optimizing record matching

From

Einar Karttunen

Date:

25 October 2001, 09:20:23

On Thu, Oct 25, 2001 at 01:41:48PM +0200, Markus Wagner wrote:
> Hi,
>
> we have a very large table (about 1 million entries), and we have an "add"
> operation that will check a new entry for equality or similarity with all of
> the existing entries. The generated SQL queries look like that:
>
> SELECT pid FROM rec WHERE (((f_lname_PC = '2C38D2E44501ED31778E0EFDFD5200CD'
> OR f_lname_PH = 'CB85F68FFDDECD7CC39AF5BC2FBC0BBC') OR (f_lname_PC IS NULL OR
> f_lname_PH IS NULL)) AND (f_fname_PC = '3A160A9BFF2EA5A0918F5F6667A411A7' OR
> f_fname_PH = '5152F1177F0BD28FB51501597669962E') AND f_bd =
> '9E6E0D70A9B76BB6990477FCF100557E' AND f_bm =
> '4BE74390684A423853B68B9F05A4BAA0' AND f_by =
> '15FF84F58774D638B1C4EC82B413EBA9');
>
> We have set indices for each of the fields (f_*), but the matching process
> doesn't seem to become faster.
> Are there any things we could improve, e. g. special index types or things
> like that?
>
Are you vacuuming regularely? Please post the explain (ie. EXPLAIN SELECT pid...)
output of the query, so we can see what is taking the time. The default index
method should be best in this case, hash indeces are not faster in postgresql.

- Einar Karttunen