Re: Best way to use indexes for partial match at - Mailing list pgsql-general

From Andrus
Subject Re: Best way to use indexes for partial match at
Date
Msg-id dl2l8e$206p$1@news.hub.org
Whole thread Raw
In response to Best way to use indexes for partial match at beginning  ("Andrus Moor" <eetasoft@online.ee>)
Responses Re: Best way to use indexes for partial match at  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
>> I expected that making planner to use primary key index in case of
>>
>> WHERE bar::CHAR(3)='ABC'
>> WHERE bar BETWEEN 'ABC' AND 'ABC' || CHR(255)
>> WHERE SUBSTRING( bar FOR 3 )='ABC'
>
> Which doesn't necessarily work because of how some collations may work.
> The first 3 characters of the string may sort to a fundamentally different
> place than the first 4 characters. You can have a collation where
> 'ABC' < 'ABCR' < 'ABCT' < 'ABD' < 'ABZ' < 'ABCS'.  In the above, the ABCS
> value shortens so it should be found, however it's sort position in the
> index is outside the normal ABC range.

Thank you. I missed this since this does not occur in my locale.

Now I understood that cryptics text_pattern_ops, varchar_pattern_ops,
bpchar_pattern_ops
actually mean  "binary_sort_order".

> AFAIK the interfaces being used
> don't give enough information to detect such locales to handle them
> differently.

Maybe there is some setting which I can use to inform Postgres that sorting
order is strictly by single byte, not affected by byte sequences. Maybe this
information can be get from Windows API or from iconv

lc_collate='C'  assumes also binary order which is to restrictive.

lc_collate shoud be divided into two: lc_collate_order  and
lc_collate_multibyte  . lc_collate_multibyte has values true and false.
In this case primary key for LIKE comparison can be used in some non-C
locales.

My desicion based on this thread for my coding is:

1. Write all partial match queries using LIKE operator like

foo LIKE 'ABC%'


2. When data access becomes slow, create duplicate primary key index using
text_pattern_ops
or change only lc_collate to C by retaining all other settings non-C locale.

I'm afraid that my users use % and ? characters in data.
Using LIKE requires  escaping % and ? characters in search patterns. It
should be nicer to use foo:CHAR(3)='ABC'  type expressions which do not
require ? and % escaping.

I understand that  Postgres cannot use any index to speed up searches like
foo:CHAR(3)='ABC' which does not use regular expression match operators.

So LIKE with escaping is the only way.

Andrus.



pgsql-general by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: A good postgresql book
Next
From: Tom Lane
Date:
Subject: Re: Best way to use indexes for partial match at