Thread: Best way to use indexes for partial match at beginning

Best way to use indexes for partial match at beginning

From
"Andrus Moor"
Date:
I have 10 CHARACTER columns in table mytable. Mytable is indexed by some of
those columns.

CREATE TABLE mytable ( col1 CHARACTER(10),
col2 CHARACTER(10),col3 CHARACTER(10),col4 CHARACTER(10),col5
CHARACTER(10),col6 CHARACTER(10),col7 CHARACTER(10),col8 CHARACTER(10),col9
CHARACTER(10), col10 CHARACTER(10) );

CREATE INDEX i1 ON mytable(col1);
CREATE INDEX i2 ON mytable(col2);

I need to select records by knowing some characters from beginning.
I know always 1-10 first characters of col1. So my LIKE pattern starts
always with constant characters and ends with % .

I can use LIKE:

SELECT * FROM mytable
  WHERE col1 LIKE 'A%'
  AND col2 LIKE 'BC%'
  AND col3 LIKE 'DEF%'
  AND col4 LIKE 'G%';

or substring():

SELECT * FROM mytable
  WHERE substring(col1 for 1)='A'
  AND substring(col2 for 2)= 'BC'
  AND substring(col3 for 3)='DEF'
  AND substring(col4 for 1) ='G';


Can Postgres 8.1 use indexes to speed the queries above ?

Which is the best way to to write the where clause in this case so that
index is used ?

Andrus.



Re: Best way to use indexes for partial match at beginning

From
"Jim C. Nasby"
Date:
Well, for starters, see if PostgreSQL is currently using any indexes via
EXPLAIN. First rule of performance tuning: don't.

If it's not (which is probably the case), then your best bet is to
create functional indexes; ie:

CREATE INDEX mytable__col1_4 ON mytable( substring( col1 for 4 ) );

You can then either

SELECT ... WHERE substring( col1 for 4 ) = blah

or

SELECT ... WHERE substring( col1 for 4 ) LIKE 'bla%'

Though that last one might not use the index; you'll have to check and
see.

Also, keep in mind that PostgreSQL doesn't store CHAR the same as most
other databases; the internal storage is the same as what's used for
VARCHAR and TEXT.

On Sun, Nov 06, 2005 at 11:03:01PM +0200, Andrus Moor wrote:
> I have 10 CHARACTER columns in table mytable. Mytable is indexed by some of
> those columns.
>
> CREATE TABLE mytable ( col1 CHARACTER(10),
> col2 CHARACTER(10),col3 CHARACTER(10),col4 CHARACTER(10),col5
> CHARACTER(10),col6 CHARACTER(10),col7 CHARACTER(10),col8 CHARACTER(10),col9
> CHARACTER(10), col10 CHARACTER(10) );
>
> CREATE INDEX i1 ON mytable(col1);
> CREATE INDEX i2 ON mytable(col2);
>
> I need to select records by knowing some characters from beginning.
> I know always 1-10 first characters of col1. So my LIKE pattern starts
> always with constant characters and ends with % .
>
> I can use LIKE:
>
> SELECT * FROM mytable
>   WHERE col1 LIKE 'A%'
>   AND col2 LIKE 'BC%'
>   AND col3 LIKE 'DEF%'
>   AND col4 LIKE 'G%';
>
> or substring():
>
> SELECT * FROM mytable
>   WHERE substring(col1 for 1)='A'
>   AND substring(col2 for 2)= 'BC'
>   AND substring(col3 for 3)='DEF'
>   AND substring(col4 for 1) ='G';
>
>
> Can Postgres 8.1 use indexes to speed the queries above ?
>
> Which is the best way to to write the where clause in this case so that
> index is used ?
>
> Andrus.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Best way to use indexes for partial match at beginning

From
"Andrus"
Date:
> Well, for starters, see if PostgreSQL is currently using any indexes via
> EXPLAIN. First rule of performance tuning: don't.

I'm designing a new application. Data is not available yet.
I'm using Postgres 8.1 in Windows.  Database encoding is UTF-8
lc_ctype is Estonian_Estonia.1257.
lc_collate is Estonian currently. However I can set lc_collate to C if this
solves this issue.

Doc says that
" to allow PostgreSQL to use indexes with LIKE clauses under a non-C locale,
several custom operator classes exist"

I don't understand "non-C locale".  Does this mean lc_collate or also some
other lc_ setting ?

> If it's not (which is probably the case), then your best bet is to
> create functional indexes; ie:
>
> CREATE INDEX mytable__col1_4 ON mytable( substring( col1 for 4 ) );
>
> You can then either
>
> SELECT ... WHERE substring( col1 for 4 ) = blah

I need to optimize queries with variable number of characters in beginning
like

SELECT ... WHERE substring( col1 for 1 ) = 'f'
SELECT ... WHERE substring( col1 for 2 ) = 'fo'
SELECT ... WHERE substring( col1 for 3 ) = 'foo'
etc

This approach requires creating 10 indexes for each column which is
unreasonable.

In my current dbms, Microsoft Visual FoxPro I have a single index

CREATE INDEX i1 ON mytable(col1)

I can use queries:

WHERE col1 BETWEEN 'f' and 'f'+CHR(255)
WHERE col1 BETWEEN 'fo' and 'fo'+CHR(255)
WHERE col1 BETWEEN 'foo' and 'foo'+CHR(255)

All those queries can use same index automatically in all locales. CHR(255)
is last character in any lc_collate sequence. CHR(255) is not used in col1
data.

How to get same functionality in Postgres ?
Does there exist unicode special character which is greater than all other
chars ?

Andrus.



Re: Best way to use indexes for partial match at beginning

From
Martijn van Oosterhout
Date:
On Wed, Nov 09, 2005 at 12:37:25PM +0200, Andrus wrote:
> I'm using Postgres 8.1 in Windows.  Database encoding is UTF-8
> lc_ctype is Estonian_Estonia.1257.
> lc_collate is Estonian currently. However I can set lc_collate to C if this
> solves this issue.
>
> Doc says that
> " to allow PostgreSQL to use indexes with LIKE clauses under a non-C locale,
> several custom operator classes exist"
>
> I don't understand "non-C locale".  Does this mean lc_collate or also some
> other lc_ setting ?

lc == locale. There are several different locale settings but collation
affects ordering. And Estonian is not C (obviously).

> I need to optimize queries with variable number of characters in beginning
> like
>
> SELECT ... WHERE substring( col1 for 1 ) = 'f'
> SELECT ... WHERE substring( col1 for 2 ) = 'fo'
> SELECT ... WHERE substring( col1 for 3 ) = 'foo'
> etc

If you use queries like:

SELECT ... WHERE col1 LIKE 'fo%'

it can use an index declared like:

CREATE INDEX myindex on mytable(col1 text_pattern_ops);

> In my current dbms, Microsoft Visual FoxPro I have a single index
>
> CREATE INDEX i1 ON mytable(col1)
>
> I can use queries:
>
> WHERE col1 BETWEEN 'f' and 'f'+CHR(255)
> WHERE col1 BETWEEN 'fo' and 'fo'+CHR(255)
> WHERE col1 BETWEEN 'foo' and 'foo'+CHR(255)

Well, you could do that in postgresql too, you just need to use the SQL
standard concatination operator.

WHERE col1 BETWEEN 'f' and 'f' || chr(255);

> How to get same functionality in Postgres ?
> Does there exist unicode special character which is greater than all other
> chars ?

Umm, I don't think so. Order is defined by the locale, not the
character set. My guess is that text_pattern_ops is the way to go.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Best way to use indexes for partial match at beginning

From
"Andrus"
Date:
Martijn,

>> I can use queries:
>>
>> WHERE col1 BETWEEN 'f' and 'f'+CHR(255)

>Well, you could do that in postgresql too, you just need to use the SQL
>standard concatination operator.

>WHERE col1 BETWEEN 'f' and 'f' || chr(255);

thank you.

I think it is best to use regular indexes since regular indexes since they
can be used in other types of queries also.

It seems that only way is to use BETWEEN comparison for this in Postgres
8.1.

I tried

CREATE TABLE foo ( col1 CHAR(20));
CREATE INDEX i1 ON foo(col1);
INSERT INTO foo VALUES ('bar');
SELECT * FROM foo WHERE col1 BETWEEN 'b' and 'b' || chr(255);

But this does not return any data.

How to write index optimizable WHERE clause when only some (variable number)
of characters from beginning of col1 are known ?

Only way seems to use BETWEEN comparison by concatenating character greater
than all other characters in locale. Since CHR(255) does not work this is
not possible.

So

CREATE INDEX i1 ON foo(col1);

cannot be used to optimize queries of type "get all rows where first n
charaters of col1 are known" in Postgres.

Andrus.



Re: Best way to use indexes for partial match at beginning

From
Richard Huxton
Date:
Andrus wrote:
> So
>
> CREATE INDEX i1 ON foo(col1);
>
> cannot be used to optimize queries of type "get all rows where first n
> charaters of col1 are known" in Postgres.

Of course it will! Any btree based index will let you do that. Re-read
the previous answers and make sure you pay attention to the bit about
text_pattern_ops and LIKE in non-C locales.

--
   Richard Huxton
   Archonet Ltd

Re: Best way to use indexes for partial match at beginning

From
Jaime Casanova
Date:
On 11/9/05, Andrus <eetasoft@online.ee> wrote:
> Martijn,
>
> >> I can use queries:
> >>
> >> WHERE col1 BETWEEN 'f' and 'f'+CHR(255)
>
> >Well, you could do that in postgresql too, you just need to use the SQL
> >standard concatination operator.
>
> >WHERE col1 BETWEEN 'f' and 'f' || chr(255);
>
> thank you.
>
> I think it is best to use regular indexes since regular indexes since they
> can be used in other types of queries also.
>
> It seems that only way is to use BETWEEN comparison for this in Postgres
> 8.1.
>
> I tried
>
> CREATE TABLE foo ( col1 CHAR(20));
> CREATE INDEX i1 ON foo(col1);
> INSERT INTO foo VALUES ('bar');
> SELECT * FROM foo WHERE col1 BETWEEN 'b' and 'b' || chr(255);
>
> But this does not return any data.
>
> How to write index optimizable WHERE clause when only some (variable number)
> of characters from beginning of col1 are known ?
>
> Only way seems to use BETWEEN comparison by concatenating character greater
> than all other characters in locale. Since CHR(255) does not work this is
> not possible.
>
> So
>
> CREATE INDEX i1 ON foo(col1);
>
> cannot be used to optimize queries of type "get all rows where first n
> charaters of col1 are known" in Postgres.
>
> Andrus.
>
>

you can create two indexes:

CREATE INDEX myindex_lc on mytable(col1 text_pattern_ops);
and
CREATE INDEX myindex_normal ON foo(col1);

the first one will be used when using LIKE and the other for normal
comparisons .


--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

Re: Best way to use indexes for partial match at beginning

From
"Andrus"
Date:
>> CREATE INDEX i1 ON foo(col1);
>>
>> cannot be used to optimize queries of type "get all rows where first n
>> charaters of col1 are known" in Postgres.
>
> Of course it will! Any btree based index will let you do that. Re-read the
> previous answers and make sure you pay attention to the bit about
> text_pattern_ops and LIKE in non-C locales.

Richard,

thank you. I try to formulate my problem more presicely.
I have table

CREATE TABLE foo ( bar CHAR(10)  PRIMARY KEY);

Cluster locale is non-C. Database encoding is UTF-8. Postgres vers is 8.1

PRIMARY KEY clause creates btree based index so the index exists on bar.

I want to run fast queries by knowing first characters of bar like :

1. Select records from foo where first character of bar is A
2. Select records from foo where first character of bar is B
3. Select records from foo where first  two characters of bar are BC
4. Select records from foo where first  three characters of bar are ABC
etc.

Can you write sample WHERE clause which can use existing primary key index
for those queries ?

Andrus.



Re: Best way to use indexes for partial match at beginning

From
"Andrus"
Date:
> you can create two indexes:
>
> CREATE INDEX myindex_lc on mytable(col1 text_pattern_ops);
> and
> CREATE INDEX myindex_normal ON foo(col1);
>
> the first one will be used when using LIKE and the other for normal
> comparisons .

Jaime,

CREATE INDEX myindex_normal ON foo(col1);

Creates btree structure. In other dbm system btree structure can be used for
searches where only some first characters in index key are known.

So I see no reason to create second index using text_pattern_ops for this
purpose.

I'm searching a way to use Postgres regular index for this.

Andrus.



Re: Best way to use indexes for partial match at

From
Scott Marlowe
Date:
On Wed, 2005-11-09 at 14:56, Andrus wrote:
> > you can create two indexes:
> >
> > CREATE INDEX myindex_lc on mytable(col1 text_pattern_ops);
> > and
> > CREATE INDEX myindex_normal ON foo(col1);
> >
> > the first one will be used when using LIKE and the other for normal
> > comparisons .
>
> Jaime,
>
> CREATE INDEX myindex_normal ON foo(col1);
>
> Creates btree structure. In other dbm system btree structure can be used for
> searches where only some first characters in index key are known.
>
> So I see no reason to create second index using text_pattern_ops for this
> purpose.
>
> I'm searching a way to use Postgres regular index for this.

Easy, do what those other databases do.  Setup your database to not use
a locale.

initdb --locale=C

and you're golden.

Re: Best way to use indexes for partial match at beginning

From
Martijn van Oosterhout
Date:
On Wed, Nov 09, 2005 at 10:46:27PM +0200, Andrus wrote:
> thank you. I try to formulate my problem more presicely.
> I have table
>
> CREATE TABLE foo ( bar CHAR(10)  PRIMARY KEY);
>
> Cluster locale is non-C. Database encoding is UTF-8. Postgres vers is 8.1

Do this instead:

CREATE TABLE foo ( bar CHAR(10) NOT NULL );
CREATE UNIQUE INDEX foo_bar ON foo(bar char_pattern_ops);

> I want to run fast queries by knowing first characters of bar like :
>
> 1. Select records from foo where first character of bar is A
> 2. Select records from foo where first character of bar is B
> 3. Select records from foo where first  two characters of bar are BC
> 4. Select records from foo where first  three characters of bar are ABC

SELECT * FROM foo WHERE bar LIKE 'A%';
SELECT * FROM foo WHERE bar LIKE 'B%';
SELECT * FROM foo WHERE bar LIKE 'BC%';
SELECT * FROM foo WHERE bar LIKE 'ABC%';

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Best way to use indexes for partial match at beginning

From
Jaime Casanova
Date:
On 11/9/05, Andrus <eetasoft@online.ee> wrote:
> > you can create two indexes:
> >
> > CREATE INDEX myindex_lc on mytable(col1 text_pattern_ops);
> > and
> > CREATE INDEX myindex_normal ON foo(col1);
> >
> > the first one will be used when using LIKE and the other for normal
> > comparisons .
>
> Jaime,
>
> CREATE INDEX myindex_normal ON foo(col1);
>
> Creates btree structure. In other dbm system btree structure can be used for
> searches where only some first characters in index key are known.
>

and the same is true for postgres when you use C LOCALE, but because
some implementation details i don't know so deep when using non-C
LOCALE you need the class operator in order to use the index with LIKE
'pattern%'

> So I see no reason to create second index using text_pattern_ops for this
> purpose.
>

the reason is that you want to use the index in the search... and, at
least you go and solve the problem with code, that's the way to do
it...


--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

Re: Best way to use indexes for partial match at beginning

From
"Dean Gibson (DB Administrator)"
Date:
On 2005-11-09 13:08, Martijn van Oosterhout wrote:
>> I want to run fast queries by knowing first characters of bar like :
>>
>> 1. Select records from foo where first character of bar is A
>> 2. Select records from foo where first character of bar is B
>> 3. Select records from foo where first  two characters of bar are BC
>> 4. Select records from foo where first  three characters of bar are ABC
>>
>
> SELECT * FROM foo WHERE bar LIKE 'A%';
> SELECT * FROM foo WHERE bar LIKE 'B%';
> SELECT * FROM foo WHERE bar LIKE 'BC%';
> SELECT * FROM foo WHERE bar LIKE 'ABC%';
>
> Have a nice day,
>

Or:

SELECT * FROM foo WHERE bar::CHAR(1) = 'A';
SELECT * FROM foo WHERE bar::CHAR(1) = 'B';
SELECT * FROM foo WHERE bar::CHAR(2) = 'BC';
SELECT * FROM foo WHERE bar::CHAR(3) = 'ABC';

-- Dean

Re: Best way to use indexes for partial match at

From
"Andrus"
Date:
Scott,

>> I'm searching a way to use Postgres regular index for this.
>
> Easy, do what those other databases do.  Setup your database to not use
> a locale.
>
> initdb --locale=C
>
> and you're golden.

thank you.

My language has letters in ������ with correstonding upper case letters
������

I need to prevent entering of duplicate customer names into database.
For this I created unique index on UPPER(customer_name)

I need also search for customer name in case-insensitive way. For this I use
ILIKE operator.

Those two features are working in Postgres 8.1 when I use non-C locale.
If I switch to C locale, they will not work.

My current database, Microsoft Visual Foxpro implements this functionality.

How to implement this functionality in Postgres if I switch to C locale ?

Andrus.



Re: Best way to use indexes for partial match at beginning

From
"Andrus"
Date:
>> CREATE TABLE foo ( bar CHAR(10)  PRIMARY KEY);
>>
>> Cluster locale is non-C. Database encoding is UTF-8. Postgres vers is 8.1

>Do this instead:

>CREATE TABLE foo ( bar CHAR(10) NOT NULL );
>CREATE UNIQUE INDEX foo_bar ON foo(bar char_pattern_ops);

Martijn,

Thank you. I have CHAR columns and need a primary key also. So I tried the
code

CREATE TABLE foo ( bar CHAR(10) NOT NULL );
CREATE UNIQUE INDEX foo_bar ON foo(bar bpchar_pattern_ops);
ALTER TABLE foo ADD PRIMARY KEY (bar);

I found that adding primary key creates another index.

How to create primary key without duplicate index on bar column ?

Andrus.



Re: Best way to use indexes for partial match at

From
Scott Marlowe
Date:
On Wed, 2005-11-09 at 15:30, Andrus wrote:
> Scott,
>
> >> I'm searching a way to use Postgres regular index for this.
> >
> > Easy, do what those other databases do.  Setup your database to not use
> > a locale.
> >
> > initdb --locale=C
> >
> > and you're golden.
>
> thank you.
>
> My language has letters in  with correstonding upper case letters
>
>
> I need to prevent entering of duplicate customer names into database.
> For this I created unique index on UPPER(customer_name)
>
> I need also search for customer name in case-insensitive way. For this I use
> ILIKE operator.
>
> Those two features are working in Postgres 8.1 when I use non-C locale.
> If I switch to C locale, they will not work.
>
> My current database, Microsoft Visual Foxpro implements this functionality.
>
> How to implement this functionality in Postgres if I switch to C locale ?

You can't.  You have conflicting desires.  PostgreSQL IS NOT FOXPRO.  If
you want to use foxpro, then do so.  If you want to use PostgreSQL, then
you'll either have to accept that you need to make certain accomodations
to use it with a non-C locale, or accept a C locale and its limitations.

You say that Foxpro implements this functionality, but are you sure that
it gets things like collation correct?  I.e. does it truly understand
all the rules for what comes before something else in your language?
Locales are a complex and difficult thing to get exactly right, and
while, at first blush, Foxpro may seem to do the right thing, you may
find it isn't doing EVERYTHING exactly right, and still having good
performance.

Then again, maybe it is.

But PostgreSQL is limited to working either in a C locale and
automatically using indexes for like 'abc%' queries but getting
collation wrong, or working in the correct locale, not using indexes for
like 'abc%', having to use the special class operator if you want likes
to work, and getting the collation correct.

If that doesn't work for you, your only real choice is to either use
another database, or start hacking to make PostgreSQL the database you
want it to be.

It's not a simple problem, and there is no simple answer.  And if you
expect any database to not have things like this in it to deal with, you
just haven't looked very hard at any of them.  They've all got warts.
And sometimes, one db is just not a good fit.

Perhaps full text searching could help you out here?  Not sure.

Re: Best way to use indexes for partial match at beginning

From
"Andrus"
Date:
> SELECT * FROM foo WHERE bar::CHAR(1) = 'A';
> SELECT * FROM foo WHERE bar::CHAR(1) = 'B';
> SELECT * FROM foo WHERE bar::CHAR(2) = 'BC';
> SELECT * FROM foo WHERE bar::CHAR(3) = 'ABC';

 Dean,

thank you. That would be EXCELLENT solution!

Can you confirm that in this case Postgres 8.1 can use index created by

CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY );

even on non-C locale ?

Andrus.



Re: Best way to use indexes for partial match at beginning

From
Jaime Casanova
Date:
On 11/9/05, Andrus <eetasoft@online.ee> wrote:
> >> CREATE TABLE foo ( bar CHAR(10)  PRIMARY KEY);
> >>
> >> Cluster locale is non-C. Database encoding is UTF-8. Postgres vers is 8.1
>
> >Do this instead:
>
> >CREATE TABLE foo ( bar CHAR(10) NOT NULL );
> >CREATE UNIQUE INDEX foo_bar ON foo(bar char_pattern_ops);
>
> Martijn,
>
> Thank you. I have CHAR columns and need a primary key also. So I tried the
> code
>
> CREATE TABLE foo ( bar CHAR(10) NOT NULL );
> CREATE UNIQUE INDEX foo_bar ON foo(bar bpchar_pattern_ops);
> ALTER TABLE foo ADD PRIMARY KEY (bar);
>
> I found that adding primary key creates another index.
>
> How to create primary key without duplicate index on bar column ?
>
> Andrus.
>
>

you can't.
postgresql implements primary keys creating unique indexes and not
null constraints on the pk columns.

--
Atentamente,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

Re: Best way to use indexes for partial match at

From
Scott Marlowe
Date:
On Wed, 2005-11-09 at 16:23, Jaime Casanova wrote:
> On 11/9/05, Andrus <eetasoft@online.ee> wrote:
> > >> CREATE TABLE foo ( bar CHAR(10)  PRIMARY KEY);
> > >>
> > >> Cluster locale is non-C. Database encoding is UTF-8. Postgres vers is 8.1
> >
> > >Do this instead:
> >
> > >CREATE TABLE foo ( bar CHAR(10) NOT NULL );
> > >CREATE UNIQUE INDEX foo_bar ON foo(bar char_pattern_ops);
> >
> > Martijn,
> >
> > Thank you. I have CHAR columns and need a primary key also. So I tried the
> > code
> >
> > CREATE TABLE foo ( bar CHAR(10) NOT NULL );
> > CREATE UNIQUE INDEX foo_bar ON foo(bar bpchar_pattern_ops);
> > ALTER TABLE foo ADD PRIMARY KEY (bar);
> >
> > I found that adding primary key creates another index.
> >
> > How to create primary key without duplicate index on bar column ?
> >
> > Andrus.
> >
> >
>
> you can't.
> postgresql implements primary keys creating unique indexes and not
> null constraints on the pk columns.

But, of course, you CAN delete that other index now that it's redundant.

Re: Best way to use indexes for partial match at beginning

From
Peter Eisentraut
Date:
Andrus wrote:
> > SELECT * FROM foo WHERE bar::CHAR(1) = 'A';
> > SELECT * FROM foo WHERE bar::CHAR(1) = 'B';
> > SELECT * FROM foo WHERE bar::CHAR(2) = 'BC';
> > SELECT * FROM foo WHERE bar::CHAR(3) = 'ABC';

> Can you confirm that in this case Postgres 8.1 can use index created
> by
> CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY );
> even on non-C locale ?

It will not use the index no matter what locale.  You would in these
cases need to create additional expression indexes on bar::char(1) etc.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Best way to use indexes for partial match at

From
"Andrus"
Date:
>> > How to create primary key without duplicate index on bar column ?
>> >
>> > Andrus.
>> >
>> >
>>
>> you can't.
>> postgresql implements primary keys creating unique indexes and not
>> null constraints on the pk columns.
>
> But, of course, you CAN delete that other index now that it's redundant.

Scott,

thank you. I don't understand how to delete primary key index without
removing primary key constraint or how to force primary key to use foo_bar
index.

I tried

CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY );
CREATE UNIQUE INDEX foo_bar ON foo(bar bpchar_pattern_ops);
DROP INDEX foo_pkey;

but got

ERROR:  cannot drop index foo_pkey because constraint foo_pkey on table foo
requires it

Andrus.



Re: Best way to use indexes for partial match at

From
Peter Eisentraut
Date:
Am Donnerstag, 10. November 2005 10:19 schrieb Andrus:
> thank you. I don't understand how to delete primary key index without
> removing primary key constraint or how to force primary key to use foo_bar
> index.

You cannot drop the primary key index without dropping the primary key, and
you cannot change the primary key to use a different index.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Best way to use indexes for partial match at

From
Scott Marlowe
Date:
On Thu, 2005-11-10 at 03:19, Andrus wrote:
> >> > How to create primary key without duplicate index on bar column ?
> >> >
> >> > Andrus.
> >> >
> >> >
> >>
> >> you can't.
> >> postgresql implements primary keys creating unique indexes and not
> >> null constraints on the pk columns.
> >
> > But, of course, you CAN delete that other index now that it's redundant.
>
> Scott,
>
> thank you. I don't understand how to delete primary key index without
> removing primary key constraint or how to force primary key to use foo_bar
> index.
>
> I tried
>
> CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY );
> CREATE UNIQUE INDEX foo_bar ON foo(bar bpchar_pattern_ops);
> DROP INDEX foo_pkey;
>
> but got
>
> ERROR:  cannot drop index foo_pkey because constraint foo_pkey on table foo
> requires it

Sorry, I thought you were looking at removing the OTHER index, the
unique one.  Now that I see you need it for the bpchar pattern ops, I
guess you're kinda stuck having two indexes if you want a primary key in
the table.

Note that if you don't use the pk in unnamed joins (i.e. you always
identify the field you're keying off of) then the primary key is
redundant and not needed, and you could just create the table without it
and then create the unique index.

Re: Best way to use indexes for partial match at

From
"Andrus"
Date:
> Note that if you don't use the pk in unnamed joins (i.e. you always
> identify the field you're keying off of) then the primary key is
> redundant and not needed, and you could just create the table without it
> and then create the unique index.

Thank you.
Can Postgres use index

CREATE UNIQUE INDEX foo_bar ON foo(bar bpchar_pattern_ops);

in usual WHERE expressions just like regular index ?

I.e can

SELECT * FROM foo WHERE bar='123'

use index foo_bar ?

Andrus.



Re: Best way to use indexes for partial match at

From
Scott Marlowe
Date:
On Thu, 2005-11-10 at 07:45, Peter Eisentraut wrote:
> Am Donnerstag, 10. November 2005 10:19 schrieb Andrus:
> > thank you. I don't understand how to delete primary key index without
> > removing primary key constraint or how to force primary key to use foo_bar
> > index.
>
> You cannot drop the primary key index without dropping the primary key, and
> you cannot change the primary key to use a different index.

Ya know, this brings up an interesting question, would it be feasible to
allow for a function of somekind to be applied as an argument to a
primary key declaration?  It would certainly be useful for character
type pks in non-C locales.

Re: Best way to use indexes for partial match at

From
Peter Eisentraut
Date:
Scott Marlowe wrote:
> Ya know, this brings up an interesting question, would it be feasible
> to allow for a function of somekind to be applied as an argument to a
> primary key declaration?

You would have to guarantee somehow that the function is a one-to-one
mapping, in order not to destroy the integrity of the primary key
constraint.  I cannot think of any useful functions that fulfill this
criterion.

> It would certainly be useful for character
> type pks in non-C locales.

I don't see how.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Best way to use indexes for partial match at

From
Peter Eisentraut
Date:
Andrus wrote:
> Can Postgres use index
>
> CREATE UNIQUE INDEX foo_bar ON foo(bar bpchar_pattern_ops);
>
> in usual WHERE expressions just like regular index ?
>
> I.e can
>
> SELECT * FROM foo WHERE bar='123'
>
> use index foo_bar ?

I think in the time it took you to write this email you could have typed
in those lines into psql and found out yourself.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Best way to use indexes for partial match at

From
Scott Marlowe
Date:
On Thu, 2005-11-10 at 11:27, Peter Eisentraut wrote:
> Scott Marlowe wrote:
> > Ya know, this brings up an interesting question, would it be feasible
> > to allow for a function of somekind to be applied as an argument to a
> > primary key declaration?
>
> You would have to guarantee somehow that the function is a one-to-one
> mapping, in order not to destroy the integrity of the primary key
> constraint.  I cannot think of any useful functions that fulfill this
> criterion.

No, I wouldn't think one to one would be necessary.  If you had a
primary key that was case insensitive, for example, it would not map one
to one.  It would, in fact, be MORE greedy about matching, so that you
could not have both a "Peter Eisentraut" and a "peter eisentraut" in the
same table.

While I could easily add a unique on table (lower(namefield)) to get
this, being able to use a lower() or some other function would be very
useful.  I don't see one to one being necessary, it just needs to be
consistent.

>
> > It would certainly be useful for character
> > type pks in non-C locales.
>
> I don't see how.

By only having to maintain one index on a large table instead of having
a PK AND a separate unique index for these kinds of cases.

Re: Best way to use indexes for partial match at

From
Peter Eisentraut
Date:
Scott Marlowe wrote:
> No, I wouldn't think one to one would be necessary.  If you had a
> primary key that was case insensitive, for example, it would not map
> one to one.  It would, in fact, be MORE greedy about matching, so
> that you could not have both a "Peter Eisentraut" and a "peter
> eisentraut" in the same table.

I suppose one could define equivalence classes that way, and if
everything else that interacted with the table (foreign keys, other
functions, etc.) behaved consistently with respect to those equivalence
classes, things would work out.  Lots of "if's" though.  The better and
safer way to do that, however, would be to define a data type.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Best way to use indexes for partial match at

From
"Andrus"
Date:
> Ya know, this brings up an interesting question, would it be feasible to
> allow for a function of somekind to be applied as an argument to a
> primary key declaration?  It would certainly be useful for character
> type pks in non-C locales.

Using this would require making non-standard changes to ddl statements and
thus is not good.

Best solution is as follows:

Planner must use index

CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY );

for queries like

SELECT * FROM foo WHERE bar::CHAR(3)='ABC';

Can you make this patch or add this to todo list  ?

Andrus.



Re: Best way to use indexes for partial match at

From
Martijn van Oosterhout
Date:
On Fri, Nov 11, 2005 at 11:31:37AM +0200, Andrus wrote:
> Best solution is as follows:
>
> Planner must use index
>
> CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY );
>
> for queries like
>
> SELECT * FROM foo WHERE bar::CHAR(3)='ABC';

Your query is the same as using LIKE, so why not express it that way?
Is it that unreasonable that a PRIMARY KEY should use the most natural
way to order strings for your locale and that if you want to use LIKE
in non-C locales that you need to specify that explicitly?

PRIMARY KEY == UNIQUE + NOT NULL

Incidently, another way might be COLLATE support, something like:

CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY COLLATE like_compatable );

But that's already on the cards.

[1] http://gborg.postgresql.org/project/citext/projdisplay.php

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Best way to use indexes for partial match at

From
"Andrus"
Date:
>> SELECT * FROM foo WHERE bar::CHAR(3)='ABC';

> Your query is the same as using LIKE, so why not express it that way?

I want simply to select by first 3 characters. LIKE is too powerful and
unnessecary. LIKE requires
escaping % and ? characters in pattern.

I expected that making planner to use primary key index in case of

WHERE bar::CHAR(3)='ABC'

or in

WHERE bar BETWEEN 'ABC' AND 'ABC' || CHR(255)

or in

WHERE SUBSTRING( bar FOR 3 )='ABC'

is more simpler than dealing with locale problems in WHERE bar LIKE 'ABC%':

Using LIKE with index optimization in Postgres in non-C locale requires

a. creating non-SQL standard compatible index unsin operator class
b. requires that primary key column has two indexes

This is too much overhead.

>Is it that unreasonable that a PRIMARY KEY should use the most natural
> way to order strings for your locale

This is very reasonable. PRIMARY KEY must use locale order always.

> and that if you want to use LIKE
> in non-C locales that you need to specify that explicitly?

This is unreasonable.

If I use SQL standard way to create table

CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY );

and use SQL standard WHERE clause

WHERE bar LIKE 'ABC%'

or

WHERE bar::CHAR(3)='ABC'

I expect that primary key index can be used without non-standard extensions
to SQL language

Andrus.



Re: Best way to use indexes for partial match at

From
Tom Lane
Date:
"Andrus" <eetasoft@online.ee> writes:
> Planner must use index
> CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY );
> for queries like
> SELECT * FROM foo WHERE bar::CHAR(3)='ABC';
> Can you make this patch or add this to todo list  ?

This isn't going to happen for exactly the same reason that LIKE
optimization doesn't happen (in non-C locales): the sort order of
non-C indexes isn't compatible with testing a prefix of the string.

            regards, tom lane

Re: Best way to use indexes for partial match at

From
Jaime Casanova
Date:
On 11/11/05, Andrus <eetasoft@online.ee> wrote:
> >> SELECT * FROM foo WHERE bar::CHAR(3)='ABC';
>
> > Your query is the same as using LIKE, so why not express it that way?
>
> I want simply to select by first 3 characters. LIKE is too powerful and
> unnessecary. LIKE requires
> escaping % and ? characters in pattern.
>
> I expected that making planner to use primary key index in case of
>
> WHERE bar::CHAR(3)='ABC'
>
> or in
>
> WHERE bar BETWEEN 'ABC' AND 'ABC' || CHR(255)
>
> or in
>
> WHERE SUBSTRING( bar FOR 3 )='ABC'
>
> is more simpler than dealing with locale problems in WHERE bar LIKE 'ABC%':
>
> Using LIKE with index optimization in Postgres in non-C locale requires
>
> a. creating non-SQL standard compatible index unsin operator class
>

are indexes in the SQL-STANDARD? i touhgt we can do with indexes what
we think is better... like other database has its own implementation
details about indexes...

> b. requires that primary key column has two indexes
>
> This is too much overhead.
>

it requires two indexes, yes... but one of them can be a normal
(accept duplicates) not unique one... just let the primary key create
its index and create the one you needs without the unique clause

> >Is it that unreasonable that a PRIMARY KEY should use the most natural
> > way to order strings for your locale
>
> This is very reasonable. PRIMARY KEY must use locale order always.
>
> > and that if you want to use LIKE
> > in non-C locales that you need to specify that explicitly?
>
> This is unreasonable.
>

Maybe, but is unfair for your side to said that without actually look
at the code and the problems... maybe, you want to look at the code
and fix what you think is wrong?

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

Re: Best way to use indexes for partial match at

From
Stephan Szabo
Date:
On Fri, 11 Nov 2005, Andrus wrote:

> >> SELECT * FROM foo WHERE bar::CHAR(3)='ABC';
>
> > Your query is the same as using LIKE, so why not express it that way?
>
> I want simply to select by first 3 characters. LIKE is too powerful and
> unnessecary. LIKE requires
> escaping % and ? characters in pattern.
>
> I expected that making planner to use primary key index in case of
>
> WHERE bar::CHAR(3)='ABC'
> WHERE bar BETWEEN 'ABC' AND 'ABC' || CHR(255)
> WHERE SUBSTRING( bar FOR 3 )='ABC'

Which doesn't necessarily work because of how some collations may work.
The first 3 characters of the string may sort to a fundamentally different
place than the first 4 characters. You can have a collation where
'ABC' < 'ABCR' < 'ABCT' < 'ABD' < 'ABZ' < 'ABCS'.  In the above, the ABCS
value shortens so it should be found, however it's sort position in the
index is outside the normal ABC range. AFAIK the interfaces being used
don't give enough information to detect such locales to handle them
differently.

Re: Best way to use indexes for partial match at

From
"Andrus"
Date:
>> I expected that making planner to use primary key index in case of
>>
>> WHERE bar::CHAR(3)='ABC'
>> WHERE bar BETWEEN 'ABC' AND 'ABC' || CHR(255)
>> WHERE SUBSTRING( bar FOR 3 )='ABC'
>
> Which doesn't necessarily work because of how some collations may work.
> The first 3 characters of the string may sort to a fundamentally different
> place than the first 4 characters. You can have a collation where
> 'ABC' < 'ABCR' < 'ABCT' < 'ABD' < 'ABZ' < 'ABCS'.  In the above, the ABCS
> value shortens so it should be found, however it's sort position in the
> index is outside the normal ABC range.

Thank you. I missed this since this does not occur in my locale.

Now I understood that cryptics text_pattern_ops, varchar_pattern_ops,
bpchar_pattern_ops
actually mean  "binary_sort_order".

> AFAIK the interfaces being used
> don't give enough information to detect such locales to handle them
> differently.

Maybe there is some setting which I can use to inform Postgres that sorting
order is strictly by single byte, not affected by byte sequences. Maybe this
information can be get from Windows API or from iconv

lc_collate='C'  assumes also binary order which is to restrictive.

lc_collate shoud be divided into two: lc_collate_order  and
lc_collate_multibyte  . lc_collate_multibyte has values true and false.
In this case primary key for LIKE comparison can be used in some non-C
locales.

My desicion based on this thread for my coding is:

1. Write all partial match queries using LIKE operator like

foo LIKE 'ABC%'


2. When data access becomes slow, create duplicate primary key index using
text_pattern_ops
or change only lc_collate to C by retaining all other settings non-C locale.

I'm afraid that my users use % and ? characters in data.
Using LIKE requires  escaping % and ? characters in search patterns. It
should be nicer to use foo:CHAR(3)='ABC'  type expressions which do not
require ? and % escaping.

I understand that  Postgres cannot use any index to speed up searches like
foo:CHAR(3)='ABC' which does not use regular expression match operators.

So LIKE with escaping is the only way.

Andrus.



Re: Best way to use indexes for partial match at

From
Tom Lane
Date:
"Andrus" <eetasoft@online.ee> writes:
> Maybe there is some setting which I can use to inform Postgres that sorting
> order is strictly by single byte, not affected by byte sequences.

AFAIK, C/POSIX is the *only* commonly used locale in which that holds.
Do you have a counterexample?  (Hint: all the other ones use dictionary
sorting rules, which have at least a discrimination against spaces.)

            regards, tom lane

Re: Best way to use indexes for partial match at

From
"Andrus"
Date:
> (Hint: all the other ones use dictionary
> sorting rules, which have at least a discrimination against spaces.)

Tom,

thank you. I ran the following code in 8.1

show lc_collate; -- returns "Estonian_Estonia.1257"
create temp table foo ( bar char(10) ) on commit drop ;
insert into foo values ('A');
insert into foo values ('A  C'); -- two spaces
insert into foo values ('A B'); -- single space
insert into foo values ('A C'); -- single space
select * from foo order by bar;

and got

"A         "
"A  C      "
"A B       "
"A C       "

I don't see any space discrimination on sorting here.
I sorted the same data in Microsoft Word and got the same result.

Andrus.




Re: Best way to use indexes for partial match at

From
Tom Lane
Date:
"Andrus" <eetasoft@online.ee> writes:
>> (Hint: all the other ones use dictionary
>> sorting rules, which have at least a discrimination against spaces.)

> show lc_collate; -- returns "Estonian_Estonia.1257"
> ...
> I don't see any space discrimination on sorting here.

Hmm ... I should probably have qualified my statement as "all the
locales generally used on Unix systems".  I have no idea what the
situation is like on Windoze.

Anyone have some data about the sort rules used by common Windows
locales?

            regards, tom lane