Thread: Best way to use indexes for partial match at beginning
I have 10 CHARACTER columns in table mytable. Mytable is indexed by some of those columns. CREATE TABLE mytable ( col1 CHARACTER(10), col2 CHARACTER(10),col3 CHARACTER(10),col4 CHARACTER(10),col5 CHARACTER(10),col6 CHARACTER(10),col7 CHARACTER(10),col8 CHARACTER(10),col9 CHARACTER(10), col10 CHARACTER(10) ); CREATE INDEX i1 ON mytable(col1); CREATE INDEX i2 ON mytable(col2); I need to select records by knowing some characters from beginning. I know always 1-10 first characters of col1. So my LIKE pattern starts always with constant characters and ends with % . I can use LIKE: SELECT * FROM mytable WHERE col1 LIKE 'A%' AND col2 LIKE 'BC%' AND col3 LIKE 'DEF%' AND col4 LIKE 'G%'; or substring(): SELECT * FROM mytable WHERE substring(col1 for 1)='A' AND substring(col2 for 2)= 'BC' AND substring(col3 for 3)='DEF' AND substring(col4 for 1) ='G'; Can Postgres 8.1 use indexes to speed the queries above ? Which is the best way to to write the where clause in this case so that index is used ? Andrus.
Well, for starters, see if PostgreSQL is currently using any indexes via EXPLAIN. First rule of performance tuning: don't. If it's not (which is probably the case), then your best bet is to create functional indexes; ie: CREATE INDEX mytable__col1_4 ON mytable( substring( col1 for 4 ) ); You can then either SELECT ... WHERE substring( col1 for 4 ) = blah or SELECT ... WHERE substring( col1 for 4 ) LIKE 'bla%' Though that last one might not use the index; you'll have to check and see. Also, keep in mind that PostgreSQL doesn't store CHAR the same as most other databases; the internal storage is the same as what's used for VARCHAR and TEXT. On Sun, Nov 06, 2005 at 11:03:01PM +0200, Andrus Moor wrote: > I have 10 CHARACTER columns in table mytable. Mytable is indexed by some of > those columns. > > CREATE TABLE mytable ( col1 CHARACTER(10), > col2 CHARACTER(10),col3 CHARACTER(10),col4 CHARACTER(10),col5 > CHARACTER(10),col6 CHARACTER(10),col7 CHARACTER(10),col8 CHARACTER(10),col9 > CHARACTER(10), col10 CHARACTER(10) ); > > CREATE INDEX i1 ON mytable(col1); > CREATE INDEX i2 ON mytable(col2); > > I need to select records by knowing some characters from beginning. > I know always 1-10 first characters of col1. So my LIKE pattern starts > always with constant characters and ends with % . > > I can use LIKE: > > SELECT * FROM mytable > WHERE col1 LIKE 'A%' > AND col2 LIKE 'BC%' > AND col3 LIKE 'DEF%' > AND col4 LIKE 'G%'; > > or substring(): > > SELECT * FROM mytable > WHERE substring(col1 for 1)='A' > AND substring(col2 for 2)= 'BC' > AND substring(col3 for 3)='DEF' > AND substring(col4 for 1) ='G'; > > > Can Postgres 8.1 use indexes to speed the queries above ? > > Which is the best way to to write the where clause in this case so that > index is used ? > > Andrus. > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
> Well, for starters, see if PostgreSQL is currently using any indexes via > EXPLAIN. First rule of performance tuning: don't. I'm designing a new application. Data is not available yet. I'm using Postgres 8.1 in Windows. Database encoding is UTF-8 lc_ctype is Estonian_Estonia.1257. lc_collate is Estonian currently. However I can set lc_collate to C if this solves this issue. Doc says that " to allow PostgreSQL to use indexes with LIKE clauses under a non-C locale, several custom operator classes exist" I don't understand "non-C locale". Does this mean lc_collate or also some other lc_ setting ? > If it's not (which is probably the case), then your best bet is to > create functional indexes; ie: > > CREATE INDEX mytable__col1_4 ON mytable( substring( col1 for 4 ) ); > > You can then either > > SELECT ... WHERE substring( col1 for 4 ) = blah I need to optimize queries with variable number of characters in beginning like SELECT ... WHERE substring( col1 for 1 ) = 'f' SELECT ... WHERE substring( col1 for 2 ) = 'fo' SELECT ... WHERE substring( col1 for 3 ) = 'foo' etc This approach requires creating 10 indexes for each column which is unreasonable. In my current dbms, Microsoft Visual FoxPro I have a single index CREATE INDEX i1 ON mytable(col1) I can use queries: WHERE col1 BETWEEN 'f' and 'f'+CHR(255) WHERE col1 BETWEEN 'fo' and 'fo'+CHR(255) WHERE col1 BETWEEN 'foo' and 'foo'+CHR(255) All those queries can use same index automatically in all locales. CHR(255) is last character in any lc_collate sequence. CHR(255) is not used in col1 data. How to get same functionality in Postgres ? Does there exist unicode special character which is greater than all other chars ? Andrus.
On Wed, Nov 09, 2005 at 12:37:25PM +0200, Andrus wrote: > I'm using Postgres 8.1 in Windows. Database encoding is UTF-8 > lc_ctype is Estonian_Estonia.1257. > lc_collate is Estonian currently. However I can set lc_collate to C if this > solves this issue. > > Doc says that > " to allow PostgreSQL to use indexes with LIKE clauses under a non-C locale, > several custom operator classes exist" > > I don't understand "non-C locale". Does this mean lc_collate or also some > other lc_ setting ? lc == locale. There are several different locale settings but collation affects ordering. And Estonian is not C (obviously). > I need to optimize queries with variable number of characters in beginning > like > > SELECT ... WHERE substring( col1 for 1 ) = 'f' > SELECT ... WHERE substring( col1 for 2 ) = 'fo' > SELECT ... WHERE substring( col1 for 3 ) = 'foo' > etc If you use queries like: SELECT ... WHERE col1 LIKE 'fo%' it can use an index declared like: CREATE INDEX myindex on mytable(col1 text_pattern_ops); > In my current dbms, Microsoft Visual FoxPro I have a single index > > CREATE INDEX i1 ON mytable(col1) > > I can use queries: > > WHERE col1 BETWEEN 'f' and 'f'+CHR(255) > WHERE col1 BETWEEN 'fo' and 'fo'+CHR(255) > WHERE col1 BETWEEN 'foo' and 'foo'+CHR(255) Well, you could do that in postgresql too, you just need to use the SQL standard concatination operator. WHERE col1 BETWEEN 'f' and 'f' || chr(255); > How to get same functionality in Postgres ? > Does there exist unicode special character which is greater than all other > chars ? Umm, I don't think so. Order is defined by the locale, not the character set. My guess is that text_pattern_ops is the way to go. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Martijn, >> I can use queries: >> >> WHERE col1 BETWEEN 'f' and 'f'+CHR(255) >Well, you could do that in postgresql too, you just need to use the SQL >standard concatination operator. >WHERE col1 BETWEEN 'f' and 'f' || chr(255); thank you. I think it is best to use regular indexes since regular indexes since they can be used in other types of queries also. It seems that only way is to use BETWEEN comparison for this in Postgres 8.1. I tried CREATE TABLE foo ( col1 CHAR(20)); CREATE INDEX i1 ON foo(col1); INSERT INTO foo VALUES ('bar'); SELECT * FROM foo WHERE col1 BETWEEN 'b' and 'b' || chr(255); But this does not return any data. How to write index optimizable WHERE clause when only some (variable number) of characters from beginning of col1 are known ? Only way seems to use BETWEEN comparison by concatenating character greater than all other characters in locale. Since CHR(255) does not work this is not possible. So CREATE INDEX i1 ON foo(col1); cannot be used to optimize queries of type "get all rows where first n charaters of col1 are known" in Postgres. Andrus.
Andrus wrote: > So > > CREATE INDEX i1 ON foo(col1); > > cannot be used to optimize queries of type "get all rows where first n > charaters of col1 are known" in Postgres. Of course it will! Any btree based index will let you do that. Re-read the previous answers and make sure you pay attention to the bit about text_pattern_ops and LIKE in non-C locales. -- Richard Huxton Archonet Ltd
On 11/9/05, Andrus <eetasoft@online.ee> wrote: > Martijn, > > >> I can use queries: > >> > >> WHERE col1 BETWEEN 'f' and 'f'+CHR(255) > > >Well, you could do that in postgresql too, you just need to use the SQL > >standard concatination operator. > > >WHERE col1 BETWEEN 'f' and 'f' || chr(255); > > thank you. > > I think it is best to use regular indexes since regular indexes since they > can be used in other types of queries also. > > It seems that only way is to use BETWEEN comparison for this in Postgres > 8.1. > > I tried > > CREATE TABLE foo ( col1 CHAR(20)); > CREATE INDEX i1 ON foo(col1); > INSERT INTO foo VALUES ('bar'); > SELECT * FROM foo WHERE col1 BETWEEN 'b' and 'b' || chr(255); > > But this does not return any data. > > How to write index optimizable WHERE clause when only some (variable number) > of characters from beginning of col1 are known ? > > Only way seems to use BETWEEN comparison by concatenating character greater > than all other characters in locale. Since CHR(255) does not work this is > not possible. > > So > > CREATE INDEX i1 ON foo(col1); > > cannot be used to optimize queries of type "get all rows where first n > charaters of col1 are known" in Postgres. > > Andrus. > > you can create two indexes: CREATE INDEX myindex_lc on mytable(col1 text_pattern_ops); and CREATE INDEX myindex_normal ON foo(col1); the first one will be used when using LIKE and the other for normal comparisons . -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
>> CREATE INDEX i1 ON foo(col1); >> >> cannot be used to optimize queries of type "get all rows where first n >> charaters of col1 are known" in Postgres. > > Of course it will! Any btree based index will let you do that. Re-read the > previous answers and make sure you pay attention to the bit about > text_pattern_ops and LIKE in non-C locales. Richard, thank you. I try to formulate my problem more presicely. I have table CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY); Cluster locale is non-C. Database encoding is UTF-8. Postgres vers is 8.1 PRIMARY KEY clause creates btree based index so the index exists on bar. I want to run fast queries by knowing first characters of bar like : 1. Select records from foo where first character of bar is A 2. Select records from foo where first character of bar is B 3. Select records from foo where first two characters of bar are BC 4. Select records from foo where first three characters of bar are ABC etc. Can you write sample WHERE clause which can use existing primary key index for those queries ? Andrus.
> you can create two indexes: > > CREATE INDEX myindex_lc on mytable(col1 text_pattern_ops); > and > CREATE INDEX myindex_normal ON foo(col1); > > the first one will be used when using LIKE and the other for normal > comparisons . Jaime, CREATE INDEX myindex_normal ON foo(col1); Creates btree structure. In other dbm system btree structure can be used for searches where only some first characters in index key are known. So I see no reason to create second index using text_pattern_ops for this purpose. I'm searching a way to use Postgres regular index for this. Andrus.
On Wed, 2005-11-09 at 14:56, Andrus wrote: > > you can create two indexes: > > > > CREATE INDEX myindex_lc on mytable(col1 text_pattern_ops); > > and > > CREATE INDEX myindex_normal ON foo(col1); > > > > the first one will be used when using LIKE and the other for normal > > comparisons . > > Jaime, > > CREATE INDEX myindex_normal ON foo(col1); > > Creates btree structure. In other dbm system btree structure can be used for > searches where only some first characters in index key are known. > > So I see no reason to create second index using text_pattern_ops for this > purpose. > > I'm searching a way to use Postgres regular index for this. Easy, do what those other databases do. Setup your database to not use a locale. initdb --locale=C and you're golden.
On Wed, Nov 09, 2005 at 10:46:27PM +0200, Andrus wrote: > thank you. I try to formulate my problem more presicely. > I have table > > CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY); > > Cluster locale is non-C. Database encoding is UTF-8. Postgres vers is 8.1 Do this instead: CREATE TABLE foo ( bar CHAR(10) NOT NULL ); CREATE UNIQUE INDEX foo_bar ON foo(bar char_pattern_ops); > I want to run fast queries by knowing first characters of bar like : > > 1. Select records from foo where first character of bar is A > 2. Select records from foo where first character of bar is B > 3. Select records from foo where first two characters of bar are BC > 4. Select records from foo where first three characters of bar are ABC SELECT * FROM foo WHERE bar LIKE 'A%'; SELECT * FROM foo WHERE bar LIKE 'B%'; SELECT * FROM foo WHERE bar LIKE 'BC%'; SELECT * FROM foo WHERE bar LIKE 'ABC%'; Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
On 11/9/05, Andrus <eetasoft@online.ee> wrote: > > you can create two indexes: > > > > CREATE INDEX myindex_lc on mytable(col1 text_pattern_ops); > > and > > CREATE INDEX myindex_normal ON foo(col1); > > > > the first one will be used when using LIKE and the other for normal > > comparisons . > > Jaime, > > CREATE INDEX myindex_normal ON foo(col1); > > Creates btree structure. In other dbm system btree structure can be used for > searches where only some first characters in index key are known. > and the same is true for postgres when you use C LOCALE, but because some implementation details i don't know so deep when using non-C LOCALE you need the class operator in order to use the index with LIKE 'pattern%' > So I see no reason to create second index using text_pattern_ops for this > purpose. > the reason is that you want to use the index in the search... and, at least you go and solve the problem with code, that's the way to do it... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
Re: Best way to use indexes for partial match at beginning
From
"Dean Gibson (DB Administrator)"
Date:
On 2005-11-09 13:08, Martijn van Oosterhout wrote: >> I want to run fast queries by knowing first characters of bar like : >> >> 1. Select records from foo where first character of bar is A >> 2. Select records from foo where first character of bar is B >> 3. Select records from foo where first two characters of bar are BC >> 4. Select records from foo where first three characters of bar are ABC >> > > SELECT * FROM foo WHERE bar LIKE 'A%'; > SELECT * FROM foo WHERE bar LIKE 'B%'; > SELECT * FROM foo WHERE bar LIKE 'BC%'; > SELECT * FROM foo WHERE bar LIKE 'ABC%'; > > Have a nice day, > Or: SELECT * FROM foo WHERE bar::CHAR(1) = 'A'; SELECT * FROM foo WHERE bar::CHAR(1) = 'B'; SELECT * FROM foo WHERE bar::CHAR(2) = 'BC'; SELECT * FROM foo WHERE bar::CHAR(3) = 'ABC'; -- Dean
Scott, >> I'm searching a way to use Postgres regular index for this. > > Easy, do what those other databases do. Setup your database to not use > a locale. > > initdb --locale=C > > and you're golden. thank you. My language has letters in ������ with correstonding upper case letters ������ I need to prevent entering of duplicate customer names into database. For this I created unique index on UPPER(customer_name) I need also search for customer name in case-insensitive way. For this I use ILIKE operator. Those two features are working in Postgres 8.1 when I use non-C locale. If I switch to C locale, they will not work. My current database, Microsoft Visual Foxpro implements this functionality. How to implement this functionality in Postgres if I switch to C locale ? Andrus.
>> CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY); >> >> Cluster locale is non-C. Database encoding is UTF-8. Postgres vers is 8.1 >Do this instead: >CREATE TABLE foo ( bar CHAR(10) NOT NULL ); >CREATE UNIQUE INDEX foo_bar ON foo(bar char_pattern_ops); Martijn, Thank you. I have CHAR columns and need a primary key also. So I tried the code CREATE TABLE foo ( bar CHAR(10) NOT NULL ); CREATE UNIQUE INDEX foo_bar ON foo(bar bpchar_pattern_ops); ALTER TABLE foo ADD PRIMARY KEY (bar); I found that adding primary key creates another index. How to create primary key without duplicate index on bar column ? Andrus.
On Wed, 2005-11-09 at 15:30, Andrus wrote: > Scott, > > >> I'm searching a way to use Postgres regular index for this. > > > > Easy, do what those other databases do. Setup your database to not use > > a locale. > > > > initdb --locale=C > > > > and you're golden. > > thank you. > > My language has letters in with correstonding upper case letters > > > I need to prevent entering of duplicate customer names into database. > For this I created unique index on UPPER(customer_name) > > I need also search for customer name in case-insensitive way. For this I use > ILIKE operator. > > Those two features are working in Postgres 8.1 when I use non-C locale. > If I switch to C locale, they will not work. > > My current database, Microsoft Visual Foxpro implements this functionality. > > How to implement this functionality in Postgres if I switch to C locale ? You can't. You have conflicting desires. PostgreSQL IS NOT FOXPRO. If you want to use foxpro, then do so. If you want to use PostgreSQL, then you'll either have to accept that you need to make certain accomodations to use it with a non-C locale, or accept a C locale and its limitations. You say that Foxpro implements this functionality, but are you sure that it gets things like collation correct? I.e. does it truly understand all the rules for what comes before something else in your language? Locales are a complex and difficult thing to get exactly right, and while, at first blush, Foxpro may seem to do the right thing, you may find it isn't doing EVERYTHING exactly right, and still having good performance. Then again, maybe it is. But PostgreSQL is limited to working either in a C locale and automatically using indexes for like 'abc%' queries but getting collation wrong, or working in the correct locale, not using indexes for like 'abc%', having to use the special class operator if you want likes to work, and getting the collation correct. If that doesn't work for you, your only real choice is to either use another database, or start hacking to make PostgreSQL the database you want it to be. It's not a simple problem, and there is no simple answer. And if you expect any database to not have things like this in it to deal with, you just haven't looked very hard at any of them. They've all got warts. And sometimes, one db is just not a good fit. Perhaps full text searching could help you out here? Not sure.
> SELECT * FROM foo WHERE bar::CHAR(1) = 'A'; > SELECT * FROM foo WHERE bar::CHAR(1) = 'B'; > SELECT * FROM foo WHERE bar::CHAR(2) = 'BC'; > SELECT * FROM foo WHERE bar::CHAR(3) = 'ABC'; Dean, thank you. That would be EXCELLENT solution! Can you confirm that in this case Postgres 8.1 can use index created by CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY ); even on non-C locale ? Andrus.
On 11/9/05, Andrus <eetasoft@online.ee> wrote: > >> CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY); > >> > >> Cluster locale is non-C. Database encoding is UTF-8. Postgres vers is 8.1 > > >Do this instead: > > >CREATE TABLE foo ( bar CHAR(10) NOT NULL ); > >CREATE UNIQUE INDEX foo_bar ON foo(bar char_pattern_ops); > > Martijn, > > Thank you. I have CHAR columns and need a primary key also. So I tried the > code > > CREATE TABLE foo ( bar CHAR(10) NOT NULL ); > CREATE UNIQUE INDEX foo_bar ON foo(bar bpchar_pattern_ops); > ALTER TABLE foo ADD PRIMARY KEY (bar); > > I found that adding primary key creates another index. > > How to create primary key without duplicate index on bar column ? > > Andrus. > > you can't. postgresql implements primary keys creating unique indexes and not null constraints on the pk columns. -- Atentamente, Jaime Casanova (DBA: DataBase Aniquilator ;)
On Wed, 2005-11-09 at 16:23, Jaime Casanova wrote: > On 11/9/05, Andrus <eetasoft@online.ee> wrote: > > >> CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY); > > >> > > >> Cluster locale is non-C. Database encoding is UTF-8. Postgres vers is 8.1 > > > > >Do this instead: > > > > >CREATE TABLE foo ( bar CHAR(10) NOT NULL ); > > >CREATE UNIQUE INDEX foo_bar ON foo(bar char_pattern_ops); > > > > Martijn, > > > > Thank you. I have CHAR columns and need a primary key also. So I tried the > > code > > > > CREATE TABLE foo ( bar CHAR(10) NOT NULL ); > > CREATE UNIQUE INDEX foo_bar ON foo(bar bpchar_pattern_ops); > > ALTER TABLE foo ADD PRIMARY KEY (bar); > > > > I found that adding primary key creates another index. > > > > How to create primary key without duplicate index on bar column ? > > > > Andrus. > > > > > > you can't. > postgresql implements primary keys creating unique indexes and not > null constraints on the pk columns. But, of course, you CAN delete that other index now that it's redundant.
Andrus wrote: > > SELECT * FROM foo WHERE bar::CHAR(1) = 'A'; > > SELECT * FROM foo WHERE bar::CHAR(1) = 'B'; > > SELECT * FROM foo WHERE bar::CHAR(2) = 'BC'; > > SELECT * FROM foo WHERE bar::CHAR(3) = 'ABC'; > Can you confirm that in this case Postgres 8.1 can use index created > by > CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY ); > even on non-C locale ? It will not use the index no matter what locale. You would in these cases need to create additional expression indexes on bar::char(1) etc. -- Peter Eisentraut http://developer.postgresql.org/~petere/
>> > How to create primary key without duplicate index on bar column ? >> > >> > Andrus. >> > >> > >> >> you can't. >> postgresql implements primary keys creating unique indexes and not >> null constraints on the pk columns. > > But, of course, you CAN delete that other index now that it's redundant. Scott, thank you. I don't understand how to delete primary key index without removing primary key constraint or how to force primary key to use foo_bar index. I tried CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY ); CREATE UNIQUE INDEX foo_bar ON foo(bar bpchar_pattern_ops); DROP INDEX foo_pkey; but got ERROR: cannot drop index foo_pkey because constraint foo_pkey on table foo requires it Andrus.
Am Donnerstag, 10. November 2005 10:19 schrieb Andrus: > thank you. I don't understand how to delete primary key index without > removing primary key constraint or how to force primary key to use foo_bar > index. You cannot drop the primary key index without dropping the primary key, and you cannot change the primary key to use a different index. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Thu, 2005-11-10 at 03:19, Andrus wrote: > >> > How to create primary key without duplicate index on bar column ? > >> > > >> > Andrus. > >> > > >> > > >> > >> you can't. > >> postgresql implements primary keys creating unique indexes and not > >> null constraints on the pk columns. > > > > But, of course, you CAN delete that other index now that it's redundant. > > Scott, > > thank you. I don't understand how to delete primary key index without > removing primary key constraint or how to force primary key to use foo_bar > index. > > I tried > > CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY ); > CREATE UNIQUE INDEX foo_bar ON foo(bar bpchar_pattern_ops); > DROP INDEX foo_pkey; > > but got > > ERROR: cannot drop index foo_pkey because constraint foo_pkey on table foo > requires it Sorry, I thought you were looking at removing the OTHER index, the unique one. Now that I see you need it for the bpchar pattern ops, I guess you're kinda stuck having two indexes if you want a primary key in the table. Note that if you don't use the pk in unnamed joins (i.e. you always identify the field you're keying off of) then the primary key is redundant and not needed, and you could just create the table without it and then create the unique index.
> Note that if you don't use the pk in unnamed joins (i.e. you always > identify the field you're keying off of) then the primary key is > redundant and not needed, and you could just create the table without it > and then create the unique index. Thank you. Can Postgres use index CREATE UNIQUE INDEX foo_bar ON foo(bar bpchar_pattern_ops); in usual WHERE expressions just like regular index ? I.e can SELECT * FROM foo WHERE bar='123' use index foo_bar ? Andrus.
On Thu, 2005-11-10 at 07:45, Peter Eisentraut wrote: > Am Donnerstag, 10. November 2005 10:19 schrieb Andrus: > > thank you. I don't understand how to delete primary key index without > > removing primary key constraint or how to force primary key to use foo_bar > > index. > > You cannot drop the primary key index without dropping the primary key, and > you cannot change the primary key to use a different index. Ya know, this brings up an interesting question, would it be feasible to allow for a function of somekind to be applied as an argument to a primary key declaration? It would certainly be useful for character type pks in non-C locales.
Scott Marlowe wrote: > Ya know, this brings up an interesting question, would it be feasible > to allow for a function of somekind to be applied as an argument to a > primary key declaration? You would have to guarantee somehow that the function is a one-to-one mapping, in order not to destroy the integrity of the primary key constraint. I cannot think of any useful functions that fulfill this criterion. > It would certainly be useful for character > type pks in non-C locales. I don't see how. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Andrus wrote: > Can Postgres use index > > CREATE UNIQUE INDEX foo_bar ON foo(bar bpchar_pattern_ops); > > in usual WHERE expressions just like regular index ? > > I.e can > > SELECT * FROM foo WHERE bar='123' > > use index foo_bar ? I think in the time it took you to write this email you could have typed in those lines into psql and found out yourself. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Thu, 2005-11-10 at 11:27, Peter Eisentraut wrote: > Scott Marlowe wrote: > > Ya know, this brings up an interesting question, would it be feasible > > to allow for a function of somekind to be applied as an argument to a > > primary key declaration? > > You would have to guarantee somehow that the function is a one-to-one > mapping, in order not to destroy the integrity of the primary key > constraint. I cannot think of any useful functions that fulfill this > criterion. No, I wouldn't think one to one would be necessary. If you had a primary key that was case insensitive, for example, it would not map one to one. It would, in fact, be MORE greedy about matching, so that you could not have both a "Peter Eisentraut" and a "peter eisentraut" in the same table. While I could easily add a unique on table (lower(namefield)) to get this, being able to use a lower() or some other function would be very useful. I don't see one to one being necessary, it just needs to be consistent. > > > It would certainly be useful for character > > type pks in non-C locales. > > I don't see how. By only having to maintain one index on a large table instead of having a PK AND a separate unique index for these kinds of cases.
Scott Marlowe wrote: > No, I wouldn't think one to one would be necessary. If you had a > primary key that was case insensitive, for example, it would not map > one to one. It would, in fact, be MORE greedy about matching, so > that you could not have both a "Peter Eisentraut" and a "peter > eisentraut" in the same table. I suppose one could define equivalence classes that way, and if everything else that interacted with the table (foreign keys, other functions, etc.) behaved consistently with respect to those equivalence classes, things would work out. Lots of "if's" though. The better and safer way to do that, however, would be to define a data type. -- Peter Eisentraut http://developer.postgresql.org/~petere/
> Ya know, this brings up an interesting question, would it be feasible to > allow for a function of somekind to be applied as an argument to a > primary key declaration? It would certainly be useful for character > type pks in non-C locales. Using this would require making non-standard changes to ddl statements and thus is not good. Best solution is as follows: Planner must use index CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY ); for queries like SELECT * FROM foo WHERE bar::CHAR(3)='ABC'; Can you make this patch or add this to todo list ? Andrus.
On Fri, Nov 11, 2005 at 11:31:37AM +0200, Andrus wrote: > Best solution is as follows: > > Planner must use index > > CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY ); > > for queries like > > SELECT * FROM foo WHERE bar::CHAR(3)='ABC'; Your query is the same as using LIKE, so why not express it that way? Is it that unreasonable that a PRIMARY KEY should use the most natural way to order strings for your locale and that if you want to use LIKE in non-C locales that you need to specify that explicitly? PRIMARY KEY == UNIQUE + NOT NULL Incidently, another way might be COLLATE support, something like: CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY COLLATE like_compatable ); But that's already on the cards. [1] http://gborg.postgresql.org/project/citext/projdisplay.php -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
>> SELECT * FROM foo WHERE bar::CHAR(3)='ABC'; > Your query is the same as using LIKE, so why not express it that way? I want simply to select by first 3 characters. LIKE is too powerful and unnessecary. LIKE requires escaping % and ? characters in pattern. I expected that making planner to use primary key index in case of WHERE bar::CHAR(3)='ABC' or in WHERE bar BETWEEN 'ABC' AND 'ABC' || CHR(255) or in WHERE SUBSTRING( bar FOR 3 )='ABC' is more simpler than dealing with locale problems in WHERE bar LIKE 'ABC%': Using LIKE with index optimization in Postgres in non-C locale requires a. creating non-SQL standard compatible index unsin operator class b. requires that primary key column has two indexes This is too much overhead. >Is it that unreasonable that a PRIMARY KEY should use the most natural > way to order strings for your locale This is very reasonable. PRIMARY KEY must use locale order always. > and that if you want to use LIKE > in non-C locales that you need to specify that explicitly? This is unreasonable. If I use SQL standard way to create table CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY ); and use SQL standard WHERE clause WHERE bar LIKE 'ABC%' or WHERE bar::CHAR(3)='ABC' I expect that primary key index can be used without non-standard extensions to SQL language Andrus.
"Andrus" <eetasoft@online.ee> writes: > Planner must use index > CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY ); > for queries like > SELECT * FROM foo WHERE bar::CHAR(3)='ABC'; > Can you make this patch or add this to todo list ? This isn't going to happen for exactly the same reason that LIKE optimization doesn't happen (in non-C locales): the sort order of non-C indexes isn't compatible with testing a prefix of the string. regards, tom lane
On 11/11/05, Andrus <eetasoft@online.ee> wrote: > >> SELECT * FROM foo WHERE bar::CHAR(3)='ABC'; > > > Your query is the same as using LIKE, so why not express it that way? > > I want simply to select by first 3 characters. LIKE is too powerful and > unnessecary. LIKE requires > escaping % and ? characters in pattern. > > I expected that making planner to use primary key index in case of > > WHERE bar::CHAR(3)='ABC' > > or in > > WHERE bar BETWEEN 'ABC' AND 'ABC' || CHR(255) > > or in > > WHERE SUBSTRING( bar FOR 3 )='ABC' > > is more simpler than dealing with locale problems in WHERE bar LIKE 'ABC%': > > Using LIKE with index optimization in Postgres in non-C locale requires > > a. creating non-SQL standard compatible index unsin operator class > are indexes in the SQL-STANDARD? i touhgt we can do with indexes what we think is better... like other database has its own implementation details about indexes... > b. requires that primary key column has two indexes > > This is too much overhead. > it requires two indexes, yes... but one of them can be a normal (accept duplicates) not unique one... just let the primary key create its index and create the one you needs without the unique clause > >Is it that unreasonable that a PRIMARY KEY should use the most natural > > way to order strings for your locale > > This is very reasonable. PRIMARY KEY must use locale order always. > > > and that if you want to use LIKE > > in non-C locales that you need to specify that explicitly? > > This is unreasonable. > Maybe, but is unfair for your side to said that without actually look at the code and the problems... maybe, you want to look at the code and fix what you think is wrong? -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
On Fri, 11 Nov 2005, Andrus wrote: > >> SELECT * FROM foo WHERE bar::CHAR(3)='ABC'; > > > Your query is the same as using LIKE, so why not express it that way? > > I want simply to select by first 3 characters. LIKE is too powerful and > unnessecary. LIKE requires > escaping % and ? characters in pattern. > > I expected that making planner to use primary key index in case of > > WHERE bar::CHAR(3)='ABC' > WHERE bar BETWEEN 'ABC' AND 'ABC' || CHR(255) > WHERE SUBSTRING( bar FOR 3 )='ABC' Which doesn't necessarily work because of how some collations may work. The first 3 characters of the string may sort to a fundamentally different place than the first 4 characters. You can have a collation where 'ABC' < 'ABCR' < 'ABCT' < 'ABD' < 'ABZ' < 'ABCS'. In the above, the ABCS value shortens so it should be found, however it's sort position in the index is outside the normal ABC range. AFAIK the interfaces being used don't give enough information to detect such locales to handle them differently.
>> I expected that making planner to use primary key index in case of >> >> WHERE bar::CHAR(3)='ABC' >> WHERE bar BETWEEN 'ABC' AND 'ABC' || CHR(255) >> WHERE SUBSTRING( bar FOR 3 )='ABC' > > Which doesn't necessarily work because of how some collations may work. > The first 3 characters of the string may sort to a fundamentally different > place than the first 4 characters. You can have a collation where > 'ABC' < 'ABCR' < 'ABCT' < 'ABD' < 'ABZ' < 'ABCS'. In the above, the ABCS > value shortens so it should be found, however it's sort position in the > index is outside the normal ABC range. Thank you. I missed this since this does not occur in my locale. Now I understood that cryptics text_pattern_ops, varchar_pattern_ops, bpchar_pattern_ops actually mean "binary_sort_order". > AFAIK the interfaces being used > don't give enough information to detect such locales to handle them > differently. Maybe there is some setting which I can use to inform Postgres that sorting order is strictly by single byte, not affected by byte sequences. Maybe this information can be get from Windows API or from iconv lc_collate='C' assumes also binary order which is to restrictive. lc_collate shoud be divided into two: lc_collate_order and lc_collate_multibyte . lc_collate_multibyte has values true and false. In this case primary key for LIKE comparison can be used in some non-C locales. My desicion based on this thread for my coding is: 1. Write all partial match queries using LIKE operator like foo LIKE 'ABC%' 2. When data access becomes slow, create duplicate primary key index using text_pattern_ops or change only lc_collate to C by retaining all other settings non-C locale. I'm afraid that my users use % and ? characters in data. Using LIKE requires escaping % and ? characters in search patterns. It should be nicer to use foo:CHAR(3)='ABC' type expressions which do not require ? and % escaping. I understand that Postgres cannot use any index to speed up searches like foo:CHAR(3)='ABC' which does not use regular expression match operators. So LIKE with escaping is the only way. Andrus.
"Andrus" <eetasoft@online.ee> writes: > Maybe there is some setting which I can use to inform Postgres that sorting > order is strictly by single byte, not affected by byte sequences. AFAIK, C/POSIX is the *only* commonly used locale in which that holds. Do you have a counterexample? (Hint: all the other ones use dictionary sorting rules, which have at least a discrimination against spaces.) regards, tom lane
> (Hint: all the other ones use dictionary > sorting rules, which have at least a discrimination against spaces.) Tom, thank you. I ran the following code in 8.1 show lc_collate; -- returns "Estonian_Estonia.1257" create temp table foo ( bar char(10) ) on commit drop ; insert into foo values ('A'); insert into foo values ('A C'); -- two spaces insert into foo values ('A B'); -- single space insert into foo values ('A C'); -- single space select * from foo order by bar; and got "A " "A C " "A B " "A C " I don't see any space discrimination on sorting here. I sorted the same data in Microsoft Word and got the same result. Andrus.
"Andrus" <eetasoft@online.ee> writes: >> (Hint: all the other ones use dictionary >> sorting rules, which have at least a discrimination against spaces.) > show lc_collate; -- returns "Estonian_Estonia.1257" > ... > I don't see any space discrimination on sorting here. Hmm ... I should probably have qualified my statement as "all the locales generally used on Unix systems". I have no idea what the situation is like on Windoze. Anyone have some data about the sort rules used by common Windows locales? regards, tom lane