Thread: Query and index ... unexpected result need advice.
Hello, Yesterday when I read the email I came across an issue and I thought very interesting result. The topic is: "Table with million rows - and PostgreSQL 9.1 is not using the index". I decided to try it because from 250 ms with millions rows to 15 ms is very good, but I did not get the expected result, rather worse. I am interested to know where is my mistake or something wrong with server which I doubt. Here is my current query with explain: (I change names to XXX YYY ZZZ because original names is written on CP1251 and most ppl in list can't read them) db=# explain analyze SELECT *,COALESCE(firstname,'') || ' ' || COALESCE(middlename,'') || ' ' || COALESCE(lastname, '') AS name FROM clients_tbl WHERE firstname = 'XXX' AND middlename || lastname LIKE '%YYY%ZZZ%'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on clients_tbl (cost=230.90..32648.50 rows=1 width=602) (actual time=12.649..35.919 rows=1 loops=1) Recheck Cond: (firstname = 'XXX'::text) Filter: ((middlename || lastname) ~~ '%YYY%ZZZ%'::text) Rows Removed by Filter: 11727 -> Bitmap Index Scan on clients_tbl_firstname_idx (cost=0.00..230.90 rows=11886 width=0) (actual time=5.415..5.415 rows=11728 loops=1) Index Cond: (firstname = 'XXX'::text) Total runtime: 35.988 ms (7 rows) 35 ms isn't bad, but it's will be good if I can optimize it more. firstname, middlename, lastname is declarated as TEXT; create index clients_tbl_firstname_idx on clients_tbl using btree (firstname); create index clients_tbl_middlename_idx on clients_tbl using btree (middlename); create index clients_tbl_lastname_idx on clients_tbl using btree (lastname); I dropped both indexes and create new one: create index clients_tbl_firstname_idx on clients_tbl using btree (firstname COLLATE "bg_BG" text_pattern_ops); create index clients_tbl_middlename_idx on clients_tbl using btree (middlename COLLATE "bg_BG" text_pattern_ops); create index clients_tbl_lastname_idx on clients_tbl using btree (lastname COLLATE "bg_BG" text_pattern_ops); My server is in CP1251 encoding: List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description ------------+----------+----------+--------------+--------------+-----------------------+---------+------------+-------------------------------------------- db | postgres | WIN1251 | bg_BG.CP1251 | bg_BG.CP1251 | =Tc/postgres +| 121 GB | pg_default | I run the same query again: db=# explain analyze SELECT *,COALESCE(firstname,'') || ' ' || COALESCE(middlename,'') || ' ' || COALESCE(lastname, '') AS name FROM clients_tbl WHERE firstname = 'XXX' AND middlename || lastname LIKE '%YYY%ZZZ%'; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Seq Scan on clients_tbl (cost=0.00..105444.47 rows=1 width=602) (actual time=56.343..381.068 rows=1 loops=1) Filter: ((firstname = 'XXX'::text) AND ((middlename || lastname) ~~ '%YYY%ZZZ%'::text)) Rows Removed by Filter: 1279568 Total runtime: 381.137 ms (4 rows) 381 ms ... Any one have ides ? Thanks, H.S.
On Sat, Dec 8, 2012 at 5:54 AM, Condor <condor@stz-bg.com> wrote: > I am interested to know where is my mistake or something wrong > with server which I doubt. Here is my current query with explain: > (I change names to XXX YYY ZZZ because original names is written on CP1251 > and most ppl in list can't read them) > > db=# explain analyze SELECT *,COALESCE(firstname,'') || ' ' || > COALESCE(middlename,'') || ' ' || COALESCE(lastname, '') AS name FROM > clients_tbl WHERE firstname = 'XXX' AND middlename || lastname LIKE > '%YYY%ZZZ%'; What is the meaning/purpose of the "middlename || lastname LIKE '%YYY%ZZZ%'" ? At least in my culture, that doesn't seem like a sensible thing to do. Is it trying to compensate for some known dirtiness in the data that has not yet been cleaned up? In any event, in order to benefit from an index on that query, you would need to create an index on the concatenated columns, not on the individual columns. create index on clients_tbl ((middlename||lastname) text_pattern_ops); But that still won't work because your patterns starts with a wild card, and that type of pattern cannot benefit from btree indexes. ... > > I dropped both indexes and create new one: > > create index clients_tbl_firstname_idx on clients_tbl using btree (firstname > COLLATE "bg_BG" text_pattern_ops); I don't understand why that is legal. I would think that text_pattern_ops implies something that contradicts COLLATE "bg_BG". In any event, the inclusion of both of those seems to prevent the index from being used for equality, while the inclusion of just one or the other property does not. (That is why the query got slower.) Since firstname is used as equality in your example, there is no reason to change this index to "text_pattern_ops" in order to support your example. Cheers, Jeff
On 2012-12-10 00:31, Jeff Janes wrote: > On Sat, Dec 8, 2012 at 5:54 AM, Condor <condor@stz-bg.com> wrote: > >> I am interested to know where is my mistake or something wrong >> with server which I doubt. Here is my current query with explain: >> (I change names to XXX YYY ZZZ because original names is written on >> CP1251 >> and most ppl in list can't read them) >> >> db=# explain analyze SELECT *,COALESCE(firstname,'') || ' ' || >> COALESCE(middlename,'') || ' ' || COALESCE(lastname, '') AS name >> FROM >> clients_tbl WHERE firstname = 'XXX' AND middlename || lastname LIKE >> '%YYY%ZZZ%'; > > What is the meaning/purpose of the "middlename || lastname LIKE > '%YYY%ZZZ%'" ? > > At least in my culture, that doesn't seem like a sensible thing to > do. > Is it trying to compensate for some known dirtiness in the data that > has not yet been cleaned up? > > In any event, in order to benefit from an index on that query, you > would need to create an index on the concatenated columns, not on the > individual columns. > > create index on clients_tbl ((middlename||lastname) > text_pattern_ops); > > But that still won't work because your patterns starts with a wild > card, and that type of pattern cannot benefit from btree indexes. > > > ... The point is that the first server should fulfill the condition which is equal sign and then move on to the rest condition. I can use it as a above example or query like bellow: SELECT *,COALESCE(firstname,'') || ' ' || COALESCE(middlename,'') || ' ' || COALESCE(lastname, '') AS name FROM clients_tbl WHERE firstname = 'XXX' AND middlename || lastname LIKE '%ZZZ%'; In this case I don't know where actually is ZZZ in the middle or in lastname because that is the input. Also can be: SELECT *,COALESCE(firstname,'') || ' ' || COALESCE(middlename,'') || ' ' || COALESCE(lastname, '') AS name FROM clients_tbl WHERE firstname = 'XXX' AND middlename || lastname LIKE '%Y%ZZZ%'; First part of the middle name only Y not YYY full middle name. And it's work fine. >> >> I dropped both indexes and create new one: >> >> create index clients_tbl_firstname_idx on clients_tbl using btree >> (firstname >> COLLATE "bg_BG" text_pattern_ops); > > I don't understand why that is legal. I would think that > text_pattern_ops implies something that contradicts COLLATE "bg_BG". > In any event, the inclusion of both of those seems to prevent the > index from being used for equality, while the inclusion of just one > or > the other property does not. (That is why the query got slower.) > I was thinking when I add COLLATE "bg_BG" text_pattern_ops it's will help to indexer to understand that data there is in specific encoding and will speed up like clause. When i make index like: create index on clients_tbl (middlename text_pattern_ops); or create index on clients_tbl (firstname text_pattern_ops); there is not different result ... 35 ms but I expect to dropped from 35 to 20 or 10 ms :) > Since firstname is used as equality in your example, there is no > reason to change this index to "text_pattern_ops" in order to support > your example. > Understand that, but if I need to do like in firstname what is the solution ? To make two indexes one with "text_pattern_ops" other without it ? > Cheers, > > Jeff Regards, H.S.
On Sun, Dec 9, 2012 at 10:59 PM, Condor <condor@stz-bg.com> wrote: > On 2012-12-10 00:31, Jeff Janes wrote: >> >> On Sat, Dec 8, 2012 at 5:54 AM, Condor <condor@stz-bg.com> wrote: >> >>> create index clients_tbl_firstname_idx on clients_tbl using btree >>> (firstname >>> COLLATE "bg_BG" text_pattern_ops); >> >> >> I don't understand why that is legal. I would think that >> text_pattern_ops implies something that contradicts COLLATE "bg_BG". >> In any event, the inclusion of both of those seems to prevent the >> index from being used for equality, while the inclusion of just one or >> the other property does not. (That is why the query got slower.) >> > > I was thinking when I add COLLATE "bg_BG" text_pattern_ops it's will help to > indexer to understand that data there is in specific encoding and > will speed up like clause. The text_pattern_ops tells it to use an collation which supports (some) like clauses, while COLLATE "bg_BG" tells it to use that named collation. I think that text_pattern_ops is almost identical to COLLATE "C". But COLLATE was not possible until 9.1 while the op_class has been around for much longer. >> Since firstname is used as equality in your example, there is no >> reason to change this index to "text_pattern_ops" in order to support >> your example. >> > > Understand that, but if I need to do like in firstname what is the solution? By experimentation, if you just do text_pattern_ops, then that supports both equality and LIKE. It will not support <, >, between. > To make two indexes one with "text_pattern_ops" other without it ? Yes. This is what the documentation recommends. It is sometimes not necessary, but I know of no way to determine when it is needed, other than experimentation with the exact encoding/collation you have and the types of queries you want to support. Cheers, Jeff