Thread: Query and index ... unexpected result need advice.

Query and index ... unexpected result need advice.

From
Condor
Date:
Hello,
Yesterday when I read the email I came across an issue and I thought
very interesting result. The topic is: "Table with million rows - and
PostgreSQL 9.1 is not using the index". I decided to try it because from
250 ms with millions rows to 15 ms is very good, but I did not get the
expected result, rather worse. I am interested to know where is my
mistake or something wrong with server which I doubt. Here is my current
query with explain:
(I change names to XXX YYY ZZZ because original names is written on
CP1251 and most ppl in list can't read them)

db=# explain analyze SELECT *,COALESCE(firstname,'') || ' ' ||
COALESCE(middlename,'') || ' ' || COALESCE(lastname, '') AS name FROM
clients_tbl WHERE firstname = 'XXX' AND middlename || lastname LIKE
'%YYY%ZZZ%';
                                                                QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on clients_tbl  (cost=230.90..32648.50 rows=1
width=602) (actual time=12.649..35.919 rows=1 loops=1)
    Recheck Cond: (firstname = 'XXX'::text)
    Filter: ((middlename || lastname) ~~ '%YYY%ZZZ%'::text)
    Rows Removed by Filter: 11727
    ->  Bitmap Index Scan on clients_tbl_firstname_idx
(cost=0.00..230.90 rows=11886 width=0) (actual time=5.415..5.415
rows=11728 loops=1)
          Index Cond: (firstname = 'XXX'::text)
  Total runtime: 35.988 ms
(7 rows)

35 ms isn't bad, but it's will be good if I can optimize it more.
firstname, middlename, lastname is declarated as TEXT;

create index clients_tbl_firstname_idx on clients_tbl using btree
(firstname);
create index clients_tbl_middlename_idx on clients_tbl using btree
(middlename);
create index clients_tbl_lastname_idx on clients_tbl using btree
(lastname);

I dropped both indexes and create new one:

create index clients_tbl_firstname_idx on clients_tbl using btree
(firstname COLLATE "bg_BG" text_pattern_ops);
create index clients_tbl_middlename_idx on clients_tbl using btree
(middlename COLLATE "bg_BG" text_pattern_ops);
create index clients_tbl_lastname_idx on clients_tbl using btree
(lastname COLLATE "bg_BG" text_pattern_ops);

My server is in CP1251 encoding:


List of databases
     Name    |  Owner   | Encoding |   Collate    |    Ctype     |
Access privileges   |  Size   | Tablespace |                Description

------------+----------+----------+--------------+--------------+-----------------------+---------+------------+--------------------------------------------
   db        | postgres | WIN1251  | bg_BG.CP1251 | bg_BG.CP1251 |
=Tc/postgres         +| 121 GB  | pg_default |

I run the same query again:

db=# explain analyze SELECT *,COALESCE(firstname,'') || ' ' ||
COALESCE(middlename,'') || ' ' || COALESCE(lastname, '') AS name FROM
clients_tbl WHERE firstname = 'XXX' AND middlename || lastname LIKE
'%YYY%ZZZ%';
                                                   QUERY PLAN
---------------------------------------------------------------------------------------------------------------
  Seq Scan on clients_tbl  (cost=0.00..105444.47 rows=1 width=602)
(actual time=56.343..381.068 rows=1 loops=1)
    Filter: ((firstname = 'XXX'::text) AND ((middlename || lastname) ~~
'%YYY%ZZZ%'::text))
    Rows Removed by Filter: 1279568
  Total runtime: 381.137 ms
(4 rows)

381 ms ...

Any one have ides ?

Thanks,
H.S.


Re: Query and index ... unexpected result need advice.

From
Jeff Janes
Date:
On Sat, Dec 8, 2012 at 5:54 AM, Condor <condor@stz-bg.com> wrote:

> I am interested to know where is my mistake or something wrong
> with server which I doubt. Here is my current query with explain:
> (I change names to XXX YYY ZZZ because original names is written on CP1251
> and most ppl in list can't read them)
>
> db=# explain analyze SELECT *,COALESCE(firstname,'') || ' ' ||
> COALESCE(middlename,'') || ' ' || COALESCE(lastname, '') AS name FROM
> clients_tbl WHERE firstname = 'XXX' AND middlename || lastname LIKE
> '%YYY%ZZZ%';

What is the meaning/purpose of the "middlename || lastname LIKE '%YYY%ZZZ%'" ?

At least in my culture, that doesn't seem like a sensible thing to do.
 Is it trying to compensate for some known dirtiness in the data that
has not yet been cleaned up?

In any event, in order to benefit from an index on that query, you
would need to create an index on the concatenated columns, not on the
individual columns.

create index on clients_tbl ((middlename||lastname) text_pattern_ops);

But that still won't work because your patterns starts with a wild
card, and that type of pattern cannot benefit from btree indexes.


...
>
> I dropped both indexes and create new one:
>
> create index clients_tbl_firstname_idx on clients_tbl using btree (firstname
> COLLATE "bg_BG" text_pattern_ops);

I don't understand why that is legal.  I would think that
text_pattern_ops implies something that contradicts COLLATE "bg_BG".
In any event, the inclusion of both of those seems to prevent the
index from being used for equality, while the inclusion of just one or
the other property does not.  (That is why the query got slower.)

Since firstname is used as equality in your example, there is no
reason to change this index to "text_pattern_ops" in order to support
your example.

Cheers,

Jeff


Re: Query and index ... unexpected result need advice.

From
Condor
Date:
On 2012-12-10 00:31, Jeff Janes wrote:
> On Sat, Dec 8, 2012 at 5:54 AM, Condor <condor@stz-bg.com> wrote:
>
>> I am interested to know where is my mistake or something wrong
>> with server which I doubt. Here is my current query with explain:
>> (I change names to XXX YYY ZZZ because original names is written on
>> CP1251
>> and most ppl in list can't read them)
>>
>> db=# explain analyze SELECT *,COALESCE(firstname,'') || ' ' ||
>> COALESCE(middlename,'') || ' ' || COALESCE(lastname, '') AS name
>> FROM
>> clients_tbl WHERE firstname = 'XXX' AND middlename || lastname LIKE
>> '%YYY%ZZZ%';
>
> What is the meaning/purpose of the "middlename || lastname LIKE
> '%YYY%ZZZ%'" ?
>
> At least in my culture, that doesn't seem like a sensible thing to
> do.
>  Is it trying to compensate for some known dirtiness in the data that
> has not yet been cleaned up?
>
> In any event, in order to benefit from an index on that query, you
> would need to create an index on the concatenated columns, not on the
> individual columns.
>
> create index on clients_tbl ((middlename||lastname)
> text_pattern_ops);
>
> But that still won't work because your patterns starts with a wild
> card, and that type of pattern cannot benefit from btree indexes.
>
>
> ...

The point is that the first server should fulfill the condition which
is equal sign and then move on to the rest condition. I can use it as a
above example or query like bellow:
SELECT *,COALESCE(firstname,'') || ' ' || COALESCE(middlename,'') || '
' || COALESCE(lastname, '') AS name FROM clients_tbl WHERE firstname =
'XXX' AND middlename || lastname LIKE
  '%ZZZ%';
In this case I don't know where actually is ZZZ in the middle or in
lastname because that is the input. Also can be:

SELECT *,COALESCE(firstname,'') || ' ' || COALESCE(middlename,'') || '
' || COALESCE(lastname, '') AS name FROM clients_tbl WHERE firstname =
'XXX' AND middlename || lastname LIKE
  '%Y%ZZZ%';

First part of the middle name only Y not YYY full middle name.

And it's work fine.


>>
>> I dropped both indexes and create new one:
>>
>> create index clients_tbl_firstname_idx on clients_tbl using btree
>> (firstname
>> COLLATE "bg_BG" text_pattern_ops);
>
> I don't understand why that is legal.  I would think that
> text_pattern_ops implies something that contradicts COLLATE "bg_BG".
> In any event, the inclusion of both of those seems to prevent the
> index from being used for equality, while the inclusion of just one
> or
> the other property does not.  (That is why the query got slower.)
>

I was thinking when I add COLLATE "bg_BG" text_pattern_ops it's will
help to indexer to understand that data there is in specific encoding
and
will speed up like clause. When i make index like:

create index on clients_tbl (middlename text_pattern_ops);
or
create index on clients_tbl (firstname text_pattern_ops);

there is not different result ... 35 ms but I expect to dropped from 35
to 20 or 10 ms :)


> Since firstname is used as equality in your example, there is no
> reason to change this index to "text_pattern_ops" in order to support
> your example.
>

Understand that, but if I need to do like in firstname what is the
solution ?
To make two indexes one with "text_pattern_ops" other without it ?

> Cheers,
>
> Jeff


Regards,
H.S.


Re: Query and index ... unexpected result need advice.

From
Jeff Janes
Date:
On Sun, Dec 9, 2012 at 10:59 PM, Condor <condor@stz-bg.com> wrote:
> On 2012-12-10 00:31, Jeff Janes wrote:
>>
>> On Sat, Dec 8, 2012 at 5:54 AM, Condor <condor@stz-bg.com> wrote:
>>

>>> create index clients_tbl_firstname_idx on clients_tbl using btree
>>> (firstname
>>> COLLATE "bg_BG" text_pattern_ops);
>>
>>
>> I don't understand why that is legal.  I would think that
>> text_pattern_ops implies something that contradicts COLLATE "bg_BG".
>> In any event, the inclusion of both of those seems to prevent the
>> index from being used for equality, while the inclusion of just one or
>> the other property does not.  (That is why the query got slower.)
>>
>
> I was thinking when I add COLLATE "bg_BG" text_pattern_ops it's will help to
> indexer to understand that data there is in specific encoding and
> will speed up like clause.

The text_pattern_ops tells it to use an collation which supports
(some) like clauses, while COLLATE "bg_BG" tells it to use that named
collation.

I think that text_pattern_ops is almost identical to COLLATE "C".  But
COLLATE was not possible until 9.1 while the op_class has been around
for much longer.


>> Since firstname is used as equality in your example, there is no
>> reason to change this index to "text_pattern_ops" in order to support
>> your example.
>>
>
> Understand that, but if I need to do like in firstname what is the solution?

By experimentation, if you just do text_pattern_ops, then that
supports both equality and LIKE.  It will not support <, >, between.

> To make two indexes one with "text_pattern_ops" other without it ?

Yes.  This is what the documentation recommends.  It is sometimes not
necessary, but I know of no way to determine when it is needed, other
than experimentation with the exact encoding/collation you have and
the types of queries you want to support.

Cheers,

Jeff