Thread: [GENERAL] Special index for "like"-based query

[GENERAL] Special index for "like"-based query

From

Job

Date:

30 December 2016, 02:21:11

Hello,

in Postgresql 9.6 we have a query running on a very large table based, in some cases, on a like statement:

... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%'

Which type of index can i create to speed to the search when the "like" case happens?

Thank you!
/F

Re: [GENERAL] Special index for "like"-based query

From

"David G. Johnston"

Date:

30 December 2016, 02:33:59

On Thu, Dec 29, 2016 at 4:21 PM, Job <Job@colliniconsulting.it> wrote:

Hello,

in Postgresql 9.6 we have a query running on a very large table based, in some cases, on a like statement:

... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%'

Which type of index can i create to speed to the search when the "like" case happens?

GIST

https://www.postgresql.org/docs/9.6/static/pgtrgm.html

https://www.postgresql.org/docs/9.6/static/btree-gist.html

David J.

Re: [GENERAL] Special index for "like"-based query

From

Tomas Vondra

Date:

30 December 2016, 02:38:57

On 12/30/2016 12:33 AM, David G. Johnston wrote:
> On Thu, Dec 29, 2016 at 4:21 PM, Job <Job@colliniconsulting.it
> <mailto:Job@colliniconsulting.it>>wrote:
>
>     Hello,
>
>     in Postgresql 9.6 we have a query running on a very large table
>     based, in some cases, on a like statement:
>
>     ... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%'
>
>     Which type of index can i create to speed to the search when the
>     "like" case happens?
>
>
> GIST
>
> https://www.postgresql.org/docs/9.6/static/pgtrgm.html
>
> https://www.postgresql.org/docs/9.6/static/btree-gist.html
> 

For prefix queries, it's also possible to use simple btree index with
varchar_pattern_ops.

https://www.postgresql.org/docs/9.6/static/indexes-opclass.html


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [GENERAL] Special index for "like"-based query

From

"David G. Johnston"

Date:

30 December 2016, 02:46:19

On Thu, Dec 29, 2016 at 4:38 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On 12/30/2016 12:33 AM, David G. Johnston wrote:
On Thu, Dec 29, 2016 at 4:21 PM, Job <Job@colliniconsulting.it
<mailto:Job@colliniconsulting.it>>wrote:

Hello,

in Postgresql 9.6 we have a query running on a very large table
based, in some cases, on a like statement:

... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%'

Which type of index can i create to speed to the search when the
"like" case happens?

GIST

https://www.postgresql.org/docs/9.6/static/pgtrgm.html

https://www.postgresql.org/docs/9.6/static/btree-gist.html

For prefix queries, it's also possible to use simple btree index with varchar_pattern_ops.

https://www.postgresql.org/docs/9.6/static/indexes-opclass.html

Even knowing that this feature exists I don't know that I could have found it within a reasonable amount of time in its present location. A few cross-references from elsewhere (probably at least the functions part of the documentation) would make learning about the capability a lot easier.

David J.

Re: [GENERAL] Special index for "like"-based query

From

Tomas Vondra

Date:

30 December 2016, 02:51:07

On 12/30/2016 12:46 AM, David G. Johnston wrote:
> On Thu, Dec 29, 2016 at 4:38 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>wrote:
>
>     On 12/30/2016 12:33 AM, David G. Johnston wrote:
>
>         On Thu, Dec 29, 2016 at 4:21 PM, Job <Job@colliniconsulting.it
>         <mailto:Job@colliniconsulting.it>
>         <mailto:Job@colliniconsulting.it
>         <mailto:Job@colliniconsulting.it>>>wrote:
>
>             Hello,
>
>             in Postgresql 9.6 we have a query running on a very large table
>             based, in some cases, on a like statement:
>
>             ... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%'
>
>             Which type of index can i create to speed to the search when the
>             "like" case happens?
>
>
>         GIST
>
>         https://www.postgresql.org/docs/9.6/static/pgtrgm.html
>         <https://www.postgresql.org/docs/9.6/static/pgtrgm.html>
>
>         https://www.postgresql.org/docs/9.6/static/btree-gist.html
>         <https://www.postgresql.org/docs/9.6/static/btree-gist.html>
>         
>
>
>     For prefix queries, it's also possible to use simple btree index
>     with varchar_pattern_ops.
>
>     https://www.postgresql.org/docs/9.6/static/indexes-opclass.html
>     <https://www.postgresql.org/docs/9.6/static/indexes-opclass.html>
>
>
> Even knowing that this feature exists I don't know that I could have
> found it within a reasonable amount of time in its present location.  A
> few cross-references from elsewhere (probably at least the functions
> part of the documentation) would make learning about the capability a
> lot easier.
>

Well, it's referenced right from the "Indexes" part of the documentation
(right at the beginning of "Index Types"):

https://www.postgresql.org/docs/9.6/static/indexes.html

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [GENERAL] Special index for "like"-based query

From

"David G. Johnston"

Date:

30 December 2016, 04:10:42

On Thu, Dec 29, 2016 at 4:51 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On 12/30/2016 12:46 AM, David G. Johnston wrote:
On Thu, Dec 29, 2016 at 4:38 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>wrote:

On 12/30/2016 12:33 AM, David G. Johnston wrote:

On Thu, Dec 29, 2016 at 4:21 PM, Job <Job@colliniconsulting.it
<mailto:Job@colliniconsulting.it>
<mailto:Job@colliniconsulting.it
<mailto:Job@colliniconsulting.it>>>wrote:

Hello,

in Postgresql 9.6 we have a query running on a very large table
based, in some cases, on a like statement:

... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%'

Which type of index can i create to speed to the search when the
"like" case happens?

GIST

https://www.postgresql.org/docs/9.6/static/pgtrgm.html
<https://www.postgresql.org/docs/9.6/static/pgtrgm.html>

https://www.postgresql.org/docs/9.6/static/btree-gist.html
<https://www.postgresql.org/docs/9.6/static/btree-gist.html>

For prefix queries, it's also possible to use simple btree index
with varchar_pattern_ops.

https://www.postgresql.org/docs/9.6/static/indexes-opclass.html
<https://www.postgresql.org/docs/9.6/static/indexes-opclass.html>

Even knowing that this feature exists I don't know that I could have
found it within a reasonable amount of time in its present location. A
few cross-references from elsewhere (probably at least the functions
part of the documentation) would make learning about the capability a
lot easier.

Well, it's referenced right from the "Indexes" part of the documentation (right at the beginning of "Index Types"):

https://www.postgresql.org/docs/9.6/static/indexes.html

While I may have an understanding of what operator classes and families are when I am in my SQL thinking mode those terms don't really come to mind. Maybe part of the problem is that SQL doesn't have indexes and so my formal education never covered them. I learned how to use "CREATE INDEX" to meet most common needs but the fact that I'm getting a b-tree family index is well hidden.

While I'm all for learning the theory a more prescriptive approach (do this to get an index that will allow prefix LIKEs to use it - see this section for detail) to the topic would be welcome. Tell the user how to use an index when they are learning about the feature that they care about - LIKE - not require them to learn all about indexes and later realize/remember that one particular incantation will solve the LIKE problem.

David J.

R: [GENERAL] Special index for "like"-based query

From

Job

Date:

30 December 2016, 12:55:34

>>GIST

>>https://www.postgresql.org/docs/9.6/static/pgtrgm.html

>>https://www.postgresql.org/docs/9.6/static/btree-gist.html

I tried with a GIST-like index and queries improves a lot, thank you!

Furthermore, this type of index is also suitable for a timestamp query, where we can mix date and time parameters?

Thank you again!
/F

Da: David G. Johnston [david.g.johnston@gmail.com]
Inviato: venerdì 30 dicembre 2016 0.33
A: Job
Cc: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Special index for "like"-based query

On Thu, Dec 29, 2016 at 4:21 PM, Job <Job@colliniconsulting.it> wrote:

Hello,

in Postgresql 9.6 we have a query running on a very large table based, in some cases, on a like statement:

... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%'

Which type of index can i create to speed to the search when the "like" case happens?

GIST

https://www.postgresql.org/docs/9.6/static/pgtrgm.html

https://www.postgresql.org/docs/9.6/static/btree-gist.html

David J.

R: [GENERAL] Special index for "like"-based query

From

Job

Date:

30 December 2016, 13:00:28

I tried to create a GIST/GIN index on a timestamp without time zone field but it does not work.

Are there alternatives index types or timezone could speed query up?

Thank you

Da: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] per conto di Job [Job@colliniconsulting.it]
Inviato: venerdì 30 dicembre 2016 10.55
A: David G. Johnston
Cc: pgsql-general@postgresql.org
Oggetto: R: [GENERAL] Special index for "like"-based query

>>GIST

>>https://www.postgresql.org/docs/9.6/static/pgtrgm.html

>>https://www.postgresql.org/docs/9.6/static/btree-gist.html

I tried with a GIST-like index and queries improves a lot, thank you!

Furthermore, this type of index is also suitable for a timestamp query, where we can mix date and time parameters?

Thank you again!
/F

On Thu, Dec 29, 2016 at 4:21 PM, Job <Job@colliniconsulting.it> wrote:

Hello,

in Postgresql 9.6 we have a query running on a very large table based, in some cases, on a like statement:

... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%'

Which type of index can i create to speed to the search when the "like" case happens?

GIST

https://www.postgresql.org/docs/9.6/static/pgtrgm.html

https://www.postgresql.org/docs/9.6/static/btree-gist.html

David J.

Re: [GENERAL] Special index for "like"-based query

From

Francisco Olarte

Date:

30 December 2016, 13:18:29

On Fri, Dec 30, 2016 at 11:00 AM, Job <Job@colliniconsulting.it> wrote:
> I tried to create a GIST/GIN index on a timestamp without time zone field
> but it does not work.
> Are there alternatives index types or timezone could speed query up?

Remember a timestamp is just a real number ( a point on the time line
) with some fancy formatting for I/O ( or you will suffer ). This
menas when you have a ts column and want to query for a date it is
usually better to do [ts>='2016-12-29' and ts<'2016-12-13'] than doing
[cast(ts as date) = '2016-12-29'] ( similar to how a real number is
better queried as [r>=1.0 and r<2.0] than [int(r)=1] ). Normally you
get good results with btree indexes.

And, basically, if you need help with some queries you could try
posting them whole, even redacted, along the table defs, this way
perople can see the problem and not invent one based on a partial
description. I do not see any thing in common between 'like based
query' and timestmap columns.

Francisco Olarte.

R: [GENERAL] Special index for "like"-based query

From

Job

Date:

30 December 2016, 13:42:24

>>And, basically, if you need help with some queries you could try
>>posting them whole, even redacted, along the table defs, this way
>>perople can see the problem and not invent one based on a partial
>>description

Thank you very much, very kind from you.

The index applied on the timestamp field is a btree("timestamp")

The query is:

select domain, sum(accessi) as c_count from TABLE where action='1' AND profile IN ('PROFILE_CODE') AND timestamp::date
BETWEEN'2016-12-27' AND '2016-12-30' AND timestamp::time BETWEEN '00:00:00' AND '23:59:59' GROUP BY domain ORDER BY
c_countDESC  LIMIT 101 

The table format is:
  Column   |           Type           |                                   Modifiers
-----------+--------------------------+--------------------------------------------------------------------------------
 id        | numeric(1000,1)          | not null default function_get_next_sequence('webtraffic_archive_id_seq'::text)
 timestamp | timestamp with time zone |
 domain    | character varying(255)   |
 action    | character varying(5)     |
 profile   | character varying        |
 accessi   | bigint                   |
 url       | text                     |

Indexes:
    "webtraffic_archive_day_2016_04_15_action_wbidx" btree (action)
    "webtraffic_archive_day_2016_04_15_domain_wbidx" btree (domain)
    "webtraffic_archive_day_2016_04_15_profile_wbidx" btree (profile) CLUSTER
    "webtraffic_archive_day_2016_04_15_timestamp_wbidx" btree ("timestamp")

Last question: the table is partitioned. I need to manually create index for every sub-tables or there is a way to
createon every sub-tables once? 

THANK YOU!
/F

Re: [GENERAL] Special index for "like"-based query

From

"Charles Clavadetscher"

Date:

30 December 2016, 14:28:42

Hello

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Job
> Sent: Freitag, 30. Dezember 2016 11:42
> To: Francisco Olarte <folarte@peoplecall.com>
> Cc: David G. Johnston <david.g.johnston@gmail.com>; pgsql-general@postgresql.org
> Subject: R: [GENERAL] Special index for "like"-based query
>
> >>And, basically, if you need help with some queries you could try
> >>posting them whole, even redacted, along the table defs, this way
> >>perople can see the problem and not invent one based on a partial
> >>description
>
> Thank you very much, very kind from you.
>
> The index applied on the timestamp field is a btree("timestamp")
>
> The query is:
>
> select domain, sum(accessi) as c_count from TABLE where action='1' AND profile IN ('PROFILE_CODE') AND
> timestamp::date  BETWEEN '2016-12-27' AND '2016-12-30' AND timestamp::time BETWEEN '00:00:00' AND '23:59:59' GROUP
> BY domain ORDER BY c_count DESC  LIMIT 101
>
> The table format is:
>   Column   |           Type           |                                   Modifiers
> -----------+--------------------------+-----------------------------------------------------------------------------
> ---
>  id        | numeric(1000,1)          | not null default
> function_get_next_sequence('webtraffic_archive_id_seq'::text)
>  timestamp | timestamp with time zone |
>  domain    | character varying(255)   |
>  action    | character varying(5)     |
>  profile   | character varying        |
>  accessi   | bigint                   |
>  url       | text                     |
>
> Indexes:
>     "webtraffic_archive_day_2016_04_15_action_wbidx" btree (action)
>     "webtraffic_archive_day_2016_04_15_domain_wbidx" btree (domain)
>     "webtraffic_archive_day_2016_04_15_profile_wbidx" btree (profile) CLUSTER
>     "webtraffic_archive_day_2016_04_15_timestamp_wbidx" btree ("timestamp")

Is the index on timestamp used at all? The index on timestamp is built on timestamp, but you query using
timestamp::date.
You can check this using EXPLAIN.

Bye
Charles

>
> Last question: the table is partitioned. I need to manually create index for every sub-tables or there is a way to
> create on every sub-tables once?
>
> THANK YOU!
> /F
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Special index for "like"-based query

From

Alban Hertroys

Date:

30 December 2016, 15:01:57

> On 30 Dec 2016, at 11:42, Job <Job@colliniconsulting.it> wrote:
>
>>> And, basically, if you need help with some queries you could try
>>> posting them whole, even redacted, along the table defs, this way
>>> perople can see the problem and not invent one based on a partial
>>> description
>
> Thank you very much, very kind from you.
>
> The index applied on the timestamp field is a btree("timestamp")
>
> The query is:
>
> select domain, sum(accessi) as c_count from TABLE where action='1' AND profile IN ('PROFILE_CODE') AND
timestamp::date BETWEEN '2016-12-27' AND '2016-12-30' AND timestamp::time BETWEEN '00:00:00' AND '23:59:59' GROUP BY
domainORDER BY c_count DESC  LIMIT 101 
>
> The table format is:
>  Column   |           Type           |                                   Modifiers
>
-----------+--------------------------+--------------------------------------------------------------------------------
> id        | numeric(1000,1)          | not null default function_get_next_sequence('webtraffic_archive_id_seq'::text)
> timestamp | timestamp with time zone |
> domain    | character varying(255)   |
> action    | character varying(5)     |
> profile   | character varying        |
> accessi   | bigint                   |
> url       | text                     |
>
> Indexes:
>    "webtraffic_archive_day_2016_04_15_action_wbidx" btree (action)
>    "webtraffic_archive_day_2016_04_15_domain_wbidx" btree (domain)
>    "webtraffic_archive_day_2016_04_15_profile_wbidx" btree (profile) CLUSTER
>    "webtraffic_archive_day_2016_04_15_timestamp_wbidx" btree ("timestamp")
>
> Last question: the table is partitioned. I need to manually create index for every sub-tables or there is a way to
createon every sub-tables once? 

It's usually more efficient to cast the constants you're comparing to, than to cast a field value for each record in
theset. The exception to that is when you have an index on the casted field. 

In your case, since you're casting to date and time separately, and whole days even, it's probably more efficient to
combinethat into: 

… AND timestamp BETWEEN '2016-12-27 00:00:00'::timestamp with time zone AND '2016-12-30 23:59:59'::timestamp with time
zone... 

But even then, you're excluding items that fall in the second between the end date and the next day. The new range
typesare useful there, for example: 

… AND timestamp <@ '[2016-12-27 00:00:00, 2016-12-31 00:00:00)'::tsrange

The above isn't entirely correct, as tsrange uses timestamp without time zone, but you get the gist.

However, if those time ranges can have other values than '[00:00. 23:59]', then you probably need 2 indexes on that
timestampcolumn; one cast to date and one to time. Otherwise, you end up creating timestamp range filters for each day
inthe range in the query (which could still be the better approach). 


Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

R: [GENERAL] Special index for "like"-based query

From

Job

Date:

30 December 2016, 15:13:50

Dear Alban,

Regarding:

>>... AND timestamp BETWEEN '2016-12-27 00:00:00'::timestamp with time zone AND '2016-12-30 23:59:59'::timestamp with
timezone ...  

I think it is a very good approach, and i would like to try.
My table has got two different field for "starting" and "ending" timestamp values.

Shall i create an index together with the two field (starting and ending) with the "tsrange" statement?

Thank you again!

/F

R: [GENERAL] Special index for "like"-based query

From

Job

Date:

30 December 2016, 15:49:58

Hi Alban,

I was wrong: i have only one column: tsrarnge.

Which index can i create to use this statement fastly:
 ... AND timestamp <@ '[2016-12-27 00:00:00, 2016-12-31 00:00:00)'::tsrange

Thank you again!

/F

Re: [GENERAL] Special index for "like"-based query

From

Francisco Olarte

Date:

30 December 2016, 16:03:14

Job:

On Fri, Dec 30, 2016 at 1:01 PM, Alban Hertroys <haramrae@gmail.com> wrote:
>> On 30 Dec 2016, at 11:42, Job <Job@colliniconsulting.it> wrote:
...
>> The index applied on the timestamp field is a btree("timestamp")
...
>> select domain, sum(accessi) as c_count from TABLE where action='1' AND profile IN ('PROFILE_CODE') AND
timestamp::date BETWEEN '2016-12-27' AND '2016-12-30' AND timestamp::time BETWEEN '00:00:00' AND '23:59:59' GROUP BY
domainORDER BY c_count DESC  LIMIT 101 
...
>> Last question: the table is partitioned. I need to manually create index for every sub-tables or there is a way to
createon every sub-tables once? 

I think Alban missed this ( or I missed his response ). Yes, you need
to create the indexes for the partitions. When in doubt, create a
partition and \d+ it.

IIRC you can do 'create partition LIKE master INCLUDING indexes
INHERITS(master)', but you'll have to test. Anyway, this is normally
not too useful as the master table is normally indexless and kept
empty. I normally script the partition creation, and I woill recommend
doing that too.

Now onto the BETWEEN PROBLEM:

> It's usually more efficient to cast the constants you're comparing to, than to cast a field value for each record in
theset. The exception to that is when you have an index on the casted field. 
> In your case, since you're casting to date and time separately, and whole days even, it's probably more efficient to
combinethat into: 
> … AND timestamp BETWEEN '2016-12-27 00:00:00'::timestamp with time zone AND '2016-12-30 23:59:59'::timestamp with
timezone ... 

Alban is correct here, in both counts.

- DO NOT CAST THE COLUMNS, cast the constants ( unless you are
building a specialized index, you can build an index in cast(timestamp
as date), and it would be useful if you did a lot of queries ONLY ON
DATES ).

- DO NOT USE  CLOSED INTERVALS for real number-like columns ( remember
I told you timestamps are a point in the time line, so real-like ).
The man problems strives from the fact tht you cannot cover the real
line with non-overlapping CLOSED intervals, BETWEEN uses closed
intervals and subtle problems permeate from this fact. Math is a harsh
mistress.

( Even when working with integer-like numbers half-open intervals are
normally the best way to go in the not so short term, but between
seems so nice and natural and reads so well that even I use it where I
should not )

> But even then, you're excluding items that fall in the second between the end date and the next day. The new range
typesare useful there, for example: 
>
> … AND timestamp <@ '[2016-12-27 00:00:00, 2016-12-31 00:00:00)'::tsrange

And this is how it is solved with those new-fangled interval thingies
( I've been keying (ts>=xxx and ts <yyy), parens included, for so long
that I never remember those, but they are nearly the same. Probably
the optimizer splits it anyway.

> The above isn't entirely correct, as tsrange uses timestamp without time zone, but you get the gist.

Or use >=, < those work.

> However, if those time ranges can have other values than '[00:00. 23:59]', then you probably need 2 indexes on that
timestampcolumn; one cast to date and one to time. Otherwise, you end up creating timestamp range filters for each day
inthe range in the query (which could still be the better approach). 

Even if they have other values, single index on timestamp column is
the way to go if you only select single intervals. I mean, Xmas
morning ( data between 25 and 25 and time between 8:00 and 12:59 can
easiliy be selected by the interval [20161225T080000,
20161225T130000), but all the mornings in december can not ( although
a query with ts>='20160101' and ts <'20170101' and ts:time >='08:00'
and ts:time<'13:00' should work quite well, the first two condition
guide to an index scan and the rest is done with a filtering ).

Francisco Olarte.