Thread: [GENERAL] Special index for "like"-based query
Hello, in Postgresql 9.6 we have a query running on a very large table based, in some cases, on a like statement: ... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%' Which type of index can i create to speed to the search when the "like" case happens? Thank you! /F
Hello,
in Postgresql 9.6 we have a query running on a very large table based, in some cases, on a like statement:
... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%'
Which type of index can i create to speed to the search when the "like" case happens?
GIST
On 12/30/2016 12:33 AM, David G. Johnston wrote: > On Thu, Dec 29, 2016 at 4:21 PM, Job <Job@colliniconsulting.it > <mailto:Job@colliniconsulting.it>>wrote: > > Hello, > > in Postgresql 9.6 we have a query running on a very large table > based, in some cases, on a like statement: > > ... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%' > > Which type of index can i create to speed to the search when the > "like" case happens? > > > GIST > > https://www.postgresql.org/docs/9.6/static/pgtrgm.html > > https://www.postgresql.org/docs/9.6/static/btree-gist.html > For prefix queries, it's also possible to use simple btree index with varchar_pattern_ops. https://www.postgresql.org/docs/9.6/static/indexes-opclass.html regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/30/2016 12:33 AM, David G. Johnston wrote:On Thu, Dec 29, 2016 at 4:21 PM, Job <Job@colliniconsulting.it
<mailto:Job@colliniconsulting.it>>wrote:
Hello,
in Postgresql 9.6 we have a query running on a very large table
based, in some cases, on a like statement:
... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%'
Which type of index can i create to speed to the search when the
"like" case happens?
GIST
https://www.postgresql.org/docs/9.6/static/pgtrgm.html
https://www.postgresql.org/docs/9.6/static/btree-gist.html
For prefix queries, it's also possible to use simple btree index with varchar_pattern_ops.
https://www.postgresql.org/docs/9.6/static/indexes-opclass. html
Even knowing that this feature exists I don't know that I could have found it within a reasonable amount of time in its present location. A few cross-references from elsewhere (probably at least the functions part of the documentation) would make learning about the capability a lot easier.
David J.
On 12/30/2016 12:46 AM, David G. Johnston wrote: > On Thu, Dec 29, 2016 at 4:38 PM, Tomas Vondra > <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>wrote: > > On 12/30/2016 12:33 AM, David G. Johnston wrote: > > On Thu, Dec 29, 2016 at 4:21 PM, Job <Job@colliniconsulting.it > <mailto:Job@colliniconsulting.it> > <mailto:Job@colliniconsulting.it > <mailto:Job@colliniconsulting.it>>>wrote: > > Hello, > > in Postgresql 9.6 we have a query running on a very large table > based, in some cases, on a like statement: > > ... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%' > > Which type of index can i create to speed to the search when the > "like" case happens? > > > GIST > > https://www.postgresql.org/docs/9.6/static/pgtrgm.html > <https://www.postgresql.org/docs/9.6/static/pgtrgm.html> > > https://www.postgresql.org/docs/9.6/static/btree-gist.html > <https://www.postgresql.org/docs/9.6/static/btree-gist.html> > > > > For prefix queries, it's also possible to use simple btree index > with varchar_pattern_ops. > > https://www.postgresql.org/docs/9.6/static/indexes-opclass.html > <https://www.postgresql.org/docs/9.6/static/indexes-opclass.html> > > > Even knowing that this feature exists I don't know that I could have > found it within a reasonable amount of time in its present location. A > few cross-references from elsewhere (probably at least the functions > part of the documentation) would make learning about the capability a > lot easier. > Well, it's referenced right from the "Indexes" part of the documentation (right at the beginning of "Index Types"): https://www.postgresql.org/docs/9.6/static/indexes.html regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/30/2016 12:46 AM, David G. Johnston wrote:On Thu, Dec 29, 2016 at 4:38 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>wrote:
On 12/30/2016 12:33 AM, David G. Johnston wrote:
On Thu, Dec 29, 2016 at 4:21 PM, Job <Job@colliniconsulting.it
<mailto:Job@colliniconsulting.it>
<mailto:Job@colliniconsulting.it
<mailto:Job@colliniconsulting.it>>>wrote:
Hello,
in Postgresql 9.6 we have a query running on a very large table
based, in some cases, on a like statement:
... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%'
Which type of index can i create to speed to the search when the
"like" case happens?
GIST
https://www.postgresql.org/docs/9.6/static/pgtrgm.html
<https://www.postgresql.org/docs/9.6/static/pgtrgm.html>
https://www.postgresql.org/docs/9.6/static/btree-gist.html
<https://www.postgresql.org/docs/9.6/static/btree-gist.html>
For prefix queries, it's also possible to use simple btree index
with varchar_pattern_ops.
https://www.postgresql.org/docs/9.6/static/indexes-opclass. html
<https://www.postgresql.org/docs/9.6/static/indexes-opclass. html>
Even knowing that this feature exists I don't know that I could have
found it within a reasonable amount of time in its present location. A
few cross-references from elsewhere (probably at least the functions
part of the documentation) would make learning about the capability a
lot easier.
Well, it's referenced right from the "Indexes" part of the documentation (right at the beginning of "Index Types"):
https://www.postgresql.org/docs/9.6/static/indexes.html
While I may have an understanding of what operator classes and families are when I am in my SQL thinking mode those terms don't really come to mind. Maybe part of the problem is that SQL doesn't have indexes and so my formal education never covered them. I learned how to use "CREATE INDEX" to meet most common needs but the fact that I'm getting a b-tree family index is well hidden.
While I'm all for learning the theory a more prescriptive approach (do this to get an index that will allow prefix LIKEs to use it - see this section for detail) to the topic would be welcome. Tell the user how to use an index when they are learning about the feature that they care about - LIKE - not require them to learn all about indexes and later realize/remember that one particular incantation will solve the LIKE problem.
David J.
>>GIST
I tried with a GIST-like index and queries improves a lot, thank you!
Furthermore, this type of index is also suitable for a timestamp query, where we can mix date and time parameters?
Thank you again!
/F
Thank you again!
/F
Da: David G. Johnston [david.g.johnston@gmail.com]
Inviato: venerdì 30 dicembre 2016 0.33
A: Job
Cc: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Special index for "like"-based query
Hello,
in Postgresql 9.6 we have a query running on a very large table based, in some cases, on a like statement:
... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%'
Which type of index can i create to speed to the search when the "like" case happens?
GIST
I tried to create a GIST/GIN index on a timestamp without time zone field but it does not work.
Are there alternatives index types or timezone could speed query up?
Thank you
Thank you
/F
Da: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] per conto di Job [Job@colliniconsulting.it]
Inviato: venerdì 30 dicembre 2016 10.55
A: David G. Johnston
Cc: pgsql-general@postgresql.org
Oggetto: R: [GENERAL] Special index for "like"-based query
>>GIST
I tried with a GIST-like index and queries improves a lot, thank you!
Furthermore, this type of index is also suitable for a timestamp query, where we can mix date and time parameters?
Thank you again!
/F
Thank you again!
/F
Da: David G. Johnston [david.g.johnston@gmail.com]
Inviato: venerdì 30 dicembre 2016 0.33
A: Job
Cc: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Special index for "like"-based query
Hello,
in Postgresql 9.6 we have a query running on a very large table based, in some cases, on a like statement:
... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%'
Which type of index can i create to speed to the search when the "like" case happens?
GIST
On Fri, Dec 30, 2016 at 11:00 AM, Job <Job@colliniconsulting.it> wrote: > I tried to create a GIST/GIN index on a timestamp without time zone field > but it does not work. > Are there alternatives index types or timezone could speed query up? Remember a timestamp is just a real number ( a point on the time line ) with some fancy formatting for I/O ( or you will suffer ). This menas when you have a ts column and want to query for a date it is usually better to do [ts>='2016-12-29' and ts<'2016-12-13'] than doing [cast(ts as date) = '2016-12-29'] ( similar to how a real number is better queried as [r>=1.0 and r<2.0] than [int(r)=1] ). Normally you get good results with btree indexes. And, basically, if you need help with some queries you could try posting them whole, even redacted, along the table defs, this way perople can see the problem and not invent one based on a partial description. I do not see any thing in common between 'like based query' and timestmap columns. Francisco Olarte.
>>And, basically, if you need help with some queries you could try >>posting them whole, even redacted, along the table defs, this way >>perople can see the problem and not invent one based on a partial >>description Thank you very much, very kind from you. The index applied on the timestamp field is a btree("timestamp") The query is: select domain, sum(accessi) as c_count from TABLE where action='1' AND profile IN ('PROFILE_CODE') AND timestamp::date BETWEEN'2016-12-27' AND '2016-12-30' AND timestamp::time BETWEEN '00:00:00' AND '23:59:59' GROUP BY domain ORDER BY c_countDESC LIMIT 101 The table format is: Column | Type | Modifiers -----------+--------------------------+-------------------------------------------------------------------------------- id | numeric(1000,1) | not null default function_get_next_sequence('webtraffic_archive_id_seq'::text) timestamp | timestamp with time zone | domain | character varying(255) | action | character varying(5) | profile | character varying | accessi | bigint | url | text | Indexes: "webtraffic_archive_day_2016_04_15_action_wbidx" btree (action) "webtraffic_archive_day_2016_04_15_domain_wbidx" btree (domain) "webtraffic_archive_day_2016_04_15_profile_wbidx" btree (profile) CLUSTER "webtraffic_archive_day_2016_04_15_timestamp_wbidx" btree ("timestamp") Last question: the table is partitioned. I need to manually create index for every sub-tables or there is a way to createon every sub-tables once? THANK YOU! /F
Hello > -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Job > Sent: Freitag, 30. Dezember 2016 11:42 > To: Francisco Olarte <folarte@peoplecall.com> > Cc: David G. Johnston <david.g.johnston@gmail.com>; pgsql-general@postgresql.org > Subject: R: [GENERAL] Special index for "like"-based query > > >>And, basically, if you need help with some queries you could try > >>posting them whole, even redacted, along the table defs, this way > >>perople can see the problem and not invent one based on a partial > >>description > > Thank you very much, very kind from you. > > The index applied on the timestamp field is a btree("timestamp") > > The query is: > > select domain, sum(accessi) as c_count from TABLE where action='1' AND profile IN ('PROFILE_CODE') AND > timestamp::date BETWEEN '2016-12-27' AND '2016-12-30' AND timestamp::time BETWEEN '00:00:00' AND '23:59:59' GROUP > BY domain ORDER BY c_count DESC LIMIT 101 > > The table format is: > Column | Type | Modifiers > -----------+--------------------------+----------------------------------------------------------------------------- > --- > id | numeric(1000,1) | not null default > function_get_next_sequence('webtraffic_archive_id_seq'::text) > timestamp | timestamp with time zone | > domain | character varying(255) | > action | character varying(5) | > profile | character varying | > accessi | bigint | > url | text | > > Indexes: > "webtraffic_archive_day_2016_04_15_action_wbidx" btree (action) > "webtraffic_archive_day_2016_04_15_domain_wbidx" btree (domain) > "webtraffic_archive_day_2016_04_15_profile_wbidx" btree (profile) CLUSTER > "webtraffic_archive_day_2016_04_15_timestamp_wbidx" btree ("timestamp") Is the index on timestamp used at all? The index on timestamp is built on timestamp, but you query using timestamp::date. You can check this using EXPLAIN. Bye Charles > > Last question: the table is partitioned. I need to manually create index for every sub-tables or there is a way to > create on every sub-tables once? > > THANK YOU! > /F > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
> On 30 Dec 2016, at 11:42, Job <Job@colliniconsulting.it> wrote: > >>> And, basically, if you need help with some queries you could try >>> posting them whole, even redacted, along the table defs, this way >>> perople can see the problem and not invent one based on a partial >>> description > > Thank you very much, very kind from you. > > The index applied on the timestamp field is a btree("timestamp") > > The query is: > > select domain, sum(accessi) as c_count from TABLE where action='1' AND profile IN ('PROFILE_CODE') AND timestamp::date BETWEEN '2016-12-27' AND '2016-12-30' AND timestamp::time BETWEEN '00:00:00' AND '23:59:59' GROUP BY domainORDER BY c_count DESC LIMIT 101 > > The table format is: > Column | Type | Modifiers > -----------+--------------------------+-------------------------------------------------------------------------------- > id | numeric(1000,1) | not null default function_get_next_sequence('webtraffic_archive_id_seq'::text) > timestamp | timestamp with time zone | > domain | character varying(255) | > action | character varying(5) | > profile | character varying | > accessi | bigint | > url | text | > > Indexes: > "webtraffic_archive_day_2016_04_15_action_wbidx" btree (action) > "webtraffic_archive_day_2016_04_15_domain_wbidx" btree (domain) > "webtraffic_archive_day_2016_04_15_profile_wbidx" btree (profile) CLUSTER > "webtraffic_archive_day_2016_04_15_timestamp_wbidx" btree ("timestamp") > > Last question: the table is partitioned. I need to manually create index for every sub-tables or there is a way to createon every sub-tables once? It's usually more efficient to cast the constants you're comparing to, than to cast a field value for each record in theset. The exception to that is when you have an index on the casted field. In your case, since you're casting to date and time separately, and whole days even, it's probably more efficient to combinethat into: … AND timestamp BETWEEN '2016-12-27 00:00:00'::timestamp with time zone AND '2016-12-30 23:59:59'::timestamp with time zone... But even then, you're excluding items that fall in the second between the end date and the next day. The new range typesare useful there, for example: … AND timestamp <@ '[2016-12-27 00:00:00, 2016-12-31 00:00:00)'::tsrange The above isn't entirely correct, as tsrange uses timestamp without time zone, but you get the gist. However, if those time ranges can have other values than '[00:00. 23:59]', then you probably need 2 indexes on that timestampcolumn; one cast to date and one to time. Otherwise, you end up creating timestamp range filters for each day inthe range in the query (which could still be the better approach). Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Dear Alban, Regarding: >>... AND timestamp BETWEEN '2016-12-27 00:00:00'::timestamp with time zone AND '2016-12-30 23:59:59'::timestamp with timezone ... I think it is a very good approach, and i would like to try. My table has got two different field for "starting" and "ending" timestamp values. Shall i create an index together with the two field (starting and ending) with the "tsrange" statement? Thank you again! /F
Hi Alban, I was wrong: i have only one column: tsrarnge. Which index can i create to use this statement fastly: ... AND timestamp <@ '[2016-12-27 00:00:00, 2016-12-31 00:00:00)'::tsrange Thank you again! /F
Job: On Fri, Dec 30, 2016 at 1:01 PM, Alban Hertroys <haramrae@gmail.com> wrote: >> On 30 Dec 2016, at 11:42, Job <Job@colliniconsulting.it> wrote: ... >> The index applied on the timestamp field is a btree("timestamp") ... >> select domain, sum(accessi) as c_count from TABLE where action='1' AND profile IN ('PROFILE_CODE') AND timestamp::date BETWEEN '2016-12-27' AND '2016-12-30' AND timestamp::time BETWEEN '00:00:00' AND '23:59:59' GROUP BY domainORDER BY c_count DESC LIMIT 101 ... >> Last question: the table is partitioned. I need to manually create index for every sub-tables or there is a way to createon every sub-tables once? I think Alban missed this ( or I missed his response ). Yes, you need to create the indexes for the partitions. When in doubt, create a partition and \d+ it. IIRC you can do 'create partition LIKE master INCLUDING indexes INHERITS(master)', but you'll have to test. Anyway, this is normally not too useful as the master table is normally indexless and kept empty. I normally script the partition creation, and I woill recommend doing that too. Now onto the BETWEEN PROBLEM: > It's usually more efficient to cast the constants you're comparing to, than to cast a field value for each record in theset. The exception to that is when you have an index on the casted field. > In your case, since you're casting to date and time separately, and whole days even, it's probably more efficient to combinethat into: > … AND timestamp BETWEEN '2016-12-27 00:00:00'::timestamp with time zone AND '2016-12-30 23:59:59'::timestamp with timezone ... Alban is correct here, in both counts. - DO NOT CAST THE COLUMNS, cast the constants ( unless you are building a specialized index, you can build an index in cast(timestamp as date), and it would be useful if you did a lot of queries ONLY ON DATES ). - DO NOT USE CLOSED INTERVALS for real number-like columns ( remember I told you timestamps are a point in the time line, so real-like ). The man problems strives from the fact tht you cannot cover the real line with non-overlapping CLOSED intervals, BETWEEN uses closed intervals and subtle problems permeate from this fact. Math is a harsh mistress. ( Even when working with integer-like numbers half-open intervals are normally the best way to go in the not so short term, but between seems so nice and natural and reads so well that even I use it where I should not ) > But even then, you're excluding items that fall in the second between the end date and the next day. The new range typesare useful there, for example: > > … AND timestamp <@ '[2016-12-27 00:00:00, 2016-12-31 00:00:00)'::tsrange And this is how it is solved with those new-fangled interval thingies ( I've been keying (ts>=xxx and ts <yyy), parens included, for so long that I never remember those, but they are nearly the same. Probably the optimizer splits it anyway. > The above isn't entirely correct, as tsrange uses timestamp without time zone, but you get the gist. Or use >=, < those work. > However, if those time ranges can have other values than '[00:00. 23:59]', then you probably need 2 indexes on that timestampcolumn; one cast to date and one to time. Otherwise, you end up creating timestamp range filters for each day inthe range in the query (which could still be the better approach). Even if they have other values, single index on timestamp column is the way to go if you only select single intervals. I mean, Xmas morning ( data between 25 and 25 and time between 8:00 and 12:59 can easiliy be selected by the interval [20161225T080000, 20161225T130000), but all the mornings in december can not ( although a query with ts>='20160101' and ts <'20170101' and ts:time >='08:00' and ts:time<'13:00' should work quite well, the first two condition guide to an index scan and the rest is done with a filtering ). Francisco Olarte.