Thread: SELECT with LIKE clause makes full table scan
Hello, We face in a PostgreSQL 11.4 installation on a potent Linux host a serious performance degree. A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a column with an Index ignores this and does a full table scan: sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 9610%' ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..680262.71 rows=510 width=952) (actual time=1324.096..1349.429 rows=1 loops=1) Workers Planned: 4 Workers Launched: 4 Buffers: shared hit=102040 read=560674 -> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) (actual time=1117.663..1315.062 rows=0 loops=5) Filter: (d01ort ~~ 'Z 9610%'::text) Rows Removed by Filter: 1055853 Buffers: shared hit=102040 read=560674 Planning Time: 2.028 ms Execution Time: 1349.593 ms (10 Zeilen) Why is this (ignoring the Index) and what could be done? Thanks matthias -- Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub
st 26. 1. 2022 v 11:55 odesílatel Matthias Apitz <guru@unixarea.de> napsal: > > > Hello, > > We face in a PostgreSQL 11.4 installation on a potent Linux host a > serious performance degree. > > A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a > column with an Index ignores this and does a full table scan: Which index is ignored? Can you share the CREATE INDEX command as well? > sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 9610%' ; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------- > Gather (cost=1000.00..680262.71 rows=510 width=952) (actual time=1324.096..1349.429 rows=1 loops=1) > Workers Planned: 4 > Workers Launched: 4 > Buffers: shared hit=102040 read=560674 > -> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) (actual time=1117.663..1315.062 rows=0 loops=5) > Filter: (d01ort ~~ 'Z 9610%'::text) > Rows Removed by Filter: 1055853 > Buffers: shared hit=102040 read=560674 > Planning Time: 2.028 ms > Execution Time: 1349.593 ms > (10 Zeilen) > > Why is this (ignoring the Index) and what could be done? > > Thanks > > matthias > > > -- > Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045 > Public GnuPG key: http://www.unixarea.de/key.pub > >
El día miércoles, enero 26, 2022 a las 12:20:08 +0100, Josef Šimánek escribió: > st 26. 1. 2022 v 11:55 odesílatel Matthias Apitz <guru@unixarea.de> napsal: > > > > > > Hello, > > > > We face in a PostgreSQL 11.4 installation on a potent Linux host a > > serious performance degree. > > > > A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a > > column with an Index ignores this and does a full table scan: > > Which index is ignored? Can you share the CREATE INDEX command as well? /* # $Revision: 1.1.2.21 $ */ create UNIQUE INDEX d01sig on d01buch(d01gsi,d01ex) ; /* alter table d01buch add constraint d01sig unique (d01gsi,d01ex) deferrable initially deferred; *//* D01SIG */ create INDEX d01mcopyno on d01buch(d01mcopyno) ;/* D01MCOPYNO */ create INDEX d01bnr on d01buch(d01bnr) ;/* D01BNR */ create INDEX d01ort on d01buch(d01ort) ;/* D01ORT */ create INDEX d01mcopynozweig on d01buch(d01mcopyno,d01zweig) ;/* D01KATZWEIG */ create INDEX d01ort2 on d01buch(d01ort2) ;/* D02ORT2 */ create INDEX d01aufnahme on d01buch(d01aufnahme) ;/* D01aufnahme */ create INDEX d01titlecatkey on d01buch(d01titlecatkey) ;/* D01TITLECATKEY */ create INDEX d01invkrnr on d01buch(d01invkreis,d01invnr) ;/* D01invkrnr */ matthias > > sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 9610%' ; > > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------------------- > > Gather (cost=1000.00..680262.71 rows=510 width=952) (actual time=1324.096..1349.429 rows=1 loops=1) > > Workers Planned: 4 > > Workers Launched: 4 > > Buffers: shared hit=102040 read=560674 > > -> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) (actual time=1117.663..1315.062 rows=0 loops=5) > > Filter: (d01ort ~~ 'Z 9610%'::text) > > Rows Removed by Filter: 1055853 > > Buffers: shared hit=102040 read=560674 > > Planning Time: 2.028 ms > > Execution Time: 1349.593 ms > > (10 Zeilen) > > > > Why is this (ignoring the Index) and what could be done? > > > > Thanks > > > > matthias > > > > > > -- > > Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045 > > Public GnuPG key: http://www.unixarea.de/key.pub > > > > -- Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub
On Wed, Jan 26, 2022 at 11:55 AM Matthias Apitz <guru@unixarea.de> wrote: > A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a > column with an Index ignores this and does a full table scan: > > sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 9610%' ; > -> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952)... > Filter: (d01ort ~~ 'Z 9610%'::text) Could it be a collation issue? They need to match between the query and the index typically. See below how an index-scan transforms into a sequential-scan just from using ILIKE instead of LIKE. Just to illustrate how collations affect plans for prefix-like queries. Show the relevant DDL for the `d01ort` column, and its indexes, and that should help diagnose this. Perhaps it could be related to statistics too? --DD ``` ddevienne=> explain select count(*) from pg_class where relname like 'PNS%'; -> Index Only Scan using pg_class_relname_nsp_index on pg_class (cost=0.41..8.44 rows=5 width=0) Index Cond: ((relname >= 'PNS'::text) AND (relname < 'PNT'::text)) Filter: (relname ~~ 'PNS%'::text) Time: 1.647 ms ddevienne=> explain select count(*) from pg_class where relname ilike 'PNS%'; Aggregate (cost=2682.35..2682.36 rows=1 width=8) -> Seq Scan on pg_class (cost=0.00..2682.34 rows=5 width=0) Filter: (relname ~~* 'PNS%'::text) Time: 1.262 ms ```
Hi, On Wed, Jan 26, 2022 at 12:39:25PM +0100, Matthias Apitz wrote: > > > > sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 9610%' ; > > > QUERY PLAN > > > ------------------------------------------------------------------------------------------------------------------------------- > > > Gather (cost=1000.00..680262.71 rows=510 width=952) (actual time=1324.096..1349.429 rows=1 loops=1) > > > Workers Planned: 4 > > > Workers Launched: 4 > > > Buffers: shared hit=102040 read=560674 > > > -> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) (actual time=1117.663..1315.062 rows=0 loops=5) > > > Filter: (d01ort ~~ 'Z 9610%'::text) > > > Rows Removed by Filter: 1055853 > > > Buffers: shared hit=102040 read=560674 > > > Planning Time: 2.028 ms > > > Execution Time: 1349.593 ms > > > (10 Zeilen) > > > > > > Why is this (ignoring the Index) and what could be done? > [...] > create INDEX d01ort on d01buch(d01ort) ;/* D01ORT */ That index can't be used with a LIKE that has a trailing wildcard. You need to either create an index with text_pattern_ops operator class (see https://www.postgresql.org/docs/current/indexes-opclass.html), or a GIN index using pg_trgm (which will also work with non-trailing wildcards), see https://www.postgresql.org/docs/current/pgtrgm.html.
On Wed, Jan 26, 2022 at 1:13 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > Why is this (ignoring the Index) and what could be done? > > [...] > > create INDEX d01ort on d01buch(d01ort) ;/* D01ORT */ > > That index can't be used with a LIKE that has a trailing wildcard. Really? That seems to contradict the doc, i.e. default index type is B-Tree, which definitely supports trailing wildcard LIKE-predicates, as explicitly stated in said doc: https://www.postgresql.org/docs/14/indexes-types.html#INDEXES-TYPES-BTREE So what makes you say that? --DD PS: That also contradicts the small demo I made earlier up-thread: ``` ddevienne=> \d+ pg_class Indexes: "pg_class_oid_index" UNIQUE, btree (oid) "pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace) <<<<<<<<< "pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode) ``` Whether the index is UNIQUE or not does not matter in this case.
Hi, On Wed, Jan 26, 2022 at 02:34:21PM +0100, Dominique Devienne wrote: > On Wed, Jan 26, 2022 at 1:13 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > Why is this (ignoring the Index) and what could be done? > > > [...] > > > create INDEX d01ort on d01buch(d01ort) ;/* D01ORT */ > > > > That index can't be used with a LIKE that has a trailing wildcard. > > Really? That seems to contradict the doc, i.e. default index type is B-Tree, > which definitely supports trailing wildcard LIKE-predicates, as > explicitly stated in said doc: > https://www.postgresql.org/docs/14/indexes-types.html#INDEXES-TYPES-BTREE > > So what makes you say that? --DD This part of the documentation you mentioned: > However, if your database does not use the C locale you will need to create > the index with a special operator class to support indexing of > pattern-matching queries; see Section 11.10 below. > > PS: That also contradicts the small demo I made earlier up-thread: relname datatype is name, which has a default C collation, so you are in the only case that natively work for btree indexes: # select unnest(indcollation)::regcollation from pg_index where indexrelid = 'pg_class_relname_nsp_index'::regclass; unnest -------- "C" - (2 rows) I highly doubt that OP tables are also using C collation, so almost no one does that.
> We face in a PostgreSQL 11.4 installation on a potent Linux host a
> ...
> Why is this (ignoring the Index) and what could be done?
> Why is this (ignoring the Index) and what could be done?
IMHO: 11.4 is very old. ( Release date: 2019-06-20 ) and missing a lot of patches.
The latest patch release is 11.14 ( see https://www.postgresql.org/docs/11/release.html )
You can test the PG11.14 with the PostgreSQL docker image ( https://hub.docker.com/_/postgres )
- docker pull postgres:11.14-bullseye
- import table ( d01buch )
- create indexes
- test your query
Regards,
Imre
Matthias Apitz <guru@unixarea.de> ezt írta (időpont: 2022. jan. 26., Sze, 11:55):
Hello,
We face in a PostgreSQL 11.4 installation on a potent Linux host a
serious performance degree.
A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a
column with an Index ignores this and does a full table scan:
sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 9610%' ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..680262.71 rows=510 width=952) (actual time=1324.096..1349.429 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=102040 read=560674
-> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) (actual time=1117.663..1315.062 rows=0 loops=5)
Filter: (d01ort ~~ 'Z 9610%'::text)
Rows Removed by Filter: 1055853
Buffers: shared hit=102040 read=560674
Planning Time: 2.028 ms
Execution Time: 1349.593 ms
(10 Zeilen)
Why is this (ignoring the Index) and what could be done?
Thanks
matthias
--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
Julien Rouhaud <rjuju123@gmail.com> writes: > On Wed, Jan 26, 2022 at 02:34:21PM +0100, Dominique Devienne wrote: >> On Wed, Jan 26, 2022 at 1:13 PM Julien Rouhaud <rjuju123@gmail.com> wrote: >>> That index can't be used with a LIKE that has a trailing wildcard. >> So what makes you say that? --DD > This part of the documentation you mentioned: >> However, if your database does not use the C locale you will need to create >> the index with a special operator class to support indexing of >> pattern-matching queries; see Section 11.10 below. Note that declaring the index with C collation should also work, and might be preferable to using the pattern_ops opclass. C collation has at least some chance of being used explicitly in queries, whereas a pattern_ops index is basically never going to match anything but LIKE/regex searches. regards, tom lane
We changed two relevant Indexes to CREATE INDEX d01ort ON d01buch(d01ort bpchar_pattern_ops ); CREATE INDEX d01ort2 ON d01buch(d01ort2 bpchar_pattern_ops ); and now the same queries are fast. We're looking through our code for more such LIKE clauses on VCHAR columns. Thanks for all the hints matthias -- Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub
Hi, On Wed, Jan 26, 2022 at 11:07 PM Matthias Apitz <guru@unixarea.de> wrote: > > We changed two relevant Indexes to > > CREATE INDEX d01ort ON d01buch(d01ort bpchar_pattern_ops ); > CREATE INDEX d01ort2 ON d01buch(d01ort2 bpchar_pattern_ops ); When you said changed, did you drop the previous ones? As Tom mentioned, those indexes are specialized and are only useful for LIKE 'something%' queries. It's quite likely that your existing indexes were useful for other queries, which may not be as fast without those indexes. You can check in pg_stat_user_indexes if your indexes seems to be used before actually dropping them for instance: https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ALL-INDEXES-VIEW
El día miércoles, enero 26, 2022 a las 11:21:12p. m. +0800, Julien Rouhaud escribió: > Hi, > > On Wed, Jan 26, 2022 at 11:07 PM Matthias Apitz <guru@unixarea.de> wrote: > > > > We changed two relevant Indexes to > > > > CREATE INDEX d01ort ON d01buch(d01ort bpchar_pattern_ops ); > > CREATE INDEX d01ort2 ON d01buch(d01ort2 bpchar_pattern_ops ); > > When you said changed, did you drop the previous ones? Yes, of course. > As Tom > mentioned, those indexes are specialized and are only useful for LIKE > 'something%' queries. It's quite likely that your existing indexes > were useful for other queries, which may not be as fast without those > indexes. You can check in pg_stat_user_indexes if your indexes seems > to be used before actually dropping them for instance: > https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ALL-INDEXES-VIEW Thanks, we're still investigating more cases with LIKE clause and will consider your hint. matthias -- Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub