Thread: SELECT with LIKE clause makes full table scan

SELECT with LIKE clause makes full table scan

From
Matthias Apitz
Date:
Hello,

We face in a PostgreSQL 11.4 installation on a potent Linux host a
serious performance degree.

A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a
column with an Index ignores this and does a full table scan:

sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 9610%' ;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..680262.71 rows=510 width=952) (actual time=1324.096..1349.429 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=102040 read=560674
-> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) (actual time=1117.663..1315.062 rows=0
loops=5)
Filter: (d01ort ~~ 'Z 9610%'::text)
Rows Removed by Filter: 1055853
Buffers: shared hit=102040 read=560674
Planning Time: 2.028 ms
Execution Time: 1349.593 ms
(10 Zeilen)

Why is this (ignoring the Index) and what could be done?

Thanks

    matthias


-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: SELECT with LIKE clause makes full table scan

From
Josef Šimánek
Date:
st 26. 1. 2022 v 11:55 odesílatel Matthias Apitz <guru@unixarea.de> napsal:
>
>
> Hello,
>
> We face in a PostgreSQL 11.4 installation on a potent Linux host a
> serious performance degree.
>
> A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a
> column with an Index ignores this and does a full table scan:

Which index is ignored? Can you share the CREATE INDEX command as well?

> sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 9610%' ;
> QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------
> Gather (cost=1000.00..680262.71 rows=510 width=952) (actual time=1324.096..1349.429 rows=1 loops=1)
> Workers Planned: 4
> Workers Launched: 4
> Buffers: shared hit=102040 read=560674
> -> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) (actual time=1117.663..1315.062 rows=0
loops=5)
> Filter: (d01ort ~~ 'Z 9610%'::text)
> Rows Removed by Filter: 1055853
> Buffers: shared hit=102040 read=560674
> Planning Time: 2.028 ms
> Execution Time: 1349.593 ms
> (10 Zeilen)
>
> Why is this (ignoring the Index) and what could be done?
>
> Thanks
>
>         matthias
>
>
> --
> Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
> Public GnuPG key: http://www.unixarea.de/key.pub
>
>



Re: SELECT with LIKE clause makes full table scan

From
Matthias Apitz
Date:
El día miércoles, enero 26, 2022 a las 12:20:08 +0100, Josef Šimánek escribió:

> st 26. 1. 2022 v 11:55 odesílatel Matthias Apitz <guru@unixarea.de> napsal:
> >
> >
> > Hello,
> >
> > We face in a PostgreSQL 11.4 installation on a potent Linux host a
> > serious performance degree.
> >
> > A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a
> > column with an Index ignores this and does a full table scan:
> 
> Which index is ignored? Can you share the CREATE INDEX command as well?


/* #   $Revision: 1.1.2.21 $ */
create UNIQUE INDEX d01sig on d01buch(d01gsi,d01ex)  ;
/* alter table d01buch add constraint d01sig unique (d01gsi,d01ex) 
deferrable initially deferred;  *//*  D01SIG   */
create INDEX d01mcopyno on d01buch(d01mcopyno)  ;/*  D01MCOPYNO  */
create INDEX d01bnr on d01buch(d01bnr)  ;/*  D01BNR   */
create INDEX d01ort on d01buch(d01ort)  ;/*  D01ORT    */
create INDEX d01mcopynozweig on d01buch(d01mcopyno,d01zweig)  ;/*  D01KATZWEIG   */
create INDEX d01ort2 on d01buch(d01ort2)  ;/*  D02ORT2  */
create INDEX d01aufnahme on d01buch(d01aufnahme)  ;/*  D01aufnahme  */
create INDEX d01titlecatkey on d01buch(d01titlecatkey)  ;/*  D01TITLECATKEY  */
create INDEX d01invkrnr on d01buch(d01invkreis,d01invnr)  ;/*  D01invkrnr  */

    matthias


> > sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 9610%' ;
> > QUERY PLAN
> >
-------------------------------------------------------------------------------------------------------------------------------
> > Gather (cost=1000.00..680262.71 rows=510 width=952) (actual time=1324.096..1349.429 rows=1 loops=1)
> > Workers Planned: 4
> > Workers Launched: 4
> > Buffers: shared hit=102040 read=560674
> > -> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) (actual time=1117.663..1315.062 rows=0
loops=5)
> > Filter: (d01ort ~~ 'Z 9610%'::text)
> > Rows Removed by Filter: 1055853
> > Buffers: shared hit=102040 read=560674
> > Planning Time: 2.028 ms
> > Execution Time: 1349.593 ms
> > (10 Zeilen)
> >
> > Why is this (ignoring the Index) and what could be done?
> >
> > Thanks
> >
> >         matthias
> >
> >
> > --
> > Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
> > Public GnuPG key: http://www.unixarea.de/key.pub
> >
> >

-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: SELECT with LIKE clause makes full table scan

From
Dominique Devienne
Date:
On Wed, Jan 26, 2022 at 11:55 AM Matthias Apitz <guru@unixarea.de> wrote:
> A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a
> column with an Index ignores this and does a full table scan:
>
> sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 9610%' ;
> -> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952)...
> Filter: (d01ort ~~ 'Z 9610%'::text)

Could it be a collation issue? They need to match between the query
and the index typically.

See below how an index-scan transforms into a sequential-scan just
from using ILIKE instead of LIKE.
Just to illustrate how collations affect plans for prefix-like
queries. Show the relevant DDL for the `d01ort` column,
and its indexes, and that should help diagnose this.

Perhaps it could be related to statistics too? --DD

```
ddevienne=> explain select count(*) from pg_class where relname like 'PNS%';
   ->  Index Only Scan using pg_class_relname_nsp_index on pg_class
(cost=0.41..8.44 rows=5 width=0)
         Index Cond: ((relname >= 'PNS'::text) AND (relname < 'PNT'::text))
         Filter: (relname ~~ 'PNS%'::text)
Time: 1.647 ms

ddevienne=> explain select count(*) from pg_class where relname ilike 'PNS%';
 Aggregate  (cost=2682.35..2682.36 rows=1 width=8)
   ->  Seq Scan on pg_class  (cost=0.00..2682.34 rows=5 width=0)
         Filter: (relname ~~* 'PNS%'::text)
Time: 1.262 ms
```



Re: SELECT with LIKE clause makes full table scan

From
Julien Rouhaud
Date:
Hi,

On Wed, Jan 26, 2022 at 12:39:25PM +0100, Matthias Apitz wrote:
> 
> > > sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 9610%' ;
> > > QUERY PLAN
> > >
-------------------------------------------------------------------------------------------------------------------------------
> > > Gather (cost=1000.00..680262.71 rows=510 width=952) (actual time=1324.096..1349.429 rows=1 loops=1)
> > > Workers Planned: 4
> > > Workers Launched: 4
> > > Buffers: shared hit=102040 read=560674
> > > -> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) (actual time=1117.663..1315.062 rows=0
loops=5)
> > > Filter: (d01ort ~~ 'Z 9610%'::text)
> > > Rows Removed by Filter: 1055853
> > > Buffers: shared hit=102040 read=560674
> > > Planning Time: 2.028 ms
> > > Execution Time: 1349.593 ms
> > > (10 Zeilen)
> > >
> > > Why is this (ignoring the Index) and what could be done?
> [...]
> create INDEX d01ort on d01buch(d01ort)  ;/*  D01ORT    */

That index can't be used with a LIKE that has a trailing wildcard.  You need to
either create an index with text_pattern_ops operator class (see
https://www.postgresql.org/docs/current/indexes-opclass.html), or a GIN index
using pg_trgm (which will also work with non-trailing wildcards), see
https://www.postgresql.org/docs/current/pgtrgm.html.



Re: SELECT with LIKE clause makes full table scan

From
Dominique Devienne
Date:
On Wed, Jan 26, 2022 at 1:13 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > > Why is this (ignoring the Index) and what could be done?
> > [...]
> > create INDEX d01ort on d01buch(d01ort)  ;/*  D01ORT    */
>
> That index can't be used with a LIKE that has a trailing wildcard.

Really? That seems to contradict the doc, i.e. default index type is B-Tree,
which definitely supports trailing wildcard LIKE-predicates, as
explicitly stated in said doc:
https://www.postgresql.org/docs/14/indexes-types.html#INDEXES-TYPES-BTREE

So what makes you say that? --DD

PS: That also contradicts the small demo I made earlier up-thread:
```
ddevienne=> \d+ pg_class
Indexes:
    "pg_class_oid_index" UNIQUE, btree (oid)
    "pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace) <<<<<<<<<
    "pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)
```
Whether the index is UNIQUE or not does not matter in this case.



Re: SELECT with LIKE clause makes full table scan

From
Julien Rouhaud
Date:
Hi,

On Wed, Jan 26, 2022 at 02:34:21PM +0100, Dominique Devienne wrote:
> On Wed, Jan 26, 2022 at 1:13 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > > > Why is this (ignoring the Index) and what could be done?
> > > [...]
> > > create INDEX d01ort on d01buch(d01ort)  ;/*  D01ORT    */
> >
> > That index can't be used with a LIKE that has a trailing wildcard.
> 
> Really? That seems to contradict the doc, i.e. default index type is B-Tree,
> which definitely supports trailing wildcard LIKE-predicates, as
> explicitly stated in said doc:
> https://www.postgresql.org/docs/14/indexes-types.html#INDEXES-TYPES-BTREE
> 
> So what makes you say that? --DD

This part of the documentation you mentioned:

> However, if your database does not use the C locale you will need to create
> the index with a special operator class to support indexing of
> pattern-matching queries; see Section 11.10 below.
> 
> PS: That also contradicts the small demo I made earlier up-thread:

relname datatype is name, which has a default C collation, so you are in the
only case that natively work for btree indexes:

# select unnest(indcollation)::regcollation from pg_index where indexrelid = 'pg_class_relname_nsp_index'::regclass;
 unnest
--------
 "C"
 -
(2 rows)

I highly doubt that OP tables are also using C collation, so almost no one
does that.



Re: SELECT with LIKE clause makes full table scan

From
Imre Samu
Date:
> We face in a PostgreSQL 11.4 installation on a potent Linux host a
> ...
> Why is this (ignoring the Index) and what could be done?

IMHO: 11.4 is very old.  ( Release date: 2019-06-20 ) and missing a lot of patches.
The latest patch release is 11.14  ( see https://www.postgresql.org/docs/11/release.html )

You can test the PG11.14 with the PostgreSQL docker image ( https://hub.docker.com/_/postgres )
- docker pull postgres:11.14-bullseye
- import table ( d01buch )
- create indexes
- test your query

Regards,
 Imre

Matthias Apitz <guru@unixarea.de> ezt írta (időpont: 2022. jan. 26., Sze, 11:55):

Hello,

We face in a PostgreSQL 11.4 installation on a potent Linux host a
serious performance degree.

A SELECT with a LIKE clause on a table with ca. 5.200.000 rows on a
column with an Index ignores this and does a full table scan:

sisis=# explain (analyze, buffers) select * from d01buch where d01ort like 'Z 9610%' ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..680262.71 rows=510 width=952) (actual time=1324.096..1349.429 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=102040 read=560674
-> Parallel Seq Scan on d01buch (cost=0.00..679211.71 rows=128 width=952) (actual time=1117.663..1315.062 rows=0 loops=5)
Filter: (d01ort ~~ 'Z 9610%'::text)
Rows Removed by Filter: 1055853
Buffers: shared hit=102040 read=560674
Planning Time: 2.028 ms
Execution Time: 1349.593 ms
(10 Zeilen)

Why is this (ignoring the Index) and what could be done?

Thanks

        matthias


--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub


Re: SELECT with LIKE clause makes full table scan

From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Wed, Jan 26, 2022 at 02:34:21PM +0100, Dominique Devienne wrote:
>> On Wed, Jan 26, 2022 at 1:13 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>>> That index can't be used with a LIKE that has a trailing wildcard.

>> So what makes you say that? --DD

> This part of the documentation you mentioned:

>> However, if your database does not use the C locale you will need to create
>> the index with a special operator class to support indexing of
>> pattern-matching queries; see Section 11.10 below.

Note that declaring the index with C collation should also work,
and might be preferable to using the pattern_ops opclass.
C collation has at least some chance of being used explicitly
in queries, whereas a pattern_ops index is basically never going
to match anything but LIKE/regex searches.

            regards, tom lane



Re: SELECT with LIKE clause makes full table scan

From
Matthias Apitz
Date:
We changed two relevant Indexes to

CREATE INDEX d01ort ON d01buch(d01ort bpchar_pattern_ops );
CREATE INDEX d01ort2 ON d01buch(d01ort2 bpchar_pattern_ops );

and now the same queries are fast. We're looking through our code for
more such LIKE clauses on VCHAR columns.

Thanks for all the hints

    matthias
-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: SELECT with LIKE clause makes full table scan

From
Julien Rouhaud
Date:
Hi,

On Wed, Jan 26, 2022 at 11:07 PM Matthias Apitz <guru@unixarea.de> wrote:
>
> We changed two relevant Indexes to
>
> CREATE INDEX d01ort ON d01buch(d01ort bpchar_pattern_ops );
> CREATE INDEX d01ort2 ON d01buch(d01ort2 bpchar_pattern_ops );

When you said changed, did you drop the previous ones?  As Tom
mentioned, those indexes are specialized and are only useful for LIKE
'something%' queries.  It's quite likely that your existing indexes
were useful for other queries, which may not be as fast without those
indexes.  You can check in pg_stat_user_indexes if your indexes seems
to be used before actually dropping them for instance:
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ALL-INDEXES-VIEW



Re: SELECT with LIKE clause makes full table scan

From
Matthias Apitz
Date:
El día miércoles, enero 26, 2022 a las 11:21:12p. m. +0800, Julien Rouhaud escribió:

> Hi,
> 
> On Wed, Jan 26, 2022 at 11:07 PM Matthias Apitz <guru@unixarea.de> wrote:
> >
> > We changed two relevant Indexes to
> >
> > CREATE INDEX d01ort ON d01buch(d01ort bpchar_pattern_ops );
> > CREATE INDEX d01ort2 ON d01buch(d01ort2 bpchar_pattern_ops );
> 
> When you said changed, did you drop the previous ones?  

Yes, of course.

> As Tom
> mentioned, those indexes are specialized and are only useful for LIKE
> 'something%' queries.  It's quite likely that your existing indexes
> were useful for other queries, which may not be as fast without those
> indexes.  You can check in pg_stat_user_indexes if your indexes seems
> to be used before actually dropping them for instance:
> https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ALL-INDEXES-VIEW

Thanks, we're still investigating more cases with LIKE clause and will
consider your hint.

    matthias


-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub