Re: [GENERAL] Slow query plan used - Mailing list pgsql-general

From Tom Lane
Subject Re: [GENERAL] Slow query plan used
Date
Msg-id 5312.1496335302@sss.pgh.pa.us
Whole thread Raw
In response to Re: [GENERAL] Slow query plan used  (Bill Moran <wmoran@potentialtech.com>)
List pgsql-general
Bill Moran <wmoran@potentialtech.com> writes:
> LIKE queries are probably challenging to plan, especially when they're not
> left-anchored: how can the planner be reasonalbly expected to estimate how
> many rows will be matched by a given LIKE expression.

Yeah, especially without any statistics.  The core problem here appears
to be the poor rowcount estimate for the LIKE:

              ->  Seq Scan on c3k_document_index  (cost=0.00..15160.48 rows=1063 width=285) (actual
time=0.206..13539.353rows=33022 loops=1) 
                    Filter: ((lower((searchfield1)::text) ~~ '%sehr%'::text) OR (lower(searchfield8) ~~
'%sehr%'::text))
                    Rows Removed by Filter: 33832

That's off by a factor of 30.

> Not having looked at the code, I would guess that the length of the LIKE
> expression will make the planner assume that the match is more restrictive,
> while many % and _ in the LIKE expression make the planner assume that the
> match is less restrictive.

Exactly.  It's not terribly bright, but the extra %'s make it think that
the pattern is less selective, producing a higher rowcount estimate that
--- by chance -- better matches reality.  Then you get better decisions
about how to shape the rest of the plan.

You might get some traction by creating indexes on lower(searchfield1)
etc.  This isn't even necessarily with an expectation that the planner
would use those indexes in the plan ... but what it would do is make
use of the statistics that ANALYZE will accumulate about the indexed
expressions.  I think that would give you better estimates about the
LIKE rowcounts.  You might have to crank up the statistics target for
those indexes if the default isn't enough to make the estimates
significantly better.  (Obviously, don't forget to re-ANALYZE before
checking results.)

> That in mind, let me throw pg_trgm into the mix of things to try:
> https://www.postgresql.org/docs/current/static/pgtrgm.html
> The trigram module allows you to create indexes that LIKE can use
> to do index searches instead of always having to do sequential scans
> or push the LIKE matching to another part of the plan tree.

For cases like the above, where half the table is getting selected,
I would not expect the planner to use an index anyway.  Any old index
on lower(searchfieldN) will work in terms of inducing ANALYZE to
collect stats.  But if you have other queries with more selective
LIKE patterns then maybe a trigram index in particular is worthwhile.

            regards, tom lane


pgsql-general by date:

Previous
From: Torsten Förtsch
Date:
Subject: Re: [GENERAL] Rounding Double Precision or Numeric
Next
From: Steve Atkins
Date:
Subject: Re: [GENERAL] Rounding Double Precision or Numeric