Re: unoptimized nested loops - Mailing list pgsql-general

From David Rowley
Subject Re: unoptimized nested loops
Date
Msg-id CAApHDvqVUpDajMa31Yq-_o4+W7frGLp+6QQD5pJXxCP1oqTOVw@mail.gmail.com
Whole thread Raw
In response to unoptimized nested loops  (Tim Kelly <gtkelly@dialectronics.com>)
List pgsql-general
On Wed, 1 Jun 2022 at 08:04, Tim Kelly <gtkelly@dialectronics.com> wrote:
>           ->  Seq Scan on data  (cost=0.00..47132.93 rows=125 width=135)
> (actual time=0.542..182952.708 rows=1167810 loops=1)
>                 Filter: (data ~~ '%some text%'::text)

The problem is coming from the 125 row estimate in the above plan
fragment.  Because the number of estimated rows is low, the query
planner likely thinks a Nested Loop join is best.

What I'd do first is verify that some other join method is better by
running the query after having done:

SET enable_nestloop TO off;
<run query>
RESET enble_nestloop;

If the query then runs faster then it's going to be worth doing
something about trying to improve those statistics.

I see the like pattern matching selectivity estimation code does look
at histogram buckets, so you might have luck if you increase the
statistics targets on this column:

ALTER TABLE data ALTER COLUMN data SET STATISTICS 1000;
ANALYZE data;

The standard number of buckets is 100. The above will set it to 1000.
You can go as high as 10000, but going too high is going to slow down
the planner, so you should only go as high as you need to go.

David



pgsql-general by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Extended multivariate statistics are ignored (potentially related to high null fraction, not sure)
Next
From: Jeff Janes
Date:
Subject: Re: unoptimized nested loops