Home > mailing lists

Re: unoptimized nested loops - Mailing list pgsql-general

From	David Rowley
Subject	Re: unoptimized nested loops
Date	June 2, 2022 00:10:07
Msg-id	CAApHDvqVUpDajMa31Yq-_o4+W7frGLp+6QQD5pJXxCP1oqTOVw@mail.gmail.com Whole thread Raw
In response to	unoptimized nested loops (Tim Kelly <gtkelly@dialectronics.com>)
List	pgsql-general

Tree view

On Wed, 1 Jun 2022 at 08:04, Tim Kelly <gtkelly@dialectronics.com> wrote:
>           ->  Seq Scan on data  (cost=0.00..47132.93 rows=125 width=135)
> (actual time=0.542..182952.708 rows=1167810 loops=1)
>                 Filter: (data ~~ '%some text%'::text)

The problem is coming from the 125 row estimate in the above plan
fragment.  Because the number of estimated rows is low, the query
planner likely thinks a Nested Loop join is best.

What I'd do first is verify that some other join method is better by
running the query after having done:

SET enable_nestloop TO off;
<run query>
RESET enble_nestloop;

If the query then runs faster then it's going to be worth doing
something about trying to improve those statistics.

I see the like pattern matching selectivity estimation code does look
at histogram buckets, so you might have luck if you increase the
statistics targets on this column:

ALTER TABLE data ALTER COLUMN data SET STATISTICS 1000;
ANALYZE data;

The standard number of buckets is 100. The above will set it to 1000.
You can go as high as 10000, but going too high is going to slow down
the planner, so you should only go as high as you need to go.

David

pgsql-general by date:

From: Bruce Momjian
Date: 01 June 2022, 23:08:38
Subject: Re: Extended multivariate statistics are ignored (potentially related to high null fraction, not sure)

From: Jeff Janes
Date: 02 June 2022, 07:05:46
Subject: Re: unoptimized nested loops

Re: unoptimized nested loops - Mailing list pgsql-general

Previous

Next